Skip to content

A web service to transform web pages into markdown

Notifications You must be signed in to change notification settings

akrabat/downmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⬇️✅ downMARK

A webservice to translate a webpage into Markdown.

Disclaimer

This is a POC.

  • Copypasted code from the web
  • Ignored a lot of security concerns
  • Zero automatic tests
  • Poor documentation
  • Born unmaintained
  • From 0 to web in 3 hours of a lazy weekend
  • https://downmark.herokuapp.com is not assured to be online

Using my experimental webservice while it's publicly available is unsecure and done at your own risk

If you're ok and conscious about these facts, let's read an insufficient documentation.

Running the webservice

This is a little web service written in node which runs over expressjs. You can deploy it wherever you want. Deploying on Heroku free tier is as easy as

git clone https://github.com/pioneerskies/downmark.git
cd downmark
heroku create
git push heroku main

NOTE: need to have heroku cli installed

It will work also by running it locally on your machine at https://localhost:3000

git clone https://github.com/pioneerskies/downmark.git
cd downmark
yarn install
yarn start

NOTE: you need node and yarn (no engines restriction yet. YOLO)

Using the webservice - JSON

Given the webservice running at some URL, do a GET request to URL/api/v1 passing a URL in query string's u parameter, e.g.:

https://downmark.herokuapp.com/api/v1?u=https://example.com

In the JSON response you'll get

  • title: the page title
  • content: the main content of the webpage in markdown format
  • excerpt: an excerpt of the content in plain text
  • byline: maybe the author (if heuristically found in page)
{

    "title": "Example Domain",
    "content": "This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)",
    "excerpt": "This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.",
    "byline": null

}

NOTE: "main content" I mean the content as parsed by https://github.com/mozilla/readability

Using the webservice - Obsidian

Directly save the resulting document into Obsidian. File will be saved in default vault under Clippings/ folder.

Endpoint:

https://downmark.herokuapp.com/obsidian

Params:

  • u: String (possibly encoded) URL to download
  • tags: [optional] Array(String) One or more tags. They will appended to the default clippings tag.

Example:

https://downmark.herokuapp.com/?u=https%3A%2F%2Fwww.ilpost.it%2F2022%2F02%2F23%2Feuropa-debolezza-ucraina%2F&tags[]=foo&tags[]=bar

Bookmarklet

Make a bookmarklet into your browser with the following code: https://raw.githubusercontent.com/pioneerskies/downmark/main/bookmarklet/bookmarklet.js

NOTE: knowing how to create a bookmarklet is up to you and and your search engine

NOTE: If you're running the webservice locally you can copy the development bookmarklet: https://raw.githubusercontent.com/pioneerskies/downmark/main/bookmarklet/bookmarklet.dev.js

Currently features are very limited:

  • the clipping will be created under the Clippings/ folder. Customization not implemented
  • vault customization is not implemented
  • the clipping will be created using Frontmatter for metadata

All these limitations would be easy to remove, but do remind the discaimer: this is a POC and a private experiment.

Using the webservice - Download

Save on server

TODO

Download from browser

TODO

Backstory and credits

All the credits you could imagine goes to @kepano who wrote this really smart version of the bookmarklet and to @Moonbase59 for his frontmatter version.

I, as a user of that bookmarklet, experimented with this toy trying to overcome to its main (and for me the only) limitation: it doesn't work on sites with connect-src SCP restriction implemented. GitHub is just one example.

NOTE: kudos to all the websites implementing such strict SCP; they're doing such for our security.

This version of the bookmarklet delegates all the work to the webservice, working around SCP restrictions (hopefully).

Customising the bookmarklet to connect to your webservice's instance

  • clone the repo
  • yarn install
  • edit bookmarklet/bookmarklet_src.js updating the domain from downmark.herokuapp.com to your
  • yarn build:bookmarklet
  • copy the resulting code from bookmarklet/bookmarklet.js

NOTE: you can to this also if you're running the webservice locally

NAQ - Never Asked Questions

Q: Does it work on iOS Safari?

A: Yep, it does

Q: Does it work on the X browser?

A: It works on FireFox for macOS and for Safari for iOS. Other combination not tested. Let's try!

Q: Is it secure for me to test this soltion using https://downmark.herokuapp.com webservice?

A: I opensourced this repo (winning against shame) in order to make you able to read what the code does, but I cannot and do not want to demostrate the code party between the repo and the deployed webservice. You have to take my word for it and I don't mind if you can't. From the intrinsic code perspective this code should be at least as secure as the original bookmarklet, but the code on https://downmark.herokuapp.com is running away from you and I could be able to inject malicious code into the resulting markdown or redirect you to unwanted URLs and even register the sites you're clipping. I'm not doing any of these things, but you should not care a lot about what I'm saying I'm doing or not :) Just read the code, fork it, get the good ideas, share them and give me a feedback if you mind to. No more nor less.

About

A web service to transform web pages into markdown

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 96.9%
  • HTML 2.0%
  • CSS 1.1%