The html2md
makes use of github.com/JohannesKaufmann/html-to-markdown
to convert HTML into Markdown, which is using an HTML Parser to avoid the use of regexp
as much as possible, which can prevent some weird cases and allows it to be used for cases where the input is totally unknown.
HTML to Markdown
Version 1.1.0 built on 2023-05-03
Copyright (C) 2020-2023, Tong Sun
HTML to Markdown converter on command line
Usage:
html2md [Options...]
Options:
-h, --help display help information
-i, --in *The html/xml file to read from (or stdin)
-d, --domain Domain of the web page, needed for links when --in is not url
-s, --sel CSS/goquery selectors [=body]
-x, --excl Excluding CSS/goquery selectors
--xc Excluding all children nodes
-v, --verbose Verbose mode (Multiple -v options increase the verbosity.)
--opt-heading-style Option HeadingStyle
--opt-horizontal-rule Option HorizontalRule
--opt-bullet-list-marker Option BulletListMarker
--opt-code-block-style Option CodeBlockStyle
--opt-fence Option Fence
--opt-em-delimiter Option EmDelimiter
--opt-strong-delimiter Option StrongDelimiter
--opt-link-style Option LinkStyle
--opt-link-reference-style Option LinkReferenceStyle
--opt-escape-mode Option EscapeMode
-A, --plugin-conf-attachment Plugin ConfluenceAttachments
-C, --plugin-conf-code Plugin ConfluenceCodeBlock
-F, --plugin-frontmatter Plugin FrontMatter
-G, --plugin-gfm Plugin GitHubFlavored
-S, --plugin-strikethrough Plugin Strikethrough
-T, --plugin-table Plugin Table
--plugin-table-compat Plugin TableCompat
-L, --plugin-task-list Plugin TaskListItems
-V, --plugin-vimeo Plugin VimeoEmbed
-Y, --plugin-youtube Plugin YoutubeEmbed
$ html2md -i https://github.com/suntong/html2md | head -3
[Skip to content](#start-of-content)
[Homepage](https://github.com/)
The most useful feature is to use and pass a goquery selection to filter for the content you want.
$ html2md -i https://github.com/JohannesKaufmann/html-to-markdown -s "div.my-3"
[go](http://github.com/topics/go "Topic: go") [html](http://github.com/topics/html "Topic: html") [markdown](http://github.com/topics/markdown "Topic: markdown") [golang](http://github.com/topics/golang "Topic: golang") [converter](http://github.com/topics/converter "Topic: converter") [html-to-markdown](http://github.com/topics/html-to-markdown "Topic: html-to-markdown") [goquery](http://github.com/topics/goquery "Topic: goquery")
Works as expected:
$ echo '<strong>Bold Text</strong>' | html2md -i
**Bold Text**
$ echo '<strong>Bold Text</strong>' | html2md -i --opt-strong-delimiter="__"
__Bold Text__
$ echo '<ul><li><input type=checkbox checked>Checked!</li><li><input type=checkbox>Check Me!</li></ul>' | html2md -i -G
- [x] Checked!
- [ ] Check Me!
$ echo 'Only <del>blue ones</del> <s> left</s>' | html2md -i --plugin-strikethrough
Only ~~blue ones~~ ~~left~~
$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T | head -6
| Firstname | Lastname | Age |
| --- | --- | --- |
| Jill | Smith | 50 |
| Eve | Jackson | 94 |
| Empty | | |
| End |
$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.table.golden -
---
@@ -41 +41,2 @@
| `var` | b | c |
\ No newline at end of file
+
$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat | head -6
Firstname · Lastname · Age
Jill · Smith · 50
Eve · Jackson · 94
$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.tablecompat.golden -
---
@@ -41 +41,2 @@
`var` · b · c
\ No newline at end of file
+
- Johannes Kaufmann's html-to-markdown that does the heavy lifting behind the scene.
- turndown (js), a very good library written in javascript.
- lunny/html2md, which is using regex instead of goquery, which exhibits a few edge cases which prompted
github.com/JohannesKaufmann/html-to-markdown
- jaytaylor/html2text, which is not converting to markdown but plain text.
sudo apt install -y html2md
- The latest binary executables are available as the result of the Continuous-Integration (CI) process.
- I.e., they are built automatically right from the source code at every git release by GitHub Actions.
- There are two ways to get/install such binary executables
- Using the binary executables directly, or
- Using packages for your distro
- The latest binary executables are directly available under
https://github.com/suntong/html2md/releases/latest - Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the
html2md_verxx_linux_amd64.tar.gz
file. - Available OS for binary executables are
- Linux
- Mac OS (darwin)
- Windows
- If your OS and its architecture is not available in the download list, please let me know and I'll add it.
- The manual installation is just to unpack it and move/copy the binary executable to somewhere in
PATH
. For example,
tar -xvf html2md_*_linux_amd64.tar.gz
sudo mv -v html2md_*_linux_amd64/html2md /usr/local/bin/
rmdir -v html2md_*_linux_amd64
The repo setup instruction url has been given above. For example, for Debian --
curl -1sLf \
'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
| sudo -E bash
# That's it. You then can do your normal operations, like
sudo apt update
apt-cache policy html2md
sudo apt install -y html2md
To install the source code instead:
go install github.com/suntong/html2md@latest
Powered by WireFrame
the one-stop wire-framing solution for Go cli based projects, from init to deploy.
Thanks goes to these wonderful people (emoji key):
suntong 💻 🤔 🎨 🔣 |
VPanteleev-S7 💻 🐛 📓 |
itdoginfo 🐛 📓 |
somename123 🐛 🤔 📓 |
vivook 🐛 📓 |
097115 🐛 🤔 📓 |
James Reynolds 👀 📢 📓 |
This project follows the all-contributors specification. Contributions of any kind welcome!