Skip to content

God-damnit-all/html2md

 
 

Repository files navigation

html2md

All Contributors

MIT License GoDoc Go Report Card Build Status PoweredBy WireFrame

TOC

html2md - HTML to Markdown converter

The html2md makes use of github.com/JohannesKaufmann/html-to-markdown to convert HTML into Markdown, which is using an HTML Parser to avoid the use of regexp as much as possible, which can prevent some weird cases and allows it to be used for cases where the input is totally unknown.

gopher stading on top of a machine that converts a box of html to blocks of markdown

Usage

$ html2md

HTML to Markdown
Version 1.1.0 built on 2023-05-03
Copyright (C) 2020-2023, Tong Sun

HTML to Markdown converter on command line

Usage:
  html2md [Options...]

Options:

  -h, --help                       display help information 
  -i, --in                        *The html/xml file to read from (or stdin) 
  -d, --domain                     Domain of the web page, needed for links when --in is not url 
  -s, --sel                        CSS/goquery selectors [=body]
  -x, --excl                       Excluding CSS/goquery selectors 
      --xc                         Excluding all children nodes 
  -v, --verbose                    Verbose mode (Multiple -v options increase the verbosity.) 

      --opt-heading-style          Option HeadingStyle 
      --opt-horizontal-rule        Option HorizontalRule 
      --opt-bullet-list-marker     Option BulletListMarker 
      --opt-code-block-style       Option CodeBlockStyle 
      --opt-fence                  Option Fence 
      --opt-em-delimiter           Option EmDelimiter 
      --opt-strong-delimiter       Option StrongDelimiter 
      --opt-link-style             Option LinkStyle 
      --opt-link-reference-style   Option LinkReferenceStyle 
      --opt-escape-mode            Option EscapeMode 

  -A, --plugin-conf-attachment     Plugin ConfluenceAttachments 
  -C, --plugin-conf-code           Plugin ConfluenceCodeBlock 
  -F, --plugin-frontmatter         Plugin FrontMatter 
  -G, --plugin-gfm                 Plugin GitHubFlavored 
  -S, --plugin-strikethrough       Plugin Strikethrough 
  -T, --plugin-table               Plugin Table 
      --plugin-table-compat        Plugin TableCompat 
  -L, --plugin-task-list           Plugin TaskListItems 
  -V, --plugin-vimeo               Plugin VimeoEmbed 
  -Y, --plugin-youtube             Plugin YoutubeEmbed

Examples

Simplest form

$ html2md -i https://github.com/suntong/html2md | head -3
[Skip to content](#start-of-content)

[Homepage](https://github.com/)

Using goquery

The most useful feature is to use and pass a goquery selection to filter for the content you want.

$ html2md -i https://github.com/JohannesKaufmann/html-to-markdown -s "div.my-3"
[go](http://github.com/topics/go "Topic: go") [html](http://github.com/topics/html "Topic: html") [markdown](http://github.com/topics/markdown "Topic: markdown") [golang](http://github.com/topics/golang "Topic: golang") [converter](http://github.com/topics/converter "Topic: converter") [html-to-markdown](http://github.com/topics/html-to-markdown "Topic: html-to-markdown") [goquery](http://github.com/topics/goquery "Topic: goquery")

The options and plugins

Works as expected:

$ echo '<strong>Bold Text</strong>' | html2md -i
**Bold Text**

$ echo '<strong>Bold Text</strong>' | html2md -i --opt-strong-delimiter="__"
__Bold Text__


$ echo '<ul><li><input type=checkbox checked>Checked!</li><li><input type=checkbox>Check Me!</li></ul>' | html2md -i -G
- [x] Checked!
- [ ] Check Me!

$ echo 'Only <del>blue ones</del> <s> left</s>' | html2md -i --plugin-strikethrough
Only ~~blue ones~~ ~~left~~

Testing the new table plugins

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T | head -6
| Firstname | Lastname | Age |
| --- | --- | --- |
| Jill | Smith | 50 |
| Eve | Jackson | 94 |
| Empty |  |  |
| End |

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.table.golden -
---
@@ -41 +41,2 @@
 | `var` | b | c |
\ No newline at end of file
+

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat | head -6
Firstname · Lastname · Age

Jill · Smith · 50

Eve · Jackson · 94

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.tablecompat.golden -
---
@@ -41 +41,2 @@
 `var` · b · c
\ No newline at end of file
+

Credits

Credits

Similar Projects

Install Debian/Ubuntu package

sudo apt install -y html2md

Download/install binaries

  • The latest binary executables are available as the result of the Continuous-Integration (CI) process.
  • I.e., they are built automatically right from the source code at every git release by GitHub Actions.
  • There are two ways to get/install such binary executables
    • Using the binary executables directly, or
    • Using packages for your distro

The binary executables

  • The latest binary executables are directly available under
    https://github.com/suntong/html2md/releases/latest
  • Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the html2md_verxx_linux_amd64.tar.gz file.
  • Available OS for binary executables are
    • Linux
    • Mac OS (darwin)
    • Windows
  • If your OS and its architecture is not available in the download list, please let me know and I'll add it.
  • The manual installation is just to unpack it and move/copy the binary executable to somewhere in PATH. For example,
tar -xvf html2md_*_linux_amd64.tar.gz
sudo mv -v html2md_*_linux_amd64/html2md /usr/local/bin/
rmdir -v html2md_*_linux_amd64

Distro package

The repo setup instruction url has been given above. For example, for Debian --

Debian package

curl -1sLf \
  'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
  | sudo -E bash

# That's it. You then can do your normal operations, like

sudo apt update
apt-cache policy html2md

sudo apt install -y html2md

Install Source

To install the source code instead:

go install github.com/suntong/html2md@latest

Author

Tong SUN
suntong from cpan.org

Powered by WireFrame
PoweredBy WireFrame
the one-stop wire-framing solution for Go cli based projects, from init to deploy.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

suntong
suntong

💻 🤔 🎨 🔣 ⚠️ 🐛 📖 📝 💡 🔧 📦 👀 💬 🚧 🚇
VPanteleev-S7
VPanteleev-S7

💻 🐛 📓
itdoginfo
itdoginfo

🐛 📓
somename123
somename123

🐛 🤔 📓
vivook
vivook

🐛 📓
097115
097115

🐛 🤔 📓
James Reynolds
James Reynolds

👀 📢 📓

This project follows the all-contributors specification. Contributions of any kind welcome!

About

HTML to Markdown converter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 99.4%
  • Shell 0.6%