Datahtml is a library to process and extract data from html and xml content.
Datahtml lets you:
- Extract ld+json data from html
- Extract frequently used meta tags from html (those that are used for SEO and social media, between others)
- Extract Article data from a html, usually from Newspaper sites
- Parse RSS feeds from sites
- Crawl some specific social media sites like google and youtube
Under the hood datahtml uses libraries like BeautifoulSoup, Newspaper2k, feedparser between others
.. toctree:: :maxdepth: 2 :caption: Contents: api_reference