Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 945 Bytes

index.rst

File metadata and controls

33 lines (23 loc) · 945 Bytes

Welcome to datahtml

Datahtml is a library to process and extract data from html and xml content.

Datahtml lets you:

  • Extract ld+json data from html
  • Extract frequently used meta tags from html (those that are used for SEO and social media, between others)
  • Extract Article data from a html, usually from Newspaper sites
  • Parse RSS feeds from sites
  • Crawl some specific social media sites like google and youtube

Under the hood datahtml uses libraries like BeautifoulSoup, Newspaper2k, feedparser between others

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   api_reference


Indices and tables