Skip to content

5j9/html-table-parse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTML Table Parse

A lightweight HTML table parser that converts tables to Python data structures without pandas.

Installation

pip install html-table-parse

Usage

from html_table_parse import to_list, to_dict, to_dicts

html = """
<table>
    <tr><th>Name</th><th>Age</th><th>City</th></tr>
    <tr><td>Alice</td><td>30</td><td>NYC</td></tr>
    <tr><td>Bob</td><td>25</td><td>LA</td></tr>
</table>
"""

# List of lists
to_list(html)
# [['Name', 'Age', 'City'], ['Alice', '30', 'NYC'], ['Bob', '25', 'LA']]

# Dictionary of columns
to_dict(html)
# {'Name': ['Alice', 'Bob'], 'Age': ['30', '25'], 'City': ['NYC', 'LA']}

# List of dictionaries
to_dicts(html)
# [{'Name': 'Alice', 'Age': '30', 'City': 'NYC'}, 
#  {'Name': 'Bob', 'Age': '25', 'City': 'LA'}]

Features

  • No pandas required - lightweight alternative to pandas.read_html()
  • Supports colspan and rowspan attributes
  • Handles duplicate headers (auto-numbered)
  • Multiple output formats: lists, dict of columns, or list of dicts
  • Automatic whitespace normalization
  • Fast parsing with lxml

API

to_list(html: str, index: int = 0) -> list[list]

Parse table as list of rows.

to_dict(html: str, index: int = 0) -> dict[str, list]

Parse table as dictionary of columns (first row = headers).

to_dicts(html: str, index: int = 0) -> list[dict]

Parse table as list of dictionaries (first row = headers).

About

A lightweight, dependency-minimal HTML table parser that converts tables to lists, dictionaries, or list of dictionaries without pandas.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages