A lightweight HTML table parser that converts tables to Python data structures without pandas.
pip install html-table-parsefrom html_table_parse import to_list, to_dict, to_dicts
html = """
<table>
<tr><th>Name</th><th>Age</th><th>City</th></tr>
<tr><td>Alice</td><td>30</td><td>NYC</td></tr>
<tr><td>Bob</td><td>25</td><td>LA</td></tr>
</table>
"""
# List of lists
to_list(html)
# [['Name', 'Age', 'City'], ['Alice', '30', 'NYC'], ['Bob', '25', 'LA']]
# Dictionary of columns
to_dict(html)
# {'Name': ['Alice', 'Bob'], 'Age': ['30', '25'], 'City': ['NYC', 'LA']}
# List of dictionaries
to_dicts(html)
# [{'Name': 'Alice', 'Age': '30', 'City': 'NYC'},
# {'Name': 'Bob', 'Age': '25', 'City': 'LA'}]- No pandas required - lightweight alternative to
pandas.read_html() - Supports
colspanandrowspanattributes - Handles duplicate headers (auto-numbered)
- Multiple output formats: lists, dict of columns, or list of dicts
- Automatic whitespace normalization
- Fast parsing with
lxml
Parse table as list of rows.
Parse table as dictionary of columns (first row = headers).
Parse table as list of dictionaries (first row = headers).