TableShare is a lightweight Python library for extracting table data from web pages and converting it into pandas DataFrame. It supports scraping tables from online resources.
You can install TableShare via pip:
pip install tableshare
import tableshare as ts
url = 'http://example.com/table-page'
df = ts.show_all_tables(url) # 默认参数dynamic是false
print(df)
import tableshare as ts
url = 'http://example.com/table-page'
df = ts.get_the_table(url,0) # 以0为例,0是table序列号
print(df)
import tableshare as ts
url = 'http://example.com/table-page'
df = ts.show_all_tables(url, dynamic=True)
print(df)
import tableshare as ts
url = 'http://example.com/table-page'
df = ts.get_the_table(url, 0, dynamic=True) # 以0为例,0是table序列号
print(df)
Scrape single or multiple tables from web pages. Scrape single or multiple tables with JS dynamic loaded table data. Convert scraped table data into pandas DataFrame for further analysis and processing.
Make sure the target website's robots.txt allows crawler access when scraping online resources. For dynamically loaded table data, you may need to use tools like Selenium to retrieve the complete page content. If you encounter any issues or have suggestions for improvement while using TableShare, please submit an issue or pull request on the GitHub repository.
TableShare is released under the MIT license. For more information, please see the LICENSE file.
If you have any questions or need assistance, please contact us via:
Email: baiguanba@outlook.com
GitHub Repository: https://github.com/baiguanba/tableshare
PyPI Package: https://pypi.org/project/tableshare/
If you find TableShare useful and would like to support its development, please consider starring the repository on GitHub.