GitHub - KbearW/Web-scraping: A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

Extract Data from IRS website:

A bot using Python with BeautifulSoup that scrapes the IRS website (prior form publication) by form number and returns the results as json and downloads a copy of the prospective pdf into a folder if the user chooses to.

How to run the script:

This script runs on Python 3.8. Install the libraries on requirements.txt into a new environment, then run 'Script.py'.

The script will ask you for a form number then scrap the IRS website.

Please separate the tax form number by a comma follow by a space, such as:

--> Form W-2, Form 1095-C

The results will be returned as a json string. If there are no results, you'll get a 'No results' message instead.

Sample output: [{'form_number': 'Form W-2', 'form_title': 'Wage and Tax Statement (Info Copy Only)', 'max_year': '2022', 'min_year': '1954'}, {'form_number': 'Form 1095-C', 'form_title': 'Employer-Provided Health Insurance Offer and Coverage', 'max_year': '2022', 'min_year': '2014'}]

Note: To keep users engaged, the bot will display which task it is performing and what URL it is currently searching.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Script.py		Script.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Script.py

Script.py

requirements.txt

requirements.txt

Repository files navigation

About

Releases

Packages

Languages

KbearW/Web-scraping

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages