A set of custom scripts for web scraping
Imagine you have to pull a large amount of data from websites and you want to do it as quickly as possible. How would you do it without manually going to each website and getting the data? Well, “Web Scraping” is the answer. Web Scraping just makes this job easier and faster.
Talking about whether web scraping is legal or not, some websites allow web scraping and some don’t. To know whether a website allows web scraping or not, you can look at the website’s “robots.txt” file. You can find this file by appending “/robots.txt” to the URL that you want to scrape.
Most of the time I use python for Web Scraping, Because it is easy, simple and powerful, it also contains many library for Web Scraping .
- BeautifulSoup : Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.
for install :
pip install beautifulsoup4
| Docs : https://www.crummy.com/software/BeautifulSoup/bs4/doc/ - Selenium : Selenium is a web testing library. It is used to automate browser activities.
for install :
pip install selenium
| Docs : https://www.selenium.dev/documentation/en/ - Pandas: Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.
for install :
pip install pandas
| Docs : https://pandas.pydata.org/docs/