Web scraping is the process of automatically extracting data from websites, web pages, and online documents. It involves using software or algorithms to navigate through websites, locate and extract specific data, and then store it in a structured format for further analysis or use.
Here I've provided some real-world examples of why we need data on the internet.
-
Day-to-day use case
- In today's digital age, we mostly shop online. And in order to buy something, we must always scroll through numerous websites and pages, paying close attention to the price, quality, and dependability of the product that we wish to purchase.
-
Organizational Use Cases
- Collect and analyze online data for decision-making and identifying market trends and patterns. Customers provide feedback on products, and so on.
- We can use webscraping technique to collect data from internet.
There are enough amount of packages in python to do web scraping such as:
- Beautiful Soup
- Mechanical Soup
- Selenium
- Scrapy
Also we can collect the data via APIs. Because in some cases you need to find some hidden APIs to request the data from the website.
Here I have used Selenium to scrape data about movies and films from Go4Explore website.
What you need to scrape the data from Go4Explore.
- Python IDE: To write an efficient code
- Selenium: Python package which is used to scape the data faster.
- Web driver: web driver is a tool to open the browser we chose (Chrome, Firefox, Edge, or Safari). Whatever browser you want to use, the driver should be compatible with your browser version and your operating system.
Here you can find all the information about selenium: https://selenium-python.readthedocs.io/
To see how I did the scraping. all the code open the go4Explore.ipynb file.
I have intended to find destination to have a trip, their prices and discounts.
- Trip Name
- Prices
- Discounts
- Duration of the trip