The present project is a basic process pipeline of extrating, transforming, loading, analysing and presenting. All of that was made by using suitable tools of web scraping, data analysis/presentation and databases.
- Create a crawler able to scrape offers and reviews from Extra web store, more specifically, offers and reviews about coolers, televisions and printers;
- Save the data in a database in an automated way;
- Analyze products and reviews data;
- Create a basic presentation using Extra offers information.
To code a programming to get web site information is needed a crawler (the crawler in DS001 project was made in Python and Scrapy). Looking at Extra web store source code and requests in browser we can find some API URL been triggered. Using API URLs the work becomes easier.
As reviews data can be extracted while scraping offers data, it's a good way to split the work into three spiders (coolers, televisions and printers spiders) without create additional spiders to reviews only. Basically, review objects are bigger than offer objects, then the impact of scraping the two together per spider isn't too severe. The crawler saves the data in MongoDB database itself using the files "pipelines.py" and "items.py".
Running the spiders with command "scrapy crawl <<SPIDER_NAME>>":
So...
Data been saved in MongoDB database:
- Products data format in database:
- Reviews data format in database:
I early stoped the crawlers due the time to deliver the case 😳. So, the result... was about 31k data documents saved within MongoDB datase.
MongoDB has its own tools to basic data analysis in database:
In a Jupyter Notebook some incredible things can be done. Python is a really flexible and versatile programming language. Using libraries/packages like Matplotlib, Pandas, Numpy, Seaborn a complete descriptive analysis is tangible.
- Exporting products data from MongoDB as CSV:
- Exporting reviews data from MongoDB as CSV:
All presentation was made in Power BI Desktop, an awesome tool to data visualization and presentation.
- Iterative charts presentation in computer:
- Iterative charts presentation in smartphone: