🎉 DS001 - Scraping to Analysis (Extra Store)

The present project is a basic process pipeline of extrating, transforming, loading, analysing and presenting. All of that was made by using suitable tools of web scraping, data analysis/presentation and databases.

Objectives:

Create a crawler able to scrape offers and reviews from Extra web store, more specifically, offers and reviews about coolers, televisions and printers;
Save the data in a database in an automated way;
Analyze products and reviews data;
Create a basic presentation using Extra offers information.

💻 Step 1. Code code... and code

To code a programming to get web site information is needed a crawler (the crawler in DS001 project was made in Python and Scrapy). Looking at Extra web store source code and requests in browser we can find some API URL been triggered. Using API URLs the work becomes easier.

🛣️ Step 2. Choose a way to scrape and save the data

As reviews data can be extracted while scraping offers data, it's a good way to split the work into three spiders (coolers, televisions and printers spiders) without create additional spiders to reviews only. Basically, review objects are bigger than offer objects, then the impact of scraping the two together per spider isn't too severe. The crawler saves the data in MongoDB database itself using the files "pipelines.py" and "items.py".

🕷️ Step 3. Run the spiders

Running the spiders with command "scrapy crawl <<SPIDER_NAME>>":

So...

💾 Step 4. Wait...

Data been saved in MongoDB database:

Products data format in database:

Reviews data format in database:

I early stoped the crawlers due the time to deliver the case 😳. So, the result... was about 31k data documents saved within MongoDB datase.

🕶️ Step 5. Looking for a first undestanding about the data

MongoDB has its own tools to basic data analysis in database:

📈 Step 6. Making a deeper descriptive analysis

In a Jupyter Notebook some incredible things can be done. Python is a really flexible and versatile programming language. Using libraries/packages like Matplotlib, Pandas, Numpy, Seaborn a complete descriptive analysis is tangible.

🎨 Step 7. Exporting data and making a simple presentation

Exporting products data from MongoDB as CSV:

Exporting reviews data from MongoDB as CSV:

All presentation was made in Power BI Desktop, an awesome tool to data visualization and presentation.

Iterative charts presentation in computer:

Iterative charts presentation in smartphone:

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
images		images
notebooks		notebooks
pbi		pbi
scraping_extra_api		scraping_extra_api
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extra_api_links.txt		extra_api_links.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

notebooks

notebooks

pbi

pbi

scraping_extra_api

scraping_extra_api

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

extra_api_links.txt

extra_api_links.txt

Repository files navigation

🎉 DS001 - Scraping to Analysis (Extra Store)

Objectives:

💻 Step 1. Code code... and code

🛣️ Step 2. Choose a way to scrape and save the data

🕷️ Step 3. Run the spiders

💾 Step 4. Wait...

🕶️ Step 5. Looking for a first undestanding about the data

📈 Step 6. Making a deeper descriptive analysis

🎨 Step 7. Exporting data and making a simple presentation

🚀 The end.

About

Releases

Packages

Languages

License

gabrielmotablima/DS001--scraping-to-analysis--Extra-Store

Folders and files

Latest commit

History

Repository files navigation

🎉 DS001 - Scraping to Analysis (Extra Store)

Objectives:

💻 Step 1. Code code... and code

🛣️ Step 2. Choose a way to scrape and save the data

🕷️ Step 3. Run the spiders

💾 Step 4. Wait...

🕶️ Step 5. Looking for a first undestanding about the data

📈 Step 6. Making a deeper descriptive analysis

🎨 Step 7. Exporting data and making a simple presentation

🚀 The end.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages