Scrapy-Jupyter_Notebook

Description

Script builded to crawl into the IMDB website and scrape some of the information about the Best Movies of all times, according to IMDB. This was made out as a project to train and build a dataset from scratch, to allow future analysis.

Access

https://github.com/BeatrizFerLim/Data-Science-Projects/tree/main/IMDB-scraping

Used technologies

Python - The source code is written in Python
Jupyter Notebook - The platform of choice to implement this project
Scrapy - A framowork used to crawl and parse informations (mainly) from websites

Preview of analysis

After saving the data into an json file, you can read it and save it into a dataframe using pandas.read_json() function, which will allow you analyze the data using any tools within the existent in Python. Below, a image that show the file after being read with the Pandas function mentioned earlier:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy-Jupyter_Notebook

Description

Access

Used technologies

Preview of analysis

About

Releases

Packages

Languages

BeatrizFerLim/IMDB-Scraping

Folders and files

Latest commit

History

Repository files navigation

Scrapy-Jupyter_Notebook

Description

Access

Used technologies

Preview of analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages