Skip to content

This repository is aimed at storing the projects developed from my data science study.

Notifications You must be signed in to change notification settings

BeatrizFerLim/IMDB-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Scrapy-Jupyter_Notebook

Description

Script builded to crawl into the IMDB website and scrape some of the information about the Best Movies of all times, according to IMDB. This was made out as a project to train and build a dataset from scratch, to allow future analysis.

Access

https://github.com/BeatrizFerLim/Data-Science-Projects/tree/main/IMDB-scraping

Used technologies

  • Python - The source code is written in Python
  • Jupyter Notebook - The platform of choice to implement this project
  • Scrapy - A framowork used to crawl and parse informations (mainly) from websites

Preview of analysis

After saving the data into an json file, you can read it and save it into a dataframe using pandas.read_json() function, which will allow you analyze the data using any tools within the existent in Python. Below, a image that show the file after being read with the Pandas function mentioned earlier: file_read_w_pandas

About

This repository is aimed at storing the projects developed from my data science study.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published