Script builded to crawl into the IMDB website and scrape some of the information about the Best Movies of all times, according to IMDB. This was made out as a project to train and build a dataset from scratch, to allow future analysis.
https://github.com/BeatrizFerLim/Data-Science-Projects/tree/main/IMDB-scraping
- Python - The source code is written in Python
- Jupyter Notebook - The platform of choice to implement this project
- Scrapy - A framowork used to crawl and parse informations (mainly) from websites
After saving the data into an json file, you can read it and save it into a dataframe using pandas.read_json() function, which will allow you analyze the data using any tools within the existent in Python. Below, a image that show the file after being read with the Pandas function mentioned earlier: