Scrapper to scrape top 250 movies page of IMDB
A spider is built using scraping library Scrapy which crawls through the IMDB top 250 movies webpage (https://www.imdb.com/chart/top) and gets back the names, rating, directors, genre and the cast in JSON format.
Using this scraped data exploratory data analysis and visualization was done to find the number of movies in each genre or cast members common in these top 250 films.
Visualization done as part of EDA.
Python 3, Scrapy, Pandas, Matplotlib, Seaborn
Contribution => Implemented a spider that scrapes data from IMDB website by taking reference from Scrapy documentation. Implemented a script to do EDA on scraped data using pandas, matplotlib and seaborn.
Major Learnings => Learnt how to write spiders using scrapy. Learnt how to preprocess and do EDA on scraped data.