IMDb Scraper

A simple webscraper which can extract metadata from a movie or show's IMDb page

Getting Started (main.py)

[ ! ] Before you begin

Install required dependencies

pip install beautifulsoup4
pip install requests

Make sure all files are in the same directory
Do not try to scrape multiple films in a short amount of time. IMDb will throttle your connection or ban your IP. Add a random time delay using time.sleep() as shown in main.py

Usage

Run the included main.py file or create your own instance

# main.py
import IMDbScraper

# Create an instance 
scraper = IMDbScraper.IMDb_Scraper()

# Start scraping
scraper.scrape("Morbius")
scraper.scrape("https://www.imdb.com/title/tt5108870/?ref_=fn_al_tt_1")

# Output
# Title: Morbius
# Type: Movie
# Year: 2022
# Runtime: 1h 44m
# Date:  April 1, 2022
# Age Rating: PG-13
# Genre: Action, Adventure, Horror, Sci-Fi, Thriller
# Cast: Jared Leto, Matt Smith, Adria Arjona, Jared Harris
# Directed by: Daniel Espinosa
# Writers: Matt Sazama, Burk Sharpless
# Keywords: vampire, based on comic, marvel comics, superhero, blood

Attributes

Attribute	Data type
title	str
original_title	str
title_type	str
year	int
end_year	int
day	int
month	int
date	str
runtime	int
age_rating	str
imdb_rating	int
votes	int
plot	str
poster_url	str
trailer_url	str
url	str
genre	list
cast	list
directors	list
writers	list
keywords	list
countries*	list
languages*	list
locations*	list

*results may not be 100% accurate

Functions

1. scrape(str)

Takes the name of a movie/show or an IMDb title page URL (https://www.imdb.com/title/tt0111161). Returns a dictionary with all extracted metadata.

2. format_runtime(int)

Converts seconds to equivalent hours and minutes and formats them into a string, which is returned.

format_runtime(5570)

# Returns a string
# 1h 32m

3. print_metadata()

Prints the metadata in a readable format

4. to_string(list)

Returns a formatted string from list

my_list = ["spam", "eggs", "foo", "bar"]
to_string(my_list)

# Returns a string
# spam, eggs, foo, bar

5. generate_webpage()

Creates a simple webpage using the scraped data with the poster and trailer

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
assets		assets
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDb Scraper

Table of Contents:

Getting Started (main.py)

Usage

Attributes

Functions

1. scrape(str)

2. format_runtime(int)

3. print_metadata()

4. to_string(list)

5. generate_webpage()

About

Releases

Packages

Languages

0x747/IMDb-Scraper

Folders and files

Latest commit

History

Repository files navigation

IMDb Scraper

Table of Contents:

Getting Started (main.py)

Usage

Attributes

Functions

1. scrape(str)

2. format_runtime(int)

3. print_metadata()

4. to_string(list)

5. generate_webpage()

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages