<a href="https://colab.research.google.com/github/Ankit-770/Justwatch-movie-details-web-scraping/blob/main/justwatch_movie_%26_series_Web_Scraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Web Scraping & Data Handling Challenge**



### **Website:**
JustWatch -  https://www.justwatch.com/in/movies?release_year_from=2000


### **Description:**

JustWatch is a popular platform that allows users to search for movies and TV shows across multiple streaming services like Netflix, Amazon Prime, Hulu, etc. For this assignment, you will be required to scrape movie and TV show data from JustWatch using Selenium, Python, and BeautifulSoup. Extract data from HTML, not by directly calling their APIs. Then, perform data filtering and analysis using Pandas, and finally, save the results to a CSV file.

### **Tasks:**

**1. Web Scraping:**

Use BeautifulSoup to scrape the following data from JustWatch:

   **a. Movie Information:**

      - Movie title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the movie page on JustWatch

   **b. TV Show Information:**

      - TV show title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the TV show page on JustWatch

  **c. Scope:**

```
 ` - Scrape data for at least 50 movies and 50 TV shows.
   - You can choose the entry point (e.g., starting with popular movies,
     or a specific genre, etc.) to ensure a diverse dataset.`

```


**2. Data Filtering & Analysis:**

   After scraping the data, use Pandas to perform the following tasks:

   **a. Filter movies and TV shows based on specific criteria:**

   ```
      - Only include movies and TV shows released in the last 2 years (from the current date).
      - Only include movies and TV shows with an IMDb rating of 7 or higher.
```

   **b. Data Analysis:**

   ```
      - Calculate the average IMDb rating for the scraped movies and TV shows.
      - Identify the top 5 genres that have the highest number of available movies and TV shows.
      - Determine the streaming service with the most significant number of offerings.
      
   ```   

**3. Data Export:**

```
   - Dump the filtered and analysed data into a CSV file for further processing and reporting.

   - Keep the CSV file in your Drive Folder and Share the Drive link on the colab while keeping view access with anyone.
```

**Submission:**
```
- Submit a link to your Colab made for the assignment.

- The Colab should contain your Python script (.py format only) with clear
  comments explaining the scraping, filtering, and analysis process.

- Your Code shouldn't have any errors and should be executable at a one go.

- Before Conclusion, Keep your Dataset Drive Link in the Notebook.
```



**Note:**

1. Properly handle errors and exceptions during web scraping to ensure a robust script.

2. Make sure your code is well-structured, easy to understand, and follows Python best practices.

3. The assignment will be evaluated based on the correctness of the scraped data, accuracy of data filtering and analysis, and the overall quality of the Python code.








# **Start The Project**

## **Task 1:- Web Scrapping**

In [None]:
#Installing all necessary labraries
!pip install bs4
!pip install requests

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2


In [None]:
#import all necessary labraries
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

## **Scrapping Movies Data**

In [None]:
# Specifying the URL from which movies related data will be fetched
url='https://www.justwatch.com/in/movies?release_year_from=2000'

# Sending an HTTP GET request to the URL
page=requests.get(url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
responce=BeautifulSoup(page.text,'html.parser')
# Printing the prettified HTML content
print(responce.prettify())

## **Fetching Movie URL's**

In [None]:
# Write Your Code here
movie_links = []
data_dict = dict()

for i in range(1):
  url='https://www.justwatch.com/in/movies?release_year_from=2000'
  response = requests.get(url)
  tags = BeautifulSoup(response.content, "html.parser")

  links = tags.find_all("a", class_="title-list-grid__item--link")
  for link in links:
    text = link['href']
    movie_links.append('https://www.justwatch.com' + text.strip())

data_dict['link'] = movie_links
# pd.DataFrame(data_dict)

In [None]:
pd.DataFrame(data_dict)


Unnamed: 0,link
0,https://www.justwatch.com/in/movie/animal-2022
1,https://www.justwatch.com/in/movie/12th-fail
2,https://www.justwatch.com/in/movie/salaar
3,https://www.justwatch.com/in/movie/sam-bahadur
4,https://www.justwatch.com/in/movie/aquaman-and...
...,...
95,https://www.justwatch.com/in/movie/pathaan
96,https://www.justwatch.com/in/movie/kannagi
97,https://www.justwatch.com/in/movie/blood-diamond
98,https://www.justwatch.com/in/movie/ala-vaikunt...


## **Scrapping Movie Title**

In [None]:
# Write Your Code here
movie_title = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    title = tags.find("h1").text.strip()
    movie_title.append(title)

data_dict['title'] = movie_title
# pd.DataFrame(data_dict)

In [None]:
pd.DataFrame(data_dict)


Unnamed: 0,link,title
0,https://www.justwatch.com/in/movie/animal-2022,Animal
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail
2,https://www.justwatch.com/in/movie/salaar,Salaar
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom
...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan
96,https://www.justwatch.com/in/movie/kannagi,Kannagi
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo


## **Scrapping release Year**

In [None]:
# Write Your Code here
release_year = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    years = tags.find('span', class_="text-muted").text.strip()
    release_year.append(years)

data_dict['year'] = release_year
# pd.DataFrame(data_dict)

In [None]:
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023)
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023)
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023)
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023)
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023)
...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023)
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023)
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006)
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020)


## **Scrapping Genres**

In [None]:
# Write Your Code here
movie_genres = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    genres = tags.find('h3', class_="detail-infos__subheading", string="Genres").find_next('div', class_="detail-infos__value").text.strip()
    movie_genres.append(genres)

# pd.DataFrame(movie_genres)
# data_dict['genres'] = movie_genres
# pd.DataFrame(data_dict)

In [None]:
data_dict['genres'] = movie_genres
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th..."
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,..."
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History"
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction"
...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure"
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller"
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure"


## **Scrapping IMBD Rating**

In [None]:
# Write Your Code here
imdb_rating = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    rating_tag = tags.find('div', class_="jw-scoring-listing__rating")
    rating = rating_tag.find_next('span').text if rating_tag else 'None'
    imdb_rating.append(rating)

#pd.DataFrame(imdb_rating)
# data_dict['genres'] = imdb_rating
# pd.DataFrame(data_dict)

In [None]:
data_dict['imdb_rating'] = imdb_rating
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres,imdb_rating,movie_duration,age_rating,production_country,streaming_service
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th...",6.4 (76k),3h 21min,A,India,"Currently you are able to watch ""Animal"" strea..."
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama,9.2 (93k),2h 26min,,India,"Currently you are able to watch ""12th Fail"" st..."
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,...",6.6 (56k),2h 55min,A,India,"Currently you are able to watch ""Salaar"" strea..."
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History",7.9 (11k),2h 30min,UA,India,"Currently you are able to watch ""Sam Bahadur"" ..."
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction",5.7 (49k),2h 4min,,United States,"You can buy ""Aquaman and the Lost Kingdom"" on ..."
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure",5.9 (155k),2h 26min,,"India, United States","Currently you are able to watch ""Pathaan"" stre..."
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance,7.8 (5k),2h 38min,UA,India,"Currently you are able to watch ""Kannagi"" stre..."
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller",8.0 (580k),2h 23min,A,"United States, Germany","Currently you are able to watch ""Blood Diamond..."
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure",7.3 (23k),2h 45min,UA,India,"Currently you are able to watch ""Ala Vaikuntha..."


## **Scrapping Runtime/Duration**

In [None]:
# Write Your Code here
movie_duration = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    duration = tags.find('h3', class_="detail-infos__subheading", string="Runtime").find_next('div', class_="detail-infos__value").text.strip()
    movie_duration.append(duration)

pd.DataFrame(movie_duration)
# data_dict['movie_duration'] = movie_duration
# pd.DataFrame(data_dict)

In [None]:
data_dict['movie_duration'] = movie_duration
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres,imdb_rating,movie_duration
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th...",6.4 (76k),3h 21min
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama,9.2 (93k),2h 26min
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,...",6.6 (56k),2h 55min
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History",7.9 (11k),2h 30min
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction",5.7 (49k),2h 4min
...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure",5.9 (155k),2h 26min
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance,7.8 (5k),2h 38min
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller",8.0 (580k),2h 23min
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure",7.3 (23k),2h 45min


## **Scrapping Age Rating**

In [None]:
# Write Your Code here
age_rating = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    age_tag = tags.find('h3', class_="detail-infos__subheading", string="Age rating")
    adulat_rating = age_tag.find_next('div', class_="detail-infos__value").text.strip() if age_tag else 'None'
    age_rating.append(adulat_rating)

# pd.DataFrame(age_rating)
# data_dict['age_rating'] = age_rating
# pd.DataFrame(data_dict)

In [None]:
data_dict['age_rating'] = age_rating
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres,imdb_rating,movie_duration,age_rating
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th...",6.4 (76k),3h 21min,A
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama,9.2 (93k),2h 26min,
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,...",6.6 (56k),2h 55min,A
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History",7.9 (11k),2h 30min,UA
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction",5.7 (49k),2h 4min,
...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure",5.9 (155k),2h 26min,
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance,7.8 (5k),2h 38min,UA
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller",8.0 (580k),2h 23min,A
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure",7.3 (23k),2h 45min,UA


## **Fetching Production Countries Details**

In [None]:
# Write Your Code here
production_country = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    country = tags.find('h3', class_="detail-infos__subheading", string=" Production country ").find_next('div', class_="detail-infos__value").text.strip()
    production_country.append(country)

# pd.DataFrame(production_country)
# data_dict['production_country'] = production_country
# pd.DataFrame(data_dict)

In [None]:
data_dict['production_country'] = production_country
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres,imdb_rating,movie_duration,age_rating,production_country
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th...",6.4 (76k),3h 21min,A,India
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama,9.2 (93k),2h 26min,,India
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,...",6.6 (56k),2h 55min,A,India
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History",7.9 (11k),2h 30min,UA,India
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction",5.7 (49k),2h 4min,,United States
...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure",5.9 (155k),2h 26min,,"India, United States"
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance,7.8 (5k),2h 38min,UA,India
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller",8.0 (580k),2h 23min,A,"United States, Germany"
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure",7.3 (23k),2h 45min,UA,India


## **Fetching Streaming Service Details**

In [None]:
# Write Your Code here
streaming_service = []
for link in movie_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    platform = tags.find('h2', class_="heading").find_next('p').text.strip()
    streaming_service.append(platform)

# pd.DataFrame(streaming_service)
# data_dict['streaming_service'] = streaming_service
# pd.DataFrame(data_dict)

In [None]:
data_dict['streaming_service'] = streaming_service
pd.DataFrame(data_dict)

Unnamed: 0,link,title,year,genres,imdb_rating,movie_duration,age_rating,production_country,streaming_service
0,https://www.justwatch.com/in/movie/animal-2022,Animal,(2023),"Action & Adventure, Drama, Crime, Mystery & Th...",6.4 (76k),3h 21min,A,India,"Currently you are able to watch ""Animal"" strea..."
1,https://www.justwatch.com/in/movie/12th-fail,12th Fail,(2023),Drama,9.2 (93k),2h 26min,,India,"Currently you are able to watch ""12th Fail"" st..."
2,https://www.justwatch.com/in/movie/salaar,Salaar,(2023),"Mystery & Thriller, Action & Adventure, Crime,...",6.6 (56k),2h 55min,A,India,"Currently you are able to watch ""Salaar"" strea..."
3,https://www.justwatch.com/in/movie/sam-bahadur,Sam Bahadur,(2023),"War & Military, Drama, History",7.9 (11k),2h 30min,UA,India,"Currently you are able to watch ""Sam Bahadur"" ..."
4,https://www.justwatch.com/in/movie/aquaman-and...,Aquaman and the Lost Kingdom,(2023),"Fantasy, Action & Adventure, Science-Fiction",5.7 (49k),2h 4min,,United States,"You can buy ""Aquaman and the Lost Kingdom"" on ..."
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/pathaan,Pathaan,(2023),"Mystery & Thriller, Drama, Action & Adventure",5.9 (155k),2h 26min,,"India, United States","Currently you are able to watch ""Pathaan"" stre..."
96,https://www.justwatch.com/in/movie/kannagi,Kannagi,(2023),Romance,7.8 (5k),2h 38min,UA,India,"Currently you are able to watch ""Kannagi"" stre..."
97,https://www.justwatch.com/in/movie/blood-diamond,Blood Diamond,(2006),"Action & Adventure, Drama, Mystery & Thriller",8.0 (580k),2h 23min,A,"United States, Germany","Currently you are able to watch ""Blood Diamond..."
98,https://www.justwatch.com/in/movie/ala-vaikunt...,Ala Vaikunthapurramuloo,(2020),"Comedy, Drama, Action & Adventure",7.3 (23k),2h 45min,UA,India,"Currently you are able to watch ""Ala Vaikuntha..."


## **Now Creating Movies DataFrame**

In [None]:
# Write Your Code here
pd.DataFrame(data_dict, columns=['title', 'year', 'movie_duration', 'imdb_rating', 'age_rating', 'production_country', 'genres', 'link', 'streaming_service'])

Unnamed: 0,title,year,movie_duration,imdb_rating,age_rating,production_country,genres,link,streaming_service
0,Animal,(2023),3h 21min,6.4 (76k),A,India,"Action & Adventure, Drama, Crime, Mystery & Th...",https://www.justwatch.com/in/movie/animal-2022,"Currently you are able to watch ""Animal"" strea..."
1,12th Fail,(2023),2h 26min,9.2 (93k),,India,Drama,https://www.justwatch.com/in/movie/12th-fail,"Currently you are able to watch ""12th Fail"" st..."
2,Salaar,(2023),2h 55min,6.6 (56k),A,India,"Mystery & Thriller, Action & Adventure, Crime,...",https://www.justwatch.com/in/movie/salaar,"Currently you are able to watch ""Salaar"" strea..."
3,Sam Bahadur,(2023),2h 30min,7.9 (11k),UA,India,"War & Military, Drama, History",https://www.justwatch.com/in/movie/sam-bahadur,"Currently you are able to watch ""Sam Bahadur"" ..."
4,Aquaman and the Lost Kingdom,(2023),2h 4min,5.7 (49k),,United States,"Fantasy, Action & Adventure, Science-Fiction",https://www.justwatch.com/in/movie/aquaman-and...,"You can buy ""Aquaman and the Lost Kingdom"" on ..."
...,...,...,...,...,...,...,...,...,...
95,Pathaan,(2023),2h 26min,5.9 (155k),,"India, United States","Mystery & Thriller, Drama, Action & Adventure",https://www.justwatch.com/in/movie/pathaan,"Currently you are able to watch ""Pathaan"" stre..."
96,Kannagi,(2023),2h 38min,7.8 (5k),UA,India,Romance,https://www.justwatch.com/in/movie/kannagi,"Currently you are able to watch ""Kannagi"" stre..."
97,Blood Diamond,(2006),2h 23min,8.0 (580k),A,"United States, Germany","Action & Adventure, Drama, Mystery & Thriller",https://www.justwatch.com/in/movie/blood-diamond,"Currently you are able to watch ""Blood Diamond..."
98,Ala Vaikunthapurramuloo,(2020),2h 45min,7.3 (23k),UA,India,"Comedy, Drama, Action & Adventure",https://www.justwatch.com/in/movie/ala-vaikunt...,"Currently you are able to watch ""Ala Vaikuntha..."


## **Scraping TV  Show Data**

In [None]:
# Specifying the URL from which tv show related data will be fetched
tv_url='https://www.justwatch.com/in/tv-shows?release_year_from=2000'
# Sending an HTTP GET request to the URL
page=requests.get(tv_url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
responce=BeautifulSoup(page.text,'html.parser')
# Printing the prettified HTML content
print(responce.prettify())

<!DOCTYPE html>
<html data-vue-meta="%7B%22dir%22:%7B%22ssr%22:%22ltr%22%7D,%22lang%22:%7B%22ssr%22:%22en%22%7D%7D" data-vue-meta-server-rendered="" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8" data-vue-meta="ssr"/>
  <meta content="IE=edge" data-vue-meta="ssr" httpequiv="X-UA-Compatible"/>
  <meta content="viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" data-vue-meta="ssr" name="viewport"/>
  <meta content="JustWatch" data-vue-meta="ssr" property="og:site_name"/>
  <meta content="794243977319785" data-vue-meta="ssr" property="fb:app_id"/>
  <meta content="/appassets/img/JustWatch_logo_with_claim.png" data-vmid="og:image" data-vue-meta="ssr" property="og:image"/>
  <meta content="606" data-vmid="og:image:width" data-vue-meta="ssr" property="og:image:width"/>
  <meta content="302" data-vmid="og:image:height" data-vue-meta="ssr" pro

## **Fetching Tv shows Url details**

In [None]:
# Write Your Code here
tv_show_links = []
tv_dict = dict()

for i in range(1):
  tv_url='https://www.justwatch.com/in/tv-shows?release_year_from=2000'
  response = requests.get(tv_url)
  tags = BeautifulSoup(response.content, "html.parser")

  links = tags.find_all("a", class_="title-list-grid__item--link")
  for link in links:
    text = link['href']
    tv_show_links.append('https://www.justwatch.com' + text.strip())

tv_dict['link'] = tv_show_links
# pd.DataFrame(tv_dict)

In [None]:
pd.DataFrame(tv_dict)

Unnamed: 0,link
0,https://www.justwatch.com/in/tv-show/jack-reacher
1,https://www.justwatch.com/in/tv-show/true-dete...
2,https://www.justwatch.com/in/tv-show/indian-po...
3,https://www.justwatch.com/in/tv-show/mirzapur
4,https://www.justwatch.com/in/tv-show/game-of-t...
...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...
96,https://www.justwatch.com/in/tv-show/the-flash
97,https://www.justwatch.com/in/tv-show/alexander...
98,https://www.justwatch.com/in/tv-show/the-king-...


## **Fetching Tv Show Title details**

In [None]:
# Write Your Code here
tvshow_title = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    title = tags.find("h1").text.strip()
    tvshow_title.append(title)

# pd.DataFrame(tvshow_title)
# tv_dict['title'] = tvshow_title
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['title'] = tvshow_title
pd.DataFrame(tv_dict)

Unnamed: 0,link,title
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones
...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch


## **Fetching Release Year**

In [None]:
# Write Your Code here
release_year = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    year = tags.find('span', class_="text-muted").text.strip()
    release_year.append(year)

# pd.DataFrame(release_year)
# tv_dict['release_year'] = release_year
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['release_year'] = release_year
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022)
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014)
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024)
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018)
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011)
...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007)
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014)
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024)
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020)


## **Fetching TV Show Genre Details**

In [None]:
# Write Your Code here
tvshow_genre = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    genre = tags.find('h3', class_="detail-infos__subheading", string="Genres").find_next('div', class_="detail-infos__value").text.strip()
    tvshow_genre.append(genre)

# pd.DataFrame(tvshow_genre)
# tv_dict['genre'] = tvshow_genre
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['genre'] = tvshow_genre
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th..."
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime"
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime"
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th..."
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa..."
...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance"
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction"
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History"
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction"


## **Fetching IMDB Rating Details**

In [None]:
# Write Your Code here
imdb_rating = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    rating_tag = tags.find('div', class_="jw-scoring-listing__rating")
    rating = rating_tag.find_next('span').text.strip() if rating_tag else 'None'
    imdb_rating.append(rating)

# pd.DataFrame(imdb_rating)
# tv_dict['imdb_rating'] = imdb_rating
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['imdb_rating'] = imdb_rating
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre,imdb_rating
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th...",8.1 (195k)
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime",8.9 (632k)
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime",5.8 (50k)
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th...",8.5 (80k)
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa...",9.2 (2m)
...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance",8.2 (857k)
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction",7.5 (366k)
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History",5.1 (2k)
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction",Baek Sang-Hoon


## **Fetching Age Rating Details**

In [None]:
# Write Your Code here
age_rating = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    age_tag = tags.find('h3', class_="detail-infos__subheading", string="Age rating")
    adulat_rating = age_tag.find_next('div', class_="detail-infos__value").text.strip() if age_tag else 'None'
    age_rating.append(adulat_rating)

# pd.DataFrame(age_rating)
# tv_dict['age_rating'] = age_rating
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['age_rating'] = age_rating
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre,imdb_rating,age_rating
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th...",8.1 (195k),A
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime",8.9 (632k),U
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime",5.8 (50k),A
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th...",8.5 (80k),
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa...",9.2 (2m),U
...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance",8.2 (857k),U
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction",7.5 (366k),
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History",5.1 (2k),
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction",Baek Sang-Hoon,


## **Fetching Production Country details**

In [None]:
# Write Your Code here
production_country = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    country = tags.find('h3', class_="detail-infos__subheading", string=" Production country ").find_next('div', class_="detail-infos__value").text.strip()
    production_country.append(country)

# pd.DataFrame(production_country)
# tv_dict['country'] = production_country
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['country'] = production_country
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre,imdb_rating,age_rating,country
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th...",8.1 (195k),A,United States
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime",8.9 (632k),U,United States
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime",5.8 (50k),A,India
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th...",8.5 (80k),,India
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa...",9.2 (2m),U,United States
...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance",8.2 (857k),U,United States
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction",7.5 (366k),,United States
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History",5.1 (2k),,United Kingdom
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction",Baek Sang-Hoon,,South Korea


## **Fetching Streaming Service details**

In [None]:
# Write Your Code here
streaming_service = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    platform = tags.find('h2', class_="heading").find_next('p').text.strip()
    streaming_service.append(platform)

# pd.DataFrame(streaming_service)
# tv_dict['streaming_service'] = streaming_service
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['streaming_service'] = streaming_service
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre,imdb_rating,age_rating,country,streaming_service
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th...",8.1 (195k),A,United States,"Currently you are able to watch ""Reacher"" stre..."
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime",8.9 (632k),U,United States,"Currently you are able to watch ""True Detectiv..."
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime",5.8 (50k),A,India,"Currently you are able to watch ""Indian Police..."
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th...",8.5 (80k),,India,"Currently you are able to watch ""Mirzapur"" str..."
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa...",9.2 (2m),U,United States,"Currently you are able to watch ""Game of Thron..."
...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance",8.2 (857k),U,United States,"Currently you are able to watch ""The Big Bang ..."
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction",7.5 (366k),,United States,We try to add new providers constantly but we ...
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History",5.1 (2k),,United Kingdom,"Currently you are able to watch ""Alexander: Th..."
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction",Baek Sang-Hoon,,South Korea,"Currently you are able to watch ""The King: Ete..."


## **Fetching Duration Details**

In [None]:
# Write Your Code here
movie_duration = []
for link in tv_show_links:
    response = requests.get(link)
    tags = BeautifulSoup(response.content, "html.parser")

    # Extract the required data from the page and append it to list
    duration = tags.find('h3', class_="detail-infos__subheading", string="Runtime").find_next('div', class_="detail-infos__value").text.strip()
    movie_duration.append(duration)

# pd.DataFrame(movie_duration)
# tv_dict['duration'] = movie_duration
# pd.DataFrame(tv_dict)

In [None]:
tv_dict['duration'] = movie_duration
pd.DataFrame(tv_dict)

Unnamed: 0,link,title,release_year,genre,imdb_rating,age_rating,country,streaming_service,duration
0,https://www.justwatch.com/in/tv-show/jack-reacher,Reacher,(2022),"Action & Adventure, Crime, Drama, Mystery & Th...",8.1 (195k),A,United States,"Currently you are able to watch ""Reacher"" stre...",48min
1,https://www.justwatch.com/in/tv-show/true-dete...,True Detective,(2014),"Drama, Mystery & Thriller, Crime",8.9 (632k),U,United States,"Currently you are able to watch ""True Detectiv...",59min
2,https://www.justwatch.com/in/tv-show/indian-po...,Indian Police Force,(2024),"Action & Adventure, Crime",5.8 (50k),A,India,"Currently you are able to watch ""Indian Police...",38min
3,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur,(2018),"Crime, Action & Adventure, Drama, Mystery & Th...",8.5 (80k),,India,"Currently you are able to watch ""Mirzapur"" str...",50min
4,https://www.justwatch.com/in/tv-show/game-of-t...,Game of Thrones,(2011),"Drama, Action & Adventure, Science-Fiction, Fa...",9.2 (2m),U,United States,"Currently you are able to watch ""Game of Thron...",58min
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-big-b...,The Big Bang Theory,(2007),"Comedy, Romance",8.2 (857k),U,United States,"Currently you are able to watch ""The Big Bang ...",20min
96,https://www.justwatch.com/in/tv-show/the-flash,The Flash,(2014),"Action & Adventure, Drama, Science-Fiction",7.5 (366k),,United States,We try to add new providers constantly but we ...,42min
97,https://www.justwatch.com/in/tv-show/alexander...,Alexander: The Making of a God,(2024),"Documentary, Drama, History",5.1 (2k),,United Kingdom,"Currently you are able to watch ""Alexander: Th...",39min
98,https://www.justwatch.com/in/tv-show/the-king-...,The King: Eternal Monarch,(2020),"Drama, Science-Fiction",Baek Sang-Hoon,,South Korea,"Currently you are able to watch ""The King: Ete...",1h 12min


## **Creating TV Show DataFrame**

In [None]:
# Write Your Code here
pd.DataFrame(tv_dict, columns=['title', 'release_year', 'duration', 'imdb_rating', 'age_rating', 'country', 'genre', 'link', 'streaming_service'])

Unnamed: 0,title,release_year,duration,imdb_rating,age_rating,country,genre,link,streaming_service
0,Reacher,(2022),48min,8.1 (195k),A,United States,"Action & Adventure, Crime, Drama, Mystery & Th...",https://www.justwatch.com/in/tv-show/jack-reacher,"Currently you are able to watch ""Reacher"" stre..."
1,True Detective,(2014),59min,8.9 (632k),U,United States,"Drama, Mystery & Thriller, Crime",https://www.justwatch.com/in/tv-show/true-dete...,"Currently you are able to watch ""True Detectiv..."
2,Indian Police Force,(2024),38min,5.8 (50k),A,India,"Action & Adventure, Crime",https://www.justwatch.com/in/tv-show/indian-po...,"Currently you are able to watch ""Indian Police..."
3,Mirzapur,(2018),50min,8.5 (80k),,India,"Crime, Action & Adventure, Drama, Mystery & Th...",https://www.justwatch.com/in/tv-show/mirzapur,"Currently you are able to watch ""Mirzapur"" str..."
4,Game of Thrones,(2011),58min,9.2 (2m),U,United States,"Drama, Action & Adventure, Science-Fiction, Fa...",https://www.justwatch.com/in/tv-show/game-of-t...,"Currently you are able to watch ""Game of Thron..."
...,...,...,...,...,...,...,...,...,...
95,The Big Bang Theory,(2007),20min,8.2 (857k),U,United States,"Comedy, Romance",https://www.justwatch.com/in/tv-show/the-big-b...,"Currently you are able to watch ""The Big Bang ..."
96,The Flash,(2014),42min,7.5 (366k),,United States,"Action & Adventure, Drama, Science-Fiction",https://www.justwatch.com/in/tv-show/the-flash,We try to add new providers constantly but we ...
97,Alexander: The Making of a God,(2024),39min,5.1 (2k),,United Kingdom,"Documentary, Drama, History",https://www.justwatch.com/in/tv-show/alexander...,"Currently you are able to watch ""Alexander: Th..."
98,The King: Eternal Monarch,(2020),1h 12min,Baek Sang-Hoon,,South Korea,"Drama, Science-Fiction",https://www.justwatch.com/in/tv-show/the-king-...,"Currently you are able to watch ""The King: Ete..."


## **Task 2 :- Data Filtering & Analysis**

In [None]:
# Write Your Code here


## **Calculating Mean IMDB Ratings for both Movies and Tv Shows**

In [None]:
# Write Your Code here


## **Analyzing Top Genres**

In [None]:
# Write Your Code here


In [None]:
#Let's Visvalize it using word cloud


## **Finding Predominant Streaming Service**

In [None]:
# Write Your Code here


In [None]:
#Let's Visvalize it using word cloud


## **Task 3 :- Data Export**

In [None]:
#saving final dataframe as Final Data in csv format


In [None]:
#saving filter data as Filter Data in csv format


# **Dataset Drive Link (View Access with Anyone) -**

# ***Congratulations!!! You have completed your Assignment.***