<a href="https://colab.research.google.com/github/Jai0926/my-new-webdesign/blob/main/scraping_yts_latest_movies_list.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Scraping Genre Wise YTS Movie Details using Python

- Extracting the latest movie details from the website yts.live largest torrent plat from.
- The tools used are (Python, requests, BeautifulSoup, Pandas).

![download.jpg](https://i.imgur.com/0DuGlCV.jpg)




# What is web-Scraping?

Web Scripting is an automatic method to get large amounts of data from websites to create a data set for analysis.
Web scraping can obtain data from various platforms like social media , ott , websites etc.
Most of this data is unstructured data in an HTML format which is then converted into structured data in a database which can be used in various applications.

![001-efficient-web-scraping.png](https://i.imgur.com/wEHurae.png)

# What is yts ?

The official YTS YIFY Movies Torrents website. Download free yify movies torrents in 720p, 1080p and 3D quality. The fastest downloads at the smallest size.


![hqdefault.jpg](https://i.imgur.com/R1QebWg.jpg)

# A small overview on YTS:

YIFY Torrents or YTS was and is a peer-to-peer release group known for distributing large numbers of movies as free downloads through BitTorrent. YIFY releases were characterized through their small file size, which attracted many downloaders. They also released the latest Hollywood movies with various resolutions which can be downloaded at a short time which consumes less memory.

# Objective of the project

- We are going to scrape 'https://ytsyify.live/genre/action/'
- We will get the genre of films.
- From each genre we will get the latest movies in the list and write it to a csv file.

### For each movie we will get the
- movie name with year
- image link
- IMDB rating
- Download link


# List of genre
1. Action
2. Adventure
3. Animation
4. Comedy
5. Crime
6. Drama
7. Family
8. Fantasy
9. Horror
10. Romance
11. Sci-Fi
12. Sport
13. Thriller

# Outline of the project:

1. Analyzing the structure of [Yts website]("https://ytsyify.live/")
2. Installing and Importing required libraries.
3. Simulating the page and Extracting the movie data using  `BeautifulSoup` & 'requests'
4. Obtaining each genre movie data.
5. Parsing the latest Movies data into: movie name, image link, IMDB rating, Download link.
6. Storing the extracted movie data into a dictionary.
7. Saving the data  into `CSV` file and compiling all the data into a DataFrame using `Pandas`.

In [None]:
!pip install jovian --upgrade --quiet

In [None]:
import jovian

In [None]:
# Execute this to save new versions of the notebook
jovian.commit(project="scraping-yts-movies-list")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "vivekkalaiarasan/scraping-yts-movies-list" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/vivekkalaiarasan/scraping-yts-movies-list[0m


'https://jovian.ai/vivekkalaiarasan/scraping-yts-movies-list'

# Download the Web-page:
- We will be using requests library to download the page, stimulate and extract the information.

# Download the webpage Using requests

In [None]:
!pip install jovian --upgrade --quiet

In [None]:
!pip install requests --quiet

In [None]:
import requests

Requests library is installed and imported.

Now we can use various methods to take the page information and return data from the web page.

In [None]:
url_page = 'https://ytsyify.live/genre/action/'

In [None]:
response = requests.get(url_page)

Downloading a web page using the `requests.get` function.

In [None]:
type(response)

requests.models.Response

In [None]:
response.status_code

200

If the request was successful, response.status_code value will be always between 200 and 299.

In [None]:
page_content = response.text
len(page_content)

153799

The contents of the web page can be accessed using the `.text`.

In [None]:
page_content[:1000]

'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">\n <head>\n <!-- Global site tag (gtag.js) - Google Analytics -->\n<script async src="https://www.googletagmanager.com/gtag/js?id=UA-205875704-1"></script>\n<script>\n  window.dataLayer = window.dataLayer || [];\n  function gtag(){dataLayer.push(arguments);}\n  gtag(\'js\', new Date());\n\n  gtag(\'config\', \'UA-205875704-1\');\n</script>\n<meta charset="UTF-8">\n<meta name="robots" content="index,follow">\n<meta http-equiv="content-language" content="en">\n<meta property="og:image:width" content="800"/>\n<meta property="og:image:height" content="420"/>\n<meta property="og:image:type" content="image/png"/>\n<meta property="og:image" content="https://ytsyify.live/wp-content/themes/movies/images/fb-capture.png"/>\n<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1">\n<meta name="google-site-verification" content="EafBggd2HZRFrnLGA_h2j_-bcxjrGwhNy8tELx

The above printed details are the source code of the we page 'yts.live' written in html.
this defines the contents displayed on the web-page.

In [None]:
with open('action_movies_and_ratings.html', 'w', encoding = 'utf-8') as file:
    file.write(page_content)

The above functions writes the contents.

# Simplifying into a function

In [None]:
def fetch_page(url_page):
   #The fetch_page function will fetch the page contents of the page and return the information as output
    response = requests.get(url_page) #this function fecthes the url page
    # if response status code is 200  to 299 the it is sucessful page fetch
    if response.status_code !=200:
        raise Exception('Fetch FAILED')


    return response.text

##### The above function  fetches the web page using requests lib

# Parsing html src using beautifulsooup

In [None]:
!pip install beautifulsoup4 --upgrade --quiet

In [None]:
from bs4 import BeautifulSoup

using this library we can open and read the html file and read the contents of the file.

In [None]:
with open('action_movies_and_ratings.html', 'r') as file:
    html_src = file.read()

We will be using the beautifulsoup  and pass the fetched page contents to it and return as beautifulsoup document. The document type is bs4.BeautifulSoup.

In [None]:
doc = BeautifulSoup(page_content)

In [None]:
type(doc)

bs4.BeautifulSoup

The doc is ready to be parsed and various methods can be used to extract the required contents in the document.

In [None]:
doc.title.text

'Action Movies List by YTS YIFY'

# Simplifying to a function

In [None]:
def get_docs(page_content):
    # this function will take htmla or xml page and return as beautifulsoup doc.
    doc = Beautifulsoup(page_content, 'html.parser')
    return doc

# Extract the movie details Name, imdb rating,the image source, and the download_link

- Here we will extract the movies div.Once we get all the movie info we will start the extraction.

# Finding_the_movie_info

The extraction will be done using doc.find_all() and the class_=". this will give us the list of the movies in the doc.

In [None]:
movie_doc = doc.find_all('div',class_= 'ml-item' )

In [None]:
len(movie_doc)

40

In [None]:
movie_docs = movie_doc[25]
movie_docs

<div class="ml-item" data-movie-id="67308">
<a class="ml-mask jt" data-hasqtip="112" data-url="" href="https://ytsyify.live/movie/the-big-bang-2011/" oldtitle="The Big Bang (2011)" title="">
<img alt="The Big Bang (2011)" class="mli-thumb" src="https://image.tmdb.org/t/p/w185/tnr5dg4YGGCRgLr84noLmbQfiQS.jpg"/>
<span class="mli-info"><h2>The Big Bang (2011)</h2></span>
</a>
<div id="hidden_tip">
<div class="qtip-title" id="">The Big Bang (2011)</div>
<div class="jtip-top">
<div class="jt-info jt-imdb"> IMDb: 5.4</div>
<div class="jt-info"><a href="https://ytsyify.live/release-year/2011/" rel="tag">2011</a></div>
<div class="jt-info">101</div>
<div class="clearfix"></div>
</div>
<p class="f-desc"></p><p>A private detective is hired to find a missing stripper but the job turns complicated when everyone he questions ends up dead. From the mean streets of Los Angeles to…</p>
<div class="block">Country: <a href="https://ytsyify.live/country/united-states/" rel="tag">United States</a></div>
<

###### Above we got the required information for the extraction.

# Extraction_of_name

In [None]:
movie_name = movie_docs.find('div', class_ ='qtip-title').text
movie_name

'The Big Bang (2011)'

# Extraction_of_imdb_rating

In [None]:
imdb_rating = movie_docs.find('div',class_= 'jt-info jt-imdb').text
imdb_rating

' IMDb: 5.4'

# Download_link_extraction

In [None]:
download_link =movie_docs.find('a', class_= 'btn btn-block btn-successful')['href']
download_link

'https://ytsyify.live/movie/the-big-bang-2011/'

# Header_extraction

In [None]:
matching_tags = doc.find_all('div',{'id' : 'menu' })

In [None]:
matching_tags

[<div id="menu">
 <ul class="top-menu" id="menu-main-menu"><li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-14" id="menu-item-14"><a href="https://ytsyify.live/">Home</a></li>
 <li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-136" id="menu-item-136"><a href="https://ytsyify.live/movies/">YIFY Movies</a></li>
 <li class="menu-item menu-item-type-custom menu-item-object-custom current-menu-ancestor current-menu-parent menu-item-has-children menu-item-15" id="menu-item-15"><a>Genre</a>
 <div class="sub-container" style="display: none;"><ul class="sub-menu">
 <li class="menu-item menu-item-type-taxonomy menu-item-object-category current-menu-item menu-item-122" id="menu-item-122"><a href="https://ytsyify.live/genre/action/">Action</a></li>
 <li class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-123" id="menu-item-123"><a href="https://ytsyify.live/genre/adventure/">Adventure</a></li>
 <li clas

In [None]:
header_link_tag = doc.find_all('a')[3]

In [None]:
header_link_tag.text

'Genre'

In [None]:
'https://ytsyify.live/'+ header_link_tag.text

'https://ytsyify.live/Genre'

# To Get a genre page

Here we will be grabbing the genre topic page of the movies for further extraction.

In [None]:
def get_genre_page(topic):
    genre_page_url ='https://ytsyify.live/genre/'+ topic
    response = requests.get(genre_page_url)
    if response.status_code !=200:
        print('status code:', response.status_code )
        raise Exception('Failed_fetch_page' + genre_page_url)

    doc = BeautifulSoup(response.text)

    return doc

![download.jpg](https://i.imgur.com/oc27tNZ.png)

In [None]:
doc = get_genre_page('action')

In [None]:
doc.title.text

'Action Movies List by YTS YIFY'

In [None]:
doc2 = get_genre_page('romance')
doc2.title.text

'Romance Movies List by YTS YIFY'

# Simplyfying into function

In [None]:
def movie_data(movie_docs):
    # This function will extract all the information required and return as dictionary.
    movie_name = movie_docs.find('div', class_ ='qtip-title').text
    imgs_tag = movie_docs.find('img')['src']
    imdb_rating = movie_docs.find('div',class_= 'jt-info jt-imdb').text.strip()
    download_link =movie_docs.find('a', class_= 'btn btn-block btn-successful')['href']
    return {
        'Movie_Name' : movie_name,
        'Image_Link' : imgs_tag,
        'IMDB_Ratings' : imdb_rating,
        'Download' :  download_link
    }

The above function will return the data in the form of dictionary.

In [None]:
movie_data(movie_doc[35])

{'Movie_Name': 'Hansan: Rising Dragon (2022)',
 'Image_Link': 'https://image.tmdb.org/t/p/w185/erKuvxvfdkUU1nu9cSNGEfGfy4A.jpg',
 'IMDB_Ratings': 'IMDb: 6.8',
 'Download': 'https://ytsyify.live/movie/hansan-rising-dragon-2022/'}

In [None]:
all_movies = [movie_data(tag) for tag in movie_doc]

In [None]:
len(all_movies)

40

In [None]:
def get_genre_movie(doc):
    # This function will get the data of the latest movies in the page.
    movie_doc = doc.find_all('div',class_= 'ml-item' )
    all_movies = [movie_data(tag) for tag in movie_doc]
    return all_movies

In [None]:
genre_romance = get_genre_page('romance')
data_romance = get_genre_movie(genre_romance)
data_romance[:20]

[{'Movie_Name': 'Love Nonetheless (2022)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/3iSO2QxyY4r30QtXY8z6xITBypl.jpg',
  'IMDB_Ratings': 'IMDb: N/A',
  'Download': 'https://ytsyify.live/movie/love-nonetheless-2022/'},
 {'Movie_Name': 'The Merry Widow (2007)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/oauzSoL2gHokbvRm49LLDOKlTOx.jpg',
  'IMDB_Ratings': 'IMDb: 5.3',
  'Download': 'https://ytsyify.live/movie/the-merry-widow-2007/'},
 {'Movie_Name': 'Fatherland (1994)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/8HbeLtS7FRaQ6Ys8Qf62HW1VBSn.jpg',
  'IMDB_Ratings': 'IMDb: 6.4',
  'Download': 'https://ytsyify.live/movie/fatherland-1994/'},
 {'Movie_Name': 'Rosita (2015)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/mExefbPnCH2bfee6Bqb43zA0BTC.jpg',
  'IMDB_Ratings': 'IMDb: 6.5',
  'Download': 'https://ytsyify.live/movie/rosita-2015/'},
 {'Movie_Name': 'I’ll Come Running (2008)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/fx60AOZM2b0LOYkLIC163c9lIZs.jpg',
  'IMDB

In [None]:
genre_romance = get_genre_page('horror')
data_romance = get_genre_movie(genre_romance)
data_romance[:20]

[{'Movie_Name': 'Doom: Annihilation (2019)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/7EGElXVSNnqcPjuhXPd6UVUX1rD.jpg',
  'IMDB_Ratings': 'IMDb: 3.7',
  'Download': 'https://ytsyify.live/movie/doom-annihilation-2019/'},
 {'Movie_Name': 'Doom (2005)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/eVjlW6aOjqEohH4Ph4PktyH4fMr.jpg',
  'IMDB_Ratings': 'IMDb: 5.2',
  'Download': 'https://ytsyify.live/movie/doom-2005/'},
 {'Movie_Name': 'Dog Soldiers (2002)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/39B0B9v089W4wykhZuDnBzwlFxs.jpg',
  'IMDB_Ratings': 'IMDb: 6.8',
  'Download': 'https://ytsyify.live/movie/dog-soldiers-2002/'},
 {'Movie_Name': 'The Curse of All Hallows’ Eve (2017)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/aA1jOI27cN9IVi3sAj8ovGX6uOY.jpg',
  'IMDB_Ratings': 'IMDb: 4.6',
  'Download': 'https://ytsyify.live/movie/the-curse-of-all-hallows-eve-2017/'},
 {'Movie_Name': 'The Damned (2013)',
  'Image_Link': 'https://image.tmdb.org/t/p/w185/jtTlnB7jOfEBiK5eVS

# Writing csv files with headings.

- Here we will be writing the .csv file to store the data for future use.

In [None]:
headers_ = list(data_romance[0].keys())
','.join(headers_) + '\n'

'Movie_Name,Image_Link,IMDB_Ratings,Download\n'

In [None]:
def write_csv(items, path):

    with open(path, 'w') as file:          # Open  file in write mode

        if len(items) == 0:     # Return if there's nothing to write
            return

        # Write the headers in the first line
        headers = list(items[0].keys())
        file.write(','.join(headers) + '\n')


        for item in items:            #Write one item per line
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            file.write(','.join(values) + "\n")

In [None]:
write_csv(data_romance,'Romance.csv')

In [None]:
with open('Romance.csv', 'r') as file:
    print(file.read())
    #This func

Movie_Name,Image_Link,IMDB_Ratings,Download
Doom: Annihilation (2019),https://image.tmdb.org/t/p/w185/7EGElXVSNnqcPjuhXPd6UVUX1rD.jpg,IMDb: 3.7,https://ytsyify.live/movie/doom-annihilation-2019/
Doom (2005),https://image.tmdb.org/t/p/w185/eVjlW6aOjqEohH4Ph4PktyH4fMr.jpg,IMDb: 5.2,https://ytsyify.live/movie/doom-2005/
Dog Soldiers (2002),https://image.tmdb.org/t/p/w185/39B0B9v089W4wykhZuDnBzwlFxs.jpg,IMDb: 6.8,https://ytsyify.live/movie/dog-soldiers-2002/
The Curse of All Hallows’ Eve (2017),https://image.tmdb.org/t/p/w185/aA1jOI27cN9IVi3sAj8ovGX6uOY.jpg,IMDb: 4.6,https://ytsyify.live/movie/the-curse-of-all-hallows-eve-2017/
The Damned (2013),https://image.tmdb.org/t/p/w185/jtTlnB7jOfEBiK5eVSLUMWR6aKg.jpg,IMDb: 5.2,https://ytsyify.live/movie/the-damned-2013/
The Invitation (2022),https://image.tmdb.org/t/p/w185/jcTq6gIskCsHlKDvCKKouEfiU66.jpg,IMDb: 5.4,https://ytsyify.live/movie/the-invitation-2022/
Mansion of Blood (2015),https://image.tmdb.org/t/p/w185/c31XK5G3h8oO9XaLOnJH3M117eP.jpg,

# Scrape_Movies_From_the_topic_page.

Put into a single fumction.

In [None]:
def scrape_all_movies(topic, path = None):
    #Get the all the movie data from the topic and write to csv file
    if path is None:
        path = topic +'.csv'
    genre_page_doc = get_genre_page(topic)
    genre_data = get_genre_movie(genre_page_doc)
    write_csv(genre_data,path)
    print('The Genre data for the topic"{}" write to file "{}"'.format(topic,path))
    return path

In [None]:
scrape_all_movies('action')

The Genre data for the topic"action" write to file "action.csv"


'action.csv'

In [None]:
scrape_all_movies('romance')

The Genre data for the topic"romance" write to file "romance.csv"


'romance.csv'

In [None]:
scrape_all_movies('horror')

The Genre data for the topic"horror" write to file "horror.csv"


'horror.csv'

In [None]:
import pandas as pandas

# The pandas function helps to read the file in a structured table.

In [None]:
pandas.read_csv('action.csv')

Unnamed: 0,Movie_Name,Image_Link,IMDB_Ratings,Download
0,Doom: Annihilation (2019),https://image.tmdb.org/t/p/w185/7EGElXVSNnqcPj...,IMDb: 3.7,https://ytsyify.live/movie/doom-annihilation-2...
1,Doom (2005),https://image.tmdb.org/t/p/w185/eVjlW6aOjqEohH...,IMDb: 5.2,https://ytsyify.live/movie/doom-2005/
2,Dog Soldiers (2002),https://image.tmdb.org/t/p/w185/39B0B9v089W4wy...,IMDb: 6.8,https://ytsyify.live/movie/dog-soldiers-2002/
3,The Marine 2 (2009),https://image.tmdb.org/t/p/w185/z1PayLY6vlMeCr...,IMDb: 5.0,https://ytsyify.live/movie/the-marine-2-2009/
4,Recoil (2011),https://image.tmdb.org/t/p/w185/5K0EDKAP4OvmCu...,IMDb: 5.1,https://ytsyify.live/movie/recoil-2011/
5,Free Ride (2013),https://image.tmdb.org/t/p/w185/9vuDdLXgyjhiuq...,IMDb: 5.6,https://ytsyify.live/movie/free-ride-2013/
6,Lost at War (2007),https://image.tmdb.org/t/p/w185/8IF24eO1T87y6E...,IMDb: 5.0,https://ytsyify.live/movie/lost-at-war-2007/
7,Bearry (2021),https://image.tmdb.org/t/p/w185/rjUVnC6itSxUPF...,IMDb: 3.5,https://ytsyify.live/movie/bearry-2021/
8,Empire Of Lust (2015),https://image.tmdb.org/t/p/w185/bT25JmqjEwPJyS...,IMDb: 6.0,https://ytsyify.live/movie/empire-of-lust-2015/
9,Restart The Earth (2022),https://image.tmdb.org/t/p/w185/kl80N1g69v9QXe...,IMDb: 4.5,https://ytsyify.live/movie/restart-the-earth-2...


In [None]:
pandas.read_csv('romance.csv')

Unnamed: 0,Movie_Name,Image_Link,IMDB_Ratings,Download
0,Love Nonetheless (2022),https://image.tmdb.org/t/p/w185/3iSO2QxyY4r30Q...,IMDb: N/A,https://ytsyify.live/movie/love-nonetheless-2022/
1,The Merry Widow (2007),https://image.tmdb.org/t/p/w185/oauzSoL2gHokbv...,IMDb: 5.3,https://ytsyify.live/movie/the-merry-widow-2007/
2,Fatherland (1994),https://image.tmdb.org/t/p/w185/8HbeLtS7FRaQ6Y...,IMDb: 6.4,https://ytsyify.live/movie/fatherland-1994/
3,Rosita (2015),https://image.tmdb.org/t/p/w185/mExefbPnCH2bfe...,IMDb: 6.5,https://ytsyify.live/movie/rosita-2015/
4,I’ll Come Running (2008),https://image.tmdb.org/t/p/w185/fx60AOZM2b0LOY...,IMDb: 6.6,https://ytsyify.live/movie/ill-come-running-2008/
5,Bearry (2021),https://image.tmdb.org/t/p/w185/rjUVnC6itSxUPF...,IMDb: 3.5,https://ytsyify.live/movie/bearry-2021/
6,The Attraction Test (2022),https://image.tmdb.org/t/p/w185/zwukL1kUeEq2Nj...,IMDb: N/A,https://ytsyify.live/movie/the-attraction-test...
7,Finding Hubby (2020),https://image.tmdb.org/t/p/w185/sBCmm7sYip7JIO...,IMDb: 4.1,https://ytsyify.live/movie/finding-hubby-2020/
8,Snowed Inn Christmas (2017),https://image.tmdb.org/t/p/w185/1GdsUysHya4mdl...,IMDb: 7.2,https://ytsyify.live/movie/snowed-inn-christma...
9,Empire Of Lust (2015),https://image.tmdb.org/t/p/w185/bT25JmqjEwPJyS...,IMDb: 6.0,https://ytsyify.live/movie/empire-of-lust-2015/


In [None]:
pandas.read_csv('horror.csv')

Unnamed: 0,Movie_Name,Image_Link,IMDB_Ratings,Download
0,Doom: Annihilation (2019),https://image.tmdb.org/t/p/w185/7EGElXVSNnqcPj...,IMDb: 3.7,https://ytsyify.live/movie/doom-annihilation-2...
1,Doom (2005),https://image.tmdb.org/t/p/w185/eVjlW6aOjqEohH...,IMDb: 5.2,https://ytsyify.live/movie/doom-2005/
2,Dog Soldiers (2002),https://image.tmdb.org/t/p/w185/39B0B9v089W4wy...,IMDb: 6.8,https://ytsyify.live/movie/dog-soldiers-2002/
3,The Curse of All Hallows’ Eve (2017),https://image.tmdb.org/t/p/w185/aA1jOI27cN9IVi...,IMDb: 4.6,https://ytsyify.live/movie/the-curse-of-all-ha...
4,The Damned (2013),https://image.tmdb.org/t/p/w185/jtTlnB7jOfEBiK...,IMDb: 5.2,https://ytsyify.live/movie/the-damned-2013/
5,The Invitation (2022),https://image.tmdb.org/t/p/w185/jcTq6gIskCsHlK...,IMDb: 5.4,https://ytsyify.live/movie/the-invitation-2022/
6,Mansion of Blood (2015),https://image.tmdb.org/t/p/w185/c31XK5G3h8oO9X...,IMDb: 5.1,https://ytsyify.live/movie/mansion-of-blood-2015/
7,Goodnight Mommy (2022),https://image.tmdb.org/t/p/w185/oHhD5jD4S5ElPN...,IMDb: N/A,https://ytsyify.live/movie/goodnight-mommy-2022/
8,Inside (2016),https://image.tmdb.org/t/p/w185/1PSctD5yezkDKl...,IMDb: 4.7,https://ytsyify.live/movie/inside-2016/
9,The Retreat (2021),https://image.tmdb.org/t/p/w185/xDlc336bLsy9dg...,IMDb: 4.7,https://ytsyify.live/movie/the-retreat-2021/


# Putting_it_all in one_cell_for_a_better_understanding

- This to give a better understanding of how the functions are placed one after the other to get the data set.
- We fetch the page
- we get the contents in the page as doc using bs4(beautifulsoup4)
- we get the genre page
- we get all the movie data
- we write it down t a csv file
- in the end we put all the functions under the a single function 'scrape_all_movies' where all the movies are scraped and put   under the table and written in a csv file.

In [None]:
import requests
from bs4 import BeautifulSoup
url_page = 'https://ytsyify.live/genre/action/'


def fetch_page(url_page):
   #The fetch_page function will fetch the page contents of the page and return the information as output
    response = requests.get(url_page) #this function fecthes the url page
    # if response status code is 200  to 299 the it is sucessful page fetch
    if response.status_code !=200:
        raise Exception('Fetch FAILED')


    return response.text

def get_docs(page_content):
    # this function will take htmla or xml page and return as beautifulsoup doc.
    doc = Beautifulsoup(page_content, 'html.parser')
    return doc

def get_genre_page(topic):
    genre_page_url ='https://ytsyify.live/genre/'+ topic
    response = requests.get(genre_page_url)
    if response.status_code !=200:
        print('status code:', response.status_code )
        raise Exception('Failed_fetch_page' + genre_page_url)

    doc = BeautifulSoup(response.text)

    return doc


def movie_data(movie_docs):
    # This function will extract all the information required and return as dictionary.
    movie_name = movie_docs.find('div', class_ ='qtip-title').text
    imgs_tag = movie_docs.find('img')['src']
    imdb_rating = movie_docs.find('div',class_= 'jt-info jt-imdb').text.strip()
    download_link =movie_docs.find('a', class_= 'btn btn-block btn-successful')['href']
    return {
        'Movie_Name' : movie_name,
        'Image_Link' : imgs_tag,
        'IMDB_Ratings' : imdb_rating,
        'Download' :  download_link
    }


all_movies = [movie_data(tag) for tag in movie_doc]



def get_genre_movie(doc):
    # This function will get the data of the latest movies in the page.
    movie_doc = doc.find_all('div',class_= 'ml-item' )
    all_movies = [movie_data(tag) for tag in movie_doc]
    return all_movies

headers_ = list(data_romance[0].keys())
','.join(headers_) + '\n'


def write_csv(items, path):

    with open(path, 'w') as file:          # Open  file in write mode

        if len(items) == 0:     # Return if there's nothing to write
            return

        # Write the headers in the first line
        headers = list(items[0].keys())
        file.write(','.join(headers) + '\n')


        for item in items:            #Write one item per line
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            file.write(','.join(values) + "\n")

def scrape_all_movies(topic, path = None):
    #Get the all the movie data from the topic and write to csv file
    if path is None:
        path = topic +'.csv'
    genre_page_doc = get_genre_page(topic)
    genre_data = get_genre_movie(genre_page_doc)
    write_csv(genre_data,path)
    print('The Genre data for the topic"{}" write to file "{}"'.format(topic,path))
    return path

# References and future work

### Summary

1. The project scrapes the different movie list from different genre page in yts.live  which provides the name of the latest movies with year, image links, imdb ratings and download links
2. Which can be used to either determine the highly rated movies which can be bought out by some ott platforms for telecasting  3. Can be used by the movies critics to give better reviews of the movies.
4. Gives a better analysis as a large number of audience still prefer free download sites.
5. The is method can also be use to stop piracy to an extent although it is fully not possible.


### Future works
##### Music and music directors
Using the type of musics, the number of plays, the number of likes and from which particular music director and his latest songs we can determine the type of songs the audience prefer and why only that particular music director.
##### Fetching product data for competitve pricing
 we can select top  selling brands of a similar product from the site and do the analysis for competitive pricing and making the sale better.
##### Swiggy average food orders analysis.
we can select a particular food or a meal or the retaurent and its average orders in a day and give an insight on how to give an offer or pricing for the increase in sale or the orders.


### References
- Netflix "www.netflix.com"
- zee5 "www.zee5.com"
- Amazon Prime "www.primevideo.com"
- major ott platforms like voot, hotstar etc.

In [None]:
import jovian

In [None]:
jovian.commit(project="scraping-yts-movies-list", outputs=['action.csv','romance.csv','horror.csv'])

<IPython.core.display.Javascript object>

In [None]:
git init

SyntaxError: invalid syntax (2830201818.py, line 1)