In [None]:
# Jovian Commit Essentials
# Please retain and execute this cell without modifying the contents for `jovian.commit` to work
!pip install jovian --upgrade -q
import jovian
jovian.set_project('web-scraping-bookmyshow')
jovian.set_colab_id('1vu2frlBjFeboZujBe9QPCOnVX0WN5M1t')

# **Scraping BookMyShow Movie Data using Python & Selenium**

***Web scraping*** is an automatic method to obtain large amounts of data from websites.[Read more](https://en.wikipedia.org/wiki/Web_scraping)

![](https://i.imgur.com/Yfs3KKh.jpg)

***BookMyShow*** is India's largest entertainment ticketing website. Headquartered in Mumbai, it is the only destination for movie and non-movie options like events, plays and sports. Apart from movies, BookMyShow has ticketed for more than 300 live large format events and sport events such as ICL, city Marathons, etc.
For Booking, [Go To](https://in.bookmyshow.com/explore/home/)

In this project, we'll retrive information from the [BookMyShow](https://in.bookmyshow.com/explore/home/) using Web Scraping.

We'll use the Python library [Selenium](https://selenium-python.readthedocs.io/) to scrape data from the website.

Here's an outline of the setps we'll follow:

1. Install and import all the libraries to get the data from the WebElement.
2. Generate the list of Popular Cities and Other Cities using `find_element` method.
3. Extract the list of Movies, Censor Ratings, Language, Booking URL and convert it to DataFrame for 3 Popular Cities..
4. Extract additional details such as Hearts, User Ratings for 3 Movies.
5. Extract Theatre details for the current day on the basis of Booking URL for 3 Hindi Movies.
6. Merge the DataFrames to generate the Combined data with all the details.
7. Save the extracted information to a CSV file

By the end of the project, we'll create a CSV file in the following format:

```
City,Movie Name,language,Cencor Rating,Hearts,Ratings Received,Theatre Name,Booking Url
MUMBAI,The Kashmir Files,Hindi,A,98%,199.3K,ratings,Cinepolis: Fun Republic Mall, Andheri (W),https://in.bookmyshow.com/mumbai/movies/the-kashmir-files/ET00110845
...
```

### **How to Run the Code**

You can execute the code by using the "Run" button at the top of this page and selecting "Run on Colab". You can make changes and save your own version of the notebook to [Jovian](https://www.jovianai) by executing the following cells:

### **Install and import all the libraries to get the data from the WebElement**

We will use `jovian` library to run this notebook on Google Colab

In [None]:
!pip install jovian --upgrade --quiet

We will use `Selenium` and `kora` library to get the required WebDriverElements

We will use `pandas` to convert the data into DataFrames and Save it to .CSV file.

We will use `time` to provide Sleep time to the Web Site.

The library can be installed by using `pip`.

In [None]:
!pip install requests --upgrade --quiet
!pip install kora -q --quiet
from kora.selenium import wd
from selenium.webdriver.common.by import By
#import os
import pandas as pd
import time

[?25l[K     |█████▏                          | 10 kB 33.7 MB/s eta 0:00:01[K     |██████████▍                     | 20 kB 29.8 MB/s eta 0:00:01[K     |███████████████▋                | 30 kB 20.4 MB/s eta 0:00:01[K     |████████████████████▊           | 40 kB 18.9 MB/s eta 0:00:01[K     |██████████████████████████      | 51 kB 10.8 MB/s eta 0:00:01[K     |███████████████████████████████▏| 61 kB 12.6 MB/s eta 0:00:01[K     |████████████████████████████████| 63 kB 1.4 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[K     |████████████████████████████████| 57 kB 3.4 MB/s 
[K     |████████████████████████████████| 55 kB 3.8 MB/s 
[?25h

Now we have installed and imported all the required libraries


In [None]:
BMS_url="https://in.bookmyshow.com/explore/home"

### **Generate the list of Popular Cities and Other Cities using `find_element` method.**

We will generate a list of all the **Popular Cities** and **Other Cities** provided on the BookMyShow website.

![](https://imgur.com/Lc9hx4d.jpg)


Let's select Mumbai as the Default city to open the web page

In [None]:
wd.get(BMS_url+"/Mumbai")


`.get()` launches a new browser and opens the given URL in your Webdriver. It simply takes the string as your specified URL and opens it for testing purposes.

After the web browser is launched, the next step is to access the list of Cities by clicking below highlighted button.

![](https://imgur.com/n96E3uA.jpg)

The highlighted button can be clicked by using `.click()` on the WebElement.

## Geting the class name(html element) to create WebElement.

To get the class name, right click on the button to be clicked and select **Inspect** 

![](https://imgur.com/5zkffTL.jpg)

Get the class name and create a WebElement using the `.find_Element_by_class_name()` function.

Selenium **Find Element** command takes in the By object as the parameter and returns an object of type list WebElement in Selenium.

In [None]:
cities=wd.find_element_by_class_name("sc-kaNhvL.jlISnX.ellipsis") #Click on the Select region options
cities.click()

  """Entry point for launching an IPython kernel.


In [None]:
time.sleep(2)

`.sleep()` will give sleep time of 2s for the page to load and avoid blank WebElements

In [None]:
Cities_data=wd.find_elements(By.XPATH,'//span[@class="sc-iuDHTM uqCMs"]')
Other_Cities_tag=wd.find_element_by_class_name("sc-jxGEyO.fQHEXW") #Click on other Cities
Other_Cities_tag.click()
Other_Cities_data=wd.find_elements(By.XPATH,'//div[@class="sc-cqPOvA fmMura"]')

  


Get the WebElements for Popular Cities. Here, class name for span tag is passed to the `XPATH`.

XPath is a technique in Selenium to navigate through the HTML structure of a page


In [None]:
print(Cities_data)

[<span class="sc-iuDHTM uqCMs">Mumbai</span>, <span class="sc-iuDHTM uqCMs">NCR</span>, <span class="sc-iuDHTM uqCMs">Bengaluru</span>, <span class="sc-iuDHTM uqCMs">Hyderabad</span>, <span class="sc-iuDHTM uqCMs">Ahmedabad</span>, <span class="sc-iuDHTM uqCMs">Chandigarh</span>, <span class="sc-iuDHTM uqCMs">Chennai</span>, <span class="sc-iuDHTM uqCMs">Pune</span>, <span class="sc-iuDHTM uqCMs">Kolkata</span>, <span class="sc-iuDHTM uqCMs">Kochi</span>]


In [None]:
type(Cities_data)

list

In [None]:
Cities=[]
for e in Cities_data:
  Cities.append(e.get_attribute('innerHTML'))

Now, to extract the Popular Cities name, we need to loop over the list that contains WebElement and use the `.get_attribute()` passing `innerHTML`

In [None]:
print(Cities)

['Mumbai', 'NCR', 'Bengaluru', 'Hyderabad', 'Ahmedabad', 'Chandigarh', 'Chennai', 'Pune', 'Kolkata', 'Kochi']


Now, the for loop is converted to a function

In [None]:
def loop(list_data):
  List_name=[]
  for e in list_data:
    List_name.append(e.get_attribute('innerHTML'))
  return List_name


Calling the **loop** funtion

In [None]:
popular_Cities=loop(Cities_data)
Other_Cities=loop(Other_Cities_data)
print(popular_Cities)
print(Other_Cities)

['Mumbai', 'NCR', 'Bengaluru', 'Hyderabad', 'Chandigarh', 'Ahmedabad', 'Chennai', 'Pune', 'Kolkata', 'Kochi']
['Aalo', 'Abohar', 'Abu Road', 'Achampet', 'Acharapakkam', 'Addanki', 'Adilabad', 'Adimali', 'Adipur', 'Adoni', 'Agar Malwa', 'Agartala', 'Agiripalli', 'Agra', 'Ahmedgarh', 'Ahmednagar', 'Ahore', 'Aizawl', 'Ajmer', 'Akaltara', 'Akbarpur', 'Akividu', 'Akluj', 'Akola', 'Akot', 'Alakode', 'Alangudi', 'Alangulam', 'Alappuzha', 'Alathur', 'Alibaug', 'Aligarh', 'Almora', 'Alsisar (Rajasthan)', 'Alur', 'Alwar', 'Amadalavalasa', 'Amalapuram', 'Amalner', 'Amangal', 'Amaravathi', 'Ambajogai', 'Ambala', 'Ambikapur', 'Ambur', 'Amgaon', 'Amravati', 'Amreli', 'Amritsar', 'Amroha', 'Anaikatti', 'Anakapalle', 'Anand', 'Anandapur', 'Anantapalli', 'Anantapur', 'Anaparthi', 'Anchal', 'Andaman And Nicobar', 'Anekal', 'Angadipuram', 'Angamaly', 'Angara', 'Angul', 'Anjad', 'Anjar', 'Anklav', 'Ankleshwar', 'Annigeri', 'Arakkonam', 'Arambagh', 'Arambol', 'Aranthangi', 'Aravakurichi', 'Ariyalur', 'Arka

### **Extract the list of Movies, Censor Ratings, Language, Booking URL and covert it to DataFrame for 3 Popular Cities.**

After we have the list of Popular Cities, we will perform below steps to get the list of Movies, Censor Ratings, Language, Booking URL.


*   Open the Website using `.get()`
*   Click on the **Select Region** button and click on the City Name passed.
*   Click on the **See all** button to load all the movie names for that City
*   Use `wd.execute_script("window.scrollTo(0, 10000)")` to scroll down to access the entire page.
*   Save the WebElement for Movies, Censor Ratings, Language, Booking URL in list
*   Create a function to convert list WebElements data to List Readable data


We will create a function called **populate_dataframe** to populate list WebElement data to a List Readable Data to convert it into a DataFrame.

For the details of 3 Cities, we will use a **For** loop and return a **Dictionary** of the Data

In [None]:
def populate_dataframe(list_of_movies,Rating_Language,city_name,Booking_URL):
  for i in list_of_movies:
    Movies.append(i.text)
    City.append(city_name)
  for i in range(0,len(Rating_Language)):
    if i % 2==0:
      Rating.append(Rating_Language[i].text)
    else:
      language.append(Rating_Language[i].text)
  for i in range(1,len(Booking_URL)):
    Booking_url.append(Booking_URL[i].get_attribute('href')) 
  Movie_Details={'City': City,"Movie Name":Movies,'Language':language,'Censor Rating':Rating,'Booking Url':Booking_url} 
  return Movie_Details


**Rating,language-** Rating_Language WebElement has data for Ratings and Language arranged in Alternate manner

**Movie_Booking_url-**`href` attribute of the Booking_url is used to extract the Booking link of the Movies

**City-** city_name is provided as a fixed city name to all the entries under that city

**Movies-** `.text()` is used to get the values of WebElement

Calling the **populate_dataframe** function and **Creating get_WebElement_Movie** function to get the list containing WebElements

In [None]:

def get_WebElement_Movie(wd):
  for i in range(0,3):
    wd.get(BMS_url+"/Mumbai")
    cities=wd.find_element_by_class_name("sc-kaNhvL.jlISnX.ellipsis").click() #Click on the Select region options
    time.sleep(2)
    Other_Cities_tag=wd.find_element_by_class_name("sc-jxGEyO.fQHEXW").click()
    emojis_imgs = wd.find_elements_by_xpath('//img[@class="sc-bqjOQT aUKrX"]') #Get the WebElement from img tag 
    city_name=emojis_imgs[i].get_attribute('alt') #Get the City name saved in a list
    datac=emojis_imgs[0].click()
    time.sleep(2)
    See_all=wd.find_element_by_class_name("sc-7o7nez-0.gUjRuq").click() #Click on Sell All
    wd.execute_script("window.scrollTo(0, 10000);") #to Scroll the page down
    time.sleep(5)
    list_of_movies=wd.find_elements_by_class_name("sc-7o7nez-0.bJKnib") #Get movie names
    Rating_Language=wd.find_elements(By.XPATH,"//div[@class='sc-7o7nez-0 kQsZEC']") #Get rating and language
    Booking_URL=wd.find_elements(By.CLASS_NAME,'sc-133848s-11.sc-1ljcxl3-1.eQiiBj') #Get Booking Link
    Movie_Details=populate_dataframe(list_of_movies,Rating_Language,city_name,Booking_URL)
    #Movie_Details={'City': City,"Movie Name":Movies,'Language':language,'Censor Rating':Rating,'Booking Url':Booking_url}
  return Movie_Details

Calling the **get_WebElement_movie** function

In [None]:
Movies=[]
Rating=[]
language=[]
Booking_url=[]
City=[]
Movie_Details=get_WebElement_Movie(wd)


  """
  import sys
  
  if sys.path[0] == '':
  from ipykernel import kernelapp as app


In [None]:
print(Movie_Details)

{'City': ['MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'MUMBAI', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'NCR', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG', 'BANG'], 'Movie Name': ['RRR', 'The Kashmir Files', 'Bachchhan Paandey ', 'The Batman', 'Gangubai Kathiawadi', 'King Richard', 'Pawankhind', 'Jhund', 'Dear Father', 'Radhe Shyam', 'James', 'Uncharted', 'Pushpa: The Rise - Part 01', 'Dilwale Dulhania Le Jayenge', 'RRR', 'The Kashmir Files', 'Bachchhan Paandey ', 'The Batman', 'Gangubai Kathiawadi', 'King Richard', 'Pawankhind', 'Jhund', 'Dear Father', 'Radhe Shyam', 'James', 'Uncharted', 'Pushpa: The Rise - Part 01', 'Dilwale Dulhania Le Jayenge', 'RRR', 'The Kashmir Files', 'Bachchhan Paandey ', 'The Batman', 'Gangubai Kathiawadi', 'King Richard', 'Pawankhind', 'Jhund', 'Dear Father', 'Radh

## **Saving the Dictionary Data to DataFrame**

In [None]:
Details_df=pd.DataFrame(Movie_Details)
Details_df

Unnamed: 0,City,Movie Name,Language,Censor Rating,Booking Url
0,MUMBAI,RRR,"Telugu, Hindi, Tamil, Kannada, Malayalam",UA,https://in.bookmyshow.com/mumbai/movies/rrr/ET...
1,MUMBAI,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...
2,MUMBAI,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...
3,MUMBAI,The Batman,"English, Hindi, Tamil, Telugu",UA,https://in.bookmyshow.com/mumbai/movies/the-ba...
4,MUMBAI,Gangubai Kathiawadi,"Hindi, Telugu",UA,https://in.bookmyshow.com/mumbai/movies/gangub...
5,MUMBAI,King Richard,English,UA,https://in.bookmyshow.com/mumbai/movies/king-r...
6,MUMBAI,Pawankhind,Marathi,UA,https://in.bookmyshow.com/mumbai/movies/pawank...
7,MUMBAI,Jhund,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/jhund/...
8,MUMBAI,Dear Father,Gujarati,U,https://in.bookmyshow.com/mumbai/movies/dear-f...
9,MUMBAI,Radhe Shyam,"Telugu, Tamil, Hindi, Malayalam, Kannada",UA,https://in.bookmyshow.com/mumbai/movies/radhe-...


## **Extract Additonal details such as Hearts, User Ratings. for 3 Movies**

Now from the DataFrame **Details_df**, we will pick 3 Hindi movies and Extract Hearts Numbers and Users Ratings.

We will perform below steps:

* Filter on the DataFrame **Details_df** for **Hindi** as the language and **MUMBAI** as City.
* Pick up 3 Booking Url and loop using a `for` loop to extract the Hearts Numbers and Users Ratings
* Create lists for Hearts Numbers and Users Ratings and covert it into DataFrame

Here, we are limiting the scope of Web Scraping for Hindi Movies in MUMBAI City, as Movies with Multiple languages will have a additional option to select the language which will create additonal loops.

Filter on the DataFrame **Details_df** for **Hindi** as the language and **MUMBAI** as City

In [None]:
Movie_List=Details_df[Details_df['Language']=='Hindi']
Movie_List_new=Movie_List[Movie_List['City']=='MUMBAI']
Movie_List_new


Unnamed: 0,City,Movie Name,Language,Censor Rating,Booking Url
1,MUMBAI,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...
2,MUMBAI,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...
7,MUMBAI,Jhund,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/jhund/...
13,MUMBAI,Dilwale Dulhania Le Jayenge,Hindi,U,https://in.bookmyshow.com/mumbai/movies/dilwal...


In [None]:
Movie_List_new.reset_index(drop=True, inplace=True)
Movie_List_new

Unnamed: 0,City,Movie Name,Language,Censor Rating,Booking Url
0,MUMBAI,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...
1,MUMBAI,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...
2,MUMBAI,Jhund,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/jhund/...
3,MUMBAI,Dilwale Dulhania Le Jayenge,Hindi,U,https://in.bookmyshow.com/mumbai/movies/dilwal...


Reseting the row number of the DataFrame to ensure all the rows numbers have data available

**Pick up 3 Booking Url and loop over a `for` loop to extract the Hearts Numbers and Users Ratings and Create lists.**

Create function **get_WebElement_for_Ratings** to fetch Web Elements for Heart Numbers and User Ratings in the form of a Dictionary

In [None]:
def get_WebElement_for_Ratings():
  wd.get(Movie_List_new['Booking Url'][i])
  time.sleep(5)
  Heart=wd.find_elements_by_class_name("sc-ec6ph5-3.clvawV")[0].text
  rate=wd.find_elements_by_class_name("sc-ec6ph5-4.geTheh")[0].text
  URL_for_Movies.append(Movie_List_new['Booking Url'][i])
  Hearts.append(Heart)
  Ratings_Values.append(rate)
  More_details={'Booking Url':URL_for_Movies,'Hearts':Hearts,"Ratings Received":Ratings_Values}
  return More_details

Calling the function get_WebElement_for_Ratings() in a for loop

In [None]:
URL_for_Movies=[]
Hearts=[]
Ratings_Values=[]
for i in range(0,3):
  More_details=get_WebElement_for_Ratings()


  after removing the cwd from sys.path.
  """


## **Saving the Dictionary Data to DataFrame**

In [None]:
More_details_df=pd.DataFrame(More_details)
More_details_df

Unnamed: 0,Booking Url,Hearts,Ratings Received
0,https://in.bookmyshow.com/mumbai/movies/the-ka...,92%,449.8K ratings
1,https://in.bookmyshow.com/mumbai/movies/bachch...,71%,38.4K ratings
2,https://in.bookmyshow.com/mumbai/movies/jhund/...,87%,29.8K ratings


## **Extract Theatre details for the current day on the basis of Booking URL for 3 Hindi Movies in Mumbai.**

Now we will extract the list of Theatres for 3 Movies for the Current Day.

We will perform below steps:
* From the DataFrame Movies_Details_df, we will navigate to the Booking URL for each movie
* Click on the **Book** button
* Create a function to extract the list of Theatre Name for each Movie for current day
* Convert the list to a dictionary
* Convert the Dictionary to DataFrame


**Creating the function WebElement_to_data**

In [None]:
def WebElement_to_data(venue_name,URL):
  theatre_name=[]
  Booking_URL=[]
  for i in venue_name:
    theatre_name.append(i.text)
    Booking_URL.append(URL)
  return theatre_name,Booking_URL

**Creating the function get_theatre**

In [None]:

def get_theatre(wd,URL):
  wd.get(URL)
  Book_tickets=wd.find_elements(By.XPATH,'//div[@class="sc-1vmod7e-2 cgQNto"]') #Book Tickets
  Book_tickets[0].click()
  venue_name=wd.find_elements(By.XPATH,'//a[@class="__venue-name"]') #Get WebElement for Venue
  theatre,URL=WebElement_to_data(venue_name,URL)
  return theatre,URL
 

Calling **get_theatre** function and save in a Dictionary

In [None]:
def Venue_Details_list(i):
  print(Movie_List_new['Booking Url'][i])
  venue_list,url_list=get_theatre(wd,Movie_List_new['Booking Url'][i])
  for i in venue_list:
    Venue_name.append(i)
  for i in url_list:
    Movie_URL.append(i)
  Theatre_data={"Booking Url":Movie_URL,"Theatre Name":Venue_name,'City':'MUMBAI'}
  return Theatre_data

In [None]:
Venue_name=[]
Movie_URL=[]
for i in range(0,3):
  Theatre_data=Venue_Details_list(i)

https://in.bookmyshow.com/mumbai/movies/the-kashmir-files/ET00110845
https://in.bookmyshow.com/mumbai/movies/bachchhan-paandey/ET00108170
https://in.bookmyshow.com/mumbai/movies/jhund/ET00077233


In [None]:
len(Venue_name)

162

 **Creating DataFrame for Theatre**

In [None]:
Theatre_data_df=pd.DataFrame(Theatre_data)
Theatre_data_df



Unnamed: 0,Booking Url,Theatre Name,City
0,https://in.bookmyshow.com/mumbai/movies/the-ka...,"CinemaStar: High Street Mall, Thane",MUMBAI
1,https://in.bookmyshow.com/mumbai/movies/the-ka...,"Cinepolis: Viviana Mall, Thane",MUMBAI
2,https://in.bookmyshow.com/mumbai/movies/the-ka...,"INOX: Korum Mall, Eastern Express Highway, Thane",MUMBAI
3,https://in.bookmyshow.com/mumbai/movies/the-ka...,"Carnival: Little World Mall, Kharghar",MUMBAI
4,https://in.bookmyshow.com/mumbai/movies/the-ka...,"Cinepolis: Seawoods Grand Central, Navi Mumbai",MUMBAI
...,...,...,...
157,https://in.bookmyshow.com/mumbai/movies/jhund/...,Movietime Cubic Mall: Chembur,MUMBAI
158,https://in.bookmyshow.com/mumbai/movies/jhund/...,"PVR: Market City, Kurla (Premiere)",MUMBAI
159,https://in.bookmyshow.com/mumbai/movies/jhund/...,PVR: Sion,MUMBAI
160,https://in.bookmyshow.com/mumbai/movies/jhund/...,"Cinemax: Eternity Mall, Thane",MUMBAI


## **Merge the DataFrames to generate the Combined data with all the details.**

Here, we will merge 3 DataFrames to create the final Dataset.

Below are the steps to be followed:

* merge_data=Merge Details_df with More_details_df(Outer join on **Booking Url**)
* Final_Dataset-Merge merge_data with Theatre_data_df (Left join on **Booking Url**)

In [None]:
Theatre_data_df  #theatre Details
More_details_df #Additonal details
Details_df #Movies for 3 Cities

Unnamed: 0,City,Movie Name,Language,Censor Rating,Booking Url
0,MUMBAI,RRR,"Telugu, Hindi, Tamil, Kannada, Malayalam",UA,https://in.bookmyshow.com/mumbai/movies/rrr/ET...
1,MUMBAI,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...
2,MUMBAI,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...
3,MUMBAI,The Batman,"English, Hindi, Tamil, Telugu",UA,https://in.bookmyshow.com/mumbai/movies/the-ba...
4,MUMBAI,Gangubai Kathiawadi,"Hindi, Telugu",UA,https://in.bookmyshow.com/mumbai/movies/gangub...
5,MUMBAI,King Richard,English,UA,https://in.bookmyshow.com/mumbai/movies/king-r...
6,MUMBAI,Pawankhind,Marathi,UA,https://in.bookmyshow.com/mumbai/movies/pawank...
7,MUMBAI,Jhund,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/jhund/...
8,MUMBAI,Dear Father,Gujarati,U,https://in.bookmyshow.com/mumbai/movies/dear-f...
9,MUMBAI,Radhe Shyam,"Telugu, Tamil, Hindi, Malayalam, Kannada",UA,https://in.bookmyshow.com/mumbai/movies/radhe-...


In [None]:
merged_data=pd.merge(Details_df,More_details_df,on='Booking Url',how='outer')
merged_data

Unnamed: 0,City,Movie Name,Language,Censor Rating,Booking Url,Hearts,Ratings Received
0,MUMBAI,RRR,"Telugu, Hindi, Tamil, Kannada, Malayalam",UA,https://in.bookmyshow.com/mumbai/movies/rrr/ET...,,
1,NCR,RRR,"Telugu, Hindi, Tamil, Kannada, Malayalam",UA,https://in.bookmyshow.com/mumbai/movies/rrr/ET...,,
2,BANG,RRR,"Telugu, Hindi, Tamil, Kannada, Malayalam",UA,https://in.bookmyshow.com/mumbai/movies/rrr/ET...,,
3,MUMBAI,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...,92%,449.8K ratings
4,NCR,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...,92%,449.8K ratings
5,BANG,The Kashmir Files,Hindi,A,https://in.bookmyshow.com/mumbai/movies/the-ka...,92%,449.8K ratings
6,MUMBAI,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...,71%,38.4K ratings
7,NCR,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...,71%,38.4K ratings
8,BANG,Bachchhan Paandey,Hindi,UA,https://in.bookmyshow.com/mumbai/movies/bachch...,71%,38.4K ratings
9,MUMBAI,The Batman,"English, Hindi, Tamil, Telugu",UA,https://in.bookmyshow.com/mumbai/movies/the-ba...,,


In [None]:
Movie_and_theatre=pd.merge(Theatre_data_df,merged_data,on=['Booking Url','City'],how='left')


## **Arrange the Sequence of Columns**

In [None]:
Movie_and_theatre = Movie_and_theatre[['City', 'Movie Name', 'Language', 'Censor Rating','Hearts','Ratings Received','Theatre Name','Booking Url']]
Movie_and_theatre

Unnamed: 0,City,Movie Name,Language,Censor Rating,Hearts,Ratings Received,Theatre Name,Booking Url
0,MUMBAI,The Kashmir Files,Hindi,A,92%,449.8K ratings,"CinemaStar: High Street Mall, Thane",https://in.bookmyshow.com/mumbai/movies/the-ka...
1,MUMBAI,The Kashmir Files,Hindi,A,92%,449.8K ratings,"Cinepolis: Viviana Mall, Thane",https://in.bookmyshow.com/mumbai/movies/the-ka...
2,MUMBAI,The Kashmir Files,Hindi,A,92%,449.8K ratings,"INOX: Korum Mall, Eastern Express Highway, Thane",https://in.bookmyshow.com/mumbai/movies/the-ka...
3,MUMBAI,The Kashmir Files,Hindi,A,92%,449.8K ratings,"Carnival: Little World Mall, Kharghar",https://in.bookmyshow.com/mumbai/movies/the-ka...
4,MUMBAI,The Kashmir Files,Hindi,A,92%,449.8K ratings,"Cinepolis: Seawoods Grand Central, Navi Mumbai",https://in.bookmyshow.com/mumbai/movies/the-ka...
...,...,...,...,...,...,...,...,...
157,MUMBAI,Jhund,Hindi,UA,87%,29.8K ratings,Movietime Cubic Mall: Chembur,https://in.bookmyshow.com/mumbai/movies/jhund/...
158,MUMBAI,Jhund,Hindi,UA,87%,29.8K ratings,"PVR: Market City, Kurla (Premiere)",https://in.bookmyshow.com/mumbai/movies/jhund/...
159,MUMBAI,Jhund,Hindi,UA,87%,29.8K ratings,PVR: Sion,https://in.bookmyshow.com/mumbai/movies/jhund/...
160,MUMBAI,Jhund,Hindi,UA,87%,29.8K ratings,"Cinemax: Eternity Mall, Thane",https://in.bookmyshow.com/mumbai/movies/jhund/...


## **Save the extracted information to a CSV file**

Here we will save and download the DataFrame **Movie_and_theatre** using `.to_csv()` function in Pandas and `files.download()` to download the file.

In [None]:
from google.colab import files
Movie_and_theatre.to_csv('BookMyShow_Data.csv')
files.download('BookMyShow_Data.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Future Work**

* We can Fetch the Movie Details for all the Popular Cities and Other Cities.
* We can Fetch the Movie Details for Movies with Multiple Language
* We can Fetch timing details for their respetive venue
* we can Fetch Details of other events such as Stand-ups Comedy, Musical Events etc

## **Reference**

* https://en.wikipedia.org/wiki/Web_scraping
* https://selenium-python.readthedocs.io/
* https://pandas.pydata.org/docs/
* https://www.selenium.dev/pt-br/documentation/


In [None]:
jovian.commit(project="web-scraping-bookmyshow")

[jovian] Detected Colab notebook...[0m
[jovian] Please enter your API key ( from https://jovian.ai/ ):[0m
API KEY: 

In [None]:
jovian.commit(files=['BookMyShow_Data.csv'])

[jovian] Detected Colab notebook...[0m
[jovian] Uploading colab notebook to Jovian...[0m
[jovian] Uploading additional files...[0m
Committed successfully! https://jovian.ai/gouravitandel1945/web-scraping-bookmyshow


'https://jovian.ai/gouravitandel1945/web-scraping-bookmyshow'