# Introduction <a id='intro'></a>
In this project, you will work with data from the entertainment industry. You will study a dataset with records on movies and shows. The research will focus on the "Golden Age" of television, which began in 1999 with the release of *The Sopranos* and is still ongoing.

The aim of this project is to investigate how the number of votes a title receives impacts its ratings. The assumption is that highly-rated shows (we will focus on TV shows, ignoring movies) released during the "Golden Age" of television also have the most votes.

# Brief description of the situation
The research will focus on the “Golden Age” of television, which began in 1999 with the release of The Sopranos and is still ongoing.

# Goal 
The aim of this project is to investigate how the number of votes a title receives from IMDb users impacts its ratings. The assumption is that highly-rated shows (we will focus on TV shows, ignoring movies) released during the “Golden Age” of television also have the most votes.

# Description of the data we are going to use

'name' — first and last name of actor (director)
'character' — character played (for actors)
'role ' — the person’s contribution to the title (it can be in the capacity either of actor or director)
'title ' — title of movie (show)
'type ' — show or movie
'genres' — list of genres under which the movie (show) falls
'release_year' — year when the movie (show) was released
'imdb_score' — score on IMDb
'imdb_votes' — votes on IMDb

# Stages 
Data on movies and shows is stored in the `/datasets/movies_and_shows.csv` file. There is no information about the quality of the data, so you will need to explore it before doing the analysis.

First, you'll evaluate the quality of the data and see whether its issues are significant. Then, during data preprocessing, you will try to account for the most critical problems.
 
Your project will consist of three stages:
 1. Data overview
 2. Data preprocessing
 3. Data analysis

# Stage 1. Data overview 

Open and explore the data.

You'll need `pandas`, so import it.

In [1]:
# importing pandas
import pandas as pd

Read the `movies_and_shows.csv` file from the `datasets` folder and save it in the `df` variable:

In [2]:
# reading the files and storing them to df
df = pd.read_csv('/datasets/movies_and_shows.csv')

Print the first 10 table rows:

In [3]:
# obtaining the first 10 rows from the df table
# hint: you can use head() and tail() in Jupyter Notebook without wrapping them into print()
df.head(10)

Unnamed: 0,name,Character,r0le,TITLE,Type,release Year,genres,imdb sc0re,imdb v0tes
0,Robert De Niro,Travis Bickle,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
1,Jodie Foster,Iris Steensma,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
2,Albert Brooks,Tom,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
3,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
4,Cybill Shepherd,Betsy,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
5,Peter Boyle,Wizard,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
6,Leonard Harris,Senator Charles Palantine,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
7,Diahnne Abbott,Concession Girl,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
8,Gino Ardito,Policeman at Rally,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0
9,Martin Scorsese,Passenger Watching Silhouette,ACTOR,Taxi Driver,MOVIE,1976,"['drama', 'crime']",8.2,808582.0


Obtain the general information about the table with one command:

In [4]:
# obtaining general information about the data in df
df.dtypes

   name          object
Character        object
r0le             object
TITLE            object
  Type           object
release Year      int64
genres           object
imdb sc0re      float64
imdb v0tes      float64
dtype: object

In [5]:
# obtaining general information about the data in df
df.shape

(85579, 9)

In [6]:
# obtaining general information about the data in df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85579 entries, 0 to 85578
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0      name       85579 non-null  object 
 1   Character     85579 non-null  object 
 2   r0le          85579 non-null  object 
 3   TITLE         85578 non-null  object 
 4     Type        85579 non-null  object 
 5   release Year  85579 non-null  int64  
 6   genres        85579 non-null  object 
 7   imdb sc0re    80970 non-null  float64
 8   imdb v0tes    80853 non-null  float64
dtypes: float64(2), int64(1), object(6)
memory usage: 5.9+ MB


The table contains nine columns. The majority store the same data type: object. The only exceptions are `'release Year'` (int64 type), `'imdb sc0re'` (float64 type) and `'imdb v0tes'` (float64 type). Scores and votes will be used in our analysis, so it's important to verify that they are present in the dataframe in the appropriate numeric format. Three columns (`'TITLE'`, `'imdb sc0re'` and `'imdb v0tes'`) have missing values.

According to the documentation:
- `'name'` — actor/director's name and last name
- `'Character'` — character played (for actors)
- `'r0le '` — the person's contribution to the title (it can be in the capacity of either actor or director)
- `'TITLE '` — title of the movie (show)
- `'  Type'` — show or movie
- `'release Year'` — year when movie (show) was released
- `'genres'` — list of genres under which the movie (show) falls
- `'imdb sc0re'` — score on IMDb
- `'imdb v0tes'` — votes on IMDb

We can see three issues with the column names:
1. Some names are uppercase, while others are lowercase.
2. There are names containing whitespace.
3. A few column names have digit '0' instead of letter 'o'. 


# Conclusions

Each row in the table stores data about a movie or show. The columns can be divided into two categories: the first is about the roles held by different people who worked on the movie or show (role, name of the actor or director, and character if the row is about an actor); the second category is information about the movie or show itself (title, release year, genre, imdb figures).

It's clear that there is sufficient data to do the analysis and evaluate our assumption. However, to move forward, we need to preprocess the data.

# Stage 2. Data preprocessing 
Correct the formatting in the column headers and deal with the missing values. Then, check whether there are duplicates in the data.

In [7]:
# the list of column names in the df table
df.columns

Index(['   name', 'Character', 'r0le', 'TITLE', '  Type', 'release Year',
       'genres', 'imdb sc0re', 'imdb v0tes'],
      dtype='object')

Change the column names according to the rules of good style:
* If the name has several words, use snake_case
* All characters must be lowercase
* Remove whitespace
* Replace zero with letter 'o'

In [8]:
# renaming columns

df = df.rename(
    columns={
        '   name': 'name',
        'Character': 'character',
        'r0le': 'role',
        'TITLE': 'title',
        '  Type': 'type',
        'release Year': 'release_year',
        'imdb v0tes': 'imdb_votes',
        'imdb sc0re': 'imdb_score',        
    }
)

Check the result. Print the names of the columns once more:

In [9]:
# checking result: the list of column names
df.columns

Index(['name', 'character', 'role', 'title', 'type', 'release_year', 'genres',
       'imdb_score', 'imdb_votes'],
      dtype='object')

# Missing values 
First, find the number of missing values in the table. To do so, combine two `pandas` methods:

In [10]:
# calculating missing values
df.isna().sum()

name               0
character          0
role               0
title              1
type               0
release_year       0
genres             0
imdb_score      4609
imdb_votes      4726
dtype: int64

Not all missing values affect the research: the single missing value in `'title'` is not critical. The missing values in columns `'imdb_score'` and `'imdb_votes'` represent around 6% of all records (4,609 and 4,726, respectively, of the total 85,579). This could potentially affect our research. To avoid this issue, we will drop rows with missing values in the `'imdb_score'` and `'imdb_votes'` columns.

In [11]:
# dropping rows where columns with title, scores and votes have missing values
df = df.dropna(axis='rows')

Make sure the table doesn't contain any more missing values. Count the missing values again.

In [12]:
# counting missing values
df.isna().sum()

name            0
character       0
role            0
title           0
type            0
release_year    0
genres          0
imdb_score      0
imdb_votes      0
dtype: int64

# Duplicates 
Find the number of duplicate rows in the table using one command:

In [13]:
df.duplicated().sum()

6994

Review the duplicate rows to determine if removing them would distort our dataset.

In [14]:
# Produce table with duplicates (with original rows included) and review last 5 rows
dup_df = df[df.duplicated()]
print(dup_df.tail(5))

                      name                 character   role    title   type  \
85569       Jessica Cediel           Liliana Navarro  ACTOR  Lokillo  MOVIE   
85570  Javier Gardeaz?­bal  Agust??n "Peluca" Ort??z  ACTOR  Lokillo  MOVIE   
85571        Carla Giraldo            Valery Reinoso  ACTOR  Lokillo  MOVIE   
85572  Ana Mar??a S?­nchez                   Lourdes  ACTOR  Lokillo  MOVIE   
85577         Isabel Gaona                    Cacica  ACTOR  Lokillo  MOVIE   

       release_year      genres  imdb_score  imdb_votes  
85569          2021  ['comedy']         3.8        68.0  
85570          2021  ['comedy']         3.8        68.0  
85571          2021  ['comedy']         3.8        68.0  
85572          2021  ['comedy']         3.8        68.0  
85577          2021  ['comedy']         3.8        68.0  


There are two clear duplicates in the printed rows. We can safely remove them.
Call the `pandas` method for getting rid of duplicate rows:

In [15]:
# removing duplicate rows
df = df.drop_duplicates()

Check for duplicate rows once more to make sure you have removed all of them:

In [16]:
# checking for duplicates
df[0:].duplicated().sum()

0

Now get rid of implicit duplicates in the `'type'` column. For example, the string `'SHOW'` can be written in different ways. These kinds of errors will also affect the result.

Print a list of unique `'type'` names, sorted in alphabetical order. To do so:
* Retrieve the intended dataframe column 
* Apply a sorting method to it
* For the sorted column, call the method that will return all unique column values

In [17]:
#Retrieve the intended dataframe column
#Apply a sorting method to it
df['type'].sort_values()

#For the sorted column, call the method that will return all unique column values
df['type'].unique()
wrong_show_list = ['shows', 'tv show', 'tv shows', 'tv series', 'tv']

Look through the list to find implicit duplicates of `'show'` (`'movie'` duplicates will be ignored since the assumption is about shows). These could be names written incorrectly or alternative names of the same genre.

You will see the following implicit duplicates:
* `'shows'`
* `'SHOW'`
* `'tv show'`
* `'tv shows'`
* `'tv series'`
* `'tv'`

To get rid of them, declare the function `replace_wrong_show()` with two parameters: 
* `wrong_shows_list=` — the list of duplicates
* `correct_show=` — the string with the correct value

The function should correct the names in the `'type'` column from the `df` table (i.e., replace each value from the `wrong_shows_list` list with the value in `correct_show`).

In [18]:
# function for replacing implicit duplicates

def replace_wrong_show(df, wrong_shows_list, correct_show):

    df['type'] = df['type'].replace(wrong_shows_list, correct_show)

Call `replace_wrong_show()` and pass it arguments so that it clears implicit duplicates and replaces them with `SHOW`:

In [19]:
# removing implicit duplicates
replace_wrong_show(df, wrong_show_list, 'SHOW')

Make sure the duplicate names are removed. Print the list of unique values from the `'type'` column:

In [20]:
# viewing unique genre names
df['type'].unique()

array(['MOVIE', 'the movie', 'SHOW', 'movies'], dtype=object)

# Conclusions <a id='data_preprocessing_conclusions'></a>
We detected three issues with the data:

- Incorrect header styles
- Missing values
- Duplicate rows and implicit duplicates

The headers have been cleaned up to make processing the table simpler.

All rows with missing values have been removed. 

The absence of duplicates will make the results more precise and easier to understand.

Now we can move on to our analysis of the prepared data.

# Stage 3. Data analysis 

Based on the previous project stages, you can now define how the assumption will be checked. Calculate the average amount of votes for each score (this data is available in the `imdb_score` and `imdb_votes` columns), and then check how these averages relate to each other. If the averages for shows with the highest scores are bigger than those for shows with lower scores, the assumption appears to be true.

Based on this, complete the following steps:

- Filter the dataframe to only include shows released in 1999 or later.
- Group scores into buckets by rounding the values of the appropriate column (a set of 1-10 integers will help us make the outcome of our calculations more evident without damaging the quality of our research).
- Identify outliers among scores based on their number of votes, and exclude scores with few votes.
- Calculate the average votes for each score and check whether the assumption matches the results.

To filter the dataframe and only include shows released in 1999 or later, you will take two steps. First, keep only titles published in 1999 or later in our dataframe. Then, filter the table to only contain shows (movies will be removed).

In [21]:
# using conditional indexing modify df so it has only titles released after 1999 (with 1999 included)
# give the slice of dataframe new name

updated_release = df[df['release_year'] >= 1999] 
updated_release.head(10)

Unnamed: 0,name,character,role,title,type,release_year,genres,imdb_score,imdb_votes
1664,Jeff Probst,Himself - Host,ACTOR,Survivor,SHOW,2000,['reality'],7.4,24687.0
1955,Benicio del Toro,Franky Four Fingers,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1956,Dennis Farina,Cousin Avi,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1957,Vinnie Jones,Bullet Tooth Tony,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1958,Brad Pitt,Mickey O'Neil,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1959,Rade ÿerbed?ija,"Boris ""The Blade"" Yurinov",ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1960,Jason Statham,Turkish,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1961,Jason Flemyng,Darren,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1962,Stephen Graham,Tommy,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0
1963,Alan Ford,Brick Top Polford,ACTOR,Snatch,MOVIE,2000,"['crime', 'comedy']",8.3,841435.0


In [22]:
updated_release.tail(10)

Unnamed: 0,name,character,role,title,type,release_year,genres,imdb_score,imdb_votes
85481,Chiago Liu,Peichia,ACTOR,I Missed You,the movie,2021,"['drama', 'romance']",5.7,250.0
85501,Jhai'ho,unknown,ACTOR,Momshies! Your Soul is Mine,the movie,2021,['comedy'],5.8,27.0
85527,Jeffrey Quizon,unknown,ACTOR,Princess 'Daya'Reese,the movie,2021,"['comedy', 'romance']",7.1,50.0
85560,Zainab Balogun,Temisan,ACTOR,Fine Wine,the movie,2021,"['romance', 'drama']",6.8,45.0
85564,Baaj Adebule,Sammy,ACTOR,Fine Wine,the movie,2021,"['romance', 'drama']",6.8,45.0
85573,A??da Morales,Maritza,ACTOR,Lokillo,the movie,2021,['comedy'],3.8,68.0
85574,Adelaida Buscato,Mar??a Paz,ACTOR,Lokillo,the movie,2021,['comedy'],3.8,68.0
85575,Luz Stella Luengas,Karen Bayona,ACTOR,Lokillo,the movie,2021,['comedy'],3.8,68.0
85576,In??s Prieto,Fanny,ACTOR,Lokillo,the movie,2021,['comedy'],3.8,68.0
85578,Julian Gaviria,unknown,DIRECTOR,Lokillo,the movie,2021,['comedy'],3.8,68.0


In [23]:
updated_release.sample(10)

Unnamed: 0,name,character,role,title,type,release_year,genres,imdb_score,imdb_votes
55646,Alba Gris Guti??rrez,Cosplay Party Guest #1,ACTOR,Unknown Origins,MOVIE,2020,"['crime', 'drama', 'thriller', 'comedy', 'acti...",6.1,6769.0
77161,Ayu Laksmi,Ibu Saski,ACTOR,A Perfect Fit,MOVIE,2021,"['comedy', 'drama', 'romance']",5.3,585.0
42193,Chris Lee,‡??‡??‘??,ACTOR,What She Put on the Table,SHOW,2017,['drama'],7.3,30.0
7945,Daniel Arrias,Carlos (uncredited),ACTOR,Blood and Bone,MOVIE,2009,"['thriller', 'sport', 'action', 'crime', 'drama']",6.7,33051.0
18980,Javier Escobar,El Mapache,ACTOR,The Perfect Dictatorship,MOVIE,2014,"['drama', 'comedy']",7.2,5432.0
17130,Sophie Kennedy Clark,Young Philomena,ACTOR,Philomena,MOVIE,2013,"['drama', 'comedy', 'european']",7.6,99647.0
67875,Guillermo Arengo,Padre,ACTOR,The Wrath of God,MOVIE,2022,"['thriller', 'drama']",5.7,2800.0
10032,Sonali Kulkarni,Pooja,ACTOR,Dil Chahta Hai,MOVIE,2001,"['romance', 'comedy', 'drama']",8.1,71586.0
60355,Zachary Quinto,Self,ACTOR,The Boys in the Band: Something Personal,MOVIE,2020,['documentation'],7.2,362.0
10108,Fumiko Orikasa,Hisana Kuchiki (voice),ACTOR,Bleach the Movie: Fade to Black,MOVIE,2008,"['fantasy', 'action', 'drama', 'animation']",7.1,2207.0


<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 2</b> 
    
✔️ Good.
    
<a class="tocSkip"></a><s>

<div class="alert alert-danger"; style="border-left: 7px solid red">
<b>⛔️ Reviewer's comment, v. 1</b> 

Could you please save the slice that you correctly created with conditional indexing a new name? It's required by the task.

<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 2</b> 
    
✔️ It's OK, but usually it's enough to use one of the methods.
    
<a class="tocSkip"></a><s>

<div class="alert alert-warning"; style="border-left: 7px solid gold">
<b>⚠️ Reviewer's comment, v. 1</b> 

Please note that some dataframes can be quite big and reading the whole dataframe to display it might take significant resources. 
    
A more optimal way to check the contents of dataframes is using `head()`, `tail()` or `sample()` method. The first will display 5 first lines, the second one - 5 last lines, and `sample()` - a random line. You may specify a different number of lines you want to be displayed in the parentheses.

In [24]:
# repeat conditional indexing so df has only shows (movies are removed as result)
updated_release = updated_release[updated_release['type'] == 'SHOW']
print(updated_release.head(10))

                  name                  character   role      title  type  \
1664       Jeff Probst             Himself - Host  ACTOR   Survivor  SHOW   
2076     Mayumi Tanaka    Monkey D. Luffy (voice)  ACTOR  One Piece  SHOW   
2077      Kazuya Nakai       Roronoa Zoro (voice)  ACTOR  One Piece  SHOW   
2078     Akemi Okamura               Nami (voice)  ACTOR  One Piece  SHOW   
2079  Kappei Yamaguchi              Usopp (voice)  ACTOR  One Piece  SHOW   
2080    Hiroaki Hirata     Vinsmoke Sanji (voice)  ACTOR  One Piece  SHOW   
2081  Yuriko Yamaguchi         Nico Robin (voice)  ACTOR  One Piece  SHOW   
2082        Ikue Otani  Tony Tony Chopper (voice)  ACTOR  One Piece  SHOW   
2083        Kazuki Yao             Franky (voice)  ACTOR  One Piece  SHOW   
2084               Cho              Brook (voice)  ACTOR  One Piece  SHOW   

      release_year                                             genres  \
1664          2000                                        ['reality']   
2076  

<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 2</b> 
    
✔️ 
    
<a class="tocSkip"></a><s>

<div class="alert alert-warning"; style="border-left: 7px solid gold">
<b>⚠️ Reviewer's comment, v. 1</b> 

When you update the previous step, please do not forget to change the name of the dataframe you are using here.

The scores that are to be grouped should be rounded. For instance, titles with scores like 7.8, 8.1, and 8.3 will all be placed in the same bucket with a score of 8.

In [25]:
# rounding column with scores
updated_release = updated_release.round()
print(updated_release.shape)
#checking the outcome with tail()
updated_release.tail()

(13430, 9)


Unnamed: 0,name,character,role,title,type,release_year,genres,imdb_score,imdb_votes
85433,Maneerat Kam-Uan,Ae,ACTOR,Let's Eat,SHOW,2021,"['drama', 'comedy']",8.0,5.0
85434,Rudklao Amratisha,unknown,ACTOR,Let's Eat,SHOW,2021,"['drama', 'comedy']",8.0,5.0
85435,Jaturong Mokjok,unknown,ACTOR,Let's Eat,SHOW,2021,"['drama', 'comedy']",8.0,5.0
85436,Pisamai Wilaisak,unknown,ACTOR,Let's Eat,SHOW,2021,"['drama', 'comedy']",8.0,5.0
85437,Sarawut Wichiensarn,unknown,DIRECTOR,Let's Eat,SHOW,2021,"['drama', 'comedy']",8.0,5.0


It is now time to identify outliers based on the number of votes.

In [26]:
# Use groupby() for scores and count all unique values in each group, print the result
print(updated_release.groupby('imdb_score').count())

            name  character  role  title  type  release_year  genres  \
imdb_score                                                             
2.0           24         24    24     24    24            24      24   
3.0           27         27    27     27    27            27      27   
4.0          180        180   180    180   180           180     180   
5.0          592        592   592    592   592           592     592   
6.0         2494       2494  2494   2494  2494          2494    2494   
7.0         4706       4706  4706   4706  4706          4706    4706   
8.0         4842       4842  4842   4842  4842          4842    4842   
9.0          557        557   557    557   557           557     557   
10.0           8          8     8      8     8             8       8   

            imdb_votes  
imdb_score              
2.0                 24  
3.0                 27  
4.0                180  
5.0                592  
6.0               2494  
7.0               4706  
8.0    

Based on the aggregation performed, it is evident that scores 2 (24 voted shows), 3 (27 voted shows), and 10 (only 8 voted shows) are outliers. There isn't enough data for these scores for the average number of votes to be meaningful.

To obtain the mean numbers of votes for the selected scores (we identified a range of 4-9 as acceptable), use conditional filtering and grouping.

In [27]:
# filter dataframe using two conditions (scores to be in the range 4-9)
updated_release = updated_release[updated_release['imdb_score'] >= 4]
updated_release = updated_release[updated_release['imdb_score'] <= 9]

print(df.shape)

# group scores and corresponding average number of votes, reset index and print the result
updated_release = updated_release.groupby('imdb_score')['imdb_votes'].mean().reset_index()
print(updated_release.groupby('imdb_score')['imdb_votes'].mean().reset_index())

print(df.shape)

(73859, 9)
   imdb_score     imdb_votes
0         4.0    5277.583333
1         5.0    3143.942568
2         6.0    3481.717322
3         7.0    8727.068211
4         8.0   30299.460967
5         9.0  126904.109515
(73859, 9)


<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 1</b> 
    
Well done, you correctly filtered the dataframe and calculated the mean values.

Now for the final step! Round the column with the averages, rename both columns, and print the dataframe in descending order.

In [28]:
# round column with averages
updated_release = updated_release.round()
# rename columns
updated_release = updated_release.rename(
    columns={
        'imdb_score': 'SCORE',
        'imdb_votes': 'VOTES',        
    }
)
# print dataframe in descending order
print(updated_release.sort_values(by='VOTES', ascending=False))

   SCORE     VOTES
5    9.0  126904.0
4    8.0   30299.0
3    7.0    8727.0
0    4.0    5278.0
2    6.0    3482.0
1    5.0    3144.0


The assumption macthes the analysis: the shows with the top 3 scores have the most amounts of votes.

<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 2</b> 
    
✔️ Great job!
    
<a class="tocSkip"></a><s>

<div class="alert alert-danger"; style="border-left: 7px solid red">
<b>⛔️ Reviewer's comment, v. 1</b> 

Please note that you did not save the result after you applied `round()`, VOTES column is not rounded. Could you please fix that?

## Conclusion <a id='hypotheses'></a>

The research done confirms that highly-rated shows released during the "Golden Age" of television also have the most votes. While shows with score 4 have more votes than ones with scores 5 and 6, the top three (scores 7-9) have the largest number. The data studied represents around 94% of the original set, so we can be confident in our findings.

<div class="alert alert-success"; style="border-left: 7px solid green">
<b>✅ Reviewer's comment, v. 1</b> 
    
Overall conclusion is an important part, where we should include the summary of the outcomes of the project.
    
Please note that you will need to write one for every project you make.