**SKILLED BASED TEST DATA SCIENCE**

Create a Notebook of the IMDb Review dataset on Kaggle and
answer the following:

1. Which movie had the highest rating?
2. How many reviewers gave it the highest rating?
3. What’s its least rating and review detail?

# Import Libraries

In [1]:
# Let import libraries for our analysis
import pandas as pd
import numpy as np
import sqlite3 as sq3

# set off the warnings that may appear
import warnings
warnings.filterwarnings(action='ignore') 

# Loading and Previewing our Dataset

In [2]:
# Create a variable, 'path', containing the path to the 'IMDB_Movies_2021.db'
path = '/content/IMDB_Movies_2021.db'
# Create a connection, 'con', that is connected to database at 'path'
conn = sq3.connect(path)
# Create a variable, 'query', containing a SQL query which reads in all data from the `` table
query = 'SELECT AUTHOR,TITLE,REVIEW,RATING FROM REVIEWS'

df = pd.read_sql_query(query,conn)

# Removing unnecessary newline characters
df = df.replace('\n','', regex=True)

# Visualizing 10 samples
df.head(10)

Unnamed: 0,AUTHOR,TITLE,REVIEW,RATING
0,margarida-44311,Not Bad,I don't get all the terrible reviews for this ...,5.0
1,joemay-2,What are all the bad reviews about is it a wo...,I cannot believe anyone could give this film l...,8.0
2,nebk,Great White=Jaws Lite,Great White is not the worst way to spend 90 m...,4.0
3,kuarinofu,Bare-bones killer shark film,Great White is as basic of a killer shark film...,4.0
4,Horror_Flick_Fanatic,"Terrible story, dialogue, and CGI","Terrible story, dialogue and CGI. The film has...",4.0
5,NickyDee07938,A decent effort,Whilst the 'shark survival' sub genre has plen...,6.0
6,Novelwolf,Nice Shark movie!,Much better than the ratings suggest. Its on p...,9.0
7,mbnn,"Nice, but could be so much better","First of all I love the film locations, drone ...",5.0
8,phobicsq,Typical movie for the genre,The film is meh when it comes to these types o...,4.0
9,rotini-52586,Liked it !,Thought it was a great Shark Movie . Special e...,7.0


In [3]:
# check for the bottom data
df.tail()

Unnamed: 0,AUTHOR,TITLE,REVIEW,RATING
5445,suryajijvania,More Parts,"It's master piece by Zack please part 2,3,4 al...",10.0
5446,shishirkmr-82243,It's a fantastic movie,No words to describe. It's awesome. One of the...,10.0
5447,moizsyed-07601,Awesome out standing!,Far better than previous one and better editin...,10.0
5448,samun_shrestha,EPIC,Why did the studio say no to this masterpiece?...,10.0
5449,mmuradali-65680,The best DC movie till date,Overall Opinion-Although the competitors Marve...,10.0


In [4]:
#data shape
df.shape[0], df.shape[1]

(5450, 4)

The dataset has 5450 rows and 4 columns

In [5]:
#find null value
df.isnull().sum()

AUTHOR      0
TITLE       0
REVIEW      0
RATING    118
dtype: int64

From the above output rating has 118 null values.

In [6]:
#drop null value
df1 = df.dropna()
df1.isnull().sum()

AUTHOR    0
TITLE     0
REVIEW    0
RATING    0
dtype: int64

In [7]:
df.describe(include = ['O'])

Unnamed: 0,AUTHOR,TITLE,REVIEW
count,5450,5450,5450
unique,4767,5073,5419
top,Gordon-11,Boring,Overall Opinion-Although the competitors Marve...
freq,10,19,2


# Which movie had the highest rating?

In [8]:
# Group the data by movie title and calculate the average rating for each movie
grouped_df = df1.groupby("TITLE").mean()

# Sort the DataFrame by average rating in descending order
sorted_df = grouped_df.sort_values("RATING", ascending=False)

# Get the movie with the highest average rating
highest_rated_movie = sorted_df.index[0]

print(f"The movie with the highest rating is: {highest_rated_movie}")

The movie with the highest rating is:  " I AM ROY, AND ROY HAS DONE THIS."


# How many reviewers gave it the highest rating?

In [9]:
# Filter the data to get only the reviews for the highest rated movie
highest_rated_movie_reviews = df1[df1["TITLE"] == highest_rated_movie]

# Count the number of reviews with a rating of 10 (the highest rating)
highest_rating_count = (highest_rated_movie_reviews["RATING"] == 10.0).sum()

print(f"The number of reviewers who gave the highest rating to {highest_rated_movie} is: {highest_rating_count}")

The number of reviewers who gave the highest rating to  " I AM ROY, AND ROY HAS DONE THIS." is: 1


In [10]:
# Count the number of reviews with a rating of 10 (the highest rating)
highest_rating_count = (highest_rated_movie_reviews["RATING"] * 10).sum()
highest_rating_count1 = (highest_rated_movie_reviews["RATING"] == 10)

print(f"The number of reviewers who gave the highest rating to {highest_rated_movie} is: {highest_rating_count}")

The number of reviewers who gave the highest rating to  " I AM ROY, AND ROY HAS DONE THIS." is: 100.0


# What’s its least rating and review detail?

In [12]:
# Sort the reviews for the highest rated movie by rating in ascending order
sorted_reviews = highest_rated_movie_reviews.sort_values("RATING")

# Get the review with the lowest rating
least_rating_review = sorted_reviews.iloc[0]

# Get the rating and review text for the review with the lowest rating
least_rating = least_rating_review["RATING"]
review_text = least_rating_review["REVIEW"]

print(f"The least rating for {highest_rated_movie} is: {least_rating}")
print(f"The review detail for the least rating is: {review_text}")

The least rating for  " I AM ROY, AND ROY HAS DONE THIS." is: 10.0
The review detail for the least rating is: Awesome movie! Frank Grillo was superb, one of the Best of the Tough Guys! It was so cool to have a real life father and son acting in a movie.John Wick meets Kill Bill, with Edge of Tomorrow. Best line, by Brett, "That was rude". Non-stop action, great fight scenes,cool weaponry, and throwing the in '80's video games was genius. Watch it, you won't regret it!
