# Project 1: Python Standard 

In this project, I practice using Python’s standard library.

Welcome to my first project! I practiced using Python’s standard library using a sample of 1075 rows from the [Netflix Dataset](https://www.kaggle.com/datasets/octopusteam/full-netflix-dataset) on Kaggle. Specifically, I practiced calculating the mean, median, and mode using the “imdbAverageRating” column (which gives the average rating of each movie listed on Netflix). I then created a sparkline to show the number of movies listed on Netflix that were released between 1950 and 2014. Trying to figure out mathematical calculations (mean, median, and mode) and plotting a sparkline without using any packages was super tricky. There was a lot of trial and error, I learned a lot in the process and became a lot more confident in my ability to code!

In [1]:
import pandas as pd

#reading the data in pandas 
netflix_data = pd.read_csv('/Users/alexachan/Downloads/netflix_data - data.csv')
netflix_data

Unnamed: 0,url,title,type,genres,releaseYear,imdbId,imdbAverageRating,imdbNumVotes,availableCountries
0,https://www.netflix.com/title/60000724,Forrest Gump,movie,"Drama, Romance",1994,tt0109830,8.8,2313221,MX
1,https://www.netflix.com/title/1154386,The Fifth Element,movie,"Action, Adventure, Sci-Fi",1997,tt0119116,7.6,516523,"AT, CH, DE"
2,https://www.netflix.com/title/60031236,Kill Bill: Vol. 1,movie,"Action, Crime, Thriller",2003,tt0266697,8.2,1220488,"AE, AL, AO, AT, AU, AZ, BG, BH, BY, CA, CI, CM..."
3,https://www.netflix.com/title/70021659,Jarhead,movie,"Biography, Drama, War",2005,tt0418763,7.0,211314,"AD, AE, AG, AL, AO, AR, AT, AZ, BA, BB, BE, BG..."
4,https://www.netflix.com/title/1080395,Unforgiven,movie,"Drama, Western",1992,tt0105695,8.2,443310,"AU, BA, BE, BG, CZ, HR, HU, MD, ME, MK, NZ, PL..."
...,...,...,...,...,...,...,...,...,...
1070,https://www.netflix.com/title/70028900,Red Eye,movie,Thriller,2005,tt0421239,6.5,150744,"BR, ES"
1071,https://www.netflix.com/title/70036290,Slim Susie,movie,"Comedy, Crime, Mystery",2003,tt0323998,6.9,10607,"DK, FI, NO, SE"
1072,https://www.netflix.com/title/60020782,America's Sweethearts,movie,"Comedy, Romance",2001,tt0265029,5.7,61132,"AT, CH, CY, DE, GB, GG, GI, GR, IE, IN, LI, PK..."
1073,https://www.netflix.com/title/60020828,Black Knight,movie,"Adventure, Comedy, Fantasy",2001,tt0265087,4.9,43956,"BE, TN"


# Mean using pandas

In [2]:
mean = netflix_data['imdbAverageRating'].mean()
print(f"The average IMDB rating for Netflix movies is {mean:.1f}")

The average IMDB rating for Netflix movies is 6.7


# Median using pandas

In [3]:
median = netflix_data['imdbAverageRating'].median()
print(f"The median IMDB rating for Netflix movies is {median}")

The median IMDB rating for Netflix movies is 6.8


# Mode using pandas

In [4]:
mode = netflix_data['imdbAverageRating'].mode()[0]
print(f"The mode IMDB rating for Netflix movies is {mode}")

The mode IMDB rating for Netflix movies is 7.3


# Mean using Python standard library

In [5]:
import csv

with open('/Users/alexachan/Downloads/netflix_data - data.csv','r') as file:
    #read the file
    netflix_data=csv.DictReader(file)

    #create empty variables
    sum = 0
    count = 0
    
    for row in netflix_data:
        #convert ratings from strings to floats
        rating = float(row['imdbAverageRating'])
        #add up the value of each rating
        sum += rating
        #count how many ratings there are in total
        count +=1 

    #calculate the mean by dividing the sum of the ratings by the total number of ratings
    mean = sum/count
        
    print(f"The average IMDB rating for Netflix is {mean:.1f}")

The average IMDB rating for Netflix is 6.7


# Median using Python standard library

In [6]:
#create empty list
rating_list = []

with open('/Users/alexachan/Downloads/netflix_data - data.csv','r') as file:
    #read the file
    netflix = csv.DictReader(file)
    
    for row in netflix:
        #convert ratings into floats and add to rating list
        rating_list.append(float(row['imdbAverageRating']))
        
        #sort ratings in numerical order
        rating_list = sorted(rating_list)
        
        #find midpoint
        midpoint = len(rating_list)//2
        
    #calculate median by using the midpoint as the index
    median = rating_list[midpoint]
    
    print(f"The median IMDB rating for Netflix movies is {median}")

The median IMDB rating for Netflix movies is 6.8


# Mode using Python standard library

In [7]:
#create empty dictionary to assign movie titles to corresponding ratings
rating_movies_dict = {}

with open('/Users/alexachan/Downloads/netflix_data - data.csv', 'r') as file:
    #read the file
    netflix_data = csv.DictReader(file)

    for row in netflix_data:
        #assign rows to new variables
        movie = row['title']
        rating = float(row['imdbAverageRating'])

        # add unique ratings to dictionary
        if rating not in rating_movies_dict:
            rating_movies_dict[rating] = []

        # add movie titles to corresponding rating
        rating_movies_dict[rating].append(movie)

    #create new dictionary to keep track of # of movies per rating (rather than each individual title)
    number_of_movies_per_rating = {}
    
    for rating in rating_movies_dict.keys():
        #assign number of movies per rating to corresponding keys in new dictionary
        number_of_movies_per_rating[rating] = len(rating_movies_dict[rating])

    #calculate the mode by pairing together maximum number of titles in a key with the corresponding key
    mode = max(zip(number_of_movies_per_rating.values(), number_of_movies_per_rating.keys()))[1]

    print(f'The mode IMDB rating for Netflix movies is {mode}')

The mode IMDB rating for Netflix movies is 7.3


# Data Visualization

In [8]:
#releaseYear x movie titles (show how many movies on Netflix were released in each year)
netflix_data = pd.read_csv('/Users/alexachan/Downloads/netflix_data - data.csv')
netflix_data[['releaseYear','title']]

Unnamed: 0,releaseYear,title
0,1994,Forrest Gump
1,1997,The Fifth Element
2,2003,Kill Bill: Vol. 1
3,2005,Jarhead
4,1992,Unforgiven
...,...,...
1070,2005,Red Eye
1071,2003,Slim Susie
1072,2001,America's Sweethearts
1073,2001,Black Knight


In [9]:
#create new max function that's separate from python standard library
def max1(*args):
    return max(args)

#create empty dictionary
year_movie_dict = {}

with open('/Users/alexachan/Downloads/netflix_data - data.csv', 'r') as file:
    #read the file
    netflix_data = csv.DictReader(file)

    for row in netflix_data:
        #assign rows to new variables
        year = row['releaseYear']
        movie = row['title']

        # add unique years to dictionary
        if year not in year_movie_dict:
            year_movie_dict[year] = []

        # add movie to corresponding year
        year_movie_dict[year].append(movie)

    #create new dictionary to keep track of number of  movies released in each year
    movies_from_each_year = {}
    
    for year in sorted(year_movie_dict.keys()):
        #assign movies to new dictionary in numerical order of keys
        movies_from_each_year[year] = int(len(year_movie_dict[year]))

#show how many movies on Netflix were released in each year
#print(movies_from_each_year)
    
    #separate years from movie counts
    years = list(movies_from_each_year.keys())
    movie_counts = list(movies_from_each_year.values())
    
#find the maximum
def max_movie(lst):
    max_movie = lst[0]
    for e in lst:
        if e > max_movie:
            max_movie = e
    return max_movie
    
max_movie = max(movie_counts)
    
#find the minimum
def min_movie(lst):
    min_movie = lst[0]
    for e in lst:
        if e < min_movie:
            min_movie = e
    return min_movie
    
min_movie = min(movie_counts)
    
#find the range
range_movie = max_movie - min_movie
    
# print sparklines for each year
bar = [
    max1(1, int((count - min_movie) / range_movie * 100))
    for count in movie_counts
]
for i, year in enumerate(years):
    sparkline = '*' * bar[i]  
    print(f"{year}: {sparkline}")


1950: *
1951: *
1952: *
1953: *
1954: *
1955: *
1956: *
1957: **
1959: *
1960: *
1961: *
1962: *
1963: **
1964: *****
1965: *
1966: *****
1967: *
1968: **
1969: *
1970: ****
1971: *
1972: *
1973: ***
1974: ******
1975: ***
1976: ******
1977: ***
1978: *****
1979: **
1980: ****
1981: *********
1982: ***
1983: ********
1984: ************
1985: ******
1986: *************
1987: ************
1988: *************
1989: *************
1990: **********
1991: *********************
1992: ********************
1993: **************************
1994: ***************************
1995: *****************************
1996: ***************************
1997: ***********************************
1998: ***************************************
1999: *********************************
2000: ********************************************************
2001: ***************************************************************
2002: ******************************************************************
2003: *********************

# Disclaimers

I used ChatGPT to figure out how to represent the counts as asterisks in the data visualization section and to create max1 (a new max function to make sure that years with only 1 movie were represented by asterisks 