Import the pandas library

In [1]:
import pandas as pd


[How to see and download Netflix viewing history](https://help.netflix.com/en/node/101917)

Load NetflixViewingHistory.csv into a pandas dataframe

In [2]:
netflix_data = pd.read_csv("NetflixViewingHistory.csv")


Take a look at the dataframe

In [3]:
netflix_data


Unnamed: 0,Title,Date
0,Russian Doll: Season 2: Station to Station,5/4/22
1,Russian Doll: Season 2: Brain Drain,5/3/22
2,Russian Doll: Season 2: Coney Island Baby,4/26/22
3,Russian Doll: Season 2: Nowhen,4/26/22
4,Eternally Confused and Eager for Love: Season ...,4/10/22
...,...,...
1967,The OA: Part I: Chapter 4: Away,12/22/16
1968,The OA: Part I: Chapter 3: Champion,12/22/16
1969,The OA: Part I: Chapter 2: New Colossus,12/22/16
1970,The OA: Part I: Chapter 1: Homecoming,12/22/16


How many entries?

In [None]:
len(netflix_data)

In [None]:
netflix_data.shape[0]

How many times did I watch an episode of the show *Grey's Anatomy*?

In [None]:
# Step 1: Get just the titles 
netflix_data["Title"]

[Reference for using .apply()](https://pandas.pydata.org/docs/reference/api/pandas.Series.apply.html)

In [None]:
# Step 2: Use .apply() to determine whether each title contains "Grey's Anatomy"
netflix_data["Title"].apply(lambda title: "Grey's Anatomy" in title)


In [None]:
# Step 3: Sum up how many titles have the keyword
netflix_data["Title"].apply(lambda title: "Grey's Anatomy" in title).sum()



Which episodes of *Grey's Anatomy* did I watch?

[Reference for using .loc[]](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)

In [None]:
# Use .loc[]
# First argument passed into loc specifies which rows to select
# Second argument passed into loc specifies which columns to select
netflix_data.loc[["Grey's Anatomy" in title for title in netflix_data["Title"]], :]

In [None]:
netflix_data.loc[["Grey's Anatomy" in title for title in netflix_data["Title"]], "Title"]

On what date did I last watch (in other words finish watching) the show *Kim's Convenience*?

In [4]:
# One approach
# Step 1: Zoom in on entries containing "Kim's Convenience" in the title
# Step 2: Sort by the date 
# Step 3: Get just the dates
# Step 4: Using .iloc[] get the latest date


In [5]:
# Step 1: Zoom in on entries containing "Kim's Convenience" in the title
netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :]

Unnamed: 0,Title,Date
359,Kim's Convenience: Season 5: Family Business,6/28/21
360,Kim's Convenience: Season 5: Hugs & Prayers,6/28/21
361,Kim's Convenience: Season 5: Matchy Matchy,6/28/21
362,Kim's Convenience: Season 5: Who's Pranking Who?,6/28/21
363,Kim's Convenience: Season 5: Field of Schemes,6/28/21
...,...,...
1529,Kim's Convenience: Season 1: Wingman,11/19/18
1530,Kim's Convenience: Season 1: Frank & Nayoung,11/19/18
1531,Kim's Convenience: Season 1: Ddong Chim,11/19/18
1532,Kim's Convenience: Season 1: Janet's Photos,11/19/18


[Reference for using .sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html)

In [6]:
# Step 2: Sort by the date 
netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :].sort_values("Date", ascending=False)



Unnamed: 0,Title,Date
359,Kim's Convenience: Season 5: Family Business,6/28/21
360,Kim's Convenience: Season 5: Hugs & Prayers,6/28/21
361,Kim's Convenience: Season 5: Matchy Matchy,6/28/21
362,Kim's Convenience: Season 5: Who's Pranking Who?,6/28/21
363,Kim's Convenience: Season 5: Field of Schemes,6/28/21
...,...,...
1529,Kim's Convenience: Season 1: Wingman,11/19/18
1530,Kim's Convenience: Season 1: Frank & Nayoung,11/19/18
1531,Kim's Convenience: Season 1: Ddong Chim,11/19/18
1532,Kim's Convenience: Season 1: Janet's Photos,11/19/18


In [7]:
# Step 3: Get just the dates

netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :].sort_values("Date", ascending=False)["Date"]


359      6/28/21
360      6/28/21
361      6/28/21
362      6/28/21
363      6/28/21
          ...   
1529    11/19/18
1530    11/19/18
1531    11/19/18
1532    11/19/18
1533    11/19/18
Name: Date, Length: 65, dtype: object

[Reference for using .iloc[]](https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html)

In [8]:
# Step 4: Using .iloc[] get the latest date

netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :].sort_values("Date", ascending=False)["Date"].iloc[0]



'6/28/21'

In [12]:
# Another approach
# Step 1: Zoom in on entries containing "Kim's Convenience" in the title
# Step 2: Get just the dates
# Step 3: Sort by the date 
# Step 4: Using .iloc[] get the latest date
netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], "Date"].sort_values(ascending=False).iloc[0]



'6/28/21'

What is the last episode of *Kim's Convenience* that I watched?

In [13]:
# Step 1: Zoom in on entries containing "Kim's Convenience" in the title
# Step 2: Sort by the date 
# Step 3: Get just the titles
# Step 4: Get the latest title


In [14]:
# Steps 1-3 combined
netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :].sort_values("Date", ascending=False)["Title"]


359         Kim's Convenience: Season 5: Family Business
360          Kim's Convenience: Season 5: Hugs & Prayers
361           Kim's Convenience: Season 5: Matchy Matchy
362     Kim's Convenience: Season 5: Who's Pranking Who?
363        Kim's Convenience: Season 5: Field of Schemes
                              ...                       
1529                Kim's Convenience: Season 1: Wingman
1530        Kim's Convenience: Season 1: Frank & Nayoung
1531             Kim's Convenience: Season 1: Ddong Chim
1532         Kim's Convenience: Season 1: Janet's Photos
1533           Kim's Convenience: Season 1: Gay Discount
Name: Title, Length: 65, dtype: object

In [15]:
# Step 4

netflix_data.loc[["Kim's Convenience" in title for title in netflix_data["Title"]], :].sort_values("Date", ascending=False)["Title"].iloc[0]


"Kim's Convenience: Season 5: Family Business"

What percentage of my Netflix viewing consists of watching shows? What percentage of my Netflix viewing consists of watching movies?

Assumptions: 

* 2 main categories of what I watch on Netflix, shows and movies
* "Season" and "Chapter" are the main keywords indicating that a title is from a show

In [None]:
# Step 1: Identify whether each title is from a show
# Step 2: Compute the proportion of Netflix viewing spent on watching shows
# Step 3: Convert the proportion for shows to a percentage and save in a variable for convenience
# Step 4: Compute the percentage for movies and save in a variable for convenience


In [16]:
# Step 1: Identify whether each title is from a show
netflix_data["Title"].apply(lambda title: "Season" in title or "Chapter" in title)

0        True
1        True
2        True
3        True
4        True
        ...  
1967     True
1968     True
1969     True
1970     True
1971    False
Name: Title, Length: 1972, dtype: bool

In [17]:
# Step 2: Compute the proportion of Netflix viewing spent on watching shows
netflix_data["Title"].apply(lambda title: "Season" in title or "Chapter" in title).sum() / len(netflix_data["Title"].apply(lambda title: "Season" in title or "Chapter" in title))


0.9031440162271805

In [19]:
# Step 3: Convert the proportion for shows to a percentage and save in a variable for convenience

show_percentage = netflix_data["Title"].apply(lambda title: "Season" in title or "Chapter" in title).sum() / len(netflix_data["Title"].apply(lambda title: "Season" in title or "Chapter" in title)) * 100
show_percentage


90.31440162271805

In [20]:
# Step 4: Compute the percentage for movies and save in a variable for convenience

movie_percentage = 100 - show_percentage
movie_percentage

9.685598377281949

Now let's say you don't want to use *lambda* and want to separately define a function that identifies whether a title is from a show. 

In [21]:
# define function
def is_show(title):
    return "Season" in title or "Chapter" in title


In [22]:
# test out function
netflix_data["Title"].apply(is_show)


0        True
1        True
2        True
3        True
4        True
        ...  
1967     True
1968     True
1969     True
1970     True
1971    False
Name: Title, Length: 1972, dtype: bool

Now let's say you want to write a more generic way of checking whether a value contains a keyword.

In [23]:
# define function
def contains_keyword(value, keywords):
    return any(keyword in value for keyword in keywords)


Now let's create a list of keywords for our specific case and test out this new function on it.

In [25]:
# create a list of keywords
keywords = ["Chapter", "Season"]
keywords

['Chapter', 'Season']

In [26]:
# test out function
netflix_data["Title"].apply(contains_keyword, args=(keywords,))


0        True
1        True
2        True
3        True
4        True
        ...  
1967     True
1968     True
1969     True
1970     True
1971    False
Name: Title, Length: 1972, dtype: bool