<p style="font-family: Arial; font-size:1.75em;color:maroon; font-style:bold">
Case Study: Movie Data Analysis</p>

This notebook uses a dataset from the MovieLens website. We will describe the dataset further as we explore with it using *pandas*. 

###  the Dataset 
- No need to download them, as they have been provided to you.

* **Data Source:** MovieLens web site (filename: ml-20m.zip)
* **Location:** https://grouplens.org/datasets/movielens/


### <span style="color:#2467C0"> Use `Pandas.read_csv()` to Read the movies.csv Dataset. </span>

Make sure the relative path is correct.


In [2]:
import pandas as pd

In [4]:
movies = pd.read_csv('movies.csv', sep=',')
print(type(movies))
movies.head(4)

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance


In [7]:
movies.shape

(87585, 3)

In [9]:
movies.columns

Index(['movieId', 'title', 'genres'], dtype='object')

### <span style="color:#2467C0"> Slicing Out Columns </span>

In [11]:
movies.title.head() # or movies['title']


0                      Toy Story (1995)
1                        Jumanji (1995)
2               Grumpier Old Men (1995)
3              Waiting to Exhale (1995)
4    Father of the Bride Part II (1995)
Name: title, dtype: object

In [13]:
movies[['title']].head()

Unnamed: 0,title
0,Toy Story (1995)
1,Jumanji (1995)
2,Grumpier Old Men (1995)
3,Waiting to Exhale (1995)
4,Father of the Bride Part II (1995)


In [15]:
df_id_titles = movies[['movieId','title']] #Note the double '[]'
print(type(df_id_titles))
df_id_titles.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)


In [17]:
#Show only movieId and title
df = movies.iloc[:, :2]
df.head()

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)


### <span style="color:#2467C0"> Slicing Out Rows </span>

In [19]:
it = movies.iloc[:3] # return a dataframe type
it.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance


In [21]:
#Use a single [ ] also works
movies[-3:]

Unnamed: 0,movieId,title,genres
87582,292753,Orca (2023),Drama
87583,292755,The Angry Breed (1968),Drama
87584,292757,Race to the Summit (2023),Action|Adventure|Documentary


### <span style="color:#2467C0"> Slicing Out both rows and columns</span>
 

In [23]:
movies.loc[0:2, ['movieId', 'title']]

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)


In [36]:
movies.iloc[0:2, [0,2]]

Unnamed: 0,movieId,genres
0,1,Adventure|Animation|Children|Comedy|Fantasy
1,2,Adventure|Children|Fantasy


In [None]:
movies.iloc[2]

### <span style="color:orangered">Practice 1-- Basic DataFrame Operations</span>

1. show first 4 rows of DataFrame **movies**.
2. show first five movies' `titles` and `genres`.
3. show row 0 and 2.
4. Show the 'title' column of the first 3 rows of DataFrame **movies**
5. Show the `movieId` and `genres` columns of the first 3 rows of DataFrame **movies**. Hints: use loc()

In [46]:
movies.head(4)
movies.iloc[0:6,1:3]

Unnamed: 0,title,genres
0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,Jumanji (1995),Adventure|Children|Fantasy
2,Grumpier Old Men (1995),Comedy|Romance
3,Waiting to Exhale (1995),Comedy|Drama|Romance
4,Father of the Bride Part II (1995),Comedy
5,Heat (1995),Action|Crime|Thriller


In [44]:
movies.iloc[0:6,[0,2]]

Unnamed: 0,movieId,genres
0,1,Adventure|Animation|Children|Comedy|Fantasy
1,2,Adventure|Children|Fantasy
2,3,Comedy|Romance
3,4,Comedy|Drama|Romance
4,5,Comedy
5,6,Action|Crime|Thriller


In [33]:
movies.loc[:5,:]

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller


### <span style="color:#2467C0"> Using Relational and Logical Operators to Create Filters </span>

In [42]:
is_animation = movies['genres'].str.contains('Animation')#create a filter
movies[is_animation].head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
12,13,Balto (1995),Adventure|Animation|Children
47,48,Pocahontas (1995),Animation|Children|Drama|Musical|Romance
236,239,"Goofy Movie, A (1995)",Animation|Children|Comedy|Romance
241,244,Gumby: The Movie (1995),Animation|Children


In [48]:
is_animation = movies['genres'].str.contains('Animation')
is_1990 = movies['title'].str.contains('1990')
movies[is_animation & is_1990].head()

Unnamed: 0,movieId,title,genres
2000,2089,"Rescuers Down Under, The (1990)",Adventure|Animation|Children
9951,33463,DuckTales: The Movie - Treasure of the Lost La...,Adventure|Animation|Children|Comedy|Fantasy
17851,93208,Mickey's The Prince and the Pauper (1990),Animation|Children
18206,95165,Dragon Ball Z the Movie: The World's Strongest...,Action|Adventure|Animation|Sci-Fi|Thriller
18208,95182,Dragon Ball Z the Movie: The Tree of Might (Do...,Action|Adventure|Animation|Sci-Fi


### <span style="color:orangered">Practice 2 </h1>

1. show first 5 movies that do not have the word "Story" in the title.
2. Show first 5 adventure movies that has the word "Dragon" or "Tiger" in the title.

In [75]:
no_story = ~movies['title'].str.contains('Story')
movies[no_story].head(5)

Unnamed: 0,movieId,title,genres
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller


In [77]:
contain=movies['title'].str.contains('Dragon','Tiger')
movies[contain].head(5)

Unnamed: 0,movieId,title,genres
642,653,Dragonheart (1996),Action|Adventure|Fantasy
1007,1030,Pete's Dragon (1977),Adventure|Animation|Children|Musical
2491,2582,Twin Dragons (Shuang long hui) (1992),Action|Comedy
3893,3996,"Crouching Tiger, Hidden Dragon (Wo hu cang lon...",Action|Drama|Romance
3894,3997,Dungeons & Dragons (2000),Action|Adventure|Comedy|Fantasy
