# Pandas Fundamentals
~~delete this line~~
## Part1: The Series Object - columns

### Construct a Series Object from an Array

In [1]:
import pandas as pd
series = pd.Series([0,1,2,3,4])
series

0    0
1    1
2    2
3    3
4    4
dtype: int64

You can just think series as 1D array. If you don't specify the index, it will be numbers from 0.

In [2]:
print("Values: ", series.values)
print("Indices: ", series.index)
print(series[1]) # Get a single value
print(series[1:4]) # Get a range of values

Values:  [0 1 2 3 4]
Indices:  RangeIndex(start=0, stop=5, step=1)
1
1    1
2    2
3    3
dtype: int64


But the index doesn't have to be a number! It can be anything!

In [42]:
series2 = pd.Series([12,24,34,45],
                    index = ['a','b','c','d'])
series2

a    12
b    24
c    34
d    45
dtype: int64

In [4]:
print(series2['c']) # Get a single value
print(series2['b':'d']) # Get a rangle of values, 'd' is included!!!

34
b    24
c    34
d    45
dtype: int64


### Construct a Series Object from a Dictionary

In [43]:
import pandas as pd
d = {'a':12,
     'b':23,
     'c':34,
     'd':45}
series3 = pd.Series(d)
series3

a    12
b    23
c    34
d    45
dtype: int64

In [44]:
print(series3['c'])
print(series3['a':'c'])

34
a    12
b    23
c    34
dtype: int64


## Part2: The DataFrame Object - columns + rows

### Construct a DataFrame Object from a single of Series


In [7]:
import pandas as pd
s1 = pd.Series([12,23,34,45],
               index=['a','b','c','d'])
df1 = pd.DataFrame(s1, columns=['quantity']) # columns is optinal
df1

Unnamed: 0,quantity
a,12
b,23
c,34
d,45


### Construct a DataFrame Object from Dictionary

In [8]:
d = {"country":["China","USA","Canada"],
     "capital":["Bejing","Washington DC","Ottawa"],
     "population":[1.3,0.5,0.1]}
df2 = pd.DataFrame(d)
df2

Unnamed: 0,country,capital,population
0,China,Bejing,1.3
1,USA,Washington DC,0.5
2,Canada,Ottawa,0.1


### Construct a DataFrame Object from a Dictionary of Series Objects

In [46]:
s1 = pd.Series([12,23,34,45],
               index=['a','b','c','d'])
s2 = pd.Series([1.2,0.5,2.5,3.6],
               index=['a','b','c','d'])
df3 = pd.DataFrame({'quantity' : s1 ,
                    'price' : s2})
df3

Unnamed: 0,quantity,price
a,12,1.2
b,23,0.5
c,34,2.5
d,45,3.6


### Construct a DataFrame Object by importing Data from file

In [47]:
import pandas as pd
df = pd.read_csv('sample_data/california_housing_test.csv')
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.3,34.26,43.0,1510.0,310.0,809.0,277.0,3.599,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0


In [48]:
df = pd.read_json('sample_data/anscombe.json')
df.head()

Unnamed: 0,Series,X,Y
0,I,10,8.04
1,I,8,6.95
2,I,13,7.58
3,I,9,8.81
4,I,11,8.33


## Part3: Data Wrangling 
Read, View and Extract info

### Read the data

In [49]:
import pandas as pd
imdb_df = pd.read_csv('sample_data/imdb_top_1000.csv')
imdb_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Poster_Link    1000 non-null   object 
 1   Series_Title   1000 non-null   object 
 2   Released_Year  1000 non-null   object 
 3   Certificate    899 non-null    object 
 4   Runtime        1000 non-null   object 
 5   Genre          1000 non-null   object 
 6   IMDB_Rating    1000 non-null   float64
 7   Overview       1000 non-null   object 
 8   Meta_score     843 non-null    float64
 9   Director       1000 non-null   object 
 10  Star1          1000 non-null   object 
 11  Star2          1000 non-null   object 
 12  Star3          1000 non-null   object 
 13  Star4          1000 non-null   object 
 14  No_of_Votes    1000 non-null   int64  
 15  Gross          831 non-null    object 
dtypes: float64(2), int64(1), object(13)
memory usage: 125.1+ KB


In [52]:
imdb_df.head()

Unnamed: 0,Poster_Link,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
0,https://m.media-amazon.com/images/M/MV5BMDFkYT...,The Shawshank Redemption,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
1,https://m.media-amazon.com/images/M/MV5BM2MyNj...,The Godfather,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
2,https://m.media-amazon.com/images/M/MV5BMTMxNT...,The Dark Knight,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
3,https://m.media-amazon.com/images/M/MV5BMWMwMG...,The Godfather: Part II,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000
4,https://m.media-amazon.com/images/M/MV5BMWU4N2...,12 Angry Men,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000


If we don't specify the index, the index of the dataFrame will be default (starts from 0). But we can specify the index as following:

In [53]:
imdb_df_indexed = pd.read_csv('sample_data/imdb_top_1000.csv', index_col='Series_Title')
# or you can do like this:
# imdb_df_indexed = imdb_df.set_index('Series_Title')

### View the data

In [54]:
imdb_df_indexed.head()

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Shawshank Redemption,https://m.media-amazon.com/images/M/MV5BMDFkYT...,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
The Godfather,https://m.media-amazon.com/images/M/MV5BM2MyNj...,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
The Dark Knight,https://m.media-amazon.com/images/M/MV5BMTMxNT...,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
The Godfather: Part II,https://m.media-amazon.com/images/M/MV5BMWMwMG...,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000
12 Angry Men,https://m.media-amazon.com/images/M/MV5BMWU4N2...,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000


What if I want to see the last 3?

In [55]:
imdb_df_indexed.tail(3)

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
From Here to Eternity,https://m.media-amazon.com/images/M/MV5BM2U3Yz...,1953,Passed,118 min,"Drama, Romance, War",7.6,"In Hawaii in 1941, a private is cruelly punish...",85.0,Fred Zinnemann,Burt Lancaster,Montgomery Clift,Deborah Kerr,Donna Reed,43374,30500000.0
Lifeboat,https://m.media-amazon.com/images/M/MV5BZTBmMj...,1944,,97 min,"Drama, War",7.6,Several survivors of a torpedoed merchant ship...,78.0,Alfred Hitchcock,Tallulah Bankhead,John Hodiak,Walter Slezak,William Bendix,26471,
The 39 Steps,https://m.media-amazon.com/images/M/MV5BMTY5OD...,1935,,86 min,"Crime, Mystery, Thriller",7.6,A man in London tries to help a counter-espion...,93.0,Alfred Hitchcock,Robert Donat,Madeleine Carroll,Lucie Mannheim,Godfrey Tearle,51853,


In [56]:
imdb_df_indexed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, The Shawshank Redemption to The 39 Steps
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Poster_Link    1000 non-null   object 
 1   Released_Year  1000 non-null   object 
 2   Certificate    899 non-null    object 
 3   Runtime        1000 non-null   object 
 4   Genre          1000 non-null   object 
 5   IMDB_Rating    1000 non-null   float64
 6   Overview       1000 non-null   object 
 7   Meta_score     843 non-null    float64
 8   Director       1000 non-null   object 
 9   Star1          1000 non-null   object 
 10  Star2          1000 non-null   object 
 11  Star3          1000 non-null   object 
 12  Star4          1000 non-null   object 
 13  No_of_Votes    1000 non-null   int64  
 14  Gross          831 non-null    object 
dtypes: float64(2), int64(1), object(12)
memory usage: 125.0+ KB


In [57]:
imdb_df_indexed.shape

(1000, 15)

In [58]:
imdb_df_indexed.describe() # a quick analysis of integer/double variables

Unnamed: 0,IMDB_Rating,Meta_score,No_of_Votes
count,1000.0,843.0,1000.0
mean,7.9493,77.97153,273692.9
std,0.275491,12.376099,327372.7
min,7.6,28.0,25088.0
25%,7.7,70.0,55526.25
50%,7.9,79.0,138548.5
75%,8.1,87.0,374161.2
max,9.3,100.0,2343110.0


### Extract the data

Working with Columns

In [20]:
# Extract the 'Genre' column
# genre_col = imdb_df_indexed.Genre
# or
genre_col_series = imdb_df_indexed['Genre'] # it returns a Series Object
genre_col_series.head()

Series_Title
The Shawshank Redemption                   Drama
The Godfather                       Crime, Drama
The Dark Knight             Action, Crime, Drama
The Godfather: Part II              Crime, Drama
12 Angry Men                        Crime, Drama
Name: Genre, dtype: object

In [59]:
print(type(genre_col_series))

<class 'pandas.core.series.Series'>


What if I want a DataFrame instead of a Series Object?

In [22]:
genre_col_df = imdb_df_indexed[['Genre']] # it returns a DataFrame Object
print(type(genre_col_df))
genre_col_df.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Genre
Series_Title,Unnamed: 1_level_1
The Shawshank Redemption,Drama
The Godfather,"Crime, Drama"
The Dark Knight,"Action, Crime, Drama"
The Godfather: Part II,"Crime, Drama"
12 Angry Men,"Crime, Drama"


  What if I want to extract mutiple columns?

In [23]:
extracted_cols_df = imdb_df_indexed[['Genre', 'IMDB_Rating', 'Director']]
extracted_cols_df.head()

Unnamed: 0_level_0,Genre,IMDB_Rating,Director
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
The Shawshank Redemption,Drama,9.3,Frank Darabont
The Godfather,"Crime, Drama",9.2,Francis Ford Coppola
The Dark Knight,"Action, Crime, Drama",9.0,Christopher Nolan
The Godfather: Part II,"Crime, Drama",9.0,Francis Ford Coppola
12 Angry Men,"Crime, Drama",9.0,Sidney Lumet


Working with rows

In [24]:
gf = imdb_df_indexed.loc['The Godfather'] # locate by name
gf

Poster_Link      https://m.media-amazon.com/images/M/MV5BM2MyNj...
Released_Year                                                 1972
Certificate                                                      A
Runtime                                                    175 min
Genre                                                 Crime, Drama
IMDB_Rating                                                    9.2
Overview         An organized crime dynasty's aging patriarch t...
Meta_score                                                   100.0
Director                                      Francis Ford Coppola
Star1                                                Marlon Brando
Star2                                                    Al Pacino
Star3                                                   James Caan
Star4                                                 Diane Keaton
No_of_Votes                                                1620367
Gross                                                  134,966

In [25]:
men12 = imdb_df_indexed.iloc[4] # locate by numerical index
men12

Poster_Link      https://m.media-amazon.com/images/M/MV5BMWU4N2...
Released_Year                                                 1957
Certificate                                                      U
Runtime                                                     96 min
Genre                                                 Crime, Drama
IMDB_Rating                                                    9.0
Overview         A jury holdout attempts to prevent a miscarria...
Meta_score                                                    96.0
Director                                              Sidney Lumet
Star1                                                  Henry Fonda
Star2                                                  Lee J. Cobb
Star3                                                Martin Balsam
Star4                                                 John Fiedler
No_of_Votes                                                 689845
Gross                                                    4,360

In [26]:
imdb_df_indexed.head(10)

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Shawshank Redemption,https://m.media-amazon.com/images/M/MV5BMDFkYT...,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
The Godfather,https://m.media-amazon.com/images/M/MV5BM2MyNj...,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
The Dark Knight,https://m.media-amazon.com/images/M/MV5BMTMxNT...,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
The Godfather: Part II,https://m.media-amazon.com/images/M/MV5BMWMwMG...,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000
12 Angry Men,https://m.media-amazon.com/images/M/MV5BMWU4N2...,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000
The Lord of the Rings: The Return of the King,https://m.media-amazon.com/images/M/MV5BNzA5ZD...,2003,U,201 min,"Action, Adventure, Drama",8.9,Gandalf and Aragorn lead the World of Men agai...,94.0,Peter Jackson,Elijah Wood,Viggo Mortensen,Ian McKellen,Orlando Bloom,1642758,377845905
Pulp Fiction,https://m.media-amazon.com/images/M/MV5BNGNhMD...,1994,A,154 min,"Crime, Drama",8.9,"The lives of two mob hitmen, a boxer, a gangst...",94.0,Quentin Tarantino,John Travolta,Uma Thurman,Samuel L. Jackson,Bruce Willis,1826188,107928762
Schindler's List,https://m.media-amazon.com/images/M/MV5BNDE4OT...,1993,A,195 min,"Biography, Drama, History",8.9,"In German-occupied Poland during World War II,...",94.0,Steven Spielberg,Liam Neeson,Ralph Fiennes,Ben Kingsley,Caroline Goodall,1213505,96898818
Inception,https://m.media-amazon.com/images/M/MV5BMjAxMz...,2010,UA,148 min,"Action, Adventure, Sci-Fi",8.8,A thief who steals corporate secrets through t...,74.0,Christopher Nolan,Leonardo DiCaprio,Joseph Gordon-Levitt,Elliot Page,Ken Watanabe,2067042,292576195
Fight Club,https://m.media-amazon.com/images/M/MV5BMmEzNT...,1999,A,139 min,Drama,8.8,An insomniac office worker and a devil-may-car...,66.0,David Fincher,Brad Pitt,Edward Norton,Meat Loaf,Zach Grenier,1854740,37030102


In [60]:
rows = imdb_df_indexed.loc['12 Angry Men':'Fight Club'] # '12 Angry Men' and 'Fight Club' are included
rows

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
12 Angry Men,https://m.media-amazon.com/images/M/MV5BMWU4N2...,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000
The Lord of the Rings: The Return of the King,https://m.media-amazon.com/images/M/MV5BNzA5ZD...,2003,U,201 min,"Action, Adventure, Drama",8.9,Gandalf and Aragorn lead the World of Men agai...,94.0,Peter Jackson,Elijah Wood,Viggo Mortensen,Ian McKellen,Orlando Bloom,1642758,377845905
Pulp Fiction,https://m.media-amazon.com/images/M/MV5BNGNhMD...,1994,A,154 min,"Crime, Drama",8.9,"The lives of two mob hitmen, a boxer, a gangst...",94.0,Quentin Tarantino,John Travolta,Uma Thurman,Samuel L. Jackson,Bruce Willis,1826188,107928762
Schindler's List,https://m.media-amazon.com/images/M/MV5BNDE4OT...,1993,A,195 min,"Biography, Drama, History",8.9,"In German-occupied Poland during World War II,...",94.0,Steven Spielberg,Liam Neeson,Ralph Fiennes,Ben Kingsley,Caroline Goodall,1213505,96898818
Inception,https://m.media-amazon.com/images/M/MV5BMjAxMz...,2010,UA,148 min,"Action, Adventure, Sci-Fi",8.8,A thief who steals corporate secrets through t...,74.0,Christopher Nolan,Leonardo DiCaprio,Joseph Gordon-Levitt,Elliot Page,Ken Watanabe,2067042,292576195
Fight Club,https://m.media-amazon.com/images/M/MV5BMmEzNT...,1999,A,139 min,Drama,8.8,An insomniac office worker and a devil-may-car...,66.0,David Fincher,Brad Pitt,Edward Norton,Meat Loaf,Zach Grenier,1854740,37030102


In [28]:
other_rows = imdb_df_indexed.iloc[0:4] # the row with the index of 4 is not included
other_rows

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Shawshank Redemption,https://m.media-amazon.com/images/M/MV5BMDFkYT...,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
The Godfather,https://m.media-amazon.com/images/M/MV5BM2MyNj...,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
The Dark Knight,https://m.media-amazon.com/images/M/MV5BMTMxNT...,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
The Godfather: Part II,https://m.media-amazon.com/images/M/MV5BMWMwMG...,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000


# Homework

Question 1 (10)

a) Please extract the rows between 'The Green Mile' and 'Se7en'.

b) Please extract the rows between row 25 to row 28 (row 28 is included).

In [29]:
# Question 1
# Part a)
rows_homework = imdb_df_indexed.loc["The Green Mile": "Se7en"]
rows_homework


Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Green Mile,https://m.media-amazon.com/images/M/MV5BMTUxMz...,1999,A,189 min,"Crime, Drama, Fantasy",8.6,The lives of guards on Death Row are affected ...,61.0,Frank Darabont,Tom Hanks,Michael Clarke Duncan,David Morse,Bonnie Hunt,1147794,136801374
La vita è bella,https://m.media-amazon.com/images/M/MV5BYmJmM2...,1997,U,116 min,"Comedy, Drama, Romance",8.6,When an open-minded Jewish librarian and his s...,59.0,Roberto Benigni,Roberto Benigni,Nicoletta Braschi,Giorgio Cantarini,Giustino Durano,623629,57598247
Se7en,https://m.media-amazon.com/images/M/MV5BOTUwOD...,1995,A,127 min,"Crime, Drama, Mystery",8.6,"Two detectives, a rookie and a veteran, hunt a...",65.0,David Fincher,Morgan Freeman,Brad Pitt,Kevin Spacey,Andrew Kevin Walker,1445096,100125643


In [30]:
# Part b)
other_rows_homework = imdb_df_indexed.iloc[25:29]
other_rows_homework

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Green Mile,https://m.media-amazon.com/images/M/MV5BMTUxMz...,1999,A,189 min,"Crime, Drama, Fantasy",8.6,The lives of guards on Death Row are affected ...,61.0,Frank Darabont,Tom Hanks,Michael Clarke Duncan,David Morse,Bonnie Hunt,1147794,136801374
La vita è bella,https://m.media-amazon.com/images/M/MV5BYmJmM2...,1997,U,116 min,"Comedy, Drama, Romance",8.6,When an open-minded Jewish librarian and his s...,59.0,Roberto Benigni,Roberto Benigni,Nicoletta Braschi,Giorgio Cantarini,Giustino Durano,623629,57598247
Se7en,https://m.media-amazon.com/images/M/MV5BOTUwOD...,1995,A,127 min,"Crime, Drama, Mystery",8.6,"Two detectives, a rookie and a veteran, hunt a...",65.0,David Fincher,Morgan Freeman,Brad Pitt,Kevin Spacey,Andrew Kevin Walker,1445096,100125643
The Silence of the Lambs,https://m.media-amazon.com/images/M/MV5BNjNhZT...,1991,A,118 min,"Crime, Drama, Thriller",8.6,A young F.B.I. cadet must receive the help of ...,85.0,Jonathan Demme,Jodie Foster,Anthony Hopkins,Lawrence A. Bonney,Kasi Lemmons,1270197,130742922


✅*score: 10/10* 

Excellent!

Question 2 (10)

Which movie is the oldest? (The earliest Released_Year. If there's a tie, you can return several titles or anyone of them.)


In [31]:
# Question 2
released_year_col_df = imdb_df_indexed[["Released_Year"]]
oldest_movie_year = []
for year in released_year_col_df.Released_Year:
  oldest_movie_year.append(year)
oldest_movie = min(oldest_movie_year)
released_year_col_df[released_year_col_df["Released_Year"] == min(oldest_movie_year)]



Unnamed: 0_level_0,Released_Year
Series_Title,Unnamed: 1_level_1
Das Cabinet des Dr. Caligari,1920


✅*score: 10/10* 

Excellent! The last line you can just:

released_year_col_df[released_year_col_df["Released_Year"] == oldest_movie]

Is there an easier way to get the movie has the lowest IMDB_rating?

In [61]:
imdb_df_indexed.describe()

Unnamed: 0,IMDB_Rating,Meta_score,No_of_Votes
count,1000.0,843.0,1000.0
mean,7.9493,77.97153,273692.9
std,0.275491,12.376099,327372.7
min,7.6,28.0,25088.0
25%,7.7,70.0,55526.25
50%,7.9,79.0,138548.5
75%,8.1,87.0,374161.2
max,9.3,100.0,2343110.0


In [62]:
imdb_df_indexed[imdb_df_indexed["IMDB_Rating"] == 7.6]

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Dark Waters,https://m.media-amazon.com/images/M/MV5BODQ0M2...,2019,PG-13,126 min,"Biography, Drama, History",7.6,A corporate defense attorney takes on an envir...,73.0,Todd Haynes,Mark Ruffalo,Anne Hathaway,Tim Robbins,Bill Pullman,60408,
Searching,https://m.media-amazon.com/images/M/MV5BMjIwOT...,2018,U/A,102 min,"Drama, Mystery, Thriller",7.6,"After his teenage daughter goes missing, a des...",71.0,Aneesh Chaganty,John Cho,Debra Messing,Joseph Lee,Michelle La,140840,26020957
Once Upon a Time... in Hollywood,https://m.media-amazon.com/images/M/MV5BOTg4ZT...,2019,A,161 min,"Comedy, Drama",7.6,A faded television actor and his stunt double ...,83.0,Quentin Tarantino,Leonardo DiCaprio,Brad Pitt,Margot Robbie,Emile Hirsch,551309,142502728
Nelyubov,https://m.media-amazon.com/images/M/MV5BNzk2Nm...,2017,R,127 min,Drama,7.6,A couple going through a divorce must team up ...,86.0,Andrey Zvyagintsev,Maryana Spivak,Aleksey Rozin,Matvey Novikov,Marina Vasileva,29765,566356
The Florida Project,https://m.media-amazon.com/images/M/MV5BMjg4Zm...,2017,A,111 min,Drama,7.6,"Set over one summer, the film follows precocio...",92.0,Sean Baker,Brooklynn Prince,Bria Vinaite,Willem Dafoe,Christopher Rivera,95181,5904366
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Breakfast at Tiffany's,https://m.media-amazon.com/images/M/MV5BNGEwMT...,1961,A,115 min,"Comedy, Drama, Romance",7.6,A young New York socialite becomes interested ...,76.0,Blake Edwards,Audrey Hepburn,George Peppard,Patricia Neal,Buddy Ebsen,166544,
Giant,https://m.media-amazon.com/images/M/MV5BODk3Yj...,1956,G,201 min,"Drama, Western",7.6,Sprawling epic covering the life of a Texas ca...,84.0,George Stevens,Elizabeth Taylor,Rock Hudson,James Dean,Carroll Baker,34075,
From Here to Eternity,https://m.media-amazon.com/images/M/MV5BM2U3Yz...,1953,Passed,118 min,"Drama, Romance, War",7.6,"In Hawaii in 1941, a private is cruelly punish...",85.0,Fred Zinnemann,Burt Lancaster,Montgomery Clift,Deborah Kerr,Donna Reed,43374,30500000
Lifeboat,https://m.media-amazon.com/images/M/MV5BZTBmMj...,1944,,97 min,"Drama, War",7.6,Several survivors of a torpedoed merchant ship...,78.0,Alfred Hitchcock,Tallulah Bankhead,John Hodiak,Walter Slezak,William Bendix,26471,


Question 3 (10)

Please filter our movie dataFrame to show only movies from 2016.


In [32]:
# Quetion 3
imdb_df_indexed[imdb_df_indexed["Released_Year"] >= 2016] 
# Doesn't work; perhaps because column "Released_Year" is not a float, therefore, will not be converted to be an integer in pandas

TypeError: ignored

✅*score: 9/10* 

This question is challenging. If you use .info(), you can see Released-Year is an Object. Then you can search "convert object to int + pandas" to find the answer.

In [63]:
imdb_df_indexed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, The Shawshank Redemption to The 39 Steps
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Poster_Link    1000 non-null   object 
 1   Released_Year  1000 non-null   object 
 2   Certificate    899 non-null    object 
 3   Runtime        1000 non-null   object 
 4   Genre          1000 non-null   object 
 5   IMDB_Rating    1000 non-null   float64
 6   Overview       1000 non-null   object 
 7   Meta_score     843 non-null    float64
 8   Director       1000 non-null   object 
 9   Star1          1000 non-null   object 
 10  Star2          1000 non-null   object 
 11  Star3          1000 non-null   object 
 12  Star4          1000 non-null   object 
 13  No_of_Votes    1000 non-null   int64  
 14  Gross          831 non-null    object 
dtypes: float64(2), int64(1), object(12)
memory usage: 157.3+ KB


In [65]:
type(imdb_df_indexed['Released_Year'][0])

str

In [66]:
imdb_df_indexed[imdb_df_indexed['Released_Year'].astype(int) == 2016]

ValueError: ignored

But when we can to apply this directly, there are errors. Clearly some value is 'PG'.

We need to find the PG value and replace it.

In [68]:
imdb_df_indexed[~imdb_df_indexed.Released_Year.str.isnumeric()]

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Apollo 13,https://m.media-amazon.com/images/M/MV5BNjEzYj...,PG,U,140 min,"Adventure, Drama, History",7.6,NASA must devise a strategy to return Apollo 1...,77.0,Ron Howard,Tom Hanks,Bill Paxton,Kevin Bacon,Gary Sinise,269197,173837933


In [69]:
imdb_df_indexed.loc['Apollo 13','Released_Year']='1995' # give it a value

In [70]:
imdb_df_indexed['Released_Year'].loc['Apollo 13']

'1995'

In [71]:
imdb_df_indexed[imdb_df_indexed["Released_Year"].astype(int)== 2016]

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Kimi no na wa.,https://m.media-amazon.com/images/M/MV5BODRmZD...,2016,U,106 min,"Animation, Drama, Fantasy",8.4,Two strangers find themselves linked in a biza...,79.0,Makoto Shinkai,Ryûnosuke Kamiki,Mone Kamishiraishi,Ryô Narita,Aoi Yûki,194838,5017246.0
Dangal,https://m.media-amazon.com/images/M/MV5BMTQ4Mz...,2016,U,161 min,"Action, Biography, Drama",8.4,Former wrestler Mahavir Singh Phogat and his t...,,Nitesh Tiwari,Aamir Khan,Sakshi Tanwar,Fatima Sana Shaikh,Sanya Malhotra,156479,12391761.0
Pink,https://m.media-amazon.com/images/M/MV5BNGI1MT...,2016,UA,136 min,"Drama, Thriller",8.1,When three young women are implicated in a cri...,,Aniruddha Roy Chowdhury,Taapsee Pannu,Amitabh Bachchan,Kirti Kulhari,Andrea Tariang,39216,1241223.0
Koe no katachi,https://m.media-amazon.com/images/M/MV5BZGRkOG...,2016,16,130 min,"Animation, Drama, Family",8.1,A young man is ostracized by his classmates af...,78.0,Naoko Yamada,Miyu Irino,Saori Hayami,Aoi Yûki,Kenshô Ono,47708,
Contratiempo,https://m.media-amazon.com/images/M/MV5BMDk0Yz...,2016,TV-MA,106 min,"Crime, Drama, Mystery",8.1,A successful entrepreneur accused of murder an...,,Oriol Paulo,Mario Casas,Ana Wagener,Jose Coronado,Bárbara Lennie,141516,
Ah-ga-ssi,https://m.media-amazon.com/images/M/MV5BNDJhYT...,2016,A,145 min,"Drama, Romance, Thriller",8.1,A woman is hired as a handmaiden to a Japanese...,84.0,Chan-wook Park,Kim Min-hee,Jung-woo Ha,Cho Jin-woong,Moon So-Ri,113649,2006788.0
Hacksaw Ridge,https://m.media-amazon.com/images/M/MV5BMjQ1Nj...,2016,A,139 min,"Biography, Drama, History",8.1,World War II American Army Medic Desmond T. Do...,71.0,Mel Gibson,Andrew Garfield,Sam Worthington,Luke Bracey,Teresa Palmer,435928,67209615.0
Airlift,https://m.media-amazon.com/images/M/MV5BMGE1ZT...,2016,UA,130 min,"Drama, History",8.0,"When Iraq invades Kuwait in August 1990, a cal...",,Raja Menon,Akshay Kumar,Nimrat Kaur,Kumud Mishra,Prakash Belawadi,52897,
La La Land,https://m.media-amazon.com/images/M/MV5BMzUzND...,2016,A,128 min,"Comedy, Drama, Music",8.0,"While navigating their careers in Los Angeles,...",94.0,Damien Chazelle,Ryan Gosling,Emma Stone,Rosemarie DeWitt,J.K. Simmons,505918,151101803.0
Lion,https://m.media-amazon.com/images/M/MV5BMjA3Nj...,2016,U,118 min,"Biography, Drama",8.0,A five-year-old Indian boy is adopted by an Au...,69.0,Garth Davis,Dev Patel,Nicole Kidman,Rooney Mara,Sunny Pawar,213970,51739495.0


Question 4 (10)

How many movies have a IMDB_Rating higher than 8.5 (8.5 is included)?

In [None]:
imdb_df_indexed[imdb_df_indexed["IMDB_Rating"] >= 8.5]

Unnamed: 0_level_0,Poster_Link,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
Series_Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The Shawshank Redemption,https://m.media-amazon.com/images/M/MV5BMDFkYT...,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469.0
The Godfather,https://m.media-amazon.com/images/M/MV5BM2MyNj...,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411.0
The Dark Knight,https://m.media-amazon.com/images/M/MV5BMTMxNT...,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444.0
The Godfather: Part II,https://m.media-amazon.com/images/M/MV5BMWMwMG...,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000.0
12 Angry Men,https://m.media-amazon.com/images/M/MV5BMWU4N2...,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000.0
The Lord of the Rings: The Return of the King,https://m.media-amazon.com/images/M/MV5BNzA5ZD...,2003,U,201 min,"Action, Adventure, Drama",8.9,Gandalf and Aragorn lead the World of Men agai...,94.0,Peter Jackson,Elijah Wood,Viggo Mortensen,Ian McKellen,Orlando Bloom,1642758,377845905.0
Pulp Fiction,https://m.media-amazon.com/images/M/MV5BNGNhMD...,1994,A,154 min,"Crime, Drama",8.9,"The lives of two mob hitmen, a boxer, a gangst...",94.0,Quentin Tarantino,John Travolta,Uma Thurman,Samuel L. Jackson,Bruce Willis,1826188,107928762.0
Schindler's List,https://m.media-amazon.com/images/M/MV5BNDE4OT...,1993,A,195 min,"Biography, Drama, History",8.9,"In German-occupied Poland during World War II,...",94.0,Steven Spielberg,Liam Neeson,Ralph Fiennes,Ben Kingsley,Caroline Goodall,1213505,96898818.0
Inception,https://m.media-amazon.com/images/M/MV5BMjAxMz...,2010,UA,148 min,"Action, Adventure, Sci-Fi",8.8,A thief who steals corporate secrets through t...,74.0,Christopher Nolan,Leonardo DiCaprio,Joseph Gordon-Levitt,Elliot Page,Ken Watanabe,2067042,292576195.0
Fight Club,https://m.media-amazon.com/images/M/MV5BMmEzNT...,1999,A,139 min,Drama,8.8,An insomniac office worker and a devil-may-car...,66.0,David Fincher,Brad Pitt,Edward Norton,Meat Loaf,Zach Grenier,1854740,37030102.0


✅*score: 10/10* 

Excellent! 