# Pandas Intro

**pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. (https://pandas.pydata.org/)

## Series
![series](series.png)

The primary two building blocks of **pandas** are the `Series` and `DataFrame`.

A `Series` is essentially a column of data with
* a name
* a row index
* a datatype

0    3
1    2
2    0
3    1
dtype: int64

1    3
2    2
3    0
4    1
Name: apple, dtype: int64

XBX    1998
EP     1912
PKA    1939
dtype: int64

dtype('int64')

## DataFrame

A `DataFrame` is a collection of 1 or more Series, hence, a 2-dimensional table of data with
* a Series per column
* a shared index for all the columns
* a name

![](series-and-dataframe.png)

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


### Components of a `DataFrame`

In [25]:
# index

Index(['apples', 'oranges'], dtype='object')

In [31]:
# custom index

Unnamed: 0,apples,oranges
June,3,0
Robert,2,3
Lily,0,7
David,1,2


In [32]:
# data

array([[3, 0],
       [2, 3],
       [0, 7],
       [1, 2]])

In [33]:
# datatypes

apples     int64
oranges    int64
dtype: object

## Reading in a csv

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [37]:
# head

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


In [52]:
# sample

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4858,Color,Edward Burns,36.0,98.0,0.0,73.0,Michael McGlone,138.0,10246600.0,Comedy|Drama|Romance,...,36.0,English,USA,R,25000.0,1995.0,111.0,6.6,1.85,265
1249,Color,Diane Keaton,82.0,94.0,0.0,235.0,Celia Weston,374.0,36037909.0,Comedy|Drama,...,130.0,English,Germany,PG-13,60000000.0,2000.0,258.0,4.7,1.85,390
1076,Color,Antoine Fuqua,109.0,122.0,845.0,854.0,Snoop Dogg,18000.0,76261036.0,Crime|Drama|Thriller,...,633.0,English,USA,R,45000000.0,2001.0,881.0,7.7,2.35,0
3274,Color,Scott Cooper,273.0,112.0,108.0,175.0,Beth Grant,12000.0,39462438.0,Drama|Music|Romance,...,226.0,English,USA,R,7000000.0,2009.0,628.0,7.3,2.35,0
2587,Color,Gillian Armstrong,27.0,115.0,44.0,902.0,Kirsten Dunst,23000.0,50003300.0,Drama|Family|Romance,...,132.0,English,USA,PG,15000000.0,1994.0,4000.0,7.3,1.85,0
934,Color,Chris Columbus,65.0,124.0,0.0,701.0,Liam Aiken,8000.0,91030827.0,Comedy|Drama,...,252.0,English,USA,PG-13,50000000.0,1998.0,818.0,6.7,2.35,0
4138,Color,Steve Miner,235.0,91.0,49.0,20.0,Dana Kimmell,72.0,36200000.0,Horror|Thriller,...,372.0,English,USA,R,4000000.0,1982.0,31.0,5.7,2.35,0
4719,Color,Jean-Luc Godard,96.0,110.0,0.0,0.0,Anna Karina,710.0,,Crime|Drama|Romance,...,74.0,French,France,Not Rated,300000.0,1965.0,257.0,7.7,2.35,2000


In [38]:
# shape

(4916, 28)

## Selecting a column

In [54]:
# index operator ([] notation)

# attribute acccess (dot notation)


0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

## Selecting rows and columns

In [45]:
# select row with index 0


color                                                                    Color
director_name                                                    James Cameron
num_critic_for_reviews                                                     723
duration                                                                   178
director_facebook_likes                                                      0
actor_3_facebook_likes                                                     855
actor_2_name                                                  Joel David Moore
actor_1_facebook_likes                                                    1000
gross                                                              7.60506e+08
genres                                         Action|Adventure|Fantasy|Sci-Fi
actor_1_name                                                       CCH Pounder
movie_title                                                             Avatar
num_voted_users                                     

In [46]:
# select row at location 3

color                                                                    Color
director_name                                                Christopher Nolan
num_critic_for_reviews                                                     813
duration                                                                   164
director_facebook_likes                                                  22000
actor_3_facebook_likes                                                   23000
actor_2_name                                                    Christian Bale
actor_1_facebook_likes                                                   27000
gross                                                              4.48131e+08
genres                                                         Action|Thriller
actor_1_name                                                         Tom Hardy
movie_title                                              The Dark Knight Rises
num_voted_users                                     

In [47]:
# select index 0 and direcor name

movies_df.loc[0, 'director_name']

'James Cameron'

In [48]:
# select index 0 and movie title

movies_df.loc[0, 'movie_title']

'Avatar'

In [49]:
# select all rows and column director name

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

### `.value-counts()`

In [56]:
# get a count of all the directors

Steven Spielberg    26
Woody Allen         22
Clint Eastwood      20
Martin Scorsese     20
Ridley Scott        16
                    ..
Benh Zeitlin         1
Erik Canuel          1
Marc F. Adler        1
John H. Lee          1
Raja Menon           1
Name: director_name, Length: 2397, dtype: int64

In [57]:
# size()

4916

In [58]:
# shape

(4916,)

In [59]:
# len()

4916

How many distinct directors are in the dataset?

In [62]:
# unique()

2398