## Series objects

The three key data structures in pandas are:

- **Series objects** (collections of values)  
- DataFrames (collections of Series objects)  
- Panels (collections of DataFrame objects)  

Series objects use NumPy arrays for fast computation, but add valuable features to them for analyzing data. While NumPy arrays use an integer index, for example, Series objects can use other index types, such as a string index. Series objects also allow for mixed data types, and use the NaN Python value for handling missing values.  
  
A Series object can hold many data types, including:  
  
- float - For float values  
- int - For integer values  
- bool - For Boolean values  
- datetime64[ns] - For date & time, without time zone  
- datetime64[ns, tz] - For date & time, with time zone  
- timedelta[ns] - For representing differences in dates & times (seconds, minutes, etc.)  
- category - For categorical values  
- object - For string values  

In [2]:
import pandas as pd
import numpy as np
fandango = pd.read_csv("data/fandango_score_comparison.csv")
fandango.head(2)

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5


In [118]:
# 2: Integer Indexes
series_film = fandango["FILM"]
print(series_film.head())
series_rt = fandango["RottenTomatoes"]
print(series_rt.head())

0    Avengers: Age of Ultron (2015)
1                 Cinderella (2015)
2                    Ant-Man (2015)
3            Do You Believe? (2015)
4     Hot Tub Time Machine 2 (2015)
Name: FILM, dtype: object
0    74
1    85
2    80
3    18
4    14
Name: RottenTomatoes, dtype: int64


In [58]:
# 3: Custom Indexes
# Import the Series object from pandas
from pandas import Series
# When it comes to indexes, Series objects act like both dictionaries and lists. 
film_names = series_film.values
rt_scores = series_rt.values
series_custom = Series(rt_scores , index=film_names)
series_custom[['Minions (2015)', 'Leviathan (2014)']]

Minions (2015)      54
Leviathan (2014)    99
dtype: int64

In [15]:
# 4: Integer Index Preservation
fiveten = series_custom[5:10]
fiveten

The Water Diviner (2015)        63
Irrational Man (2015)           42
Top Five (2014)                 86
Shaun the Sheep Movie (2015)    99
Love & Mercy (2015)             89
dtype: int64

In [60]:
# 5: Reindexing
original_index = series_custom.index.tolist()
sorted_index = sorted(original_index)
sorted_by_index = series_custom.reindex(sorted_index)

print(series_custom[:2],'\n')
print(series_custom.index[:2],'\n')
print(series_custom.index.tolist()[:2],'\n')
print(sorted_index[:2],'\n')
print(sorted_by_index[:2])

Avengers: Age of Ultron (2015)    74
Cinderella (2015)                 85
dtype: int64 

Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)'], dtype='object') 

['Avengers: Age of Ultron (2015)', 'Cinderella (2015)'] 

["'71 (2015)", '5 Flights Up (2015)'] 

'71 (2015)             97
5 Flights Up (2015)    52
dtype: int64


In [66]:
# 6: Sorting (data alignment)
sc2 = series_custom.sort_index()
sc2.head(2)
sc3 = series_custom.sort_values()
sc3.head(2)

Paul Blart: Mall Cop 2 (2015)    5
Hitman: Agent 47 (2015)          7
dtype: int64

In [68]:
# 7: Transforming Columns with Vectorized Operations
series_normalized = series_custom / 20 # normalize to 0-5
series_normalized.head(2)

Avengers: Age of Ultron (2015)    3.70
Cinderella (2015)                 4.25
dtype: float64

In [77]:
# 8: Comparing and Filtering
criteria_one = series_custom > 97
criteria_two = series_custom <= 100
both_criteria = series_custom[criteria_one & criteria_two]
both_criteria.head(3)

Shaun the Sheep Movie (2015)    99
Leviathan (2014)                99
Selma (2014)                    99
dtype: int64

In [81]:
# 9: Alignment
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = (rt_critics + rt_users) / 2
rt_mean.head(2)

FILM
Avengers: Age of Ultron (2015)    80.0
Cinderella (2015)                 82.5
dtype: float64

## Dataframes

The three key data structures in pandas are:

- Series objects (collections of values)  
- **DataFrames** (collections of Series objects)  
- Panels (collections of DataFrame objects)  

In [86]:
# 1: Shared Indexes
fandango = pd.read_csv("data/fandango_score_comparison.csv")
print(fandango.head(2))
print(fandango.index)

                             FILM  RottenTomatoes  RottenTomatoes_User  \
0  Avengers: Age of Ultron (2015)              74                   86   
1               Cinderella (2015)              85                   80   

   Metacritic  Metacritic_User  IMDB  Fandango_Stars  Fandango_Ratingvalue  \
0          66              7.1   7.8             5.0                   4.5   
1          67              7.5   7.1             5.0                   4.5   

   RT_norm  RT_user_norm         ...           IMDB_norm  RT_norm_round  \
0     3.70           4.3         ...                3.90            3.5   
1     4.25           4.0         ...                3.55            4.5   

   RT_user_norm_round  Metacritic_norm_round  Metacritic_user_norm_round  \
0                 4.5                    3.5                         3.5   
1                 4.0                    3.5                         4.0   

   IMDB_norm_round  Metacritic_user_vote_count  IMDB_user_vote_count  \
0              

In [116]:
# 2: Using Integer Indexes to Select Rows
last_row = fandango.shape[0] - 1 # shape 0 is row number
first_last = fandango.iloc[[0, last_row]]
first_last

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
145,"Kumiko, The Treasure Hunter (2015)",87,63,68,6.4,6.7,3.5,3.5,4.35,3.15,...,3.35,4.5,3.0,3.5,3.0,3.5,19,5289,41,0.0


In [127]:
# 3: Using Custom Indexes
fandango_films = fandango.set_index(fandango['FILM'],drop=False)
fandango_films.index

Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)',
       'Do You Believe? (2015)', 'Hot Tub Time Machine 2 (2015)',
       'The Water Diviner (2015)', 'Irrational Man (2015)', 'Top Five (2014)',
       'Shaun the Sheep Movie (2015)', 'Love & Mercy (2015)',
       ...
       'The Woman In Black 2 Angel of Death (2015)', 'Danny Collins (2015)',
       'Spare Parts (2015)', 'Serena (2015)', 'Inside Out (2015)',
       'Mr. Holmes (2015)', ''71 (2015)', 'Two Days, One Night (2014)',
       'Gett: The Trial of Viviane Amsalem (2015)',
       'Kumiko, The Treasure Hunter (2015)'],
      dtype='object', name='FILM', length=146)

In [137]:
# 4: Using a Custom Index for Selection
movies = ["The Lazarus Effect (2015)", "Gett: The Trial of Viviane Amsalem (2015)", "Mr. Holmes (2015)"]
best_movies_ever = fandango_films.loc[movies]
best_movies_ever

Unnamed: 0_level_0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
FILM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Lazarus Effect (2015),The Lazarus Effect (2015),14,23,31,4.9,5.2,3.0,3.0,0.7,1.15,...,2.6,0.5,1.0,1.5,2.5,2.5,62,17691,1651,0.0
Gett: The Trial of Viviane Amsalem (2015),Gett: The Trial of Viviane Amsalem (2015),100,81,90,7.3,7.8,3.5,3.5,5.0,4.05,...,3.9,5.0,4.0,4.5,3.5,4.0,19,1955,59,0.0
Mr. Holmes (2015),Mr. Holmes (2015),87,78,67,7.9,7.4,4.0,4.0,4.35,3.9,...,3.7,4.5,4.0,3.5,4.0,3.5,33,7367,1348,0.0


In [145]:
# 5: Apply() Logic Over the Columns in a Dataframe

# returns the data types as a Series
types = fandango_films.dtypes
# filter data types to just floats, index attributes returns just column names
float_columns = types[types.values == 'float64'].index
# use bracket notation to filter columns to just float columns
float_df = fandango_films[float_columns]

# `x` is a Series object representing a column
deviations = float_df.apply(lambda x: np.std(x))
print(deviations)

Metacritic_User               1.505529
IMDB                          0.955447
Fandango_Stars                0.538532
Fandango_Ratingvalue          0.501106
RT_norm                       1.503265
RT_user_norm                  0.997787
Metacritic_norm               0.972522
Metacritic_user_nom           0.752765
IMDB_norm                     0.477723
RT_norm_round                 1.509404
RT_user_norm_round            1.003559
Metacritic_norm_round         0.987561
Metacritic_user_norm_round    0.785412
IMDB_norm_round               0.501043
Fandango_Difference           0.152141
dtype: float64


In [149]:
# 6: Apply Logic Over Columns: Practice
double_df = float_df.apply(lambda x: x*2)
print(double_df.head(1))
halved_df = float_df.apply(lambda x: x/2)
print(halved_df.head(1))

                                Metacritic_User  IMDB  Fandango_Stars  \
FILM                                                                    
Avengers: Age of Ultron (2015)             14.2  15.6            10.0   

                                Fandango_Ratingvalue  RT_norm  RT_user_norm  \
FILM                                                                          
Avengers: Age of Ultron (2015)                   9.0      7.4           8.6   

                                Metacritic_norm  Metacritic_user_nom  \
FILM                                                                   
Avengers: Age of Ultron (2015)              6.6                  7.1   

                                IMDB_norm  RT_norm_round  RT_user_norm_round  \
FILM                                                                           
Avengers: Age of Ultron (2015)        7.8            7.0                 9.0   

                                Metacritic_norm_round  \
FILM                       

In [156]:
# 7: Apply() Over Dataframe Rows
rt_mt_user = float_df[['RT_user_norm', 'Metacritic_user_nom']]
rt_mt_deviations = rt_mt_user.apply(lambda x: np.std(x), axis=1)
print(rt_mt_deviations[0:5])
rt_mt_means = rt_mt_user.apply(lambda x: np.mean(x), axis=1)
rt_mt_means

FILM
Avengers: Age of Ultron (2015)    0.375
Cinderella (2015)                 0.125
Ant-Man (2015)                    0.225
Do You Believe? (2015)            0.925
Hot Tub Time Machine 2 (2015)     0.150
dtype: float64


FILM
Avengers: Age of Ultron (2015)                    3.925
Cinderella (2015)                                 3.875
Ant-Man (2015)                                    4.275
Do You Believe? (2015)                            3.275
Hot Tub Time Machine 2 (2015)                     1.550
The Water Diviner (2015)                          3.250
Irrational Man (2015)                             3.225
Top Five (2014)                                   3.300
Shaun the Sheep Movie (2015)                      4.250
Love & Mercy (2015)                               4.300
Far From The Madding Crowd (2015)                 3.800
Black Sea (2015)                                  3.150
Leviathan (2014)                                  3.775
Unbroken (2014)                                   3.375
The Imitation Game (2014)                         4.350
Taken 3 (2015)                                    2.300
Ted 2 (2015)                                      3.075
Southpaw (2015)                            