# Python: Pandas series

**Goal**: Structure and sort data with pandas series!

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Indexing-using-integers" data-toc-modified-id="Indexing-using-integers-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Indexing using integers</a></span></li><li><span><a href="#Customize-indexing" data-toc-modified-id="Customize-indexing-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Customize indexing</a></span></li><li><span><a href="#Reindex-series-object" data-toc-modified-id="Reindex-series-object-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Reindex series object</a></span></li><li><span><a href="#Sorting-series-object" data-toc-modified-id="Sorting-series-object-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Sorting series object</a></span></li><li><span><a href="#Column-transformation" data-toc-modified-id="Column-transformation-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Column transformation</a></span></li><li><span><a href="#Compare-&amp;-Filter" data-toc-modified-id="Compare-&amp;-Filter-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Compare &amp; Filter</a></span></li><li><span><a href="#Data-alignment" data-toc-modified-id="Data-alignment-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Data alignment</a></span></li></ul></div>

## Introduction

In this chapter, we will explore more about the ``series`` objects of pandas. The main pandas objects are ``series`` which is a sequence of values, ``dataframes`` which are a set of series objects and ``panels`` which are a set of dataframe objects. As a reminder, series objects use numpy arrays for faster computation results.

In [1]:
import pandas as pd

In [2]:
fandango = pd.read_csv("fandango_score_comparison.csv")
fandango.head()

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,...,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5
3,Do You Believe? (2015),18,84,22,4.7,5.4,5.0,4.5,0.9,4.2,...,2.7,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5
4,Hot Tub Time Machine 2 (2015),14,28,29,3.4,5.1,3.5,3.0,0.7,1.4,...,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5


## Indexing using integers

In [3]:
series_film = fandango["FILM"]
series_film[0:5]

0    Avengers: Age of Ultron (2015)
1                 Cinderella (2015)
2                    Ant-Man (2015)
3            Do You Believe? (2015)
4     Hot Tub Time Machine 2 (2015)
Name: FILM, dtype: object

In [4]:
series_rt = fandango["RottenTomatoes"]
series_rt[0:5]

0    74
1    85
2    80
3    18
4    14
Name: RottenTomatoes, dtype: int64

## Customize indexing

In [5]:
series_film[5], series_rt[5]

('The Water Diviner (2015)', 63)

In [6]:
series_custom = pd.Series(data=series_rt.values, index=series_film.values)
series_custom

Avengers: Age of Ultron (2015)                74
Cinderella (2015)                             85
Ant-Man (2015)                                80
Do You Believe? (2015)                        18
Hot Tub Time Machine 2 (2015)                 14
                                            ... 
Mr. Holmes (2015)                             87
'71 (2015)                                    97
Two Days, One Night (2014)                    97
Gett: The Trial of Viviane Amsalem (2015)    100
Kumiko, The Treasure Hunter (2015)            87
Length: 146, dtype: int64

In [7]:
series_custom[["Cinderella (2015)", "Ant-Man (2015)"]]

Cinderella (2015)    85
Ant-Man (2015)       80
dtype: int64

In [8]:
series_custom[[1, 2]]

Cinderella (2015)    85
Ant-Man (2015)       80
dtype: int64

## Reindex series object

In [9]:
original_index = series_custom.index
original_index

Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)',
       'Do You Believe? (2015)', 'Hot Tub Time Machine 2 (2015)',
       'The Water Diviner (2015)', 'Irrational Man (2015)', 'Top Five (2014)',
       'Shaun the Sheep Movie (2015)', 'Love & Mercy (2015)',
       ...
       'The Woman In Black 2 Angel of Death (2015)', 'Danny Collins (2015)',
       'Spare Parts (2015)', 'Serena (2015)', 'Inside Out (2015)',
       'Mr. Holmes (2015)', ''71 (2015)', 'Two Days, One Night (2014)',
       'Gett: The Trial of Viviane Amsalem (2015)',
       'Kumiko, The Treasure Hunter (2015)'],
      dtype='object', length=146)

In [10]:
sorted_original_index = sorted(original_index)
print(sorted_original_index)

["'71 (2015)", '5 Flights Up (2015)', 'A Little Chaos (2015)', 'A Most Violent Year (2014)', 'About Elly (2015)', 'Aloha (2015)', 'American Sniper (2015)', 'American Ultra (2015)', 'Amy (2015)', 'Annie (2014)', 'Ant-Man (2015)', 'Avengers: Age of Ultron (2015)', 'Big Eyes (2014)', 'Birdman (2014)', 'Black Sea (2015)', 'Black or White (2015)', 'Blackhat (2015)', 'Cake (2015)', 'Chappie (2015)', 'Child 44 (2015)', 'Cinderella (2015)', 'Clouds of Sils Maria (2015)', 'Danny Collins (2015)', 'Dark Places (2015)', 'Do You Believe? (2015)', 'Dope (2015)', 'Entourage (2015)', 'Escobar: Paradise Lost (2015)', 'Ex Machina (2015)', 'Fantastic Four (2015)', 'Far From The Madding Crowd (2015)', 'Fifty Shades of Grey (2015)', 'Focus (2015)', 'Furious 7 (2015)', 'Get Hard (2015)', 'Gett: The Trial of Viviane Amsalem (2015)', 'Hitman: Agent 47 (2015)', 'Home (2015)', 'Hot Pursuit (2015)', 'Hot Tub Time Machine 2 (2015)', "I'll See You In My Dreams (2015)", 'Infinitely Polar Bear (2015)', 'Inherent Vic

In [11]:
reindexed_series_custom = series_custom.reindex(sorted_original_index)
reindexed_series_custom

'71 (2015)                          97
5 Flights Up (2015)                 52
A Little Chaos (2015)               40
A Most Violent Year (2014)          90
About Elly (2015)                   97
                                    ..
What We Do in the Shadows (2015)    96
When Marnie Was There (2015)        89
While We're Young (2015)            83
Wild Tales (2014)                   96
Woman in Gold (2015)                52
Length: 146, dtype: int64

## Sorting series object

In [12]:
series_custom_by_index = series_custom.sort_index()
series_custom_by_index

'71 (2015)                          97
5 Flights Up (2015)                 52
A Little Chaos (2015)               40
A Most Violent Year (2014)          90
About Elly (2015)                   97
                                    ..
What We Do in the Shadows (2015)    96
When Marnie Was There (2015)        89
While We're Young (2015)            83
Wild Tales (2014)                   96
Woman in Gold (2015)                52
Length: 146, dtype: int64

In [13]:
series_custom_by_values = series_custom.sort_values()
series_custom_by_values

Paul Blart: Mall Cop 2 (2015)                  5
Hitman: Agent 47 (2015)                        7
Hot Pursuit (2015)                             8
Fantastic Four (2015)                          9
Taken 3 (2015)                                 9
                                            ... 
Song of the Sea (2014)                        99
Phoenix (2015)                                99
Selma (2014)                                  99
Seymour: An Introduction (2015)              100
Gett: The Trial of Viviane Amsalem (2015)    100
Length: 146, dtype: int64

## Column transformation

In [14]:
import numpy as np

In [15]:
series_custom / 10

Avengers: Age of Ultron (2015)                7.4
Cinderella (2015)                             8.5
Ant-Man (2015)                                8.0
Do You Believe? (2015)                        1.8
Hot Tub Time Machine 2 (2015)                 1.4
                                             ... 
Mr. Holmes (2015)                             8.7
'71 (2015)                                    9.7
Two Days, One Night (2014)                    9.7
Gett: The Trial of Viviane Amsalem (2015)    10.0
Kumiko, The Treasure Hunter (2015)            8.7
Length: 146, dtype: float64

In [16]:
np.add(series_custom, series_custom)

Avengers: Age of Ultron (2015)               148
Cinderella (2015)                            170
Ant-Man (2015)                               160
Do You Believe? (2015)                        36
Hot Tub Time Machine 2 (2015)                 28
                                            ... 
Mr. Holmes (2015)                            174
'71 (2015)                                   194
Two Days, One Night (2014)                   194
Gett: The Trial of Viviane Amsalem (2015)    200
Kumiko, The Treasure Hunter (2015)           174
Length: 146, dtype: int64

In [17]:
np.sin(series_custom)

Avengers: Age of Ultron (2015)              -0.985146
Cinderella (2015)                           -0.176076
Ant-Man (2015)                              -0.993889
Do You Believe? (2015)                      -0.750987
Hot Tub Time Machine 2 (2015)                0.990607
                                               ...   
Mr. Holmes (2015)                           -0.821818
'71 (2015)                                   0.379608
Two Days, One Night (2014)                   0.379608
Gett: The Trial of Viviane Amsalem (2015)   -0.506366
Kumiko, The Treasure Hunter (2015)          -0.821818
Length: 146, dtype: float64

In [18]:
np.max(series_custom)

100

In [19]:
series_normalized = (series_custom / 20)
series_normalized

Avengers: Age of Ultron (2015)               3.70
Cinderella (2015)                            4.25
Ant-Man (2015)                               4.00
Do You Believe? (2015)                       0.90
Hot Tub Time Machine 2 (2015)                0.70
                                             ... 
Mr. Holmes (2015)                            4.35
'71 (2015)                                   4.85
Two Days, One Night (2014)                   4.85
Gett: The Trial of Viviane Amsalem (2015)    5.00
Kumiko, The Treasure Hunter (2015)           4.35
Length: 146, dtype: float64

## Compare & Filter

In [20]:
series_custom > 50

Avengers: Age of Ultron (2015)                True
Cinderella (2015)                             True
Ant-Man (2015)                                True
Do You Believe? (2015)                       False
Hot Tub Time Machine 2 (2015)                False
                                             ...  
Mr. Holmes (2015)                             True
'71 (2015)                                    True
Two Days, One Night (2014)                    True
Gett: The Trial of Viviane Amsalem (2015)     True
Kumiko, The Treasure Hunter (2015)            True
Length: 146, dtype: bool

In [21]:
series_custom_50 = series_custom[series_custom > 50]
series_custom_50

Avengers: Age of Ultron (2015)                74
Cinderella (2015)                             85
Ant-Man (2015)                                80
The Water Diviner (2015)                      63
Top Five (2014)                               86
                                            ... 
Mr. Holmes (2015)                             87
'71 (2015)                                    97
Two Days, One Night (2014)                    97
Gett: The Trial of Viviane Amsalem (2015)    100
Kumiko, The Treasure Hunter (2015)            87
Length: 94, dtype: int64

In [22]:
series_custom_50_75 = series_custom[(series_custom > 50) & (series_custom < 75)]
series_custom_50_75

Avengers: Age of Ultron (2015)                                            74
The Water Diviner (2015)                                                  63
Unbroken (2014)                                                           51
Southpaw (2015)                                                           59
Insidious: Chapter 3 (2015)                                               59
The Man From U.N.C.L.E. (2015)                                            68
Run All Night (2015)                                                      60
5 Flights Up (2015)                                                       52
Welcome to Me (2015)                                                      71
Saint Laurent (2015)                                                      51
Maps to the Stars (2015)                                                  60
Pitch Perfect 2 (2015)                                                    67
The Age of Adaline (2015)                                                 54

## Data alignment

In this section, we will try to answer the following questions:

* create the Series rt_critics which indexes the values of the ratings of the RottenTomatoes column (critics' ratings) with as index the name of the movies (FILM column)
* create the Series rt_users which indexes the values of the RottenTomatoes_User column with the name of the movies (FILM column) as index
* t_critics and rt_users are thus 2 series objects containing the average of the critics' and users' ratings for each movie
* since each series uses the same index (corresponding to the movie names), use mathematical operations to return the average of the critics' and users' ratings for each movie
* assign the resulting series to the variable rt_mean
* display the result

In [23]:
rt_critics = pd.Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_critics

FILM
Avengers: Age of Ultron (2015)                74
Cinderella (2015)                             85
Ant-Man (2015)                                80
Do You Believe? (2015)                        18
Hot Tub Time Machine 2 (2015)                 14
                                            ... 
Mr. Holmes (2015)                             87
'71 (2015)                                    97
Two Days, One Night (2014)                    97
Gett: The Trial of Viviane Amsalem (2015)    100
Kumiko, The Treasure Hunter (2015)            87
Length: 146, dtype: int64

In [24]:
rt_users = pd.Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])  
rt_users

FILM
Avengers: Age of Ultron (2015)               86
Cinderella (2015)                            80
Ant-Man (2015)                               90
Do You Believe? (2015)                       84
Hot Tub Time Machine 2 (2015)                28
                                             ..
Mr. Holmes (2015)                            78
'71 (2015)                                   82
Two Days, One Night (2014)                   78
Gett: The Trial of Viviane Amsalem (2015)    81
Kumiko, The Treasure Hunter (2015)           63
Length: 146, dtype: int64

In [25]:
rt_mean = (rt_critics + rt_users) / 2 
rt_mean

FILM
Avengers: Age of Ultron (2015)               80.0
Cinderella (2015)                            82.5
Ant-Man (2015)                               85.0
Do You Believe? (2015)                       51.0
Hot Tub Time Machine 2 (2015)                21.0
                                             ... 
Mr. Holmes (2015)                            82.5
'71 (2015)                                   89.5
Two Days, One Night (2014)                   87.5
Gett: The Trial of Viviane Amsalem (2015)    90.5
Kumiko, The Treasure Hunter (2015)           75.0
Length: 146, dtype: float64