# 1. Data Structures
The three key data structures in pandas are:

* Series objects (collections of values)
* DataFrames (collections of Series objects)
* Panels (collections of DataFrame objects)

In [2]:
import pandas as pd
fandango = pd.read_csv('fandango_score_comparison2.csv')
fandango.head(2)

Unnamed: 0.1,Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5


# 2. Integer indexes

In [3]:
series_film = fandango['FILM']
print(series_film[0:5])
series_rt = fandango['RottenTomatoes']
print(series_rt[0:5])

0    Avengers: Age of Ultron (2015)
1                 Cinderella (2015)
2                    Ant-Man (2015)
3            Do You Believe? (2015)
4     Hot Tub Time Machine 2 (2015)
Name: FILM, dtype: object
0    74
1    85
2    80
3    18
4    14
Name: RottenTomatoes, dtype: int64


# 3. Custom indexes
Series objects use NumPy arrays for fast computation, but add valuable features to them for analyzing data. While NumPy arrays use an integer index, for example, Series objects can use other index types, such as a string index. Series objects also allow for mixed data types, and use the NaN Python value for handling missing values.

In [5]:
# Import the Series object from pandas
from pandas import Series

film_names = series_film.values
rt_scores = series_rt.values

# Use film name as index, rt_score as value to construct a Series object
series_custom = Series(rt_scores , index=film_names)
print(series_custom)

# Use film name index to access data
movie_scores = series_custom[['Minions (2015)', 'Leviathan (2014)']]
print(movie_scores)

Avengers: Age of Ultron (2015)                     74
Cinderella (2015)                                  85
Ant-Man (2015)                                     80
Do You Believe? (2015)                             18
Hot Tub Time Machine 2 (2015)                      14
The Water Diviner (2015)                           63
Irrational Man (2015)                              42
Top Five (2014)                                    86
Shaun the Sheep Movie (2015)                       99
Love & Mercy (2015)                                89
Far From The Madding Crowd (2015)                  84
Black Sea (2015)                                   82
Leviathan (2014)                                   99
Unbroken (2014)                                    51
The Imitation Game (2014)                          90
Taken 3 (2015)                                      9
Ted 2 (2015)                                       46
Southpaw (2015)                                    59
Night at the Museum: Secret 

# 4. Integer index preservation
Even though we specified that the Series object uses a custom string index, the object still has an internal integer index that we can use for selection. When it comes to indexes, Series objects act like both dictionaries and lists. We can access values with our custom index (like the keys in a dictionary), or the integer index (like the index in a list).

In [15]:
five_seven = series_custom[['The Water Diviner (2015)', 'Irrational Man (2015)']]
print(five_seven)

five_seven = series_custom[5:7]
print(five_seven)

The Water Diviner (2015)    63
Irrational Man (2015)       42
dtype: int64
The Water Diviner (2015)    63
Irrational Man (2015)       42
dtype: int64


# 5. Reindexing

In [16]:
original_index = series_custom.index.tolist()
sorted_index = sorted(original_index)
print(sorted_index[0:7])
sorted_by_index = series_custom.reindex(sorted_index)
five_seven = sorted_by_index[5:7]
print(five_seven)

["'71 (2015)", '5 Flights Up (2015)', 'A Little Chaos (2015)', 'A Most Violent Year (2014)', 'About Elly (2015)', 'Aloha (2015)', 'American Sniper (2015)']
Aloha (2015)              19
American Sniper (2015)    72
dtype: int64


# 6. Sorting

In [17]:
sc2 = series_custom.sort_index()
sc3 = series_custom.sort_values()
print(sc2[0:10])
print(sc3[0:10])

'71 (2015)                    97
5 Flights Up (2015)           52
A Little Chaos (2015)         40
A Most Violent Year (2014)    90
About Elly (2015)             97
Aloha (2015)                  19
American Sniper (2015)        72
American Ultra (2015)         46
Amy (2015)                    97
Annie (2014)                  27
dtype: int64
Paul Blart: Mall Cop 2 (2015)     5
Hitman: Agent 47 (2015)           7
Hot Pursuit (2015)                8
Fantastic Four (2015)             9
Taken 3 (2015)                    9
The Boy Next Door (2015)         10
The Loft (2015)                  11
Unfinished Business (2015)       11
Mortdecai (2015)                 12
Seventh Son (2015)               12
dtype: int64


# 7: Transforming Columns With Vectorized Operations

In [18]:
series_normalized = (series_custom/20)

# 8. Comparing and filtering

In [21]:
criteria_one = series_custom > 60
criteria_two = series_custom < 70
both_criteria = series_custom[criteria_one & criteria_two]
print(both_criteria)

The Water Diviner (2015)                                                  63
The Man From U.N.C.L.E. (2015)                                            68
Pitch Perfect 2 (2015)                                                    67
Ricki and the Flash (2015)                                                64
The Hobbit: The Battle of the Five Armies (2014)                          61
The Second Best Exotic Marigold Hotel (2015)                              62
The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015)    67
Magic Mike XXL (2015)                                                     62
dtype: int64


# 9. Alignment
With DataFrame objects, the values link to the index labels and the column labels. Pandas also preserves these links, unless we explicitly break them (by reassigning or editing a column or index label, for example).

This core tenet allows us to use pandas effectively when working with data, and offers a big advantage over using NumPy objects. For Series objects in particular, this means we can use the standard Python arithmetic operators (+, -, *, and /) to add, subtract, multiply, and divide the values at each index label for two different Series objects.

In [22]:
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = (rt_critics + rt_users)/2

print(rt_mean)

FILM
Avengers: Age of Ultron (2015)                    80.0
Cinderella (2015)                                 82.5
Ant-Man (2015)                                    85.0
Do You Believe? (2015)                            51.0
Hot Tub Time Machine 2 (2015)                     21.0
The Water Diviner (2015)                          62.5
Irrational Man (2015)                             47.5
Top Five (2014)                                   75.0
Shaun the Sheep Movie (2015)                      90.5
Love & Mercy (2015)                               88.0
Far From The Madding Crowd (2015)                 80.5
Black Sea (2015)                                  71.0
Leviathan (2014)                                  89.0
Unbroken (2014)                                   60.5
The Imitation Game (2014)                         91.0
Taken 3 (2015)                                    27.5
Ted 2 (2015)                                      52.0
Southpaw (2015)                                   69.5
Night