# Pandas Internals: DataFrame

--------

- Read `fandango_score_comparison.csv` into a dataframe named `fandango`.
- Use the `head` method to return the first two rows in the dataframe, then display them with the `print` function.
- Use the `index` attribute to return the index of the dataframe, and display it with the `print` function.

In [1]:
import pandas as pd

fandango = pd.read_csv('fandango_score_comparison.csv')
print(fandango.head(2))

print("Index: ")
print(fandango.index)

                             FILM  RottenTomatoes  RottenTomatoes_User  \
0  Avengers: Age of Ultron (2015)              74                   86   
1               Cinderella (2015)              85                   80   

   Metacritic  Metacritic_User  IMDB  Fandango_Stars  Fandango_Ratingvalue  \
0          66              7.1   7.8             5.0                   4.5   
1          67              7.5   7.1             5.0                   4.5   

   RT_norm  RT_user_norm         ...           IMDB_norm  RT_norm_round  \
0     3.70           4.3         ...                3.90            3.5   
1     4.25           4.0         ...                3.55            4.5   

   RT_user_norm_round  Metacritic_norm_round  Metacritic_user_norm_round  \
0                 4.5                    3.5                         3.5   
1                 4.0                    3.5                         4.0   

   IMDB_norm_round  Metacritic_user_vote_count  IMDB_user_vote_count  \
0              

Return a dataframe containing just the first and last rows, and assign it to first_last.

In [3]:
first_last = fandango.iloc[[0, -1]]
print first_last

                                   FILM  RottenTomatoes  RottenTomatoes_User  \
0        Avengers: Age of Ultron (2015)              74                   86   
145  Kumiko, The Treasure Hunter (2015)              87                   63   

     Metacritic  Metacritic_User  IMDB  Fandango_Stars  Fandango_Ratingvalue  \
0            66              7.1   7.8             5.0                   4.5   
145          68              6.4   6.7             3.5                   3.5   

     RT_norm  RT_user_norm         ...           IMDB_norm  RT_norm_round  \
0       3.70          4.30         ...                3.90            3.5   
145     4.35          3.15         ...                3.35            4.5   

     RT_user_norm_round  Metacritic_norm_round  Metacritic_user_norm_round  \
0                   4.5                    3.5                         3.5   
145                 3.0                    3.5                         3.0   

     IMDB_norm_round  Metacritic_user_vote_count  I

Use the pandas dataframe method `set_index` to assign the `FILM` column as the custom index for the dataframe. 

Also, specify that we don't want to drop the `FILM` column from the dataframe. We want to keep the original dataframe, so assign the new one to `fandango_films`.

In [5]:
fandango_films = fandango.set_index('FILM', inplace=False, drop=False)
print fandango_films

                                                                                          FILM  \
FILM                                                                                             
Avengers: Age of Ultron (2015)                                  Avengers: Age of Ultron (2015)   
Cinderella (2015)                                                            Cinderella (2015)   
Ant-Man (2015)                                                                  Ant-Man (2015)   
Do You Believe? (2015)                                                  Do You Believe? (2015)   
Hot Tub Time Machine 2 (2015)                                    Hot Tub Time Machine 2 (2015)   
The Water Diviner (2015)                                              The Water Diviner (2015)   
Irrational Man (2015)                                                    Irrational Man (2015)   
Top Five (2014)                                                                Top Five (2014)   
Shaun the Sheep Movi

Select the following movies from `fandango_films` (in the order in which they appear), and assign them to `best_movies_ever`:

- "The Lazarus Effect (2015)"
- "Gett: The Trial of Viviane Amsalem (2015)"
- "Mr. Holmes (2015)"

In [6]:
best_movies_ever_names = ['The Lazarus Effect (2015)', 'Gett: The Trial of Viviane Amsalem (2015)', 'Mr. Holmes (2015)']

best_movies_ever = fandango_films.loc[best_movies_ever_names]
print(best_movies_ever)

                                                                                FILM  \
FILM                                                                                   
The Lazarus Effect (2015)                                  The Lazarus Effect (2015)   
Gett: The Trial of Viviane Amsalem (2015)  Gett: The Trial of Viviane Amsalem (2015)   
Mr. Holmes (2015)                                                  Mr. Holmes (2015)   

                                           RottenTomatoes  \
FILM                                                        
The Lazarus Effect (2015)                              14   
Gett: The Trial of Viviane Amsalem (2015)             100   
Mr. Holmes (2015)                                      87   

                                           RottenTomatoes_User  Metacritic  \
FILM                                                                         
The Lazarus Effect (2015)                                   23          31   
Gett: The Trial of 

In [7]:
import numpy as np

# returns the data types as a Series
types = fandango_films.dtypes
# filter data types to just floats, index attributes returns just column names
float_columns = types[types.values == 'float64'].index
# use bracket notation to filter columns to just float columns
float_df = fandango_films[float_columns]

# `x` is a Series object representing a column
deviations = float_df.apply(lambda x: np.std(x))

print(deviations)

Metacritic_User               1.505529
IMDB                          0.955447
Fandango_Stars                0.538532
Fandango_Ratingvalue          0.501106
RT_norm                       1.503265
RT_user_norm                  0.997787
Metacritic_norm               0.972522
Metacritic_user_nom           0.752765
IMDB_norm                     0.477723
RT_norm_round                 1.509404
RT_user_norm_round            1.003559
Metacritic_norm_round         0.987561
Metacritic_user_norm_round    0.785412
IMDB_norm_round               0.501043
Fandango_Difference           0.152141
dtype: float64


Use the `apply()` method on `float_df` to halve each value, and assign the result to `halved_df`. Then, print the first row.

In [8]:
# DataFrame of doubled values
double_df = float_df.apply(lambda x: x*2)
print(double_df.head(1))

# DataFrame of halved values
halved_df = float_df.apply(lambda x: x/2)
print(halved_df.head(1))

                                Metacritic_User  IMDB  Fandango_Stars  \
FILM                                                                    
Avengers: Age of Ultron (2015)             14.2  15.6            10.0   

                                Fandango_Ratingvalue  RT_norm  RT_user_norm  \
FILM                                                                          
Avengers: Age of Ultron (2015)                   9.0      7.4           8.6   

                                Metacritic_norm  Metacritic_user_nom  \
FILM                                                                   
Avengers: Age of Ultron (2015)              6.6                  7.1   

                                IMDB_norm  RT_norm_round  RT_user_norm_round  \
FILM                                                                           
Avengers: Age of Ultron (2015)        7.8            7.0                 9.0   

                                Metacritic_norm_round  \
FILM                       

Use the `apply()` method to calculate the average of each movie's values for `RT_user_norm` and `Metacritic_user_nom`, and assign the result to the variable `rt_mt_means`.

In [9]:
rt_mt_user = float_df[['RT_user_norm', 'Metacritic_user_nom']]

# Standard deviations
rt_mt_deviations = rt_mt_user.apply(lambda x: np.std(x), axis=1)
print(rt_mt_deviations[0:5])

# Means
rt_mt_means = rt_mt_user.apply(lambda x: np.mean(x), axis=1)
print(rt_mt_means[0:5])

FILM
Avengers: Age of Ultron (2015)    0.375
Cinderella (2015)                 0.125
Ant-Man (2015)                    0.225
Do You Believe? (2015)            0.925
Hot Tub Time Machine 2 (2015)     0.150
dtype: float64
FILM
Avengers: Age of Ultron (2015)    3.925
Cinderella (2015)                 3.875
Ant-Man (2015)                    4.275
Do You Believe? (2015)            3.275
Hot Tub Time Machine 2 (2015)     1.550
dtype: float64
