## Basic Data Analysis with Pandas

Pandas uses a number of different data types ranging from continuous variables commonly used for measuring numerical values in terms of height, weight, temperature, wages and so on, to discrete variables, which are also sometimes indicated as a categorical variables. Categorical variables are finite values such as colour, season, names etc. Pandas has a number of specific definitions which are as given below.

 <table style="width:100%">
  <tr>
    <th>Common data types</th>
    <th>NumPy/pandas object</th>
    <th>Pandas string name</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>Boolean </td>
    <td>np.bool</td>
    <td>bool</td>
    <td>Stored as a single byte.</td>
    
  </tr>
  <tr>
    <td>Integer</td>
    <td>np.int</td>
    <td>int </td>
    <td>Defaulted to 64 bits. Unsigned ints are also available.</td>
  </tr>

</tr>
  <tr>
    <td>Float</td>
    <td>np.float</td>
    <td>float </td>
    <td>Defaulted to 64 bits.</td>
  </tr>

</tr>
  <tr>
    <td>Complex</td>
    <td>np.complex</td>
    <td>complex</td>
    <td>Complex numbers used scientific computing. Rarely seen in data analysis</td>
  </tr>

</tr>
  <tr>
    <td>Datetime</td>
    <td>np.datetime64 and pd.Timestamp</td>
    <td>datetime64 </td>
    <td>Specific moment in time, with nanosecond precision.</td>
  </tr>

</tr>
  <tr>
    <td>Categorical</td>
    <td>pd.Categorical</td>
    <td>categorical </td>
    <td>Specific only to pandas. Useful for object columns with relatively few unique values.</td>
  </tr>

</table> 

### Steps in data analysis

#### Can a data analysis project have a step-by-step process?

The short answer is, yes. However, the data analysis procedures would be influenced by the nature of the underlying data and the business insights we are looking to generate (or research questions to answer). All data projects start with certain questions to be answered. Questions may be along the lines of: 

<b> <u> Marketing </u> </b>
* Would our customers prefer Product A to Product B?
* How do we improve our service level times for customers waiting in a queue? Or, can we learn from the type of queries that come to us? 
* What can we learn about customer's spending patterns using data from time of shopping, items bundled, discount offers, gift purchases, mode of purchase, geographic locations etc.?
* Given different combinations of features, what features of a product would customers rank most highly (or most value)?

<b> <u> Scientific Research </u> </b>
* How effective are the CCS devices (carbon capture and storage) in terms of reducing greenhouse emissions vis-a-vis peak-shaving using rooftop solar power?
* Does drug C show better result on patients as compared to drugs A and B?
* What are the best predictors of snow precipation in the province of British Columbia after an unsually hot summer on the Western seaboard?
* Are El-Ninos in Mexican Gulf a leading indicator of poor catches of fish on the upper west-side?

<b> <u> Sports / Entertainment </u> </b>
* What are the odds of France's national football team defeating the German team in the absence of their lead striker?
* What are the chances that Canucks will win the Stanley Cup in the next 3 tournaments?
* If Columbia Pictures releases a movie 2 weeks prior to Disney's animated movie targeted at a similar demographic, would it impact the former's sales? 
* Can decline in sale of movie tickets of Hollywood films be explained by competition from overseas studios making inroads into US markets?

<b> <u> Financial </u> </b>
* What are the chances that a luxury liner will fall short on ticket sales (or capacity utilization) if a hurricane is forecasted 2 weeks before the sail date?
* How would a portfolio of mid-Western farm commodity assets perform in a given quarter as compared to previous 10 years?
* Given the substantiated "rumours" of LIBOR rates hardening over the next 3 months, would it be better to diversify or consolidate our asset portfolio? 

<b> <u> Other </u> </b>
* How long does it take to cross in car downtown Manhattan on Boxing day? Or would it be quicker to take a cab across downtown on Boxing day than driving yourself?

To start with, all data need to be examined and understood.

There are some basic elements which are common to all:

1) Examine the data quality - Broadly, this means that data has to be checked for field-types and data completeness. It is further understood in terms of the type of data to check whether it is a timeseries/date-time, numeric data, object type, textual, boolean, continuous or discrete data, categorical data, geolocations, financial data etc. Where data is missing, measures have to be taken on how to deal with the problem (repopulate by interpolation, k-nn or other methods, omit, or averaging out, or find linked-variables); the method we choose to deal with this problem invariably depends on the nature of the underlying data and, in certain cases, the distribution it follows. 

2) Dictionary (and Metadata) - Ask yourself: does the data you are looking at make some kind of intuitive sense, either as a whole or as individual entries. Can you visualize how this data was generated? Most of the data in the business realm is still generated with human responses in the picture. Not all attributes follow a clear intituive nomenclature, in those cases using a data-dictionary helps in making sense. (Often, when dealing with machine-generated, or sensor generated, data it is difficult to visualize data generation as it's done automatically so we don't need to talk about it here.)

3) ETL Operations - Data is often stored in different formats and systems, and data analysis projects occasionally require performing some variation of ETL (extract, transform, load) operations. This may include performing queries, joins, data blending, building ETL pipelines, and standardizing formats for analysis. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from IPython.display import display

In [2]:
pd.options.display.max_columns=50

In [3]:
college_raw = pd.read_csv('data/college.csv')

In [4]:
college_raw.shape

(7535, 27)

In [5]:
college_raw.head()

Unnamed: 0,INSTNM,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
0,Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
1,University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
2,Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
3,University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
4,Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


In [6]:
with pd.option_context('display.max_rows', 25):
    display(college_raw.describe(include=[np.number]).T)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
HBCU,7164.0,0.014238,0.118478,0.0,0.0,0.0,0.0,1.0
MENONLY,7164.0,0.009213,0.095546,0.0,0.0,0.0,0.0,1.0
WOMENONLY,7164.0,0.005304,0.072642,0.0,0.0,0.0,0.0,1.0
RELAFFIL,7535.0,0.190975,0.393096,0.0,0.0,0.0,0.0,1.0
SATVRMID,1185.0,522.819409,68.578862,290.0,475.0,510.0,555.0,765.0
SATMTMID,1196.0,530.76505,73.469767,310.0,482.0,520.0,565.0,785.0
DISTANCEONLY,7164.0,0.005583,0.074519,0.0,0.0,0.0,0.0,1.0
UGDS,6874.0,2356.83794,5474.275871,0.0,117.0,412.5,1929.5,151558.0
UGDS_WHITE,6874.0,0.510207,0.286958,0.0,0.2675,0.5557,0.747875,1.0
UGDS_BLACK,6874.0,0.189997,0.224587,0.0,0.036125,0.10005,0.2577,1.0


In [7]:
college_raw.describe(include=[np.object, pd.Categorical]).T

Unnamed: 0,count,unique,top,freq
INSTNM,7535,7535,Brightwood Career Institute-Pittsburgh,1
CITY,7535,2514,New York,87
STABBR,7535,59,CA,773
MD_EARN_WNE_P10,6413,598,PrivacySuppressed,822
GRAD_DEBT_MDN_SUPP,7503,2038,PrivacySuppressed,1510


In [8]:
college_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7535 entries, 0 to 7534
Data columns (total 27 columns):
INSTNM                7535 non-null object
CITY                  7535 non-null object
STABBR                7535 non-null object
HBCU                  7164 non-null float64
MENONLY               7164 non-null float64
WOMENONLY             7164 non-null float64
RELAFFIL              7535 non-null int64
SATVRMID              1185 non-null float64
SATMTMID              1196 non-null float64
DISTANCEONLY          7164 non-null float64
UGDS                  6874 non-null float64
UGDS_WHITE            6874 non-null float64
UGDS_BLACK            6874 non-null float64
UGDS_HISP             6874 non-null float64
UGDS_ASIAN            6874 non-null float64
UGDS_AIAN             6874 non-null float64
UGDS_NHPI             6874 non-null float64
UGDS_2MOR             6874 non-null float64
UGDS_NRA              6874 non-null float64
UGDS_UNKN             6874 non-null float64
PPTUG_EF          

In [9]:
college_raw.describe(include=[np.number]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
HBCU,7164.0,0.014238,0.118478,0.0,0.0,0.0,0.0,1.0
MENONLY,7164.0,0.009213,0.095546,0.0,0.0,0.0,0.0,1.0
WOMENONLY,7164.0,0.005304,0.072642,0.0,0.0,0.0,0.0,1.0
RELAFFIL,7535.0,0.190975,0.393096,0.0,0.0,0.0,0.0,1.0
SATVRMID,1185.0,522.819409,68.578862,290.0,475.0,510.0,555.0,765.0
SATMTMID,1196.0,530.76505,73.469767,310.0,482.0,520.0,565.0,785.0
DISTANCEONLY,7164.0,0.005583,0.074519,0.0,0.0,0.0,0.0,1.0
UGDS,6874.0,2356.83794,5474.275871,0.0,117.0,412.5,1929.5,151558.0
UGDS_WHITE,6874.0,0.510207,0.286958,0.0,0.2675,0.5557,0.747875,1.0
UGDS_BLACK,6874.0,0.189997,0.224587,0.0,0.036125,0.10005,0.2577,1.0


In [10]:
college_dict = pd.read_csv('data/college_data_dictionary.csv')

with pd.option_context('display.max_rows', 27):
    display(college_dict)

Unnamed: 0,column_name,description
0,INSTNM,Institution Name
1,CITY,City Location
2,STABBR,State Abbreviation
3,HBCU,Historically Black College or University
4,MENONLY,0/1 Men Only
5,WOMENONLY,0/1 Women only
6,RELAFFIL,0/1 Religious Affiliation
7,SATVRMID,SAT Verbal Median
8,SATMTMID,SAT Math Median
9,DISTANCEONLY,Distance Education Only


### Reducing memory by changing data types

In [17]:
different_cols = ['RELAFFIL', 'SATMTID', 'CURROPER', 'INSTNM', 'STABBR']
college2 = college_raw.loc[:, different_cols]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


In [16]:
college2.head()

Unnamed: 0,RELAFFIL,SATMTID,CURROPER,INSTNM,STABBR
0,0,,1,Alabama A & M University,AL
1,0,,1,University of Alabama at Birmingham,AL
2,1,,1,Amridge University,AL
3,0,,1,University of Alabama in Huntsville,AL
4,0,,1,Alabama State University,AL


In [18]:
college2.dtypes

RELAFFIL      int64
SATMTID     float64
CURROPER      int64
INSTNM       object
STABBR       object
dtype: object

In [25]:
#we can evaluate memory usage by calling the in built-in 'memory_usage' function
college2.memory_usage(deep=True)

Index           80
RELAFFIL     60280
SATMTID      60280
CURROPER     60280
INSTNM      660240
STABBR      444565
dtype: int64

In [27]:
#let's store this output to compare it a later time
original_mem_usuage = college2.memory_usage(deep=True)
original_mem_usuage

Index           80
RELAFFIL     60280
SATMTID      60280
CURROPER     60280
INSTNM      660240
STABBR      444565
dtype: int64

In [28]:
college2['RELAFFIL'] = college2['RELAFFIL'].astype(np.int8)

In [29]:
college2.dtypes

RELAFFIL       int8
SATMTID     float64
CURROPER      int64
INSTNM       object
STABBR       object
dtype: object

In [30]:
college2['STABBR'] = college2['STABBR'].astype('category')
college2.dtypes

RELAFFIL        int8
SATMTID      float64
CURROPER       int64
INSTNM        object
STABBR      category
dtype: object

In [31]:
new_memory_usage = college2.memory_usage(deep=True)
new_memory_usage

Index           80
RELAFFIL      7535
SATMTID      60280
CURROPER     60280
INSTNM      660240
STABBR       13576
dtype: int64

In [32]:
new_memory_usage / original_mem_usuage

Index       1.000000
RELAFFIL    0.125000
SATMTID     1.000000
CURROPER    1.000000
INSTNM      1.000000
STABBR      0.030538
dtype: float64

### Selecting the min and max values, and sorting in order

In [74]:
movie = pd.read_csv('data/movie.csv')

In [75]:
movie2 = movie[['director_name', 'movie_title', 'imdb_score', 'budget']]

In [76]:
movie2.head()

Unnamed: 0,director_name,movie_title,imdb_score,budget
0,James Cameron,Avatar,7.9,237000000.0
1,Gore Verbinski,Pirates of the Caribbean: At World's End,7.1,300000000.0
2,Sam Mendes,Spectre,6.8,245000000.0
3,Christopher Nolan,The Dark Knight Rises,8.5,250000000.0
4,Doug Walker,Star Wars: Episode VII - The Force Awakens,7.1,


In [85]:
#sort by the bottom 100 lowest imdb scores
movie2.nsmallest(25, 'imdb_score')

Unnamed: 0,director_name,movie_title,imdb_score,budget,budget_millions
2789,Jon M. Chu,Justin Bieber: Never Say Never,1.6,13000000.0,13.0
1126,Lawrence Kasanoff,Foodfight!,1.7,65000000.0,65.0
2240,Jason Friedberg,Disaster Movie,1.9,25000000.0,25.0
2266,Bob Clark,Superbabies: Baby Geniuses 2,1.9,20000000.0,20.0
4498,A. Raven Cruz,The Helix... Loaded,1.9,1000000.0,1.0
1713,Frédéric Auburtin,United Passions,2.0,24000000.0,24.0
3438,Don Michael Paul,Who's Your Caddy?,2.0,7000000.0,7.0
2934,Robert Iscove,From Justin to Kelly,2.1,12000000.0,12.0
3282,Vondie Curtis-Hall,Glitter,2.1,22000000.0,22.0
3595,Preston A. Whitmore II,Crossover,2.1,5600000.0,5.6


In [78]:
#let's create a new column to make it easier to read the budget field. Let's represent the budget in millions
movie2['budget_millions'] = movie2['budget']/1000000

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [79]:
#since the budget field has many missing values, for the purpose of this pandas exercise let us replace the missing values 
    # with the value zero

movie2['budget_millions'] = movie2['budget_millions'].fillna('0')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


In [80]:
movie2['budget_millions'] = movie2['budget_millions'].astype(np.float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [84]:
#Which movies have lowest rating by imdb users? 

movie2.nsmallest(25, 'imdb_score')

Unnamed: 0,director_name,movie_title,imdb_score,budget,budget_millions
2789,Jon M. Chu,Justin Bieber: Never Say Never,1.6,13000000.0,13.0
1126,Lawrence Kasanoff,Foodfight!,1.7,65000000.0,65.0
2240,Jason Friedberg,Disaster Movie,1.9,25000000.0,25.0
2266,Bob Clark,Superbabies: Baby Geniuses 2,1.9,20000000.0,20.0
4498,A. Raven Cruz,The Helix... Loaded,1.9,1000000.0,1.0
1713,Frédéric Auburtin,United Passions,2.0,24000000.0,24.0
3438,Don Michael Paul,Who's Your Caddy?,2.0,7000000.0,7.0
2934,Robert Iscove,From Justin to Kelly,2.1,12000000.0,12.0
3282,Vondie Curtis-Hall,Glitter,2.1,22000000.0,22.0
3595,Preston A. Whitmore II,Crossover,2.1,5600000.0,5.6


In [86]:
#Which movies were most highly rated by imdb users? 

movie2.nlargest(25, 'imdb_score')

Unnamed: 0,director_name,movie_title,imdb_score,budget,budget_millions
2725,John Blanchard,Towering Inferno,9.5,,0.0
1920,Frank Darabont,The Shawshank Redemption,9.3,25000000.0,25.0
3402,Francis Ford Coppola,The Godfather,9.2,6000000.0,6.0
2779,,Dekalog,9.1,,0.0
4312,John Stockwell,Kickboxer: Vengeance,9.1,17000000.0,17.0
66,Christopher Nolan,The Dark Knight,9.0,185000000.0,185.0
2791,Francis Ford Coppola,The Godfather: Part II,9.0,13000000.0,13.0
3415,,Fargo,9.0,,0.0
335,Peter Jackson,The Lord of the Rings: The Return of the King,8.9,94000000.0,94.0
1857,Steven Spielberg,Schindler's List,8.9,22000000.0,22.0


In [87]:
#let's create a new DF for sorting functions
movie3 = movie[['movie_title', 'title_year', 'imdb_score']]

In [95]:
#let's sort by movie_title
movie3.sort_values('movie_title', ascending=True).head()

Unnamed: 0,movie_title,title_year,imdb_score
4349,#Horror,2015.0,3.3
3629,10 Cloverfield Lane,2016.0,7.3
2964,10 Days in a Madhouse,2015.0,7.5
2799,10 Things I Hate About You,1999.0,7.2
276,"10,000 B.C.",,7.2


In [104]:
#let's sort by highest imdb_score by year

#this answers a question: how to rank movies by imdb scores for each year? 


movie3.sort_values(['title_year', 'imdb_score'], ascending=False).head()

Unnamed: 0,movie_title,title_year,imdb_score
4312,Kickboxer: Vengeance,2016.0,9.1
4277,A Beginner's Guide to Snuff,2016.0,8.7
3798,Airlift,2016.0,8.5
27,Captain America: Civil War,2016.0,8.2
98,Godzilla Resurgence,2016.0,8.2


In [123]:
#answering the question: which movie was ranked at the top in IMDB in the year it was released in the last 25 years?

movie3.sort_values(['title_year', 'imdb_score'], ascending=False).drop_duplicates('title_year').head(25)

Unnamed: 0,movie_title,title_year,imdb_score
4312,Kickboxer: Vengeance,2016.0,9.1
3745,Running Forever,2015.0,8.6
4369,Queen of the Mountains,2014.0,8.7
3935,"Batman: The Dark Knight Returns, Part 2",2013.0,8.4
3,The Dark Knight Rises,2012.0,8.5
3853,Samsara,2011.0,8.5
97,Inception,2010.0,8.8
67,Up,2009.0,8.3
66,The Dark Knight,2008.0,9.0
2646,U2 3D,2007.0,8.4


In [143]:
movie4 = movie[['movie_title', 'title_year', 'budget', 'imdb_score', 'content_rating']]
movie4['budget'] = movie2['budget_millions']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [144]:
#In the last 25 years, which movie scored the highest imdb rating with the lowest budget?

movie4.sort_values(['title_year', 'imdb_score', 'budget'], 
                   ascending=[False, False, True]).drop_duplicates('title_year').head(25)

Unnamed: 0,movie_title,title_year,budget,imdb_score,content_rating
4312,Kickboxer: Vengeance,2016.0,17.0,9.1,
3745,Running Forever,2015.0,5.0,8.6,
4804,Butterfly Girl,2014.0,0.18,8.7,
3935,"Batman: The Dark Knight Returns, Part 2",2013.0,3.5,8.4,PG-13
293,Django Unchained,2012.0,100.0,8.5,R
3853,Samsara,2011.0,4.0,8.5,PG-13
97,Inception,2010.0,160.0,8.8,PG-13
582,Inglourious Basterds,2009.0,75.0,8.3,R
66,The Dark Knight,2008.0,185.0,9.0,PG-13
2646,U2 3D,2007.0,0.0,8.4,G


In [148]:
movie4.sort_values(['title_year', 'content_rating', 'budget'],
                   ascending=[False, False, True]).drop_duplicates('title_year').head()

Unnamed: 0,movie_title,title_year,budget,imdb_score,content_rating
2077,Our Kind of Traitor,2016.0,0.0,6.4,R
4731,Bizarre,2015.0,0.5,4.3,Unrated
4513,Hidden Away,2014.0,0.0,7.2,Unrated
3615,R100,2013.0,5.5,6.1,Unrated
3887,How to Fall in Love,2012.0,4.0,6.3,TV-G


#### Simple combinations of sort functions to extract the necessary information

In [187]:
#Of the top 100 highly rated movies on imdb, which 10 movies had the smallest budget?

movie4.nlargest(100, 'imdb_score').sort_values('budget', ascending=True).head(10)

#since budget values are set at '0', this is a mistake. This is the result of replacing NaN values with '0'

Unnamed: 0,movie_title,title_year,budget,imdb_score,content_rating
2725,Towering Inferno,,0.0,9.5,
1801,The Honeymooners,,0.0,8.7,
1604,Friday Night Lights,,0.0,8.7,TV-14
398,Hannibal,,0.0,8.6,TV-14
1485,Luther,,0.0,8.6,TV-MA
1825,It's Always Sunny in Philadelphia,,0.0,8.8,TV-MA
2904,Spartacus: War of the Damned,,0.0,8.6,TV-MA
453,Daredevil,,0.0,8.8,TV-MA
1026,Outlander,,0.0,8.5,TV-MA
1648,Entourage,,0.0,8.5,TV-MA


In [190]:
#we can try the same function to refine our results using the dropna function

movie4.nlargest(100, 'imdb_score').sort_values('budget', ascending=True).dropna().head(10)

Unnamed: 0,movie_title,title_year,budget,imdb_score,content_rating
2646,U2 3D,2007.0,0.0,8.4,G
3616,Rang De Basanti,2006.0,0.0,8.4,Not Rated
4801,Children of Heaven,1997.0,0.18,8.5,PG
4706,12 Angry Men,1957.0,0.35,8.9,Not Rated
4550,A Separation,2011.0,0.5,8.4,PG-13
4636,The Other Dream Team,2012.0,0.5,8.4,Not Rated
2215,Psycho,1960.0,0.806947,8.5,R
4425,Casablanca,1942.0,0.95,8.6,PG
4395,Reservoir Dogs,1992.0,1.2,8.4,R
4397,"The Good, the Bad and the Ugly",1966.0,1.2,8.9,Approved
