## Movies Exercise

* List all the columns in the data set.
* Create a new table that takes the Film and all the columns relating to IMDB.
* Filter out only the good movies—i.e., any film with an IMDb score greater than or equal to 7 and remove the norm ratings.
* Find less popular movies that you may not have heard about - i.e., anything with under 20K votes
* Export this file to a spreadsheet, excluding the index, so we can keep track of our future watchlist.

In [1]:
# Dependencie
import pandas as pd

In [4]:
# Read and display the CSV with Pandas
movie_file_pd = pd.read_csv('Resources/movie_scores.csv')
movie_file_pd.head(3)

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,...,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5


In [3]:
# List all the columns in the table
movie_file_pd.columns

Index(['FILM', 'RottenTomatoes', 'RottenTomatoes_User', 'Metacritic',
       'Metacritic_User', 'IMDB', 'Fandango_Stars', 'Fandango_Ratingvalue',
       'RT_norm', 'RT_user_norm', 'Metacritic_norm', 'Metacritic_user_nom',
       'IMDB_norm', 'RT_norm_round', 'RT_user_norm_round',
       'Metacritic_norm_round', 'Metacritic_user_norm_round',
       'IMDB_norm_round', 'Metacritic_user_vote_count', 'IMDB_user_vote_count',
       'Fandango_votes', 'Fandango_Difference'],
      dtype='object')

In [5]:
# We only want IMDb data, so create a new table that takes the Film and all the columns relating to IMDB
imdb_table = movie_file_pd[["FILM", "IMDB", "IMDB_norm",
                            "IMDB_norm_round", "IMDB_user_vote_count"]]
imdb_table.head()

Unnamed: 0,FILM,IMDB,IMDB_norm,IMDB_norm_round,IMDB_user_vote_count
0,Avengers: Age of Ultron (2015),7.8,3.9,4.0,271107
1,Cinderella (2015),7.1,3.55,3.5,65709
2,Ant-Man (2015),7.8,3.9,4.0,103660
3,Do You Believe? (2015),5.4,2.7,2.5,3136
4,Hot Tub Time Machine 2 (2015),5.1,2.55,2.5,19560


In [6]:
# We only like good movies, so find those that scored over 7, and ignore the norm rating
good_movies = movie_file_pd.loc[movie_file_pd["IMDB"] > 7, [
    "FILM", "IMDB", "IMDB_user_vote_count"]]
good_movies.head()

Unnamed: 0,FILM,IMDB,IMDB_user_vote_count
0,Avengers: Age of Ultron (2015),7.8,271107
1,Cinderella (2015),7.1,65709
2,Ant-Man (2015),7.8,103660
5,The Water Diviner (2015),7.2,39373
8,Shaun the Sheep Movie (2015),7.4,12227


In [7]:
# Find less popular movies--i.e., those with fewer than 20K votes
unknown_movies = good_movies.loc[good_movies["IMDB_user_vote_count"] < 20000, [
    "FILM", "IMDB", "IMDB_user_vote_count"]]
unknown_movies.head()

Unnamed: 0,FILM,IMDB,IMDB_user_vote_count
8,Shaun the Sheep Movie (2015),7.4,12227
9,Love & Mercy (2015),7.8,5367
10,Far From The Madding Crowd (2015),7.2,12129
20,"McFarland, USA (2015)",7.5,13769
29,The End of the Tour (2015),7.9,1320


In [8]:
# Finally, export this file to a spread so we can keep track of out new future watch list without the index
unknown_movies.to_excel("output/movieWatchlist.xlsx", index=False)

## Crime Exercise

* Get a count of rows within the DataFrame in order to determine if there are any null values
* Drop the rows which contain null values
* Search through the "Offense Type" column and replace any similar values with one consistent value
* Create a couple DataFrames that look into one Neighborhood only and print them to the screen

In [9]:
import pandas as pd
crime_df = pd.read_csv("Resources/crime_incident_data2017.csv")
crime_df.head(3)

Unnamed: 0,Address,Case Number,Crime Against,Neighborhood,Number of Records,Occur Date,Occur Month Year,Occur Time,Offense Category,Offense Count,Offense Type,Open Data Lat,Open Data Lon,Open Data X,Open Data Y,Report Date,Report Month Year
0,,17-X4762181,Person,,1,1/1/96,1/1/96,800,Sex Offenses,1,Rape,,,,,1/26/17,1/1/17
1,,17-X4757824,Property,Centennial,1,1/20/00,1/1/00,1615,Fraud Offenses,1,Identity Theft,,,,,1/20/17,1/1/17
2,200 BLOCK OF SE 78TH AVE,17-900367,Property,Montavilla,1,12/1/03,12/1/03,800,Fraud Offenses,1,False Pretenses/Swindle/Confidence Game,45.5207,-122.583,7668150.0,682825.0,1/9/17,1/1/17


In [10]:
# look for missing values
crime_df.count()

Address              37365
Case Number          41032
Crime Against        41032
Neighborhood         39712
Number of Records    41032
Occur Date           41032
Occur Month Year     41032
Occur Time           41032
Offense Category     41032
Offense Count        41032
Offense Type         41032
Open Data Lat        36712
Open Data Lon        36712
Open Data X          36712
Open Data Y          36712
Report Date          41032
Report Month Year    41032
dtype: int64

In [11]:
# drop null rows
no_null_crime_df = crime_df.dropna(how='any')
# verify counts
no_null_crime_df.count()

Address              36146
Case Number          36146
Crime Against        36146
Neighborhood         36146
Number of Records    36146
Occur Date           36146
Occur Month Year     36146
Occur Time           36146
Offense Category     36146
Offense Count        36146
Offense Type         36146
Open Data Lat        36146
Open Data Lon        36146
Open Data X          36146
Open Data Y          36146
Report Date          36146
Report Month Year    36146
dtype: int64

In [12]:
# Check to see if there are any values with mispelled or similar values in "Offense Type"
no_null_crime_df["Offense Type"].value_counts()

Theft From Motor Vehicle                       6947
Motor Vehicle Theft                            4689
All Other Larceny                              4558
Vandalism                                      3863
Burglary                                       2824
Shoplifting                                    2259
Identity Theft                                 1794
Simple Assault                                 1216
Drug/Narcotic Violations                       1095
Theft of Motor Vehicle Parts or Accessories    1073
Intimidation                                    900
Theft From Building                             895
False Pretenses/Swindle/Confidence Game         870
Aggravated Assault                              839
Robbery                                         608
Counterfeiting/Forgery                          448
Weapons Law Violations                          266
Credit Card/ATM Fraud                           226
Arson                                           200
Prostitution

In [17]:
# Combining similar offenses together
no_null_crime_df = no_null_crime_df.replace(
    {"Commercial Sex Acts": "Prostitution", "Assisting or Promoting Prostitution": "Prostitution"})
# Check to see if you comnbined similar offenses correctly in "Offense Type".
no_null_crime_df["Offense Type"].value_counts()

Theft From Motor Vehicle                       6947
Motor Vehicle Theft                            4689
All Other Larceny                              4558
Vandalism                                      3863
Burglary                                       2824
Shoplifting                                    2259
Identity Theft                                 1794
Simple Assault                                 1216
Drug/Narcotic Violations                       1095
Theft of Motor Vehicle Parts or Accessories    1073
Intimidation                                    900
Theft From Building                             895
False Pretenses/Swindle/Confidence Game         870
Aggravated Assault                              839
Robbery                                         608
Counterfeiting/Forgery                          448
Weapons Law Violations                          266
Credit Card/ATM Fraud                           226
Arson                                           200
Prostitution

In [15]:
# Create a new DataFrame that looks into a specific neighborhood
vernon_crime_df = no_null_crime_df.loc[no_null_crime_df["Neighborhood"] == "Vernon"]
vernon_crime_df.head(3)

Unnamed: 0,Address,Case Number,Crime Against,Neighborhood,Number of Records,Occur Date,Occur Month Year,Occur Time,Offense Category,Offense Count,Offense Type,Open Data Lat,Open Data Lon,Open Data X,Open Data Y,Report Date,Report Month Year
6,5000 BLOCK OF NE 19TH AVE,17-901079,Property,Vernon,1,11/8/13,11/1/13,1200,Fraud Offenses,1,False Pretenses/Swindle/Confidence Game,45.5594,-122.646,7652567.0,697337.0,1/26/17,1/1/17
7,5000 BLOCK OF NE 19TH AVE,17-901079,Property,Vernon,1,11/8/13,11/1/13,1200,Fraud Offenses,1,Identity Theft,45.5594,-122.646,7652567.0,697337.0,1/26/17,1/1/17
147,1000 BLOCK OF NE EMERSON ST,17-901190,Property,Vernon,1,11/26/16,11/1/16,2040,Fraud Offenses,1,Identity Theft,45.5619,-122.655,7650320.0,698297.0,1/29/17,1/1/17


## Pokemon Exercise

* Create a new table by extracting the following columns: "Type 1", "HP", "Attack", "Sp. Atk", "Sp. Def", and "Speed".
* Find the average stats for each type of Pokemon.
* Create a new DataFrame out of the averages.
* Calculate the total power level of each type of Pokemon by summing all of the previous stats together and place the results into a new column.
* Sort the table by strongest type and export the resulting table to a new CSV

In [18]:
# Dependencies
import pandas as pd
import numpy as np
# Read with Pandas
pokemon_pd = pd.read_csv("Resources/Pokemon.csv")
pokemon_pd.head(3)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False


In [19]:
# Extract the following columns: "Type 1", "HP", "Attack", "Sp. Atk", "Sp. Def", and "Speed"
pokemon_type = pokemon_pd[["Type 1", "HP", "Attack",
                           "Defense", "Sp. Atk", "Sp. Def", "Speed"]]
pokemon_type.head()

Unnamed: 0,Type 1,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Grass,45,49,49,65,65,45
1,Grass,60,62,63,80,80,60
2,Grass,80,82,83,100,100,80
3,Grass,80,100,123,122,120,80
4,Fire,39,52,43,60,50,65


In [20]:
# Create a dataframe of the average stats for each type of pokemon.
pokemon_group = pokemon_type.groupby(["Type 1"])
pokemon_comparison = pokemon_group.mean()
pokemon_comparison.head(3)

Unnamed: 0_level_0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bug,56.884058,70.971014,70.724638,53.869565,64.797101,61.681159
Dark,66.806452,88.387097,70.225806,74.645161,69.516129,76.16129
Dragon,83.3125,112.125,86.375,96.84375,88.84375,83.03125


In [21]:
# Calculate the total power level of each type of pokemon by summing all of the stats together.
# Place the results into a new column.
pokemon_comparison["Total"] = pokemon_comparison.sum(axis=1)
pokemon_comparison.head(3)

Unnamed: 0_level_0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Total
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Bug,56.884058,70.971014,70.724638,53.869565,64.797101,61.681159,378.927536
Dark,66.806452,88.387097,70.225806,74.645161,69.516129,76.16129,445.741935
Dragon,83.3125,112.125,86.375,96.84375,88.84375,83.03125,550.53125


In [22]:
# Sort the table by strongest type and export the resulting table to a new CSV.
strongest_pokemon = pokemon_comparison.sort_values(["Total"], ascending=False)
strongest_pokemon.to_csv("output/pokemon_rankings.csv", index=True)

## Worst Stricker Exercise

* Print out a list of all of the values within the "Preferred Position" column.
* Select a value from this list and create a new DataFrame that only includes players who prefer that position.
* Sort the DataFrame based upon a player's skill in that position.
* Reset the index for the DataFrame so that the index is in order.
* Print out the statistics for the worst player in a position to the screen.

In [23]:
# Dependencies
import pandas as pd
import numpy as np
# Import the CSV into a pandas DataFrame
soccer_2018_df = pd.read_csv("Resources/Soccer2018Data.csv")
soccer_2018_df.head(3)

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Preferred Position,CAM,CB,CDM,...,RB,RCB,RCM,RDM,RF,RM,RS,RW,RWB,ST
0,Cristiano Ronaldo,32,Portugal,94,94,Real Madrid CF,ST,89.0,53.0,62.0,...,61.0,53.0,82.0,62.0,91.0,89.0,92.0,91.0,66.0,92.0
1,L. Messi,30,Argentina,93,93,FC Barcelona,RW,92.0,45.0,59.0,...,57.0,45.0,84.0,59.0,92.0,90.0,88.0,91.0,62.0,88.0
2,Neymar,25,Brazil,92,94,Paris Saint-Germain,LW,88.0,46.0,59.0,...,59.0,46.0,79.0,59.0,88.0,87.0,84.0,89.0,64.0,84.0


In [24]:
# Collect a list of all the unique values in "Preferred Position"
soccer_2018_df["Preferred Position"].unique()

array(['ST', 'RW', 'LW', 'GK', 'CDM', 'CB', 'RM', 'CM', 'LM', 'LB', 'CAM',
       'RB', 'CF', 'RWB', 'LWB'], dtype=object)

In [25]:
# Looking only at strikers (ST) to start
strikers_2018_df = soccer_2018_df.loc[soccer_2018_df["Preferred Position"] == "ST", :]
# Sort the DataFrame by the values in the "ST" column to find the worst
strikers_2018_df = strikers_2018_df.sort_values("ST")
# Reset the index so that the index is now based on the sorting locations
strikers_2018_df = strikers_2018_df.reset_index(drop=True)
strikers_2018_df.head(3)

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Preferred Position,CAM,CB,CDM,...,RB,RCB,RCM,RDM,RF,RM,RS,RW,RWB,ST
0,L. Sackey,18,Ghana,46,64,Scunthorpe United,ST,29.0,45.0,38.0,...,40.0,45.0,30.0,38.0,29.0,30.0,31.0,29.0,38.0,31.0
1,M. Zettl,18,Germany,50,67,SpVgg Unterhaching,ST,47.0,32.0,36.0,...,39.0,32.0,42.0,36.0,46.0,49.0,43.0,49.0,41.0,43.0
2,O. Sowunmi,21,England,59,71,Yeovil Town,ST,35.0,58.0,47.0,...,52.0,58.0,37.0,47.0,38.0,38.0,44.0,37.0,49.0,44.0


In [26]:
# Save all of the information collected on the worst striker
worst_striker = strikers_2018_df.loc[0, :]
worst_striker

Name                          L. Sackey
Age                                  18
Nationality                       Ghana
Overall                              46
Potential                            64
Club                  Scunthorpe United
Preferred Position                   ST
CAM                                  29
CB                                   45
CDM                                  38
CF                                   29
CM                                   30
LAM                                  29
LB                                   40
LCB                                  45
LCM                                  30
LDM                                  38
LF                                   29
LM                                   30
LS                                   31
LW                                   29
LWB                                  38
RAM                                  29
RB                                   40
RCB                                  45


## Crypto Exercise

* Read in both of the CSV files and print out their DataFrames.
* Perform an inner merge that combines both DataFrames on the "Date" column.
* Rename the columns within the newly merged DataFrame so that the headers are more descriptive.
* Create a summary table that includes the following information: 
    * Best Bitcoin Open
    * Best Dash Open
    * Best Bitcoin Close
    * Best Dash Close
    * Total Bitcoin Volume
    * Total Dash Volume.
* Total Bitcoin Volume and Total Dash Volume should be calculated to have units of "millions" and be rounded to two decimal places.

In [27]:
# Import Dependencies
import pandas as pd
bitcoin_df = pd.read_csv("Resources/bitcoin_cash_price.csv")
dash_df = pd.read_csv("Resources/dash_price.csv")

In [28]:
bitcoin_df.head(3)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Market Cap
0,17-Sep-17,438.9,438.9,384.06,419.86,221828000.0,7279520000
1,16-Sep-17,424.49,450.98,388.2,440.22,313583000.0,7039590000
2,15-Sep-17,369.49,448.39,301.69,424.02,707231000.0,6126800000


In [29]:
dash_df.head(3)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Market Cap
0,17-Sep-17,298.59,315.58,278.17,313.84,38081600.0,2257850000
1,16-Sep-17,284.5,301.23,276.57,298.86,43702600.0,2150800000
2,15-Sep-17,236.05,300.11,220.51,284.36,72695500.0,1784040000


In [30]:
# Merge the two DataFrames together based on the Dates they share
crypto_df = pd.merge(bitcoin_df, dash_df, on="Date")
crypto_df.head(3)

Unnamed: 0,Date,Open_x,High_x,Low_x,Close_x,Volume_x,Market Cap_x,Open_y,High_y,Low_y,Close_y,Volume_y,Market Cap_y
0,17-Sep-17,438.9,438.9,384.06,419.86,221828000.0,7279520000,298.59,315.58,278.17,313.84,38081600.0,2257850000
1,16-Sep-17,424.49,450.98,388.2,440.22,313583000.0,7039590000,284.5,301.23,276.57,298.86,43702600.0,2150800000
2,15-Sep-17,369.49,448.39,301.69,424.02,707231000.0,6126800000,236.05,300.11,220.51,284.36,72695500.0,1784040000


In [32]:
# Rename columns so that they are differentiated
crypto_df = crypto_df.rename(columns={"Open_x": "Bitcoin Open", "High_x": "Bitcoin High", "Low_x": "Bitcoin Low",
                                      "Close_x": "Bitcoin Close", "Volume_x": "Bitcoin Volume", "Market Cap_x": "Bitcoin Market Cap"})

crypto_df = crypto_df.rename(columns={"Open_y": "Dash Open", "High_y": "Dash High", "Low_y": "Dash Low",
                                      "Close_y": "Dash Close", "Volume_y": "Dash Volume", "Market Cap_y": "Dash Market Cap"})
crypto_df.head(3)

Unnamed: 0,Date,Bitcoin Open,Bitcoin High,Bitcoin Low,Bitcoin Close,Bitcoin Volume,Bitcoin Market Cap,Dash Open,Dash High,Dash Low,Dash Close,Dash Volume,Dash Market Cap
0,17-Sep-17,438.9,438.9,384.06,419.86,221828000.0,7279520000,298.59,315.58,278.17,313.84,38081600.0,2257850000
1,16-Sep-17,424.49,450.98,388.2,440.22,313583000.0,7039590000,284.5,301.23,276.57,298.86,43702600.0,2150800000
2,15-Sep-17,369.49,448.39,301.69,424.02,707231000.0,6126800000,236.05,300.11,220.51,284.36,72695500.0,1784040000


In [33]:
# Collecting best open for Bitcoin and Dash
bitcoin_open = crypto_df["Bitcoin Open"].max()
dash_open = crypto_df["Dash Open"].max()

# Collecting best close for Bitcoin and Dash
bitcoin_close = crypto_df["Bitcoin Close"].max()
dash_close = crypto_df["Dash Close"].max()

# Collecting the total volume for Bitcoin and Dash
bitcoin_volume = round(crypto_df["Bitcoin Volume"].sum()/1000000, 2)
dash_volume = round(crypto_df["Dash Volume"].sum()/1000000, 2)  

# Creating a summary DataFrame using above values
summary_df = pd.DataFrame({"Best Bitcoin Open": [bitcoin_open],
                           "Best Bitcoin Close": [bitcoin_close],
                           "Total Bitcoin Volume": str(bitcoin_volume)+" million",
                           "Best Dash Open": [dash_open],
                           "Best Dash Close": [dash_close],
                           "Total Dash Volume": str(dash_volume)+" million"})

summary_df

Unnamed: 0,Best Bitcoin Open,Best Bitcoin Close,Total Bitcoin Volume,Best Dash Open,Best Dash Close,Total Dash Volume
0,772.42,754.56,24383.05 million,400.42,399.85,2960.28 million


## Ted Talk Exercise

* Find the minimum "views" and maximum "views".
* Using the minimum and maximum "views" as a reference, create 10 bins in which to slice the data.
* Create a new column called "View Group" and fill it with the values collected through your slicing.
* Group the DataFrame based upon the values within "View Group".
* Find out how many rows fall into each group before finding the averages for "comments", "duration", and "languages".

In [34]:
# Import Dependencies
import pandas as pd
# Read data
ted_df = pd.read_csv("Resources/ted_talks.csv")
ted_df.head(3)

Unnamed: 0,comments,description,duration,event,languages,main_speaker,name,title,views
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,Do schools kill creativity?,47227110
1,265,With the same humor and humanity he exuded in ...,977,TED2006,43,Al Gore,Al Gore: Averting the climate crisis,Averting the climate crisis,3200520
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,26,David Pogue,David Pogue: Simplicity sells,Simplicity sells,1636292


In [35]:
# Figure out the minimum and maximum views for a TED Talk
print(ted_df["views"].max())
print(ted_df["views"].min())

47227110
50443


In [37]:
# Create bins in which to place values based upon TED Talk views
bins = [0, 199999, 399999, 599999, 799999, 999999,
        1999999, 2999999, 3999999, 4999999, 50000000]

# Create labels for these bins
group_labels = ["0 to 199k", "200k to 399k", "400k to 599k", "600k to 799k", "800k to 999k", "1mil to 2mil",
                "2mil to 3mil", "3mil to 4mil", "4mil to 5mil", "5mil to 50mil"]

# Slice the data and place it into bins
pd.cut(ted_df["views"], bins, labels=group_labels)

# Place the data series into a new column inside of the DataFrame
ted_df["View Group"] = pd.cut(ted_df["views"], bins, labels=group_labels)
ted_df.head(3)

Unnamed: 0,comments,description,duration,event,languages,main_speaker,name,title,views,View Group
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,Do schools kill creativity?,47227110,5mil to 50mil
1,265,With the same humor and humanity he exuded in ...,977,TED2006,43,Al Gore,Al Gore: Averting the climate crisis,Averting the climate crisis,3200520,3mil to 4mil
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,26,David Pogue,David Pogue: Simplicity sells,Simplicity sells,1636292,1mil to 2mil


In [38]:
# Create a GroupBy object based upon "View Group"
ted_group = ted_df.groupby("View Group")

# Find how many rows fall into each bin
print(ted_group["comments"].count())

# Get the average of each column within the GroupBy object
ted_group[["comments", "duration", "languages"]].mean()

View Group
0 to 199k          32
200k to 399k      135
400k to 599k      234
600k to 799k      307
800k to 999k      339
1mil to 2mil     1004
2mil to 3mil      239
3mil to 4mil       93
4mil to 5mil       68
5mil to 50mil      99
Name: comments, dtype: int64


Unnamed: 0_level_0,comments,duration,languages
View Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0 to 199k,76.9375,898.1875,4.0625
200k to 399k,81.992593,832.192593,18.785185
400k to 599k,107.162393,870.517094,22.940171
600k to 799k,118.912052,829.039088,24.400651
800k to 999k,119.628319,798.772861,25.678466
1mil to 2mil,168.136454,809.899402,27.899402
2mil to 3mil,299.481172,832.430962,32.807531
3mil to 4mil,360.870968,809.505376,34.258065
4mil to 5mil,507.088235,920.514706,35.720588
5mil to 50mil,650.393939,884.282828,40.252525
