# Pandas: grouping

In [16]:
import pandas as pd
import numpy as np

In [17]:
cars = pd.read_csv("data/vehicles.csv")

In [18]:
cars.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


How many Car models? 

In [19]:
### your code is here
len(cars["Model"].value_counts())
# there are 3608 different car models in the dataset

3608

group the data by the Make  using count function

In [20]:
### your code us here
cars.groupby("Make")["Make"].count().sort_values(ascending=False)

Make
Chevrolet                             3643
Ford                                  2946
Dodge                                 2360
GMC                                   2347
Toyota                                1836
                                      ... 
Excalibur Autos                          1
S and S Coach Company  E.p. Dutton       1
Environmental Rsch and Devp Corp         1
E. P. Dutton, Inc.                       1
Lambda Control Systems                   1
Name: Make, Length: 127, dtype: int64

Converting Grams/Mile to Grams/Km

1 Mile = 1.60934 Km

Converting Gallons to Liters

1 Gallon = 3.78541 Liters

What brand has the most cars?

In [21]:
### your code us here
cars["Make"].value_counts()

Chevrolet                             3643
Ford                                  2946
Dodge                                 2360
GMC                                   2347
Toyota                                1836
                                      ... 
Excalibur Autos                          1
S and S Coach Company  E.p. Dutton       1
Environmental Rsch and Devp Corp         1
E. P. Dutton, Inc.                       1
Lambda Control Systems                   1
Name: Make, Length: 127, dtype: int64

show the average CO2_Emission_Grams/Km  by Brand

In [22]:
### your code us here
cars.head(1)
cars["CO2 Emission Grams/Km"] =  cars["CO2 Emission Grams/Mile"].apply(lambda x: x/1.60934)
cars.groupby("Make")["CO2 Emission Grams/Km"].mean()

Make
AM General                     379.881345
ASC Incorporated               345.133719
Acura                          262.583000
Alfa Romeo                     288.287195
American Motors Corporation    314.264744
                                  ...    
Volkswagen                     244.038998
Volvo                          270.796572
Wallace Environmental          408.857065
Yugo                           221.251107
smart                          153.498052
Name: CO2 Emission Grams/Km, Length: 127, dtype: float64

show the average CO2_Emission_Grams/Km  by Brand ... sorted

In [23]:
### your code us here
cars.groupby("Make")["CO2 Emission Grams/Km"].mean().sort_values(ascending=False)

Make
Vector                                651.919248
Superior Coaches Div E.p. Dutton      552.213951
S and S Coach Company  E.p. Dutton    552.213951
Bugatti                               542.497235
Laforza Automobile Inc                502.012683
                                         ...    
MINI                                  194.935105
Daihatsu                              192.742404
Fiat                                  189.311494
smart                                 153.498052
Fisker                                105.011992
Name: CO2 Emission Grams/Km, Length: 127, dtype: float64

# (Optional) 

Use `pd.cut` or `pd.qcut` to create 4 groups (bins) of cars, by Year. We want to explore how cars have evolved decade by decade.

In [24]:
cars['Year'].describe()

count    35952.00000
mean      2000.71640
std         10.08529
min       1984.00000
25%       1991.00000
50%       2001.00000
75%       2010.00000
max       2017.00000
Name: Year, dtype: float64

In [25]:
cars["Year"]

0        1984
1        1984
2        1985
3        1985
4        1987
         ... 
35947    2013
35948    2014
35949    2015
35950    2016
35951    2016
Name: Year, Length: 35952, dtype: int64

In [26]:
cars["quartiles_by_year"] = pd.qcut(x=cars["Year"], q=4).reset_index().drop("index", axis=1)

### Did cars consume more gas in the eighties?

show the average City_Km/Liter by year_range

In [27]:
### your code is here
cars["City KmPL"] = cars["City MPG"].apply(lambda x: x*1.60934/3.78541)
cars["Highway KmPL"] = cars["Highway MPG"].apply(lambda x: x*1.60934/3.78541)
cars["Combined KmPL"] = cars["Combined MPG"].apply(lambda x: x*1.60934/3.78541)

cars.groupby("quartiles_by_year")["City KmPL"].mean()

quartiles_by_year
(1983.999, 1991.0]    7.326102
(1991.0, 2001.0]      7.210326
(2001.0, 2010.0]      7.208441
(2010.0, 2017.0]      8.394430
Name: City KmPL, dtype: float64

Which brands are more environment friendly?

In [31]:
cars.groupby(["quartiles_by_year", "Make"])[["Fuel Barrels/Year", "CO2 Emission Grams/Km"]].mean().sort_values(by="CO2 Emission Grams/Km",ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Fuel Barrels/Year,CO2 Emission Grams/Km
quartiles_by_year,Make,Unnamed: 2_level_1,Unnamed: 3_level_1
"(1983.999, 1991.0]",Lamborghini,43.051102,721.259038
"(1991.0, 2001.0]",Vector,38.912292,651.919248
"(1983.999, 1991.0]",Rolls-Royce,36.719711,615.185717
"(1983.999, 1991.0]",Aston Martin,36.145419,605.564292
"(2001.0, 2010.0]",Bugatti,32.961000,552.213951
...,...,...,...
"(2010.0, 2017.0]",Fiat,11.383588,189.311494
"(1983.999, 1991.0]",Daihatsu,11.299587,189.308267
"(2010.0, 2017.0]",smart,9.231674,153.543176
"(2001.0, 2010.0]",smart,9.155833,153.392764


Does the drivetrain affect fuel consumption?

In [33]:
## Your Code here
cars.groupby("Drivetrain")["Fuel Barrels/Year"].mean().sort_values()

Drivetrain
2-Wheel Drive, Front          11.771786
Front-Wheel Drive             14.266654
All-Wheel Drive               16.349672
4-Wheel Drive                 17.942952
Rear-Wheel Drive              19.587486
4-Wheel or All-Wheel Drive    20.484720
Part-time 4-Wheel Drive       20.628218
2-Wheel Drive                 21.069467
Name: Fuel Barrels/Year, dtype: float64

Do cars with automatic transmission consume more fuel than cars with manual transmission?

In [35]:
## Your Code here
cars["Automatic Transmission"] = cars["Transmission"].str.startswith("Auto")
cars.groupby("Automatic Transmission")["Fuel Barrels/Year"].mean()

Automatic Transmission
False    16.704904
True     18.043152
Name: Fuel Barrels/Year, dtype: float64

Use `groupby` and `agg` with different aggregation measures for different columns:

aggregate with average City_Km/Liter and the count of the Trans

In [38]:
## your code is here
cars.agg({"City KmPL": "mean", "Automatic Transmission": "count"})

City KmPL                     7.50213
Automatic Transmission    35952.00000
dtype: float64

aggregate with average City_Km/Liter and the minimum of the Trans

In [138]:
### your code is here
cars.groupby("Automatic Transmission").agg({"City KmPL": "mean", "Automatic Transmission": "min"})

Unnamed: 0_level_0,City KmPL,Automatic Transmission
Automatic Transmission,Unnamed: 1_level_1,Unnamed: 2_level_1
False,7.968348,False
True,7.278292,True
