# Exercises

For several of the following exercises, you'll need to load several datasets using the **pydataset** library. (If you get an error when trying to run the import below, use **pip** to install the **pydataset** package.)

In [4]:
from pydataset import data

When the instructions say to load a dataset, you can pass the name of the dataset as a string to the data function to load the dataset. You can also view the documentation for the data set by passing the show_doc keyword argument.

In [5]:
# data('mpg', show_doc=True) # view the documentation for the dataset

mpg = data('mpg') # load the dataset and store it in a variable

In [6]:
data('mpg', show_doc=True)

mpg

PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)

## Fuel economy data from 1999 and 2008 for 38 popular models of car

### Description

This dataset contains a subset of the fuel economy data that the EPA makes
available on http://fueleconomy.gov. It contains only models which had a new
release every year between 1999 and 2008 - this was used as a proxy for the
popularity of the car.

### Usage

    data(mpg)

### Format

A data frame with 234 rows and 11 variables

### Details

  * manufacturer. 

  * model. 

  * displ. engine displacement, in litres 

  * year. 

  * cyl. number of cylinders 

  * trans. type of transmission 

  * drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd 

  * cty. city miles per gallon 

  * hwy. highway miles per gallon 

  * fl. 

  * class. 




All the datasets loaded from the pydataset library will be pandas dataframes.

In [7]:
# .info(), .describe()
# .dtypes, .shape, .columns, .index

# mpg.columns = [col.upper() for col in mpg.columns]   #uppercase the columns
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


## 1. Copy the code from the lesson to create a dataframe full of student grades.

In [8]:
import pandas as pd
import numpy as np

np.random.seed(123)

students = ['Sally', 'Jane', 'Suzie', 'Billy', 'Ada', 'John', 'Thomas',
            'Marie', 'Albert', 'Richard', 'Isaac', 'Alan']

# randomly generate scores for each student for each subject
# note that all the values need to have the same length here
math_grades = np.random.randint(low=60, high=100, size=len(students))
english_grades = np.random.randint(low=60, high=100, size=len(students))
reading_grades = np.random.randint(low=60, high=100, size=len(students))

df = pd.DataFrame({'name': students,
                   'math': math_grades,
                   'english': english_grades,
                   'reading': reading_grades})

type(df)

pandas.core.frame.DataFrame

In [9]:
df

Unnamed: 0,name,math,english,reading
0,Sally,62,85,80
1,Jane,88,79,67
2,Suzie,94,74,95
3,Billy,98,96,88
4,Ada,77,92,98
5,John,79,76,93
6,Thomas,82,64,81
7,Marie,93,63,90
8,Albert,92,62,87
9,Richard,69,80,94


#### a. Create a column named passing_english that indicates whether each student has a passing grade in english.

In [10]:
df['passing_english'] = (df['english'] >= 70).astype(int)
df

Unnamed: 0,name,math,english,reading,passing_english
0,Sally,62,85,80,1
1,Jane,88,79,67,1
2,Suzie,94,74,95,1
3,Billy,98,96,88,1
4,Ada,77,92,98,1
5,John,79,76,93,1
6,Thomas,82,64,81,0
7,Marie,93,63,90,0
8,Albert,92,62,87,0
9,Richard,69,80,94,1


In [11]:
# alternative
df['passing_english'] = df['english'] >= 70
df

Unnamed: 0,name,math,english,reading,passing_english
0,Sally,62,85,80,True
1,Jane,88,79,67,True
2,Suzie,94,74,95,True
3,Billy,98,96,88,True
4,Ada,77,92,98,True
5,John,79,76,93,True
6,Thomas,82,64,81,False
7,Marie,93,63,90,False
8,Albert,92,62,87,False
9,Richard,69,80,94,True


#### b. Sort the english grades by the passing_english column. How are duplicates handled?

In [12]:
df.sort_values('passing_english')
# Duplicates are ordered by ENGLISH grade in Descending order

Unnamed: 0,name,math,english,reading,passing_english
6,Thomas,82,64,81,False
7,Marie,93,63,90,False
8,Albert,92,62,87,False
11,Alan,92,62,72,False
0,Sally,62,85,80,True
1,Jane,88,79,67,True
2,Suzie,94,74,95,True
3,Billy,98,96,88,True
4,Ada,77,92,98,True
5,John,79,76,93,True


#### c. Sort the english grades first by passing_english and then by student name. All the students that are failing english should be first, and within the students that are failing english they should be ordered alphabetically. The same should be true for the students passing english. 

### (Hint: you can pass a list to the .sort_values method)

In [13]:
# sorts by passing english then resorts entire dataframe by name
df.sort_values('passing_english').sort_values('name')

#I'm running it w/out the first 'sort_values' and it is giving the same result...

Unnamed: 0,name,math,english,reading,passing_english
4,Ada,77,92,98,True
11,Alan,92,62,72,False
8,Albert,92,62,87,False
3,Billy,98,96,88,True
10,Isaac,92,99,93,True
1,Jane,88,79,67,True
5,John,79,76,93,True
7,Marie,93,63,90,False
9,Richard,69,80,94,True
0,Sally,62,85,80,True


In [14]:
df.sort_values(['passing_english', 'name']) #here it is working with a list!

Unnamed: 0,name,math,english,reading,passing_english
11,Alan,92,62,72,False
8,Albert,92,62,87,False
7,Marie,93,63,90,False
6,Thomas,82,64,81,False
4,Ada,77,92,98,True
3,Billy,98,96,88,True
10,Isaac,92,99,93,True
1,Jane,88,79,67,True
5,John,79,76,93,True
9,Richard,69,80,94,True


#### d. Sort the english grades first by passing_english, and then by the actual english grade, similar to how we did in the last step.

In [15]:
df.sort_values(['passing_english','english'])

Unnamed: 0,name,math,english,reading,passing_english
8,Albert,92,62,87,False
11,Alan,92,62,72,False
7,Marie,93,63,90,False
6,Thomas,82,64,81,False
2,Suzie,94,74,95,True
5,John,79,76,93,True
1,Jane,88,79,67,True
9,Richard,69,80,94,True
0,Sally,62,85,80,True
4,Ada,77,92,98,True


#### e. Calculate each students overall grade and add it as a column on the dataframe. The overall grade is the average of the math, english, and reading grades.

In [16]:
df['overall_grade'] = round((df.math + df.english + df.reading)/3,1)
df

Unnamed: 0,name,math,english,reading,passing_english,overall_grade
0,Sally,62,85,80,True,75.7
1,Jane,88,79,67,True,78.0
2,Suzie,94,74,95,True,87.7
3,Billy,98,96,88,True,94.0
4,Ada,77,92,98,True,89.0
5,John,79,76,93,True,82.7
6,Thomas,82,64,81,False,75.7
7,Marie,93,63,90,False,82.0
8,Albert,92,62,87,False,80.3
9,Richard,69,80,94,True,81.0


In [17]:
# another way
round(df[['math', 'english','reading']].mean(axis=1),1)

#bit more programatic since we didn't have to hard-code

0     75.7
1     78.0
2     87.7
3     94.0
4     89.0
5     82.7
6     75.7
7     82.0
8     80.3
9     81.0
10    94.7
11    75.3
dtype: float64

## 2. Load the mpg dataset. Read the documentation for the dataset and use it for the following questions:

In [18]:
# data('mpg', show_doc=True) # view the documentation for the dataset

mpg = data('mpg') # load the dataset and store it in a variable

In [19]:
data('mpg', show_doc=True)

mpg

PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)

## Fuel economy data from 1999 and 2008 for 38 popular models of car

### Description

This dataset contains a subset of the fuel economy data that the EPA makes
available on http://fueleconomy.gov. It contains only models which had a new
release every year between 1999 and 2008 - this was used as a proxy for the
popularity of the car.

### Usage

    data(mpg)

### Format

A data frame with 234 rows and 11 variables

### Details

  * manufacturer. 

  * model. 

  * displ. engine displacement, in litres 

  * year. 

  * cyl. number of cylinders 

  * trans. type of transmission 

  * drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd 

  * cty. city miles per gallon 

  * hwy. highway miles per gallon 

  * fl. 

  * class. 




In [20]:
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


### How many rows and columns are there?

In [21]:
mpg.shape #attribute

(234, 11)

### What are the data types of each column?

In [22]:
mpg.dtypes #attribute

manufacturer     object
model            object
displ           float64
year              int64
cyl               int64
trans            object
drv              object
cty               int64
hwy               int64
fl               object
class            object
dtype: object

### Summarize the dataframe with .info and .describe

In [23]:
mpg.info() #method

<class 'pandas.core.frame.DataFrame'>
Index: 234 entries, 1 to 234
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   manufacturer  234 non-null    object 
 1   model         234 non-null    object 
 2   displ         234 non-null    float64
 3   year          234 non-null    int64  
 4   cyl           234 non-null    int64  
 5   trans         234 non-null    object 
 6   drv           234 non-null    object 
 7   cty           234 non-null    int64  
 8   hwy           234 non-null    int64  
 9   fl            234 non-null    object 
 10  class         234 non-null    object 
dtypes: float64(1), int64(4), object(6)
memory usage: 21.9+ KB


In [24]:
mpg.describe() #method

Unnamed: 0,displ,year,cyl,cty,hwy
count,234.0,234.0,234.0,234.0,234.0
mean,3.471795,2003.5,5.888889,16.858974,23.440171
std,1.291959,4.509646,1.611534,4.255946,5.954643
min,1.6,1999.0,4.0,9.0,12.0
25%,2.4,1999.0,4.0,14.0,18.0
50%,3.3,2003.5,6.0,17.0,24.0
75%,4.6,2008.0,8.0,19.0,27.0
max,7.0,2008.0,8.0,35.0,44.0


###  Rename the cty column to city.

In [25]:
mpg.rename(columns={'cty': 'city'}, inplace=True) 
#the inplace key is to make sure the change is here to stay rather than producing a new dataframe

# you can also save by assigning it --> mpg = mpg.rename(columns={'cty': 'city'}, inplace=True) <--
        #Misty recommends this method because not all functions have a key argument option


###  Rename the hwy column to highway.

In [26]:
mpg.rename(columns={'hwy': 'highway'}, inplace=True)
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


###  Do any cars have better city mileage than highway mileage?

In [27]:
# I was confused at first because I tried to compare the different rows but this here shows empty because
# no car has a better city mileage than their own highway mileage
mpg[mpg['city'] > mpg['highway']]

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class


In [28]:
(mpg.city > mpg.highway).sum() # count them all up

0

In [29]:
len(mpg [mpg.city > mpg.highway]) 

0

In [30]:
(mpg.city > mpg.highway).any() # answers the question perfectly

False

## Create a column named mileage_difference this column should contain the difference between highway and city mileage for each car.

In [31]:
mpg['mileage_difference'] = (mpg.highway - mpg.city)

# mpg['mileage_difference'] = mpg['highway'] - mpg['city']    #another way#

mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10
...,...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize,9
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize,8
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize,10
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize,8


## Which car (or cars) has the highest mileage difference?

In [32]:
mpg['mileage_difference'].idxmax() #result is the index number

107

In [33]:
mpg['mileage_difference'].max() #result is the mileage_difference 

12

In [34]:
# idxmax() function

mpg.loc[mpg['mileage_difference'].idxmax()] #mpg.loc shows a single row (but what if several?)

manufacturer               honda
model                      civic
displ                        1.8
year                        2008
cyl                            4
trans                   auto(l5)
drv                            f
city                          24
highway                       36
fl                             c
class                 subcompact
mileage_difference            12
Name: 107, dtype: object

In [35]:
mpg[mpg['mileage_difference'] == mpg['mileage_difference'].max()] #this shows multiple results since 
#used index operator                                              #there was more than 1 car.

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
107,honda,civic,1.8,2008,4,auto(l5),f,24,36,c,subcompact,12
223,volkswagen,new beetle,1.9,1999,4,auto(l4),f,29,41,d,subcompact,12


In [36]:
mpg.sort_values('mileage_difference', ascending = False).head(2) ###CHECK REPO###

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
107,honda,civic,1.8,2008,4,auto(l5),f,24,36,c,subcompact,12
223,volkswagen,new beetle,1.9,1999,4,auto(l4),f,29,41,d,subcompact,12


## Which compact class car has the LOWEST highway mileage? The best?

In [63]:
mpg['class'].value_counts()

class
suv           62
compact       47
midsize       41
subcompact    35
pickup        33
minivan       11
2seater        5
Name: count, dtype: int64

In [65]:
mpg[mpg['class'] == 'compact'] #shows all the compact class cars

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10,21.0
6,audi,a4,2.8,1999,6,manual(m5),f,18,26,p,compact,8,22.0
7,audi,a4,3.1,2008,6,auto(av),f,18,27,p,compact,9,22.5
8,audi,a4 quattro,1.8,1999,4,manual(m5),4,18,26,p,compact,8,22.0
9,audi,a4 quattro,1.8,1999,4,auto(l5),4,16,25,p,compact,9,20.5
10,audi,a4 quattro,2.0,2008,4,manual(m6),4,20,28,p,compact,8,24.0


In [38]:
#compact class with the lowest highway mileage
mpg[mpg['class'] == 'compact'].loc[mpg[mpg['class'] == 'compact']['highway'].idxmin()] 
#gets the single result(but what if many?)

manufacturer          volkswagen
model                      jetta
displ                        2.8
year                        1999
cyl                            6
trans                   auto(l4)
drv                            f
city                          16
highway                       23
fl                             r
class                    compact
mileage_difference             7
Name: 220, dtype: object

In [39]:
#I like to clean it up sometimes
compact_cars = mpg[mpg['class'] == 'compact']

compact_cars.loc[compact_cars['highway'].idxmin()]

manufacturer          volkswagen
model                      jetta
displ                        2.8
year                        1999
cyl                            6
trans                   auto(l4)
drv                            f
city                          16
highway                       23
fl                             r
class                    compact
mileage_difference             7
Name: 220, dtype: object

In [40]:
compact_cars[compact_cars['highway'] == compact_cars['highway'].min()] #shows there truly is only one!

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
220,volkswagen,jetta,2.8,1999,6,auto(l4),f,16,23,r,compact,7


In [67]:
#another way for worst highway mileage

# mpg [compact_cars].sort_values('highway').head(1) ###CHECK REPO###
compact_cars.sort_values('highway').head(1)

#I made a compact_cars variable so it looks simpler
#normally:
    # mpg [mpg['class'] == 'compact'].sort_values('highway').head(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
220,volkswagen,jetta,2.8,1999,6,auto(l4),f,16,23,r,compact,7


## Which compact class car has the BEST highway mileage?

In [42]:
compact_cars[compact_cars['highway'] == compact_cars['highway'].max()] #shows there truly is only one nice & clean.

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
213,volkswagen,jetta,1.9,1999,4,manual(m5),f,33,44,d,compact,11


## Create a column named average_mileage that is the mean of the city and highway mileage.

In [43]:
mpg['average_mileage'] = (mpg['city'] + mpg['highway']) / 2
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10,21.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize,9,23.5
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize,8,25.0
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize,10,21.0
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize,8,22.0


### Which dodge car has the BEST average mileage? The worst?

In [44]:
#dodge car with the best average mileage

#1st make a variable to make it easier
dodge_car = mpg[mpg['manufacturer'] == 'dodge']

dodge_car[dodge_car['average_mileage'] == dodge_car['average_mileage'].max()]

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
38,dodge,caravan 2wd,2.4,1999,4,auto(l3),f,18,24,r,minivan,6,21.0


In [45]:
#another way to see a single result

dodge_car.loc[dodge_car['average_mileage'].idxmax()]

manufacturer                dodge
model                 caravan 2wd
displ                         2.4
year                         1999
cyl                             4
trans                    auto(l3)
drv                             f
city                           18
highway                        24
fl                              r
class                     minivan
mileage_difference              6
average_mileage              21.0
Name: 38, dtype: object

In [46]:
###MISTY'S
dodge_car.sort_values('average_mileage', ascending=False).head(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
38,dodge,caravan 2wd,2.4,1999,4,auto(l3),f,18,24,r,minivan,6,21.0


### Which dodge car has the WORST average mileage?

In [47]:
dodge_car[dodge_car['average_mileage'] == dodge_car['average_mileage'].min()] #tied with a few others

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
55,dodge,dakota pickup 4wd,4.7,2008,8,auto(l5),4,9,12,e,pickup,3,10.5
60,dodge,durango 4wd,4.7,2008,8,auto(l5),4,9,12,e,suv,3,10.5
66,dodge,ram 1500 pickup 4wd,4.7,2008,8,auto(l5),4,9,12,e,pickup,3,10.5
70,dodge,ram 1500 pickup 4wd,4.7,2008,8,manual(m6),4,9,12,e,pickup,3,10.5


In [48]:
###MISTY'S
dodge_car.sort_values('average_mileage', ascending=False).tail(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
70,dodge,ram 1500 pickup 4wd,4.7,2008,8,manual(m6),4,9,12,e,pickup,3,10.5


## 3. Load the Mammals dataset. Read the documentation for it, and use the data to answer these questions:

In [49]:
dataset_names = data()
dataset_names

Unnamed: 0,dataset_id,title
0,AirPassengers,Monthly Airline Passenger Numbers 1949-1960
1,BJsales,Sales Data with Leading Indicator
2,BOD,Biochemical Oxygen Demand
3,Formaldehyde,Determination of Formaldehyde
4,HairEyeColor,Hair and Eye Color of Statistics Students
...,...,...
752,VerbAgg,Verbal Aggression item responses
753,cake,Breakage Angle of Chocolate Cakes
754,cbpp,Contagious bovine pleuropneumonia
755,grouseticks,Data on red grouse ticks from Elston et al. 2001


In [50]:
mam = data('Mammals')
mam

Unnamed: 0,weight,speed,hoppers,specials
1,6000.0,35.0,False,False
2,4000.0,26.0,False,False
3,3000.0,25.0,False,False
4,1400.0,45.0,False,False
5,400.0,70.0,False,False
6,350.0,70.0,False,False
7,300.0,64.0,False,False
8,260.0,70.0,False,False
9,250.0,40.0,False,False
10,3800.0,25.0,False,True


### How many rows and columns are there?

In [51]:
mam.shape

(107, 4)

### What are the data types?

In [52]:
mam.dtypes

weight      float64
speed       float64
hoppers        bool
specials       bool
dtype: object

### Summarize the dataframe with .info and .describe

In [53]:
mam.info

<bound method DataFrame.info of        weight  speed  hoppers  specials
1    6000.000   35.0    False     False
2    4000.000   26.0    False     False
3    3000.000   25.0    False     False
4    1400.000   45.0    False     False
5     400.000   70.0    False     False
6     350.000   70.0    False     False
7     300.000   64.0    False     False
8     260.000   70.0    False     False
9     250.000   40.0    False     False
10   3800.000   25.0    False      True
11   1000.000   60.0    False     False
12    900.000   70.0    False     False
13    900.000   56.0    False     False
14    800.000   29.0    False     False
15    750.000   57.0    False     False
16    500.000   32.0    False     False
17    450.000   56.0    False     False
18    300.000   72.0    False     False
19    300.000   90.0    False     False
20    250.000   80.0    False     False
21    250.000   56.0    False     False
22    170.000   80.0    False     False
23    150.000   48.0    False     False
24    13

In [54]:
mam.describe

<bound method NDFrame.describe of        weight  speed  hoppers  specials
1    6000.000   35.0    False     False
2    4000.000   26.0    False     False
3    3000.000   25.0    False     False
4    1400.000   45.0    False     False
5     400.000   70.0    False     False
6     350.000   70.0    False     False
7     300.000   64.0    False     False
8     260.000   70.0    False     False
9     250.000   40.0    False     False
10   3800.000   25.0    False      True
11   1000.000   60.0    False     False
12    900.000   70.0    False     False
13    900.000   56.0    False     False
14    800.000   29.0    False     False
15    750.000   57.0    False     False
16    500.000   32.0    False     False
17    450.000   56.0    False     False
18    300.000   72.0    False     False
19    300.000   90.0    False     False
20    250.000   80.0    False     False
21    250.000   56.0    False     False
22    170.000   80.0    False     False
23    150.000   48.0    False     False
24    

## What is the the weight of the fastest animal?

In [55]:
###MISTY'S

mam.sort_values('speed', ascending=False).head(1).weight

53    55.0
Name: weight, dtype: float64

## What is the overal percentage of specials?

In [56]:
mam.specials

1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10      True
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
30     False
31     False
32     False
33     False
34     False
35     False
36     False
37     False
38     False
39     False
40     False
41     False
42     False
43     False
44     False
45     False
46     False
47     False
48     False
49     False
50     False
51     False
52     False
53     False
54     False
55     False
56     False
57     False
58     False
59      True
60      True
61     False
62     False
63     False
64     False
65      True
66      True
67     False
68      True
69      True
70      True
71     False
72     False
73     False
74     False
75     False
76     False
77     False

In [57]:
#MISTY'S
#reminder not to overthink it, nice and simple

round(mam.specials.mean() * 100, 2)      #adds up ALL the trues and gets the mean!

9.35

* How many animals are hoppers that are above the median speed? What percentage is this?

In [58]:
#MISTY'S
#meadian speed

mam.speed.median()

48.0

In [59]:
#only hoppers
hoppers = mam[mam.hoppers]
hoppers

Unnamed: 0,weight,speed,hoppers,specials
82,0.056,21.0,True,False
85,0.035,32.0,True,False
86,0.035,14.0,True,False
96,4.6,64.0,True,False
97,4.4,72.0,True,False
98,4.0,72.0,True,False
99,3.5,56.0,True,False
100,2.0,64.0,True,False
101,1.9,56.0,True,False
102,1.5,50.0,True,False


In [60]:
# counting the number of hoppers with wspeeds greater than the median
((hoppers.speed) > mam.speed.median()).sum()

7

In [61]:
#this is everyton
len(mam)

107

In [62]:
#made hoppers variable to make it easier
#dividing to get the percentage

((hoppers.speed) > mam.speed.median()).sum() / len(mam) 

0.06542056074766354

# Awesome Bonus

For much more practice with pandas, go to https://github.com/guipsamora/pandas_exercises and clone the repo down to your laptop. To clone a repository:

* Copy the SSH address of the repository

* Run cd ~/codeup-data-science in the terminal

* Run git clone git@github.com:guipsamora/pandas_exercises.git

* Run cd pandas_exercises

* Run git remote remove origin (so you won't accidentally try to push your work to guipsamora's repo_

Congratulations! You have cloned guipsamora's pandas exercises to your computer. Now you need to make a new, blank, repository on GitHub.

* Go to https://github.com/new to make a new repo. Name it pandas_exercises.

* DO NOT check any check boxes. We need a blank, empty repo.

* Finally, follow the directions to "push an existing repository from the command line" so that you can push up your changes to your own account.

* Now do your own work, add it, commit it, and push it!