# Dataframe Exercises

<hr style="border:2px solid gray">

All the datasets loaded from the pydataset library will be pandas dataframes.

1. Copy the code from the lesson to create a dataframe full of student grades.

- a. Create a column named passing_english that indicates whether each student has a passing grade in english.
- b. Sort the english grades by the passing_english column. How are duplicates handled?
- c. Sort the english grades first by passing_english and then by student name. All the students that are failing english should be first, and within the students that are failing english they should be ordered alphabetically. The same should be true for the students passing english. (Hint: you can pass a list to the .sort_values method)
- d. Sort the english grades first by passing_english, and then by the actual english grade, similar to how we did in the last step.
- e. Calculate each students overall grade and add it as a column on the dataframe. The overall grade is the average of the math, english, and reading grades.
<hr style="border:0.5px solid black">

2. Load the ```mpg``` dataset. Read the documentation for the dataset and use it for the following questions:

- a. How many rows and columns are there?
- b. What are the data types of each column?
- c. Summarize the dataframe with .info and .describe
- d. Rename the cty column to city.
- e. Rename the hwy column to highway.
- f. Do any cars have better city mileage than highway mileage?
- g. Create a column named mileage_difference this column should contain the difference between highway and city mileage for each car.
- h. Which car (or cars) has the highest mileage difference?
- i. Which compact class car has the lowest highway mileage? The best?
- j. Create a column named average_mileage that is the mean of the city and highway mileage.
- k. Which dodge car has the best average mileage? The worst?
<hr style="border:0.5px solid black">

3. Load the Mammals dataset. Read the documentation for it, and use the data to answer these questions:

- a. How many rows and columns are there?
- b. What are the data types?
- c. Summarize the dataframe with .info and .describe
- d. What is the the weight of the fastest animal?
- e. What is the overal percentage of specials?
- f. How many animals are hoppers that are above the median speed? What percentage is this?

In [1]:
import pandas as pd
import numpy as np
from pydataset import data

<hr style="border:1px solid black">
<hr style="border:1px solid black">

### #1. Copy the code from the lesson to create a dataframe full of student grades.

In [2]:
np.random.seed(123)

students = ['Sally', 'Jane', 'Suzie', 'Billy', 'Ada', 'John', 'Thomas',
            'Marie', 'Albert', 'Richard', 'Isaac', 'Alan']

# randomly generate scores for each student for each subject
# note that all the values need to have the same length here
math_grades = np.random.randint(low=60, high=100, size=len(students))
english_grades = np.random.randint(low=60, high=100, size=len(students))
reading_grades = np.random.randint(low=60, high=100, size=len(students))

students_df = pd.DataFrame({'name': students,
                   'math': math_grades,
                   'english': english_grades,
                   'reading': reading_grades})

In [3]:
#look at our dataframe
students_df.head()

Unnamed: 0,name,math,english,reading
0,Sally,62,85,80
1,Jane,88,79,67
2,Suzie,94,74,95
3,Billy,98,96,88
4,Ada,77,92,98


In [4]:
#what columns, datatypes, and nulls do we have
students_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     12 non-null     object
 1   math     12 non-null     int64 
 2   english  12 non-null     int64 
 3   reading  12 non-null     int64 
dtypes: int64(3), object(1)
memory usage: 512.0+ bytes


<b>a. Create a column named passing_english that indicates whether each student has a passing grade in english.

In [5]:
#new column- greater than 70 grade in english
students_df['passing_english'] = students_df['english'] > 70

In [6]:
#another way using .assign
students_df.assign(passing_english=students_df.english > 70)

Unnamed: 0,name,math,english,reading,passing_english
0,Sally,62,85,80,True
1,Jane,88,79,67,True
2,Suzie,94,74,95,True
3,Billy,98,96,88,True
4,Ada,77,92,98,True
5,John,79,76,93,True
6,Thomas,82,64,81,False
7,Marie,93,63,90,False
8,Albert,92,62,87,False
9,Richard,69,80,94,True


In [7]:
#take a look at our new column
students_df.head()

Unnamed: 0,name,math,english,reading,passing_english
0,Sally,62,85,80,True
1,Jane,88,79,67,True
2,Suzie,94,74,95,True
3,Billy,98,96,88,True
4,Ada,77,92,98,True


<hr style="border:0.5px solid grey">

<b>b. Sort the english grades by the passing_english column. How are duplicates handled?

In [8]:
students_df.sort_values('passing_english')

Unnamed: 0,name,math,english,reading,passing_english
6,Thomas,82,64,81,False
7,Marie,93,63,90,False
8,Albert,92,62,87,False
11,Alan,92,62,72,False
0,Sally,62,85,80,True
1,Jane,88,79,67,True
2,Suzie,94,74,95,True
3,Billy,98,96,88,True
4,Ada,77,92,98,True
5,John,79,76,93,True


In [9]:
#how many students are passing vs failing
students_df.value_counts('passing_english')

passing_english
True     8
False    4
dtype: int64

<hr style="border:0.5px solid grey">

<b>c. Sort the english grades first by passing_english and then by student name. All the students that are failing english should be first, and within the students that are failing english they should be ordered alphabetically. The same should be true for the students passing english. (Hint: you can pass a list to the .sort_values method)

In [12]:
students_df.sort_values(['passing_english', 'name'], ascending =(True, True))

Unnamed: 0,name,math,english,reading,passing_english
4,Ada,77,92,98,True
3,Billy,98,96,88,True
10,Isaac,92,99,93,True
1,Jane,88,79,67,True
5,John,79,76,93,True
9,Richard,69,80,94,True
0,Sally,62,85,80,True
2,Suzie,94,74,95,True
11,Alan,92,62,72,False
8,Albert,92,62,87,False


<hr style="border:0.5px solid grey">

<b>d. Sort the english grades first by passing_english, and then by the actual english grade, similar to how we did in the last step.

In [13]:
students_df.sort_values(['passing_english', 'english'], ascending =(True, True))

Unnamed: 0,name,math,english,reading,passing_english
8,Albert,92,62,87,False
11,Alan,92,62,72,False
7,Marie,93,63,90,False
6,Thomas,82,64,81,False
2,Suzie,94,74,95,True
5,John,79,76,93,True
1,Jane,88,79,67,True
9,Richard,69,80,94,True
0,Sally,62,85,80,True
4,Ada,77,92,98,True


<hr style="border:0.5px solid grey">

<b>e. Calculate each students overall grade and add it as a column on the dataframe. The overall grade is the average of the math, english, and reading grades.

In [17]:
#new column- average of all grades
students_df['overall_grade'] = (students_df['english'] +students_df['math'] +students_df['reading'])/3

In [15]:
#another way using .mean()
students_df['overall_grade']= students_df[['english','math','reading']].mean(axis=1)

In [18]:
#look at this
students_df.head()

Unnamed: 0,name,math,english,reading,passing_english,overall_grade
0,Sally,62,85,80,True,75.666667
1,Jane,88,79,67,True,78.0
2,Suzie,94,74,95,True,87.666667
3,Billy,98,96,88,True,94.0
4,Ada,77,92,98,True,89.0


<hr style="border:1px solid black">

### #2. Load the ```mpg``` dataset. Read the documentation for the dataset and use it for the following questions:

In [19]:
#assign the dataframe
mpg_df = data('mpg')

In [20]:
#look at the docs
data('mpg', show_doc=True)

mpg

PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)

## Fuel economy data from 1999 and 2008 for 38 popular models of car

### Description

This dataset contains a subset of the fuel economy data that the EPA makes
available on http://fueleconomy.gov. It contains only models which had a new
release every year between 1999 and 2008 - this was used as a proxy for the
popularity of the car.

### Usage

    data(mpg)

### Format

A data frame with 234 rows and 11 variables

### Details

  * manufacturer. 

  * model. 

  * displ. engine displacement, in litres 

  * year. 

  * cyl. number of cylinders 

  * trans. type of transmission 

  * drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd 

  * cty. city miles per gallon 

  * hwy. highway miles per gallon 

  * fl. 

  * class. 




In [21]:
#look at the dataframe
mpg_df.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


<b>a. How many rows and columns are there?

- <b>Answer</b>: rows- 234 , columns- 11

In [22]:
mpg_df.shape

(234, 11)

<hr style="border:0.5px solid grey">

<b>b. What are the data types of each column?

In [23]:
mpg_df.dtypes

manufacturer     object
model            object
displ           float64
year              int64
cyl               int64
trans            object
drv              object
cty               int64
hwy               int64
fl               object
class            object
dtype: object

<hr style="border:0.5px solid grey">

<b>c. Summarize the dataframe with .info and .describe

In [24]:
#what datatypes, columns, and nulls do I have
mpg_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 234 entries, 1 to 234
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   manufacturer  234 non-null    object 
 1   model         234 non-null    object 
 2   displ         234 non-null    float64
 3   year          234 non-null    int64  
 4   cyl           234 non-null    int64  
 5   trans         234 non-null    object 
 6   drv           234 non-null    object 
 7   cty           234 non-null    int64  
 8   hwy           234 non-null    int64  
 9   fl            234 non-null    object 
 10  class         234 non-null    object 
dtypes: float64(1), int64(4), object(6)
memory usage: 21.9+ KB


In [25]:
#statistical summary
mpg_df.describe()

Unnamed: 0,displ,year,cyl,cty,hwy
count,234.0,234.0,234.0,234.0,234.0
mean,3.471795,2003.5,5.888889,16.858974,23.440171
std,1.291959,4.509646,1.611534,4.255946,5.954643
min,1.6,1999.0,4.0,9.0,12.0
25%,2.4,1999.0,4.0,14.0,18.0
50%,3.3,2003.5,6.0,17.0,24.0
75%,4.6,2008.0,8.0,19.0,27.0
max,7.0,2008.0,8.0,35.0,44.0


<hr style="border:0.5px solid grey">

<b>d. Rename the cty column to city.

In [26]:
#reassign the df with the rename a column
mpg_df = mpg_df.rename(columns={'cty':'city'})

#take a look
mpg_df

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


<hr style="border:0.5px solid grey">

<b>e. Rename the hwy column to highway.

In [27]:
#reassign the df with the rename a column
mpg_df = mpg_df.rename(columns={'hwy':'highway'})

#take a look
mpg_df

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


<hr style="border:0.5px solid grey">

<b>f. Do any cars have better city mileage than highway mileage?

- <b>Answer</b>: No

In [28]:
#are there ANY instances where city is greater than highway?
(mpg_df['city'] > mpg_df['highway']).any()

False

In [29]:
#another way
city_greater = (mpg_df['city'] > mpg_df['highway']).value_counts()
city_greater

False    234
dtype: int64

<hr style="border:0.5px solid grey">

<b>g. Create a column named mileage_difference this column should contain the difference between highway and city mileage for each car.

In [30]:
#new column of mileage difference
mpg_df['mileage_difference'] = mpg_df['highway'] -mpg_df['city']

#take a look
mpg_df.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10


<hr style="border:0.5px solid grey">

<b>h. Which car (or cars) has the highest mileage difference?

- <b>Answer</b>: 
    - 2008 Honda Civic (mileage diff: 12 mpg)
    - 1999 VW New Beetle (mileage diff: 12 mpg)


In [33]:
#pandas already sorts the values from greatest to least
# .tail will give us the last instance in the dataframe
mpg_df.sort_values('mileage_difference').tail(2)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
223,volkswagen,new beetle,1.9,1999,4,auto(l4),f,29,41,d,subcompact,12
107,honda,civic,1.8,2008,4,auto(l5),f,24,36,c,subcompact,12


In [36]:
#another way
mpg_df.sort_values(['mileage_difference'], ascending =(False))

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
107,honda,civic,1.8,2008,4,auto(l5),f,24,36,c,subcompact,12
223,volkswagen,new beetle,1.9,1999,4,auto(l4),f,29,41,d,subcompact,12
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11
229,volkswagen,passat,1.8,1999,4,auto(l5),f,18,29,p,midsize,11
36,chevrolet,malibu,3.5,2008,6,auto(l4),f,18,29,r,midsize,11
...,...,...,...,...,...,...,...,...,...,...,...,...
80,ford,explorer 4wd,4.0,1999,6,auto(l5),4,14,17,r,suv,3
138,mercury,mountaineer 4wd,4.0,1999,6,auto(l5),4,14,17,r,suv,3
177,toyota,4runner 4wd,3.4,1999,6,manual(m5),4,15,17,r,suv,2
152,nissan,pathfinder 4wd,3.3,1999,6,manual(m5),4,15,17,r,suv,2


<hr style="border:0.5px solid grey">

<b>i. Which compact class car has the lowest highway mileage? The best?

<b>Answer:</b>
   - Worst Highway Mileage: 1999 VW Jetta - 4 cyl(Highway: 23)
   - Best Highway Mileage: 1999 VW Jetta - 6 cyl (Highway: 44)

In [37]:
#create a variable of ONLY compact cars
compact_cars = mpg_df[mpg_df['class'] == 'compact']

In [38]:
#take a look
compact_cars.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10


In [39]:
#lowest (worst) highway mileage
compact_cars.sort_values('highway').head(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
220,volkswagen,jetta,2.8,1999,6,auto(l4),f,16,23,r,compact,7


In [40]:
#highest (best) highway mileage
compact_cars.sort_values('highway').tail(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference
213,volkswagen,jetta,1.9,1999,4,manual(m5),f,33,44,d,compact,11


<hr style="border:0.5px solid grey">

<b>j. Create a column named average_mileage that is the mean of the city and highway mileage.

In [41]:
#create a column for average mileage
mpg_df['average_mileage']= (mpg_df['highway'] + mpg_df['city'])/2

#take a look at the new column
mpg_df

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,11,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,8,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,11,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,9,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,10,21.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize,9,23.5
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize,8,25.0
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize,10,21.0
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize,8,22.0


<hr style="border:0.5px solid grey">

<b>k. Which dodge car has the best average mileage? The worst?</b>

<b>Answer:</b>
   - Best Mileage: 1999 Dodge Caravan (21)
   - Wort Mileage: 2008 Dodge Ram 1500 (10.5)
   

In [42]:
#assign variable to ONLY dodge
dodge_cars = mpg_df[mpg_df['manufacturer'] == 'dodge']
dodge_cars.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
38,dodge,caravan 2wd,2.4,1999,4,auto(l3),f,18,24,r,minivan,6,21.0
39,dodge,caravan 2wd,3.0,1999,6,auto(l4),f,17,24,r,minivan,7,20.5
40,dodge,caravan 2wd,3.3,1999,6,auto(l4),f,16,22,r,minivan,6,19.0
41,dodge,caravan 2wd,3.3,1999,6,auto(l4),f,16,22,r,minivan,6,19.0
42,dodge,caravan 2wd,3.3,2008,6,auto(l4),f,17,24,r,minivan,7,20.5


In [43]:
#highest (best) average_mileage
dodge_cars.sort_values('average_mileage').tail(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
38,dodge,caravan 2wd,2.4,1999,4,auto(l3),f,18,24,r,minivan,6,21.0


In [44]:
#lowest (worst) average_mileage
dodge_cars.sort_values('average_mileage').head(1)

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,city,highway,fl,class,mileage_difference,average_mileage
70,dodge,ram 1500 pickup 4wd,4.7,2008,8,manual(m6),4,9,12,e,pickup,3,10.5


<hr style="border:1px solid black">

### #3. Load the ```mammals``` dataset. Read the documentation for it, and use the data to answer these questions:

In [45]:
#assign dataframe
mammals_df = data('Mammals')

In [46]:
#let's look at the docs to see what it holds
data('Mammals', show_doc=True)

Mammals

PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)

## Garland(1983) Data on Running Speed of Mammals

### Description

Observations on the maximal running speed of mammal species and their body
mass.

### Usage

    data(Mammals)

### Format

A data frame with 107 observations on the following 4 variables.

weight

Body mass in Kg for "typical adult sizes"

speed

Maximal running speed (fastest sprint velocity on record)

hoppers

logical variable indicating animals that ambulate by hopping, e.g. kangaroos

specials

logical variable indicating special animals with "lifestyles in which speed
does not figure as an important factor": Hippopotamus, raccoon (Procyon),
badger (Meles), coati (Nasua), skunk (Mephitis), man (Homo), porcupine
(Erithizon), oppossum (didelphis), and sloth (Bradypus)

### Details

Used by Chappell (1989) and Koenker, Ng and Portnoy (1994) to illustrate the
fitting of piecewise linear curves.

### Source

Garland, T. (

In [47]:
#take a look at the dataframe
mammals_df.head()

Unnamed: 0,weight,speed,hoppers,specials
1,6000.0,35.0,False,False
2,4000.0,26.0,False,False
3,3000.0,25.0,False,False
4,1400.0,45.0,False,False
5,400.0,70.0,False,False


<b>a. How many rows and columns are there?</b>
- <b>Answer</b> 
    - rows: 107
    - columns: 4

In [48]:
#take a look at how many 
mammals_df.shape

(107, 4)

<hr style="border:0.5px solid grey">

<b>b. What are the data types?

In [49]:
mammals_df.dtypes

weight      float64
speed       float64
hoppers        bool
specials       bool
dtype: object

<hr style="border:0.5px solid grey">

<b>c. Summarize the dataframe with .info and .describe

In [50]:
#what columns, dtypes, and nulls do we have
mammals_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 107 entries, 1 to 107
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   weight    107 non-null    float64
 1   speed     107 non-null    float64
 2   hoppers   107 non-null    bool   
 3   specials  107 non-null    bool   
dtypes: bool(2), float64(2)
memory usage: 2.7 KB


In [51]:
#statistical summary
mammals_df.describe()

Unnamed: 0,weight,speed
count,107.0,107.0
mean,278.688178,46.208411
std,839.608269,26.716778
min,0.016,1.6
25%,1.7,22.5
50%,34.0,48.0
75%,142.5,65.0
max,6000.0,110.0


<hr style="border:0.5px solid grey">

<b>d. What is the the weight of the fastest animal?</b>
- <b>Answer</b>: 55

In [52]:
#get highest speed only
mammals_df.sort_values(['speed'], ascending=(False)).head(1)

Unnamed: 0,weight,speed,hoppers,specials
53,55.0,110.0,False,False


<hr style="border:0.5px solid grey">

<b>e. What is the overal percentage of specials?</b>
- <b>Answer</b>:
    - True: 9.3%
    - False: 90.6%

In [53]:
#how many special animals vs non-special are there
mammals_df['specials'].value_counts()

False    97
True     10
Name: specials, dtype: int64

In [54]:
#Alfred Style!
#return percentage of true vs false
mammals_df['specials'].value_counts('True')

False    0.906542
True     0.093458
Name: specials, dtype: float64

In [55]:
#what is the percent of special
percent_specials = (mammals_df['specials'].value_counts())/ len(mammals_df)

#take a look
percent_specials

False    0.906542
True     0.093458
Name: specials, dtype: float64

<hr style="border:0.5px solid grey">

<b>f. How many animals are hoppers that are above the median speed? What percentage is this?

- <b>Answer</b>: 6.54%

In [56]:
#first, find out how many are hoppers
mammals_df['hoppers'].value_counts()

False    96
True     11
Name: hoppers, dtype: int64

In [57]:
#second, what is the median speed
median = mammals_df['speed'].median()
median

48.0

In [58]:
#can also find the median like this:
mammals_df[['speed']].describe()

Unnamed: 0,speed
count,107.0
mean,46.208411
std,26.716778
min,1.6
25%,22.5
50%,48.0
75%,65.0
max,110.0


In [59]:
#hoppers and speed greater than median
mammals_df[(mammals_df.hoppers == True) & (mammals_df.speed > mammals_df.speed.median())]

Unnamed: 0,weight,speed,hoppers,specials
96,4.6,64.0,True,False
97,4.4,72.0,True,False
98,4.0,72.0,True,False
99,3.5,56.0,True,False
100,2.0,64.0,True,False
101,1.9,56.0,True,False
102,1.5,50.0,True,False


In [60]:
#count what percent are BOTH hoppers AND faster then median speed
round(len(mammals_df[(mammals_df.hoppers == True) & 
      (mammals_df.speed > mammals_df.speed.median())]) / len(mammals_df) *100, 2)

6.54