# Basic Statistics I:  Percents

A **percentage** is a number or ratio expressed as a fraction of 100. We'll do some examples together to learn how to calculate percentages.

**Example 1:** For a basket of 18 fruits, there are 5 apples, 3 bananas, 6 peaches, and 4 oranges.

What percentage of fruits are apples? 

In [1]:
# Calculate percentage for apples
5/18*100

27.77777777777778

What percentage of fruits are oranges **and** peaches? 

In [2]:
# Calculate percentage for oranges and peaches
(4+6)/18*100

55.55555555555556

**Example 2:**  Let's learn to calculate percentages by using real world data. We will work with a dataset of Ames, Iowa housing prices.

In [6]:
# Import the fetch_openml method 
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True, parser="auto")

In [7]:
# Import pandas, so that we can work with the data frame version of the Ames housing data
import pandas as pd

In [10]:
# Load the dataset of house prices in Ames, and convert to
# a data frame format so it's easier to view and process
ames_df = pd.DataFrame(housing['data'], columns = housing['feature_names'])
ames_df['SalePrice'] = housing.target
ames_df

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


The `SaleCondition` column lists the condition of the house sale:


*   `Normal`: Normal Sale     

* `Abnorml`: Abnormal Sale -  trade, foreclosure, short sale

* `AdjLand`: Adjoining Land Purchase

* `Alloca`: Allocation - two linked properties with separate deeds, typically condo with a garage unit

* `Family`: Sale between family members   

* `Partial`: Home was not completed when last assessed (associated with New Homes)


What percentage of the houses were sold normally? We'll see how to do this using the query method AND using boolean indexing.

In [12]:
# Determine number of tracts that bound the Charles River two ways:
# (1) with the query function
num_normal = len(ames_df.query("SaleCondition == 'Normal'"))
num_normal

1198

In [13]:
# (2) using boolean indexing
num_normal = sum(ames_df["SaleCondition"] == "Normal")
num_normal

1198

How do these two methods give the same answer?

In [14]:
# Determine the total number of houses in the dataset
total_num = len(ames_df)

# Now calculate the percentage of houses sold normally.
num_normal/total_num*100

82.05479452054794

What percentage of houses have a price less than $200,000?

In [15]:
# Determine number of houses that cost less than $200,000
num_cost_less_200k = sum(ames_df["SalePrice"] < 200000)

# Calculate the percentage of houses that cost less than $200k.
num_cost_less_200k/total_num*100

70.2054794520548

What percentage of houses have a sale price **between** $200,000 and $500,000?

In [17]:
# Make an array of booleans with cost greater than $200,000 AND less than $500,000
between_200k_and_500k = (ames_df["SalePrice"] > 200000) & (ames_df["SalePrice"] < 500000)

# Determine number of houses that cost between $200,000 and $500,000
num_between_200k_and_500k = sum(between_200k_and_500k)

# Calculate the percentage of houses between $200,000 and $500,000
num_between_200k_and_500k/total_num*100

28.63013698630137

Good work! You just learned about how to calculate percentages in Python!