# Introduction to Statistics Part II


Now that we have learned how to use the mean and median, we'll talk about some more advanced

In [20]:
# import pandas and numpy

import numpy as np
import pandas as pd

## Count Statistics

*Count variables* are variables which represent the number of events that occur of a specific category. This can be anything, like the number of dogs in a park or how many people went to a concert. For both of these examples, each of the counts must be *whole numbers*. 

Run the cell below to load a listing of the weather in Detroit for every day since 1950:

In [2]:
data_table = pd.read_csv( 'detroit_weather.csv' ) # Data from Mathematica WeatherData, 2019

Take a look at the contents of `data_table`:

In [3]:
# Print out data_table to look at its contents

data_table

Unnamed: 0.1,Unnamed: 0,YEAR,MONTH,DAY,Rain,Snow
0,0,1950,1,1,True,False
1,1,1950,1,2,True,False
2,2,1950,1,3,True,False
3,3,1950,1,4,True,True
4,4,1950,1,5,False,False
5,5,1950,1,6,False,True
6,6,1950,1,7,False,True
7,7,1950,1,8,False,True
8,8,1950,1,9,False,False
9,9,1950,1,10,True,False


This table contains if it was snowing and if it was raining for each day in Detroit since 1950. We will use this as an example dataset.

In [19]:
# Lookup the weather for May 1, 2019:

data_table.query( 'YEAR==2019 and MONTH==5 and DAY==1' )

# or...

data_table.query( 'YEAR==2019' ).query( 'MONTH==5' ).query( 'DAY==1' )

Unnamed: 0.1,Unnamed: 0,YEAR,MONTH,DAY,Rain,Snow
25313,25313,2019,5,1,True,False


As we can see, it was raining, but not snowing that day!

Now, let's create some count statistics!

In [5]:
# Import the Counter class from collections to help us do the counting

from collections import Counter

`Counter` gives us an easy way to count any `list` for its contents:

In [6]:
# Create a list and count it using Counter

Counter( [1,1,1,1,1,2,2,2,2,2] )

Counter({1: 5, 2: 5})

Now, let's count the weather data!

In [7]:
# Count how many days it has snowed in Detroit since 1950:

snow_days = Counter( data_table["Snow"] )
snow_days

Counter({False: 21079, True: 4235})

It looks like it has snowed 4,235 days in that time period, that is a lot!

In [8]:
# Count how many days *per month* it has snowed since 1950:

snow_days_by_month = Counter( data_table.query( 'Snow' )["MONTH"] )
snow_days_by_month

Counter({1: 1110,
         2: 903,
         3: 648,
         4: 227,
         11: 369,
         12: 933,
         5: 10,
         10: 34,
         8: 1})

In [9]:
# How many days *per month* has it NOT snowed since 1950?

not_snow_days_by_month = Counter( data_table.query( 'not Snow' )["MONTH"] )
not_snow_days_by_month

Counter({1: 1060,
         2: 1074,
         3: 1522,
         4: 1873,
         5: 2130,
         6: 2070,
         7: 2139,
         8: 2137,
         9: 2069,
         10: 2105,
         11: 1701,
         12: 1199})

In [10]:
# How many days TOTAL have there been in each month since 1950?

days_by_month = Counter( data_table["MONTH"] )
days_by_month

Counter({1: 2170,
         2: 1977,
         3: 2170,
         4: 2100,
         5: 2140,
         6: 2070,
         7: 2139,
         8: 2138,
         9: 2069,
         10: 2139,
         11: 2070,
         12: 2132})

## Percentages

A *percentage* is a number between 0 and 1 which represents the fraction of a given variable are in a given condition. It can be calculated by dividing the number of items in a category by the total number of items in a set.

We can use all of the counts that we created above to calculate percentages describing the weather in Detroit!

In [17]:
# Find the percentage of days in January where it snowed:

snow_days_by_month[1] / days_by_month[1]

0.511520737327189

A percentage of 51% means that, on average, it snowed 1 day for every 2 days each January (1/2 = 50%).

In [18]:
# Now do the same for June:

snow_days_by_month[6] / days_by_month[6]

0.0

It shouldn't come as much suprise that it doesn't snow much in summer!

In this lesson you learned how to:
     - Calculate count statistics using data from `pandas`
     - Calculate percentages from count statistics
     
Now, lets continue to practice with your partner!