## Bonus:  Other Recommended Analyses

### Temperature Analysis I
* Hawaii is reputed to enjoy mild weather all year.
* Is there a meaningful difference between the temperature in June and December?

In [1]:
# Import dependencies
import pandas as pd
from datetime import datetime as dt
from scipy import stats

In [2]:
# Read csv file into dataframe
df = pd.read_csv('hawaii_measurements.csv')
df.head()

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.0,63
2,USC00519397,2010-01-03,0.0,74
3,USC00519397,2010-01-04,0.0,76
4,USC00519397,2010-01-06,,73


In [3]:
# Check data types
df.date.dtype

dtype('O')

In [4]:
# Convert the date column format from string to datetime
df.date = pd.to_datetime(df.date, infer_datetime_format=True)

In [5]:
# Set the date column as the DataFrame index
df = df.set_index(df['date'])
df

Unnamed: 0_level_0,station,date,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-01-01,USC00519397,2010-01-01,0.08,65
2010-01-02,USC00519397,2010-01-02,0.00,63
2010-01-03,USC00519397,2010-01-03,0.00,74
2010-01-04,USC00519397,2010-01-04,0.00,76
2010-01-06,USC00519397,2010-01-06,,73
...,...,...,...,...
2017-08-19,USC00516128,2017-08-19,0.09,71
2017-08-20,USC00516128,2017-08-20,,78
2017-08-21,USC00516128,2017-08-21,0.56,76
2017-08-22,USC00516128,2017-08-22,0.50,76


In [6]:
# Drop the date column
df = df.drop(columns='date')
df

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.00,63
2010-01-03,USC00519397,0.00,74
2010-01-04,USC00519397,0.00,76
2010-01-06,USC00519397,,73
...,...,...,...
2017-08-19,USC00516128,0.09,71
2017-08-20,USC00516128,,78
2017-08-21,USC00516128,0.56,76
2017-08-22,USC00516128,0.50,76


### Compare June and December Dates
* Identify the average temperature for June and December at all stations across all available years in the dataset.

In [7]:
# Filter data for June
jun_data = df[df.index.month == 6]
jun_data

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-06-01,USC00519397,0.00,78
2010-06-02,USC00519397,0.01,76
2010-06-03,USC00519397,0.00,78
2010-06-04,USC00519397,0.00,76
2010-06-05,USC00519397,0.00,77
...,...,...,...
2017-06-26,USC00516128,0.02,79
2017-06-27,USC00516128,0.10,74
2017-06-28,USC00516128,0.02,74
2017-06-29,USC00516128,0.04,76


In [8]:
# Filter data for December 
dec_data = df[df.index.month == 12]
dec_data

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-12-01,USC00519397,0.04,76
2010-12-03,USC00519397,0.00,74
2010-12-04,USC00519397,0.00,74
2010-12-06,USC00519397,0.00,64
2010-12-07,USC00519397,0.00,64
...,...,...,...
2016-12-27,USC00516128,0.14,71
2016-12-28,USC00516128,0.14,71
2016-12-29,USC00516128,1.03,69
2016-12-30,USC00516128,2.37,65


In [9]:
# Calculate the June mean
jun_data.mean()

prcp     0.136360
tobs    74.944118
dtype: float64

In [10]:
# Calculate the December mean
dec_data.mean()

prcp     0.216819
tobs    71.041529
dtype: float64

In [11]:
# Create collections of temperature for June data
# tobs = temperature observations
jun_temp = jun_data.tobs
jun_temp

date
2010-06-01    78
2010-06-02    76
2010-06-03    78
2010-06-04    76
2010-06-05    77
              ..
2017-06-26    79
2017-06-27    74
2017-06-28    74
2017-06-29    76
2017-06-30    75
Name: tobs, Length: 1700, dtype: int64

In [12]:
# Create collections of temperature for December data
dec_temp = dec_data.tobs
dec_temp

date
2010-12-01    76
2010-12-03    74
2010-12-04    74
2010-12-06    64
2010-12-07    64
              ..
2016-12-27    71
2016-12-28    71
2016-12-29    69
2016-12-30    65
2016-12-31    65
Name: tobs, Length: 1517, dtype: int64

### Use the t-test to determine whether the difference in means, if any, is statistically significantly different.
* A paired t-test is designed to compare the means of the same group or item under two separate scenarios. 
* In a paired t-test, the variance is not assumed to be equal.
* An unpaired t-test compares the means of two independent or unrelated groups.
* In an unpaired t-test, the variance between groups is assumed to be equal. 

### A paired t-test was chosen for this analysis

In [13]:
# Run paired t-test; t-test compares two means
stats.ttest_ind(jun_temp, dec_temp)

Ttest_indResult(statistic=31.60372399000329, pvalue=3.9025129038616655e-191)

##  Analysis Result
* In June and December, the mean temperatures only differed by 3.9 degrees.  The t-test p-value result was 3.9025129038616655e-191.  This low p-value indicates there is no statistical significant difference.