# Bonus: Temperature Analysis I

In [1]:
import pandas as pd
from datetime import datetime as dt

In [2]:
# "tobs" is "temperature observations"
df = pd.read_csv('Resources/hawaii_measurements.csv')
df.head()

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.0,63
2,USC00519397,2010-01-03,0.0,74
3,USC00519397,2010-01-04,0.0,76
4,USC00519397,2010-01-06,,73


In [3]:
# Convert the date column format from string to datetime
df['date'] = pd.to_datetime(df['date'])
df.head()
df.dtypes

station            object
date       datetime64[ns]
prcp              float64
tobs                int64
dtype: object

In [4]:
# Set the date column as the DataFrame index
df1 = df.set_index('date')
df1

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.00,63
2010-01-03,USC00519397,0.00,74
2010-01-04,USC00519397,0.00,76
2010-01-06,USC00519397,,73
...,...,...,...
2017-08-19,USC00516128,0.09,71
2017-08-20,USC00516128,,78
2017-08-21,USC00516128,0.56,76
2017-08-22,USC00516128,0.50,76


In [5]:
# Drop the date column
df2 = df1.reset_index(drop=True)
df2

Unnamed: 0,station,prcp,tobs
0,USC00519397,0.08,65
1,USC00519397,0.00,63
2,USC00519397,0.00,74
3,USC00519397,0.00,76
4,USC00519397,,73
...,...,...,...
19545,USC00516128,0.09,71
19546,USC00516128,,78
19547,USC00516128,0.56,76
19548,USC00516128,0.50,76


### Compare June and December data across all years 

In [6]:
from scipy import stats

In [7]:
# Filter data for desired months

df_jun = df1.loc[df1.index.month == 6 ] 
df_jun.head()
df_dec = df1.loc[df1.index.month == 12 ] 
df_dec.head()

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-12-01,USC00519397,0.04,76
2010-12-03,USC00519397,0.0,74
2010-12-04,USC00519397,0.0,74
2010-12-06,USC00519397,0.0,64
2010-12-07,USC00519397,0.0,64


In [8]:
# Identify the average temperature for June
df_jun_avg = df_jun['tobs'].mean()
df_jun_avg

74.94411764705882

In [9]:
# Identify the average temperature for December
df_dec_avg = df_dec['tobs'].mean()
df_dec_avg

71.04152933421226

In [10]:
# Create collections of temperature data
ctemp_jun = df_jun['tobs']
ctemp_jun
ctemp_dec = df_dec['tobs']
ctemp_dec

date
2010-12-01    76
2010-12-03    74
2010-12-04    74
2010-12-06    64
2010-12-07    64
              ..
2016-12-27    71
2016-12-28    71
2016-12-29    69
2016-12-30    65
2016-12-31    65
Name: tobs, Length: 1517, dtype: int64

In [15]:
# Run paired t-test
print(f"P-value using paired t-test: {stats.ttest_ind(ctemp_jun, ctemp_dec)}")

P-value using paired t-test: Ttest_indResult(statistic=31.60372399000329, pvalue=3.9025129038616655e-191)


### Analysis

A paired t-test (also known as a dependent or correlated t-test) is a statistical test that compares the averages/means and standard deviations of two related groups to determine if there is a significant difference between the two groups.

Two-sample t-tests are statistical tests used to compare the means of two populations. Also known as Student’s t-tests, their results are used to determine if there is a significant difference between the mean of two samples that is unlikely to be due to sampling error or random chance.

A p-value of 0.05 or lower is generally considered statistically significant. so our pvalue is pvalue=3.9025129038616655e-191 is minuscule, that makes it Statistically significant.

The mean temperature difference between the June and December is a mere 3.9 degrees Fahrenheit.So while the difference is meaningful, the actual difference is not thereby indicating that you can travel to Hawaii and enjoy 70 degree temperature year-round!