Step 3 - Climate Analysis and Exploration

You are now ready to use Python and SQLAlchemy to do basic climate analysis and data exploration on your new weather station tables. All of the following analysis should be completed using SQLAlchemy ORM queries, Pandas, and Matplotlib.


Create a Jupyter Notebook file called climate_analysis.ipynb and use it to complete your climate analysis and data exporation.
Choose a start date and end date for your trip. Make sure that your vacation range is approximately 3-15 days total.
Use SQLAlchemy create_engine to connect to your sqlite database.
Use SQLAlchemy automap_base() to reflect your tables into classes and save a reference to those classes called Station and Measurement.

In [None]:
#imports dependencies
import pandas as pd
import seaborn
seaborn.set()
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base


In [None]:
#Connect to the database
engine=create_engine("sqlite:///hawaii.sqlite")

In [None]:
#Reading the tables from the database
Base=automap_base()
Base.prepare(engine, reflect=True)

In [None]:
#creating a session
from sqlalchemy.orm import sessionmaker
Session=sessionmaker(bind=engine)
session=Session()

In [None]:
#Reading the tables
Station=Base.classes.station
Measurement=Base.classes.measurement

Precipitation Analysis


Design a query to retrieve the last 12 months of precipitation data.
Select only the date and prcp values.
Load the query results into a Pandas DataFrame and set the index to the date column.
Plot the results using the DataFrame plot method.




In [None]:
#Query the database
results=engine.execute('SELECT date,prcp FROM measurement').fetchall()
results

In [None]:
prcp_results=session.query(Measurement.date,Measurement.prcp).\
    filter(Measurement.date > '2017-05-03').\
    order_by(Measurement.date).all()

In [None]:
prcp_df=pd.DataFrame(prcp_results)
prcp_df.head()

In [None]:
new_prcp_df=prcp_df.set_index("date")
new_prcp_df.head()

In [None]:
new_prcp_df.plot()
plt.xticks(rotation=45)
plt.xlabel("date")
plt.legend()
plt.show()

In [None]:
#Summary Statistics of percipation
new_prcp_df.describe()

Station Analysis


Design a query to calculate the total number of stations.

Design a query to find the most active stations.


List the stations and observation counts in descending order
Which station has the highest number of observations?



Design a query to retrieve the last 12 months of temperature observation data (tobs).


Filter by the station with the highest number of observations.
Plot the results as a histogram with bins=12.

In [None]:
from sqlalchemy import func
session.query(func.count(Station.station_id)).all()

In [None]:
#The most active stations
station_data=engine.execute("""select station,count(measurement_id ) as 'measurement'
from measurement
group by station
order by measurement desc""").fetchall()
station_data

In [None]:
best_station=station_data[0][0]

In [None]:
#temperature data for one year for the best station
tobs=session.query(Measurement.date,Measurement.tobs).\
    filter(Measurement.date > '2017-05-03').\
    filter(Measurement.station==best_station).\
    order_by(Measurement.date).all()
#To show only the first 10 dates
tobs[:10]

In [None]:
tobs_df=pd.DataFrame(tobs)
tobs_df.head()

In [None]:
#plot the histogram
tobs_df.hist("tobs", bins=12)
plt.xlabel("temperature")
plt.ylabel("frequency")
plt.show()

Temperature Analysis


Write a function called calc_temps that will accept a start date and end date in the format %Y-%m-%d and return the minimum, average, and maximum temperatures for that range of dates.
Use the calc_temps function to calculate the min, avg, and max temperatures for your trip using the matching dates from the previous year (i.e. use "2017-01-01" if your trip start date was "2018-01-01")

Plot the min, avg, and max temperature from your previous query as a bar chart.


Use the average temperature as the bar height.
Use the peak-to-peak (tmax-tmin) value as the y error bar (yerr).

In [None]:
def calc_temps(start, end):
    temp=session.query(Measurement.date,Measurement.tobs).\
    filter(Measurement.date >= start).\
    filter(Measurement.date<=end).\
    all()
    temp_df=pd.DataFrame(temp)
    return temp_df.tobs.min(),temp_df.tobs.mean(),temp_df.tobs.max()

     
low, mid, hi=calc_temps("2017-05-03", "2017-05-10")
    

In [None]:
plt.bar(0,mid,yerr=hi-low)
plt.axis([-1,1,0,100])
plt.xticks([])
plt.ylabel("Temp (F)")
plt.title("Trip Avg Temp")
plt.show()