# Jupyter Notebook Demo
Here I will show a few of the basic data visualation and minipulation tools available with Python Pandas.

Here is a Pandas [cheatsheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)

I grabbed one year of electricity data for demonstration purposes. **production** means solar production during the given interval in watthours. **consumption** means the electricity consumed by the house in watthours. The goal will be to simply examine the data more closely. Sources:
* [Production](http://www.soda-is.com/eng/services/services_radiation_free_eng.php)
* [Consumption](http://www.smartgridaustralia.com.au/)

In [None]:
import pandas as pd

## Read data from csv
By default, notebooks display about 30 rows from the start and 30 rows from the end of the tabulated data.

In [None]:
df = pd.read_csv('data.csv', index_col=0, parse_dates=[0])
df

We can use `DataFrame.head(n)` and `DataFrame.tail(n)` to display the `n` rows at the beginning or end. Note that when you use `print()` you don't get the nice table output.

In [None]:
print("Head of df")
print(df.head(10))
print("\nTail of df")
df.tail(10)

We can see that we have one year of data in 30 minute incriments from the year 2013.

# Plot the data
Jupyter supports two options for plotting with matplotlib, *inline* (static) and *notebook* (interactive).

In [None]:
%matplotlib inline
df.plot()

Crap. Something looks funny with the production data. What's going on?! Let's use an interactive plot so we can zoom in.

In [None]:
%matplotlib notebook
df.plot(subplots=True) #figsize=(13,8)
df.describe()

Looks like some missing data! Let's look at production data with the value of -999.

In [None]:
df['production'][df['production'] == -999].describe()

Ruhroh. 154 missing data points! And now the eternal question of how to handle missing data. Set to NaN? Set to 0? Throw it out completely? I'll set to 0 for now.

In [None]:
df.loc[df.production == -999, 'production'] = 0
df.describe()

Be careful! This edits the variable `df` in the kernel, so all cells now use this updated `df`. Rerun the plotting cells above to see the difference.

## Save this notebook!
*File* --> *Download As*