* Anaconda
  * Out of the box modules
  * Environments
  * Spyder
* Jupyter Project
  * Formerly IPython Notebooks
  * Better support for non python kernels
  * Navigation
  * Tips and tricks
* Pandas
  * Series
  * DataFrame
  * Input functions
  * Boolean Indexing
  * Groupby and Pivot
  * Merge
  * Map and Apply
* Bokeh
  * Interactive data visualizations
  * Python -> Javascript
* Blaze
  * Data transfer
  * Querying

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from IPython.display import FileLink, display, HTML
import blaze as bz

## Jupyter Project

In [None]:
os.path

In [None]:
HTML("hello.txt")

In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo("dQw4w9WgXcQ",width=600, height=400,start=43,autoplay=True)

In [None]:
from IPython.core.display import Image
Image("rr.jpg")

In [None]:
! ls

In [None]:
! ping www.omahapython.org

In [None]:
% timeit range(1000)

In [None]:
url="https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages"
HTML(url)

## Pandas

### Why
* Data scrubbing and formatting through the power of python
* Data aggregation
* Data visualization thru matplotlib, seaborn, bokeh, etc
* Hooks to machine learning and other modules
* Combined with Jupyter, allows you to build a forgiving and dynamic workflow  

In [None]:
#reading in a csv file
df = pd.read_csv("bike_data_sub.csv",parse_dates=["start_date","end_date"])

In [None]:
from sqlalchemy import create_engine
engine = create_engine("postgresql://tbmh1:postgres@localhost:5432/bike_data")
df.to_sql("bike_trips",engine, schema="public", if_exists="replace",index=False)

In [None]:
#Series and DataFrames
df.head(5)

In [None]:
#basic stats for every numeric value in your dataframe.  Whether you want it or not :-)
df.describe()

In [None]:
#even more stuff
df.info()

In [None]:
#filtering
filter = (df['start_station'] == 'San Jose City Hall')
df[filter].head()

In [None]:
#groupby and sorting
#df[['Bike #','Duration']].groupby('Bike #').count().sort('Duration',ascending=False).head()
df[['bike_num','duration']].groupby('bike_num').agg({'duration':['mean','sum']}).sort([('duration','sum')]).head()

In [None]:
pd.pivot_table(df,'duration','subscription_type','start_station',aggfunc=len,fill_value=0)

## pivot_table() is just the tip of the iceberg when it comes to reshaping in Pandas
http://pandas.pydata.org/pandas-docs/stable/reshaping.html

In [None]:
#merging two dataframes
dfRev = pd.read_csv('bike_trip_rev')
dfRev.head()

In [None]:
pd.merge(df,dfRev).head()

In [None]:
#Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
df['start_dow'] = df['start_date'].apply(lambda x : x.weekday())
df['end_dow'] = df['end_date'].apply(lambda x : x.weekday())
df.head()

In [None]:
def getHour(x):    
    return x.hour

In [None]:
df['start_hour'] = df['start_date'].apply(getHour)
df['end_hour'] = df['end_date'].apply(getHour)
df.head()

In [None]:
#dict keys = integers representing days and values = days of the week
d = {0 : 'Monday', 1 : 'Tuesday', 2 : 'Wednesday', 3 : 'Thursday', 4 : 'Friday',  5 : 'Saturday', 6 : 'Sunday'}
d

In [None]:
#map the values to our dataframe
df['start_dow_name'] = df['start_dow'].map(d)
df.head()

## Pandas Resoures
###Using SQL as a comparison
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/

http://www.gregreda.com/2013/10/26/using-pandas-on-the-movielens-dataset/

###Videos
https://www.youtube.com/watch?v=MxRMXhjXZos

https://www.youtube.com/watch?v=w26x-z-BdWQ

https://www.youtube.com/watch?v=rEalbu8UGeo

###Books
http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?ie=UTF8&qid=1414785088&sr=8-1&keywords=python+for+data+analysis


## Bokeh

In [None]:
#pandas integration with matplotlib
df[["Start DOW Name","Duration"]].groupby("Start DOW Name").count().plot(kind="barh");

## Bokeh Resources

http://bokeh.pydata.org/en/latest/

https://www.youtube.com/watch?v=O5OvOLK-xqQ

https://www.youtube.com/watch?v=S11GfFlQgtw


## Blaze

In [None]:
bz.into("postgresql:///bike_data::bike_trips","bike_data_sub_no_headers.csv")

In [None]:
bike_db = bz.Data('postgresql:///bike_data')
bz.by(bike_db.bike_trips.start_terminal, count=bike_db.bike_trips.trip_id.count())

In [None]:
bike_csv = bz.Data("bike_data_sub.csv")
bz.by(bike_csv.start_terminal, count=bike_csv.trip_id.count())

In [None]:
bdf = bz.Data(df)
bz.by(bdf.start_terminal,count=bdf.trip_id.count())

##Blaze Resources

https://www.youtube.com/watch?v=x0svOPW6DdE

http://blaze.pydata.org/en/latest/