* Anaconda
  * Out of the box modules
  * Environments
  * Spyder
* Jupyter Project
  * Formerly IPython Notebooks
  * Better support for non python kernels
  * Navigation
  * Tips and tricks
* Pandas
  * Series
  * DataFrame
  * Input functions
  * Boolean Indexing
  * Groupby and Pivot
  * Merge
  * Map and Apply
* Bokeh
  * Interactive data visualizations
  * Python -> Javascript
* Blaze
  * Data transfer
  * Querying

In [None]:
import os
from IPython.display import FileLink, display, HTML
from IPython.lib.display import YouTubeVideo
from IPython.core.display import Image

# Jupyter Project

* Markdown
    * bullets

In [None]:
#running cells (ctrl-enter and shift-enter)
print (os.path)

In [None]:
#output
os.path 

In [None]:
#code completion
HTML("hello.txt")

In [None]:
#displaying a video
YouTubeVideo("dQw4w9WgXcQ",width=600, height=400,start=43,autoplay=True)

In [None]:
#displaying an image
Image("rr.jpg")

In [None]:
#bash commands
! ls

In [None]:
#magic functions
% timeit range(1000) #and so many more

In [None]:
#open web pages
url="https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages"
HTML(url)

### Misc Navigation

# Pandas

* Inspired by dataframes in R 
* Built on numpy
* In memory
* Active development


### Why
* Data scrubbing and formatting through the power of python
* Data aggregation
* Data visualization thru matplotlib, seaborn, bokeh, etc
* Hooks to machine learning and other modules
* Combined with Jupyter, allows you to build a forgiving and dynamic workflow  

In [None]:
import pandas as pd

In [None]:
#reading in a csv file
df = pd.read_csv("bike_data_sub.csv",parse_dates=["start_date","end_date"])

In [None]:
#Series and DataFrames
df.head(5)

In [None]:
#rename columns post import
d = {"start_date" : "start_date_time", "end_date" : "end_date_time"}
df.rename(columns=d, inplace = True)
df.head()

In [None]:
#basic stats for every numeric value in your dataframe.  Whether you want it or not :-)
df.describe()

In [None]:
#even more stuff
df.info()

In [None]:
#filtering
filter = (df['start_station'] == 'San Jose City Hall')
df[filter].head()

In [None]:
#groupby and sorting
#df[['Bike #','Duration']].groupby('Bike #').count().sort('Duration',ascending=False).head()
df[['bike_num','duration']].groupby('bike_num').agg({'duration':['mean','sum']}).sort([('duration','sum')]).head()

In [None]:
pd.pivot_table(df,'duration','subscription_type','start_station',aggfunc=len,fill_value=0)

In [None]:
dfLatLongs = pd.read_csv("latlongs.csv")
dfLatLongs.head()

In [None]:
#merging
df = pd.merge(df,dfLatLongs,left_on="start_terminal",right_on="terminal", how="left")
df.head()

###Functional aspect of Pandas

In [None]:
#plain old function
def getDate(x):
    return x.date()

In [None]:
#creating a new column and updating it to the date of a datetime.  WITHOUT a for loop
df["start_date"] = df["start_date_time"].apply(getDate)
df[["start_date_time","start_date"]].head()

In [None]:
#Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
df["start_dow"] = df["start_date_time"].apply(lambda x : x.weekday())
df[["start_date_time","start_dow"]].head()

In [None]:
#dict keys = integers representing days and values = days of the week
d = {0 : 'Monday', 1 : 'Tuesday', 2 : 'Wednesday', 3 : 'Thursday', 4 : 'Friday',  5 : 'Saturday', 6 : 'Sunday'}
d

In [None]:
#map the values to our dataframe
df['start_dow_name'] = df['start_dow'].map(d)
df[["start_dow","start_dow_name"]].head()

## What didn't we cover
* Indexes
* Time Series
* In addition read_csv() there are many other ways to get data in (and out)
* In addition to pivot_table() check out other reshaping methods
* In addtion to merge check out join and concat
* Panels (multiple related dataframes)

## Pandas Resoures
###Using SQL as a comparison
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/

http://www.gregreda.com/2013/10/26/using-pandas-on-the-movielens-dataset/

###Videos
https://www.youtube.com/watch?v=MxRMXhjXZos

https://www.youtube.com/watch?v=w26x-z-BdWQ

https://www.youtube.com/watch?v=rEalbu8UGeo

###Books
http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?ie=UTF8&qid=1414785088&sr=8-1&keywords=python+for+data+analysis


## Bokeh

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from bokeh.charts import Bar, Line, TimeSeries 
from bokeh.plotting import show, output_notebook, figure, output_file, ColumnDataSource
from bokeh.models import Text, Rect, Triangle, Circle, HoverTool, Plot, Range1d
output_notebook()

In [None]:
#housekeeping
d = {"lat" : "start_lat", "long" : "start_long"}
df.rename(columns=d, inplace = True)

In [None]:
dow = df[["start_dow_name","duration"]].groupby("start_dow_name").count()

In [None]:
#pandas integration with matplotlib is pretty good
dow.plot(kind="bar");

In [None]:
#basic bokeh plot
p = Bar(dow)
show(p)

In [None]:
#prepping some data by grouping and cutting
grp = df[["start_terminal","start_station","trip_id","start_long","start_lat"]].groupby(["start_terminal","start_station","start_long","start_lat"]).count().reset_index()
grp["cuts"] = pd.cut(grp.trip_id,4,labels=[10,20,30,40]).astype("int")
grp.head()

In [None]:
TOOLS='box_zoom,pan,box_select,crosshair,resize,reset,hover'

p = figure(plot_width=800, plot_height=600, title=None, tools=TOOLS)

#output_file("bike.html")

source = ColumnDataSource(data=dict(x=grp["start_long"],y=grp["start_lat"],id=grp["start_terminal"],
                                    name=grp["start_station"],trips=grp["trip_id"]))

p.circle(grp["start_long"], grp["start_lat"],size=grp["cuts"] + 20, color="blue", source=source)
hover = p.select(dict(type=HoverTool))
hover.tooltips = [("(x,y)", "($x, $y)"),("id", "@id"),("name", "@name"),("trips", "@trips")]

show(p)

###bokeh gallery
http://bokeh.pydata.org/en/latest/docs/user_guide/interaction.html

http://bokeh.pydata.org/en/latest/docs/gallery.html#static-examples

## Bokeh Resources

http://bokeh.pydata.org/en/latest/

https://www.youtube.com/watch?v=O5OvOLK-xqQ

https://www.youtube.com/watch?v=S11GfFlQgtw


## Blaze

http://blaze.pydata.org/en/latest/_static/presentations/blaze.html#/0/4

In [None]:
import blaze as bz

In [None]:
#loading a csv into a postgres table using the postgres Copy command 
bz.into("postgresql:///bike_data::bike_trips","bike_data_sub_no_headers.csv")

In [None]:
#doing a "groupby" using the postgres engine
bike_db = bz.Data('postgresql:///bike_data')
bz.by(bike_db.bike_trips.start_terminal, count=bike_db.bike_trips.trip_id.count())

In [None]:
#whats happening behind the scenes
print (bz.compute(bz.by(bike_db.bike_trips.start_terminal, 
                        count=bike_db.bike_trips.trip_id.count())))

In [None]:
#and with a csv file
bike_csv = bz.Data("bike_data_sub.csv")
bz.by(bike_csv.start_terminal, count=bike_csv.trip_id.count())

In [None]:
#and with a pandas dataframe
bdf = bz.Data(df)
bz.by(bdf.start_terminal,count=bdf.trip_id.count())

##Blaze Resources

https://www.youtube.com/watch?v=x0svOPW6DdE

http://blaze.pydata.org/en/latest/