# Introduction to Jupyter Notebook (.ipynb) 
#### Rethink Excel Workbooks (.xlsb)


### Why Jupyter?  Rich, Modern, Beautiful 
<UL>- Allows you to easily share your code, data, plots, and explanation in a single notebook </UL>
<UL>- Publishing is flexible: PDF, HTML, ipynb, dashboards, slides, and more. </UL>

In [1]:
print ("hello world")

hello world


#### We'll use Python and some of it's packages/libraries :

<UL> - Pandas: import data via a url and create a dataframe to easily handle data for analysis and graphing. </UL>
<UL> - NumPy: a package for scientific computing with tools for algebra, random number generation, integrating with databases, and managing data. </UL>
<UL> - SciPy: a Python-based ecosystem of packages for math, science, and engineering.</UL>
<UL>- Plotly: a graphing library for making interactive, publication-quality graphs. More here: https://plot.ly/python.<UL>

In [2]:
import pandas as pd
import numpy as np
import scipy as sp
#!pip install plotly
import plotly
plotly.tools.set_credentials_file(username='rgoyal30', api_key='2Y4VA3MtZKchj2SQyCrI')
import plotly.plotly as py
from plotly.tools import FigureFactory as ff


In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

table = ff.create_table(df)

df.head(5)
#py.iplot(table, filename='jupyter-table1')

Unnamed: 0,School,Women,Men,Gap
0,MIT,94,152,58
1,Stanford,96,151,55
2,Harvard,112,165,53
3,U.Penn,92,141,49
4,Princeton,90,137,47


In [4]:
df.describe()

Unnamed: 0,Women,Men,Gap
count,21.0,21.0,21.0
mean,81.095238,113.52381,32.428571
std,12.813683,25.705289,14.137084
min,62.0,78.0,9.0
25%,72.0,92.0,22.0
50%,79.0,114.0,31.0
75%,92.0,131.0,40.0
max,112.0,165.0,58.0


Use dataframe.column_title to index the dataframe:



In [5]:
schools = df.School
schools[0]

'MIT'

Most pandas functions also work on an entire dataframe. For example, calling std() calculates the standard deviation for each column.


In [6]:
df.std()

Women    12.813683
Men      25.705289
Gap      14.137084
dtype: float64

### Plotting Inline.

In [7]:
from plotly.graph_objs import *

data = [Bar(x=df.School,
            y=df.Gap)]

py.iplot(data, filename='jupyter-basic_bar')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~rgoyal30/0 or inside your plot.ly account where it is named 'jupyter-basic_bar'


Plotting multiple traces and styling the chart with custom colors and titles is simple with Plotly syntax. Additionally, you can control the privacy with sharing set to public, private, or secret.

In [8]:
import plotly.plotly as py
import plotly.graph_objs as go

trace_women = go.Bar(x=df.School,
                  y=df.Women,
                  name='Women',
                  marker=dict(color='#ffcdd2'))

trace_men = go.Bar(x=df.School,
                y=df.Men,
                name='Men',
                marker=dict(color='#A2D5F2'))

trace_gap = go.Bar(x=df.School,
                y=df.Gap,
                name='Gap',
                marker=dict(color='#59606D'))

data = [trace_women, trace_men, trace_gap]

layout = go.Layout(title="Average Earnings for Graduates",
                xaxis=dict(title='School'),
                yaxis=dict(title='Salary (in thousands)'))

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='jupyter-styled_bar')
#py.iplot(fig, sharing='private', filename='jupyter-styled_bar')

#### Plotting Interactive Maps

Now we have interactive charts displayed in our notebook. Hover on the chart to see the values for each bar, click and drag to zoom into a specific section or click on the legend to hide/show a trace.

Plotting Interactive Maps
Plotly is now integrated with Mapbox. In this example we'll plot lattitude and longitude data of nuclear waste sites. To plot on Mapbox maps with Plotly you'll need a Mapbox account and a Mapbox Access Token which you can add to your Plotly settings.

In [9]:
#@hidden_cell
import plotly.plotly as py
import plotly.graph_objs as go

import pandas as pd

mapbox_access_token = 'pk.eyJ1IjoicmdveWFsMzAiLCJhIjoiY2pneHFreHNjMWozMDJxbXphdjZtNmptayJ9.AtEJqX-XsOtKwKXmggdu4Q'

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/Nuclear%20Waste%20Sites%20on%20American%20Campuses.csv')
site_lat = df.lat
site_lon = df.lon
locations_name = df.text

data = [
    go.Scattermapbox(
        lat=site_lat,
        lon=site_lon,
        mode='markers',
        marker=dict(
            size=17,
            color='rgb(255, 0, 0)',
            opacity=0.7
        ),
        text=locations_name,
        hoverinfo='text'
    ),
    go.Scattermapbox(
        lat=site_lat,
        lon=site_lon,
        mode='markers',
        marker=dict(
            size=8,
            color='rgb(242, 177, 172)',
            opacity=0.7
        ),
        text=locations_name,
        hoverinfo='text'
    )]


layout = go.Layout(
    title='Nuclear Waste Sites on Campus',
    autosize=True,
    hovermode='closest',
    showlegend=False,
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=38,
            lon=-94
        ),
        pitch=0,
        zoom=3,
        style='light'
    ),
)

fig = dict(data=data, layout=layout)

py.iplot(fig, filename='jupyter-Nuclear Waste Sites on American Campuses')


### And much more 

<UL> 3D Plots, Animation Plots </UL>
<UL> Dashboards </UL>
<UL> 

## And Machine Learning, Deep Learning 

## Getting Started
Once you've installed the Notebook, you start from your terminal by calling $ jupyter notebook. This will open a browser on a localhost to the URL of your Notebooks, by default http://127.0.0.1:8888. Windows users need to open up their Command Prompt. You'll see a dashboard with all your Notebooks. You can launch your Notebooks from there. The Notebook has the advantage of looking the same when you're coding and publishing. You just have all the options to move code, run cells, change kernels, and use Markdown when you're running a NB.

Helpful Commands
- Tab Completion: Jupyter supports tab completion! You can type object_name.<TAB> to view an object’s attributes. For tips on cell magics, running Notebooks, and exploring objects, check out the Jupyter docs. 
- Help: provides an introduction and overview of features.


#### Languages
The bulk of this tutorial discusses executing python code in Jupyter notebooks. You can also use Jupyter notebooks to execute R code. Skip down to the [R section] for more information on using IRkernel with Jupyter notebooks and graphing examples.

#### Package Management
When installing packages in Jupyter, you either need to install the package in your actual shell, or run the ! prefix, e.g.:

!pip install packagename

#### You may want to reload submodules if you've edited the code in one. IPython comes with automatic reloading magic. You can reload all changed modules before executing a new line.

%load_ext autoreload
%autoreload 2

You probably already know the drill, but these principles include the following:

Try to provide comments and documentation to your code. They might be a great help to others!
Also consider a consistent naming scheme, code grouping, limit your line length, ...  
Don't be afraid to refactor when or if necessary

In addition to these general best practices for programming, you could also consider the following tips to make your notebooks the best source for other users to learn:

Don't forget to name your notebook documents!
Try to keep the cells of your notebook simple: don't exceed the width of your cell and make sure that you don't put too many related functions in one cell.  
If possible, import your packages in the first code cell of your notebook, and
Display the graphics inline. The magic command %matplotlib inline will definitely come in handy to suppress the output of the function on a final line. Don't forget to add a semicolon to suppress the output and to just give back the plot itself. 
Sometimes, your notebook can become quite code-heavy or maybe you just want to have a cleaner report. In those cases, you could consider hiding some of this code. You can already hide some of the code by using magic commands such as %run to execute a whole Python script as if it was in a notebook cell. However, this might not help you to the extent that you expect. In such cases, you can always check out this tutorial on optional code visibility or consider toggling your notebook's code cells.


## Jupyter Notebooks for Data Science Teams: Best Practices

### 1) Use two types of notebooks for a data science project, namely, a lab notebook and a deliverable notebook. 
####        The difference between the two (besides the obvious that you can infer from the names that are given to the notebooks) is the fact that individuals control the lab notebook, while the deliverable notebook is controlled by the whole data science team, 
### 2) Use some type of versioning control (Git, Github, ...). 
### 3) Don't forget to commit also the HTML file if your version control system lacks rendering capabilities, and 
### 4) Use explicit rules on the naming of your documents.  

Introduction
It happened few years back. After working on SAS for more than 5 years, I decided to move out of my comfort zone. Being a data scientist, my hunt for other useful tools was ON! Fortunately, it didn’t take me long to decide, Python was my appetizer.

I always had a inclination towards coding. This was the time to do what I really loved. Code. Turned out, coding was so easy!

I learned basics of Python within a week. And, since then, I’ve not only explored this language to the depth, but also have helped many other to learn this language. Python was originally a general purpose language. But, over the years, with strong community support, this language got dedicated library for data analysis and predictive modeling.

Due to lack of resource on python for data science, I decided to create this tutorial to help many others to learn python faster. In this tutorial, we will take bite sized information about how to use Python for Data Analysis, chew it till we are comfortable and practice it at our own end.
