# Getting around

- Use `File > New Python 3 notebook` to create a new notebook
- Move the notebook in a folder in Google Drive as needed

- Notebook is made of two basic types of cells:
  - Code  cells
  - Markdown cells 

Shortcuts (see `Tools > keyboard shortcuts`)

- __&#8984;/Ctrl+Enter__ : Run the focused cell
- __&#8984;/Ctrl+Shift+Enter__ : Run selection
- __&#8984;/Ctrl+Alt+N__ : Open scratch code cell



# Python basics

## Getting help

In [0]:
help('list')  #help for list
?list  #prints code documentation for lists

## Python as a calculator

In [0]:
2 * 15 + 10
x = 2 * 15 + 10
x

## Functions

In notebooks most python code consists in function calls on objects loaded from libraries.

The following declares a function f taking x as argument and invokes f with x = 2:

In [0]:
def f(x):
    return x * x

f(2)

## Types
- Strings
- Numeric
- Data structures (arrays, lists, sets...)

In [0]:
arr = [1, 2.0, "3", '3', True, False]  #a list
arr[2]  #list indices starting at 0!

# Libraries

[NumPy](http://www.numpy.org) and [Pandas](http://pandas.pydata.org) are the most used for data visualizations



In [0]:
import numpy as np  #import numpy
import pandas as pd  #import pandas

# Working with Data

1. Load dataset in libraries

In [0]:
from sklearn import datasets
iris = datasets.load_iris()

help(datasets)
list(iris.target_names)

df = pd.DataFrame(iris.data)  #put iris.data into a pandas dataframe

2. Import CSV using `UPLOAD` from the `Files` pane of the toolbar on your left &larr;

In [0]:
df = pd.read_csv("heart-decease-cleveland.csv")

# Pandas

Creating dataframes by code

In [0]:
df = pd.DataFrame({ 
    'A' : pd.Series([1,2,3,4,5,6]),
    'B' : pd.Timestamp('20130102'),
    'C' : pd.Series([1,2,3,4,5,6]),
    'D' : pd.Categorical(["one", "two", "three", "four", "five", "six"]),
    'E' : 'foo' })

In [0]:
df.head()  #display first 5 rows
df.tail()  #last 5 rows
df.columns  #list columns names

df['C'][1]  #access by df[column = 'C'][row = 1]
df['B'][2]  #access by df[column = 'B'][row = 2]

df.C  #access df column C -- this is the preferred way!
df.C[1]  #access by df[column = 'C'][row = 1]

#you can also access by index but that's rarely used
df.iloc[:, 1]  #access at location [rows = all, column = 1]
df.iloc[:, 3]  #access at location [rows = all, column = 3]

## Example printing table of descriptive statistics for numerical variables

In [0]:
df.describe()

## Example printing a frequency table

In [0]:
df = pd.read_csv("heart-decease-cleveland.csv")

In [0]:
df.describe()

In [0]:
df['bin'] = pd.cut(df['chol'], [80, 120, 160, 200, 240, 280, 320, 360, 400])
df = pd.value_counts(df['bin'])  #count values for df.bin
df = df.to_frame('count').reset_index()
df = df.sort_values('index')
df['rf'] = df['count'] / len(df)
df['cf'] = df['count'].cumsum()

df.columns = ['Chol mg/cl', 'No.', 'Rel. Freq.', 'Cum. Freq.']

df.reset_index(drop=True, inplace=True)

df