# Python Basics

In this notebook, we'll go over some basics of Python programming.

## Cells

In [1]:
1+1

2

In [2]:
a = 5

### Printing

In [3]:
print(a)

5


## Functions

In [4]:
def hello():
    print("Hello World")

In [5]:
hello()

Hello World


In [6]:
def hello(name):
    print("Hello {}".format(name))

In [7]:
hello('Class')

Hello Class


## Data Structures

### Arrays

In [8]:
a = [1,2,3]

In [9]:
a

[1, 2, 3]

In [10]:
a[1]

2

## Dictionaries

Dictionaries are Python’s implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.

In [11]:
country = {}
country['New York'] = 'USA'
country['Chicago'] = 'USA'
country['Beijing'] = 'China'

In [12]:
print(country)

{'New York': 'USA', 'Chicago': 'USA', 'Beijing': 'China'}


In [13]:
continent = {
    'New York' : 'North America',
    'Chicago' : 'North America',
    'London' : 'Europe',
    'Beijing' : 'Europe'
}

In [14]:
continent['New York']

'North America'

In [15]:
continent['Paris'] = 'Europe'

In [16]:
print(continent)

{'New York': 'North America', 'Chicago': 'North America', 'London': 'Europe', 'Beijing': 'Europe', 'Paris': 'Europe'}


## Pandas Data Frames

A data frame is a two-dimensional data structure, with data aligned in a tabular fashion in rows and columns.

In [17]:
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


### Making a Data Frame from an Array

#### One-dimensional array

In [18]:
data = [1, 2, 3, 4, 5]
print(data)

[1, 2, 3, 4, 5]


In [19]:
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


#### Two-dimensional array

In [20]:
data = [['a', 1], ['b', 2], ['c',3]]

In [21]:
df = pd.DataFrame(data)
print(df)

   0  1
0  a  1
1  b  2
2  c  3


### Making a Data Frame from a Dictionary

In [22]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first','second'], columns=['a','b','c'])
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0


In [23]:
print(df['a'])

first     1
second    5
Name: a, dtype: int64


In [24]:
print(df['b'])

first      2
second    10
Name: b, dtype: int64


In [25]:
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df1.append(df2)
print(df)

   a  b
0  1  2
1  3  4
0  5  6
1  7  8


### Slicing

You can use "slicing" to select various subsets of a dataframe.  The ":" (colon) allows you to specify regions of the data frame within ranges of rows or columns.

In [26]:
print(df['a'][1:4])

1    3
0    5
1    7
Name: a, dtype: int64


In [27]:
print(df[:2])

   a  b
0  1  2
1  3  4


## File I/O with Pandas

Pandas makes it easy to load a file into a data frame directly.  You can use `read_csv` to load a file directly into a dataframe. 

In [28]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

### Exploring the dataset

In [29]:
df.shape

(149, 5)

In [30]:
df.head(n=5)

Unnamed: 0,5.1,3.5,1.4,0.2,Iris-setosa
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


In [31]:
df.describe()

Unnamed: 0,5.1,3.5,1.4,0.2
count,149.0,149.0,149.0,149.0
mean,5.848322,3.051007,3.774497,1.205369
std,0.828594,0.433499,1.759651,0.761292
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.4,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5
