# Introduction to Python


### Basics

- Variables  
- Assignment  
- Strings  
- Lists  
- Dictionaries  
 

### Programming Fundamentals

- Control flow (if-statements)  
- loops
- Functions


### Pandas 

- Selecting and subsetting data  
- Data types  
- Data manipulation and processing techniques  



### Plotting

- pandas  
- matplotlib  
- seaborn




## Python Data Types

- `int` : integers (R: numeric)
- `float` : reals (R: numeric)
- `bool` : booleans `True` or `False` values (R: logical, `TRUE` or `FALSE`)
- `str` : strings (R: character)


## Assignment & Data Types


In [3]:
x = 5  # R: x <- 5

In [2]:
y = 1.5  # R: y <- 1.5

In [None]:
z = True # R: z <- TRUE

In [4]:
w = "Hello" # R: w <- "Hello"

To know the data type of a variable use:

In [8]:
type(x) # R: class(x)

int

In [9]:
type(y) # R: class(y)


float

## Print statements (in python 3)


In [5]:
print(5)  # R: print(5)

5


In [7]:
print("Hello World") # R: print("Hello World")

Hello World


## Operators

They may work differently with different data types

In [11]:
print(x + 1) # R print(x + 1)

6


In [16]:
print(w + " World") # R: print (paste(w,"World"))


Hello World


This will result in an error:

In [17]:
print(x + "world")


TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [21]:
"1" * 5


'11111'

In [20]:
'1' '5'

'15'

## Lists

Similar to in R, in python lists can contain data of multiple data type

In [23]:
l = [1, True, 3.14, 'Hello'] # R: l <- list(1, TRUE, 3.14, 'Hello')
print(l)

[1, True, 3.14, 'Hello']


**WARNING:** In python the first element of a list is indexed by 0

In [24]:
print(l[0]) # R: print(l[1])

1


**WARNING:** In python negative indexes select from the tail
(while in R remove elements from the head)

In [27]:
print(l[-2]) # R: print(tail(l)[2])


3.14


## Sliding Lists
In Python a sub-list is estracted by using te operator
`start:end` or `start:end:step`. 
The start index is included while the end is excluded.

In [34]:
l = [1,2,3,4,5] # R: l <- list(1,2,3,4,5)

In [32]:
l[2:4] # R: l[3:4]

[3, 4]

In [35]:
l[0:5:2] # R: l[c(TRUE,FALSE)]

[1, 3, 5]

Implicit values mean:
- start: head element
- end: tail element
- step: 1

In [38]:
l[:3] # R: l[1:3]

[1, 2, 3]

In [40]:
l[3:] # R: l[4:5]

[4, 5]

In [41]:
l[::2] # R: l[c(TRUE,FALSE)]



[1, 3, 5]

## Dictionaries 

Dictionaries store `key:value` pairs.

While elements of a lists are unnamed and accessed by an index,
the values of a dictionary are accessed by a key value.

In [57]:
d = {'name': 'John', 'employed': False, 'age':30, 'height':71.5} 
# R: d <- list('John', 30, 71.5, FALSE); names(d)<-c('name', 'age', 'height', 'employed') 

In [58]:
d['name'] # R: d['name']

'John'

In [59]:
d

{'name': 'John', 'employed': False, 'age': 30, 'height': 71.5}

## Methods and Functions

Similarly to R, Python has functions and methods.
- _Methods_ are functions that an object can call on itself
- _Functions_ are called on an object

In [65]:
l = [0,2,3,4,1,5]
len(l)

6

In [67]:
l.sort(); 
print(l)

[0, 1, 2, 3, 4, 5]


**IMPORTANT** periods have special meaning in Python.


## Example of Methods

In [70]:
l = [0,1,2,3,4,5];
l.append(6); l

[0, 1, 2, 3, 4, 5, 6]

In [72]:
d = {'fname': 'John', 'lname': 'Brown'} 
d.update({'lname': 'Davis', 'age': 35}); d

{'fname': 'John', 'lname': 'Davis', 'age': 35}

Methods are specific of an object. 
For example dictionaries do not have a method `append`.


## Libraries

Similarly to R, many functionality in Python are not part of the core language but added by calling specific libraries.

For example, to have some typical R objects available in Python you may need to load special libraries.

- `numpy` : Array and Matrices  
- `pandas` : Dataframes



## Numpy `ndarray`

To read an array from a _csv_ file in Python you can use

In [7]:
import numpy # R: library(readr)
arr = numpy.loadtxt('darray.csv', delimiter=',') 
#R: arr <- read_csv('darray.csv')
arr

array([1.230e+02, 3.400e+01, 1.230e+02, 4.500e+01, 2.346e+03, 1.230e+02,
       5.236e+03, 2.340e+02, 6.510e+02, 6.400e+01, 2.430e+02, 2.346e+03,
       5.234e+03, 6.245e+03, 6.000e+00, 3.246e+03, 4.500e+01, 7.234e+03,
       6.000e+00])

or you may give an _alias_ to the library name, such as `np`

In [13]:
import numpy as np # R: library(readr)
arr = np.loadtxt('darray.csv', delimiter=',') 
#R: arr <- read_csv('darray.csv', col_names = FALSE)
arr

array([1.230e+02, 3.400e+01, 1.230e+02, 4.500e+01, 2.346e+03, 1.230e+02,
       5.236e+03, 2.340e+02, 6.510e+02, 6.400e+01, 2.430e+02, 2.346e+03,
       5.234e+03, 6.245e+03, 6.000e+00, 3.246e+03, 4.500e+01, 7.234e+03,
       6.000e+00])

or, as in R, you may let all library's objiects available in the main scope

In [28]:
from numpy import * # R: library(readr)
arr = loadtxt('darray.csv', delimiter=',') 
#R: arr <- read_csv('darray.csv', col_names = FALSE)
arr[:,0].mean() # R: mean(arr[[1]])

273.0

## Pandas `DataFrame`

To read an dataframe from a _csv_ file in Python you can use `pandas`library

In [19]:
import pandas as pd
iris = pd.read_csv('iris.csv')
type(iris)

pandas.core.frame.DataFrame

In [20]:
iris.head() # R: head(iris)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
