# Preface

## Why Python?
1. General purpose language - add what you need
2. Portable (Linux, Windows, Mac)
3. Interactive
4. Free
5. Community and eco-system
6. Easy to use

## Working with Python
Workflows - many - find your own! In this course - Jupyter notebook and Pandas:
* Python + Jupyter notebook + Pandas = A complete environment
* Interactive
* Encourage an iterative work process (research?)
* Documentation, code and visualization in one - literate programming
* Reproducing results and figures

# Introduction

In this session, you will learn the basic Python syntax for data manipulation & analysis, including:

1. General syntax
2. basic operations
3. Object & data types
4. Flow controls




In [None]:
import numpy as np # Basic library for all kind of numerical operations
import pandas as pd # Basic library for data manipulation in dataframes

# Basics

## The very basics

In [None]:
# Running a cell (Ctrl-Enter, Shift-Enter)
print('Hello world')

Hello world


## Variables

In [None]:
i = 6
print(i, type(i))

6 <class 'int'>


In [None]:
x = 3.2
print(x, type(x))

3.2 <class 'float'>


In [None]:
s = 'Hello'
print(s, type(s))

Hello <class 'str'>


## Value assignment & evaluation

In [None]:
x = 3         # Assignment
print('We asigned x the value of ', x)              # Evaluate the expression and print result

We asigned x the value of  3


In [None]:
y = 4         # Assignment
y + 5         # Evaluation, y remains 4

9

In [None]:
z = x + 17*y  # Assignment
z             # Evaluation

71

In [None]:
a=2

In [None]:
# basic mathematical operations
print(x+y, x*y, x-y, x/y, a**2, x+y**2, (x+y)**2)

7 12 -1 0.75 4 19 49


## Value comparison

Comparisons return boolean values: True or False

In [None]:
2==2  # Equality

True

In [None]:
2!=2  # Inequality

False

In [None]:
x <= y # less than or equal: "<", ">", and ">=" also work

True

In [None]:
(x | z) >= y

True

In [None]:
(x & z) >= y

False

In [None]:
x + z / 50 < y

False

## Special Constraints, NA, NaN, Inf

In [None]:
print([1, None, 3])

[1, None, 3]


## Importing
We need to import libraries or only parts of libraries all the time. Use name-conventions when doing so

In [None]:
from math import sqrt

In [None]:
a = 2
b = 3

c = sqrt(a**2 + b**2)
print(c)

3.605551275463989


## Functions
* Define a function
* Function name: pythagoras
* Arguments: a, b
* Indentation using tab (4 spaces) for the whole function
* `return` statement

In [None]:
#@title
def pythagoras(a, b):
    return sqrt(a**2 + b**2) # Notice the tab!

In [None]:
#@title
print(pythagoras)

<function pythagoras at 0x7ff9ff41ec20>


In [None]:
#@title
c = pythagoras(a, b)
print(c)

3.605551275463989


In [None]:
#@title
some_list = [(2,4),(6,7),(8,9),(1,6)]
pd.DataFrame(some_list)

Unnamed: 0,0,1
0,2,4
1,6,7
2,8,9
3,1,6


In [None]:
#@title
[pythagoras(stuff[0],stuff[1]) for stuff in some_list]

[4.47213595499958, 9.219544457292887, 12.041594578792296, 6.082762530298219]

**Best practice: ** Adding documentation via
* Doc-string (""")
* Try placing the cursor at the function and press `<shift+tab>`

In [None]:
def pythagoras(a, b):
    """
    Computes the length of the hypotenuse of a right triangle

    Arguments
    a, b: the two lengths of the right triangle
    """

    return sqrt(a**2 + b**2)

## Mini-assignment
* Construct a function that given two points $(x_1, y_1), (x_2, y_2)$ on a line computes the slope $a$ of the line
$$ y = ax + b$$
given by
$$ a = \frac{y_2- y_1}{x_2 - x_1}$$

In [None]:
def slope(x1,y1,x2,y2):
  return (y2-y1)/(x2-x1)

# Flow Control (loops & friends)

Python is made for readability and therefore tabs and new lines have syntax meaning


In [None]:
# If/else controls
x = 5
y = 10

if (x==0):
  y = 0
else:
  y = y/x
  print(y)

2.0


In [None]:
# For loops
for i in range(1,x+1):
  print("OMG, i just counted to " + str(i))

OMG, i just counted to 1
OMG, i just counted to 2
OMG, i just counted to 3
OMG, i just counted to 4
OMG, i just counted to 5


In [None]:
# While loop
x = 5

while x > 0:
  print(x)
  x = x-1

5
4
3
2
1


In [None]:
x = 1

while True:
  print(x)
  x = x + 1
  if x > 7:
    break

1
2
3
4
5
6
7


In [None]:
even = [] # empty list
for i in range(10):
    even.append(i*2)
even

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [None]:
odd = []
for i in even:
    odd.append(i+1)
odd

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

### Mini-assignment

Write a function `KtoC` that translates Kelvin to Celcius

$$ C = K - 273.15 \quad \text{with} \quad C\geq - 273.15$$

The function returns `None` when $C < -273.15$

In [None]:
def KtoC(K):
  if K >= -273.15:
    return K-273.15
  else:
    return None



#Object classes


## Vector

One-dimensional collection of values

In [None]:
# Numeric
v1 = [1,5,11,33] # [] initiate a list
v1

[1, 5, 11, 33]

In [None]:
# String
v2 = ["hello","world"]
v2

['hello', 'world']

In [None]:
# Boolean
v3 = [True, True, False, True]
v3

[True, True, False, True]

Evaluating elements in vectors

In [None]:
v1[0]

1

In [None]:
v1[1:3]

[5, 11]

Manipulatingg vector elements

In [None]:
v1[2] = 1337
v1

[1, 5, 1337, 33]

Combining different types of elements you obtain a list of lists (later) with all elements in their original format

In [None]:
v5 =[v1, v2, v3]
v5
# Integers (numbers) are still numbers, not strings (text). Easy to see because they don't have ' '

[[1, 5, 1337, 33], ['hello', 'world'], [True, True, False, True]]

Adding vectors will append them (not sum them)

In [None]:
v1 + v3

[1, 5, 1337, 33, True, True, False, True]

In [None]:
# Same for multiplication
v1 * 2

[1, 5, 1337, 33, 1, 5, 1337, 33]

**Element-wise operations:** To do numerical operations on vectors
numpy.arrays. NumPy is a library, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Here, you can already see that Python is a CS language.

In [None]:
v1_array = np.array(v1)
v2_array = np.array(v2)
v3_array = np.array(v3)

In [None]:
v1_array

array([   1,    5, 1337,   33])

In [None]:
v1_array + 5

array([   6,   10, 1342,   38])

In [None]:
v1_array + v3_array

array([   2,    6, 1337,   34])

In [None]:
# Arrays of different size
v1_array + np.array([1,7])

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

In [None]:
# non-numerical arrays
v1_array + v2_array

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U5')) -> None

**Mathematical operations over the vector:** For most maths you need to engage numpy or other modules (Python is not per sea maths language)

In [None]:
# that works the same way
np.sum(v1)

1376

In [None]:
np.mean(v1)

344.0

In [None]:
# Standard deviation for population - DeltaDegreesOfFreedom = 0 by default
np.std(v1, ddof=0)

573.4413657907842

In [None]:
np.std(v1, ddof=1)

662.1530538075518

In [None]:
np.corrcoef(v1,v1)

array([[1., 1.],
       [1., 1.]])

Also consider this cheat sheet

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

## Lists
* An indexable collection of variables (objects)
* C-style or 0-indexed

In [None]:
l = ['Caroline', 1.0, pythagoras]
type(l)

list

In [None]:
l

['Caroline', 1.0, <function __main__.pythagoras(a, b)>]

In [None]:
l[0]

'Caroline'

In [None]:
type(l[0])

str

Common methods for lists

In [None]:
l.append(sqrt(2.0))
l

['Caroline', 1.0, <function __main__.pythagoras(a, b)>, 1.4142135623730951]

In [None]:
a = l.pop(2)
a

In [None]:
l

['Caroline', 1.0, 1.4142135623730951]

In [None]:
l.pop(0)
l.append(100)
l.sort(reverse=True)

In [None]:
l

In [None]:
l[1] = 2
l

In [None]:
l.extend([6.0, 4])
l

## Tuples
* Immutable "lists"

In [None]:
t = (1.0, 4.0)
t, type(t)

In [None]:
t[1]

In [None]:
t[1] = 2

## Dictionaries
- Like lists with user-definable indices
- Can, like lists and tuples, contain a mix of different types of data.
- The indices can *also* be different kinds of data - unlike lists and tuples.

In [None]:
d = {'one': 1, 2: 1 + 1, 3.0: 'three'}
d

Usefull methods

In [None]:
d.keys()

In [None]:
d.items()

In [None]:
some_value = d.pop(3.0)
d

In [None]:
some_value

In [None]:
d['four'] = 4
d

In [None]:
d.update({'five': 5.0, 6: 6.0})
d

## Data Frames

In Python Data Frames are managed by Pandas, a very comprehensive library for data manipulation and analysis.

We will introduce to it later more in detail, so here only brief:

In [None]:
# We construct the DF from a dictionary which is indicated by {'some_key':['some_values']}

df1 = pd.DataFrame(
    {'ID':range(1,5), # Python counts from 0 and the last value in a range is excluded
     'FirstName':["Jesper","Jonas","Pernille","Helle"],
     'Female':[False,False,True,True],
     'Age':[22,33,44,55]
})

In [None]:
# Python doesn't really do much factors and as you can see pandas understood your input formats
df1.info()

In [None]:
df1.FirstName #dot notation

In [None]:
df1['FirstName'] #more traditional subsetting

In [None]:
df1.loc[:,'FirstName'] #more complex subsetting

In [None]:
df1.iloc[:,1] #index based

In [None]:
# Rows 1 and 2, columns 3 and 4 - the gender and age of Jesper & Jonas
df1.iloc[[0,1],[2,3]]


In [None]:
#Same thing
df1.loc[[0,1],['Female','Age']]

In [None]:
# Rows 1 and 3, all columns

df1.iloc[[0,2],:] # don't forget to count index-1 when going from R to python

In [None]:
#Find the names of everyone over the age of 30 in the data
df1[df1.Age > 30]

In [None]:
# or "Query style" (There are always many ways of doing the same thing)
df1.query('Age > 30')