# Introduction to Python for Data Science - Day 1 

Welcome to the course. These notebooks will guide through the two days of the course. 
They are designed for you to repreduce and play, so feel free to modify the content. 

In particular, the course is split into "theory" and "lab" sessions. 
- The theory sessions are in the morning and show hands-on how the main concept works
- The lab session are afternoon exercises designed to understand and try the concepts learn in the morning. 

In the first day we see the basics of Python language. In particular, we will look into the main concepts and how to run code. 
Please create an account in Google to access colab or, if you want, use Jupyter notebook in your laptop. 

**Acknowledgments**

The material in this day is adapted from Chapter 2 and Chapter 3 in the book 
> [Python for Data Analysis, 3rd Edition](https://wesmckinney.com/book/) by Wes McKinney, published by O'Reilly Media.

The original jupyter notebooks can be found at the [book's Github repository](https://github.com/wesm/pydata-book/tree/3rd-edition).


## Functions and Error handling

Functions are the building blocks of a programming language. A function is a repeateable set of operations with an assigned name. To declare a function use the syntax

```python 
def function_name(param1, param2, ...): 
    code_of_the_function
    return result_of_the_function
```

In [136]:
def my_function(x, y):
    return x + y

You can call a function by its name and pass the **right** number of parameters as input. 

In [137]:
my_function(1, 2)
result = my_function(1, 2)
result

3

A function may or may not return some value

In [138]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)

hello!
None


A function can return multiple values and have *optional* parameters with a default value. 

```z``` is an optional parameter in the below function.

In [139]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [140]:
print(my_function2(5, 6, z=0.7))
print(my_function2(3.14, 7, 3.5))
print(my_function2(10, 20))

0.06363636363636363
35.49
45.0


#### Global variables

A function can modify the value of a variable defined outside of the body of the function. 
In general, this practice is not recommended as it might generate errors, but it is useful in cases like program settings or variables that are shared among functions. 

In [141]:
a = []
def func():
    for i in range(5):
        a.append(i)

In [142]:
func()
print(a)
func()
print(a)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]


A more explicit manner to access variables outside of the function is to use the keyword ```global```

In [143]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

[]


In [144]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

In [145]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

In [146]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

Since functions are objects, they can become parameters of other functions. 

In [147]:
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value) # In this case ops is a list of generic functions that are applied to a list of strings, a convenient trick ;-)
        result.append(value)
    return result

In [148]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

Similarly the ```map``` function can apply a function to each item in a list

In [149]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


#### Lambda functions

Lambda functions are anonymous functions. They are suitable in all cases in which we do not need to specify the function name. The syntax is

```python
lambda param1, param2, .. : function_code
```

In [150]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [151]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [152]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

In [153]:

strings.sort(key=lambda x: len(set(x)))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

#### Iterators
Iterators are other special functions that compute the result **on collections** once at the time. An iterator, does not return a value immediately, but only if called in a for-loop. 

In [154]:
some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

a
b
c


If you print the value of an iterator you will not see the content. 

In [155]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7fc78c579440>

unless you save the iterator in a list

In [156]:
list(dict_iterator)

['a', 'b', 'c']

In [157]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

In [158]:
gen = squares()
gen

<generator object squares at 0x7fc78c6c18c0>

In [159]:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [160]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x7fc78c6c1a80>

In [161]:
sum(x ** 2 for x in range(100))
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Itertools is a module that contains a number of iterators. 

In [162]:
import itertools
def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### Error handling

Sometimes a call to a function or operator raises an error

In [163]:
float("1.2345")
float("something")

ValueError: could not convert string to float: 'something'

To handle an error, we can use the syntax

```python
try: 
    code_with_potential_error
except error_name_or_empty: 
    code_if_error_occurs
``` 

In [164]:

def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [165]:

attempt_float("1.2345")
attempt_float("something")

'something'

In [166]:
float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

In [167]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [168]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

In [169]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

By using the keyword ```raise error_name``` we can include errors in our code. 

In [170]:
def check_receipt(amount):
    if amount < 0: 
        raise ValueError('The amount cannot be negative')
    else: 
        print(f"We received a receipt of {amount}DKK")

check_receipt(-5)

ValueError: The amount cannot be negative

In [171]:
check_receipt(6)

We received a receipt of 6DKK
