# Introduction to Python for Data Science - Day 1 

Welcome to the course. These notebooks will guide through the two days of the course. 
They are designed for you to repreduce and play, so feel free to modify the content. 

In particular, the course is split into "theory" and "lab" sessions. 
- The theory sessions are in the morning and show hands-on how the main concept works
- The lab session are afternoon exercises designed to understand and try the concepts learn in the morning. 

The schedule for the first day is the following

 | Time | Topic |
 --- | --- |
9.00-10.00 | Basic introduction to python, variables and operations, if-statements, loops
 10.00-11.00 | Collections: Tuples, Lists, Dictionaries, Set |
 11.00-12.00 | functions, error handling |
 12.00-13.00 | **Lunch break** | 
 13.00-14.00 | Installing conda, packages, jupyter notebook, documentation, magic functions, pull from git |
 14.00-15.00 | Tutorial, First program / exercises |

In the first day we see the basics of Python language. In particular, we will look into the main concepts and how to run code. 
Please create an account in Google to access colab or, if you want, use Jupyter notebook in your laptop. 

**Acknowledgments**

The material in this day is adapted from Chapter 2 and Chapter 3 in the book 
> [Python for Data Analysis, 3rd Edition](https://wesmckinney.com/book/) by Wes McKinney, published by O'Reilly Media.

The original jupyter notebooks can be found at the [book's Github repository](https://github.com/wesm/pydata-book/tree/3rd-edition).


## Python basics

### Variables 

Let us just play a bit with python. _What does the code below do?_

In [1]:
# This is a comment, it is not executed

In [2]:
import numpy as np
data = [np.random.standard_normal() for i in range(7)]
data

[-1.7976191451550851,
 1.1140066756926867,
 -1.2097935895351124,
 -2.010429307225568,
 -1.0457552822978402,
 -1.8471213052317386,
 -1.7753631702398598]

In python, variables are assigned by _reference_, this means that the `=` does not copy the variable, but creates an alias for the same object.

In [3]:
a = [1, 2, 3]

In [4]:
b = a
b

[1, 2, 3]

In [5]:
a.append(4)
b

[1, 2, 3, 4]

If a variable is modified in a function, the value changes as well. We will see what a function is soon!

In [6]:
def append_element(some_list, element):
    some_list.append(element)

In [7]:
data = [1, 2, 3]
append_element(data, 4)
data

[1, 2, 3, 4]

In order to know the type of a variable, use the function ```type```. What is the result of the expression below? 

In [8]:
a = 5
print(type(a))
a = "foo"
print(type(a))

<class 'int'>
<class 'str'>


In [9]:
"5" + 5

TypeError: can only concatenate str (not "int") to str

In [10]:
a = 4.5
b = 2
# String formatting, to be visited later
print(f"a is {type(a)}, b is {type(b)}")
a / b

a is <class 'float'>, b is <class 'int'>


2.25

```isinstance``` can tell you whether a variable has a specific type. 

In [11]:
a = 5
isinstance(a, int)

True

In [12]:
a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))

True

In [13]:
a = "foo"

Try pressing the <tab> key after writing ```a.<tab>```, now you can appreciate the "magic". 
The functions associated to the string a are all visiable. This is defined by someone else (in the standard library) and they can be used to manipulate strings. 

In [14]:
getattr(a, "split")

<function str.split(sep=None, maxsplit=-1)>

### Operators and comparisons

Let us now play a bit with some useful operation. These basic functions allow to perform transformations on numbers, strings, and any object in python. 

In [15]:
5 - 7
12 + 21.5
5 <= 2

False

In [16]:
a = [1, 2, 3]
b = a
c = list(a)
a is b
a is not c

True

In [17]:
a == c

True

In [18]:
a = None
a is None

True

In [19]:
a_list = ["foo", 2, [4, 5]]
a_list[2] = (3, 4)
a_list

['foo', 2, (3, 4)]

In [20]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = "four"

TypeError: 'tuple' object does not support item assignment

In [21]:
ival = 17239871
ival ** 6

26254519291092456596965462913230729701102721

In [22]:
fval = 7.243
fval2 = 6.78e-5

In [23]:
3 / 2

1.5

In [24]:
3 // 2

1

### String manipulation

In [25]:
c = """
This is a longer string that
spans multiple lines
"""

In [26]:
c.count("\n")

3

In [27]:
a = "this is a string"
a[10] = "f"

TypeError: 'str' object does not support item assignment

In the code above, the interpreter gave us an error. Why? 

**String are immutable objects**, that means that their value cannot be changed. 
The only way to modify a string is to create a new one by using the function ```replace```


In [28]:
b = a.replace("string", "longer string")
b

'this is a longer string'

In [29]:
a

'this is a string'

It is also possible to convert numbers to strings and viceversa. The operation that converts a type into another is called _casting_. To cast a variable use the functions ```str, int, float, ...``` on a variable. 

In [30]:
a = 5.6
s = str(a)
print(s)

5.6


In [31]:
s = "python"
list(s)
s[:3]

'pyt'

If you want to insert a "\" in your string you need to write it twice. 

In [32]:
s = "12\\34"
print(s)

12\34


Or use ```r"your_string"``` 

In [33]:
s = r"this\has\no\special\characters"
s

'this\\has\\no\\special\\characters'

In [34]:
a = "this is the first half "
b = "and this is the second half"
a + b

'this is the first half and this is the second half'

**String formatting** is an important operation to insert computed values into your strings.

In [35]:
template = "{0:.2f} {1:s} are worth US${2:d}"

In [36]:
template.format(88.46, "Argentine Pesos", 1)

'88.46 Argentine Pesos are worth US$1'

In [37]:

amount = 10
rate = 88.46
currency = "Pesos"
result = f"{amount} {currency} is worth US${amount / rate}"

In [38]:

f"{amount} {currency} is worth US${amount / rate:.2f}"

'10 Pesos is worth US$0.11'

In [39]:
val = "español"
val

'español'

In [40]:

val_utf8 = val.encode("utf-8")
val_utf8
type(val_utf8)

bytes

In [41]:

val_utf8.decode("utf-8")

'español'

In [42]:

val.encode("latin1")
val.encode("utf-16")
val.encode("utf-16le")

b'e\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'

### Boolean operations

```True```and ```False``` are the only values a boolean variable can take. Boolean variables allow for operations, such as and and or.

In [43]:
True and True
False or True

True

In [44]:
int(False)
int(True)

1

In [45]:
a = True
b = False
not a
not b

True

In [46]:
s = "3.14159"
fval = float(s)
type(fval)
int(fval)
bool(fval)
bool(0)

False

```None``` is a keyword that indicates "no type". To check whether a variable has no type use 
```python
var is None
```

In [47]:

a = None
a is None
b = 5
b is not None

True

In [48]:

from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute

30

In [49]:

dt.date()
dt.time()

datetime.time(20, 30, 21)

In [50]:

dt.strftime("%Y-%m-%d %H:%M")

'2011-10-29 20:30'

In [51]:

datetime.strptime("20091031", "%Y%m%d")

datetime.datetime(2009, 10, 31, 0, 0)

In [52]:

dt_hour = dt.replace(minute=0, second=0)
dt_hour

datetime.datetime(2011, 10, 29, 20, 0)

In [53]:

dt

datetime.datetime(2011, 10, 29, 20, 30, 21)

In [54]:

dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
type(delta)

datetime.timedelta

In [55]:

dt
dt + delta

datetime.datetime(2011, 11, 15, 22, 30)

### Control statements (if)

One of the most important feature of programming languages is control statements, that allow to execute a piece of code only under some specific conditions. In python, the main control statements are the if-statements. The syntax is the following
```python
if condition: 
    code_if_condition_true
else:
    code_if_condition_false
```

Let's see some example. 

In [56]:
# Does the code below print or not? 
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print("Made it")

Made it


Alternatively one can use ```elif``` to define further conditions.

In [57]:
age = 27

if age < 10: 
    print('Child')
elif age < 15: 
    print('Teenager')
elif age < 30: 
    print('Young adult')
elif age < 60: 
    print('Adult')
else: 
    print('Senior')

Young adult


Comparison (>,<, >=, <=), operators (and, or, not), and containment (in) operators allow for defining conditions in the if-statements. 

In [58]:
4 > 3 > 2 > 1

True

### Loops

Loops allow for iterating over numbers or set of objects. Python defines two different loops: 
    - _for_ loops to iterate over objects
    - _while_ loops to iterate until a condition is True
    
#### For-loops

The syntax of a for-loop is the following. 

```python
for variable in objects: 
    do_something
```

Let us try generating all the pairs of numbers from 0 to 4. 

In [59]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


The function ```range(10)``` returns an object that iterates over the numbers from 0 to 9.  

In [60]:

range(10)
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [61]:
list(range(0, 20, 2))
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

Equivalently one could write a for-loop over a predefined list of numbers. 

In [62]:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
    print(f"element {i}: {seq[i]}")

element 0: 1
element 1: 2
element 2: 3
element 3: 4


For and if can be, of course, combined together. 

In [63]:
total = 0
for i in range(100_000):
    # % is the modulo operator
    if i % 3 == 0 or i % 5 == 0:
        total += i
print(total)

2333316668


### While-loops

A while-loop repeats a piece of code untile the condition is True. The syntax is the following. 

```python
while condition: 
    do_something
```

In [64]:
s = 'p'

while len(s) < 5: 
    s += 'e'
    
print(s)

peeee
