# Introduction to Python for Data Science - Day 1 

Welcome to the course. These notebooks will guide through the two days of the course. 
They are designed for you to repreduce and play, so feel free to modify the content. 

In particular, the course is split into "theory" and "lab" sessions. 
- The theory sessions are in the morning and show hands-on how the main concept works
- The lab session are afternoon exercises designed to understand and try the concepts learn in the morning. 

In the first day we see the basics of Python language. In particular, we will look into the main concepts and how to run code. 
Please create an account in Google to access colab or, if you want, use Jupyter notebook in your laptop. 

**Acknowledgments**

The material in this day is adapted from Chapter 2 and Chapter 3 in the book 
> [Python for Data Analysis, 3rd Edition](https://wesmckinney.com/book/) by Wes McKinney, published by O'Reilly Media.

The original jupyter notebooks can be found at the [book's Github repository](https://github.com/wesm/pydata-book/tree/3rd-edition).


## Collections: Tuples, Lists, Sets, Dictionaries

We now look a bit more closely to four of the main collections in Python. A collectionis a "container" of data. 

Depending on the application, you might find yourself choosing one or the other. 

### Tuples

A tuple is an *immutable* object, in which each position corresponds to a specific item of any type. A tuple is the natural extension of a pair. 
To declare a tuple, you can use the syntax ```(item1, item2, ..., itemN)```

In [65]:
tup = (4, 5, 6)
tup

(4, 5, 6)

In [66]:
tup = 4, 5, 6
tup

(4, 5, 6)

Using the function ```tuple``` it is possible to construct tuples from lists or strings. 

In [67]:
tuple([4, 0, 2])
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

Elements in a tuple can be accessed using the square-brackets. 

**Note:** The first element in the tuple is in position 0

In [68]:
tup[0]

's'

In [69]:

nested_tup = (4, 5, 6), (7, 8)
nested_tup
nested_tup[0]
nested_tup[1]

(7, 8)

In [70]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

In the above code, why did you receive an error? 

Why the code below is instead valid? 

In [71]:
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples with the ```+``` operator. 

In [72]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

If ```+``` concatenates tuple, what would ```*``` do? 

_Hint_: think to the mathematical definition of ```*```

In [73]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Elements in a tuple can be "unrolled" and saved into different variables.

In [74]:
tup = (4, 5, 6)
a, b, c = tup
b

5

In [75]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

In [76]:
a, b = 1, 2
print(f"a:{a}")
print(f"b:{b}")
b, a = a, b
print(f"a:{a}")
print(f"b:{b}")

a:1
b:2
a:2
b:1


This "unrolling" capability combined with a for loop, offers a very easy way to read the values of a list of tuples. 

In [77]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


If you are not interested to save all the values of a tuple after a certain point, the variable ```*var_name``` is very useful. 

In [78]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a
b
rest

[3, 4, 5]

In [79]:
a, b, *_ = values

Some interesting function ... what does ```count``` do? 

In [80]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### Lists

As opposed to tuples list are _mutable_ objects, i.e., the items inside a list can change over the time. 
Converting a tuple to a list is, therefore, a way to ensure that we can update the tuple's elements. 

In [81]:
tup = ("foo", "bar", "baz")
b_list = list(tup)
b_list
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

The ```range``` function we have already seen, returns a list of number in a certain range. 

In [82]:
gen = range(10)
gen
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Range is a useful function (we will see the equivalent in numpy), try to check the documentation with the syntax

```python
range?
```

In [83]:
range?

[0;31mInit signature:[0m [0mrange[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


#### Modifying and searching a list

We can add elements to a list, using the function ```append```. 

In [84]:
b_list.append("dwarf")
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

The ```.``` before ```append``` is a special character to call _methods_ of a specific _class_.  You can see all the methods using ```b_list.<tab>```

Other common operations for a list is insertion of an element at a specific position. 

In [85]:
b_list.insert(1, "red")
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

... and deletion of an element at a specific position

In [86]:
b_list.pop(2)
b_list

['foo', 'red', 'baz', 'dwarf']

```remove``` removes an element from a list. However, it needs to scan the entire list in order to find the element!

In [87]:
b_list.append("foo")
b_list
b_list.remove("foo")
b_list

['red', 'baz', 'dwarf', 'foo']

#### List operations

Checking if an element is in a list 

In [88]:

"dwarf" in b_list

True

In [89]:

"dwarf" not in b_list

False

Concatenation

In [90]:

[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [91]:

x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Sorting a list

In [92]:

a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [93]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Slicing 

Slicing allows to return a sub-list from a certain position to another using 

In [94]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [95]:

seq[3:5] = [6, 3]
seq

[7, 2, 3, 6, 3, 6, 0, 1]

In [96]:

seq[:5]
seq[3:]

[6, 3, 6, 0, 1]

Negative indexes in slicing indicate starting from the end of the list. 

In [97]:

seq[-4:]
seq[-6:-2]

[3, 6, 3, 6]

In [98]:

seq[::2]

[7, 3, 3, 0]

What does the syntax below do? 

In [99]:
seq[::-1]

[1, 0, 6, 3, 6, 3, 2, 7]

### Dictionaries

Dictionaries contains key,value pairs. That means that an object _value_ in a dictionary can be found with its key. Think of a dictionary like an easy way to find any object. 
A dictionary is created and populated in this way. 

In [100]:
empty_dict = {}
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

Dictionaries are _mutable_ objects, in which the content can be changed. 

In [101]:
d1[7] = "an integer" #NOTE: this is not position 7 as in lists! 7 is a key for the string "an integer"
print(d1)
d1["b"]

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}


[1, 2, 3, 4]

#### Dictionary operations
The ```in``` keyword is used for checking whether a dictionary contains a certain key 

In [102]:
"b" in d1

True

```del``` deletes a specific element, ```pop``` deletes the element and returns the value.

In [103]:
d1[5] = "some value"
print(d1)
d1["dummy"] = "another value"
print(d1)
del d1[5]
print(d1)
ret = d1.pop("dummy")
print(ret)
print(d1)

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value', 'dummy': 'another value'}
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'dummy': 'another value'}
another value
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}


Accessing the list of keys and the list of values

In [104]:
print(list(d1.keys()))
print(list(d1.values()))

['a', 'b', 7]
['some value', [1, 2, 3, 4], 'an integer']


Accessing the pairs of items.

In [105]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

Update the values of a dictionary

In [106]:
d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

#### Useful operations with dictionaries

In [107]:
tuples = zip(range(5), reversed(range(5)))
print(tuples)
mapping = dict(tuples)
print(mapping)

<zip object at 0x7fc78c5e2ec0>
{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}


In [108]:

words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

```setdefault``` function allows to change the values of a key or create a new value if the key does not exists. 

In [109]:
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

Similarly, one can use the defaultdict from module collections

In [110]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

The _keys_ of a dictionary must be immutable objects! 

For an immutable object it is possible to compute its hash value, that is a alphanumeric code that "uniquely" represent an object. 

In [111]:
hash("string")
hash((1, 2, (2, 3)))
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

In [112]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

### Sets

Sets are _mutable_ objects that contain non-repeated objects.  

In [113]:
set([2, 2, 2, 1, 3, 3])
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

In [114]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

#### Set operations

Sets are useful to perform set operations, such as ```union````

In [115]:
a.union(b)
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

intersection

In [116]:
a.intersection(b)
a & b

{3, 4, 5}

Combined union and update

In [117]:
c = a.copy()
c |= b
print(c)
d = a.copy()
d &= b
print(d)

{1, 2, 3, 4, 5, 6, 7, 8}
{3, 4, 5}


In [118]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

Containment operations

In [119]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)
a_set.issuperset({1, 2, 3})

True

Equality checks whether two sets contain the same elements

In [120]:
{1, 2, 3} == {3, 2, 1}

True

### Useful functions on collections

In [121]:
sorted([7, 1, 2, 6, 0, 3, 2])
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

```zip``` creates a list of pairs from a pairs of lists. 

In [122]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [123]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

```enumerate``` function returns pairs of (position, item) of a collection

In [124]:
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")

0: foo, one
1: bar, two
2: baz, three


In [125]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

#### Comprehensions

Comprehensions create collections from a for-loop in a very compact manner

In [126]:
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [127]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

```map``` applies a function to a collection

In [128]:
set(map(len, strings)) # Note: len is a name of a function, as functions in python are objects

{1, 2, 3, 4, 6}

In [129]:
loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

Collections can be "nested". For instance you can create a list of lists. 

In [130]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

In [131]:
names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

['Maria', 'Natalia']

Of course, it is possible to provide a list comprehension for a list of lists. 

In [132]:
result = [name for names in all_data for name in names
          if name.count("a") >= 2]
result

['Maria', 'Natalia']

In [133]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [134]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)

In [135]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]