# Basics II - Data Structures

# Table of contents

[Executive Summary](#summary)
1. [Tuples](#tuple)\
    1.1. [How to modify a Tuple](#modify_tuple)\
    1.2. [Nested Tuples](#nested_tuple)
2. [Lists](#list)\
    2.1. [Nested Lists](#nested_list)\
    2.1. [for loop](#for)
3. [Dicts](#dict)
4. [Sets](#set)

### **Resources**: 

- [_Python for Finance (2nd ed.)_](http://shop.oreilly.com/product/0636920117728.do): Sec. 3.Basic Data Structures (Section 3.Excursus: Functional Programming is optional)
- [_The Python Tutorial_](https://docs.python.org/3.7/tutorial/): Sec. 3.1.3 (Lists), 4.2 (for Statements), 4.3 (The range() Function), 4.4 (break and continue Statemenents, and else Clauses on Loops), 5.1 (More on Lists), 5.3 (Tuples and Sequences), 5.4 (Sets), 5.5 (Dictionaries)

# Executive Summary <a name="summary"></a>

Intuitively, a _data structure_ is an object containing other objects, not necessarily of the same _data type_.

Standard Python provides four basic data structures, which can be differentiated at high level by being:
- _ordered_ or _not ordered:_ that is, whether they preserve the order in which entries are added or not;
- _mutable_ or _immutable:_ that is, whether - once defined - they can be modified or not.

These data-strucutures are:

data-structure | ordered (or not) | mutable (or not)
--- | --- | ---
Tuples  | ordered | immutable |
Lists | ordered | mutable |
Dicts | not ordered | mutable |
Sets | not ordered | mutable |

The function `type()` can be called over any defined data-structure and returns its type: `tuple` for Tuples, `list` for Lists, `dict` for Dicts and `set` for Sets.

The following sections are organized as follows: 
- In Sec. [1](#tuple) Tuples (`tuple`) are introduced as the Python data-structure for _ordered_ sequence-like objects that _cannot be_ modified once defined. 
- In Sec. [2](#list) Lists (`list`) are introduced as the Python data-structure for _ordered_ sequence-like objects that _can be_ modified once defined. In this context `for` loops are introduced in Sec. [2.1](#for).
- In Sec. [3](#dict) Dicts (`dict`) are introduced as the Python data-structure for _not ordered_ collection-like objects that _can be_ modified once defined and that implement a _key-to-value_ map.
- In Sec. [4](#set) Sets (`set`) are introduced as the Python data-structure for _not ordered_ collection-like objects that _can be_ modified once defined and that contain unique elements (that is, every elements appears only once). 

# 1. Tuples <a name="tuple"></a>

[Tuples](https://docs.python.org/3.7/tutorial/datastructures.html#tuples-and-sequences) consists of a number of values - in general, of heterogeneous data-type - packed together in an immutable sequence and separated by commas. 

In my experience, I didn't use tuple that often, probably because their _immutability_ goes against the dynamism of trail-n-error phases of a typical quantitative analysis. In fact, for the same reason, tuple may be good assets as they guarantee the preservation of data stored in them.

Tuples can be defined with or without parenthesis `()` surrounding the `,`-separated sequence.

In [1]:
tup = (1, 0.35, "GBP")

print(tup)
type(tup)

(1, 0.35, 'GBP')


tuple

In [2]:
tup = 1, 0.35, "GBP"

print(tup)
type(tup)

(1, 0.35, 'GBP')


tuple

The number of elements is easily retrieved by the `len()` function:

In [64]:
len(tup)

3

Tuples share a lot of properties with other sequence-like data-structure. For details take a look at [Sequence Types — list, tuple, range](https://docs.python.org/3.7/library/stdtypes.html#sequence-types-list-tuple-range) page of the Python standard library.

Tuples share indexing features with strings (see [Basics_I___Data_Types.ipynb](https://github.com/gabrielepompa88/IT-For-Business-And-Finance-2019-20/blob/master/Notebooks/Basics_I___Data_Types.ipynb)) and lists (see Sec. [2](#list)).
In particular, elements of a tuple can be accessed by _zero-based_ indexes:

In [3]:
# 0 is the index of the first element of the tuple
print(tup[0])
type(tup[0])

1


int

In [4]:
# -1 is the index of the last element of the tuple
print(tup[-1])
type(tup[-1])

GBP


str

and tuples can be sliced. That is, you can select only few elements of the tuple.

In [5]:
tup_slice = tup[0:2] # elements from position 0 (included) to 2 (excluded)

print(tup_slice)
type(tup_slice)

(1, 0.35)


tuple

In [6]:
tup[2:5] # elements from position 2 (included) to 5 (excluded)

('GBP',)

In [7]:
tup[:2]   # elements from the beginning to position 2 (excluded) --- equivalent to s[0:2]

(1, 0.35)

In [8]:
tup[-2:]  # elements from the second-last (included) to the end

(0.35, 'GBP')

Analogously to strings - but differently from lists - tuples are _immutable_ objects.  That is, if you try to change one of its elements, you get
```python
TypeError: 'tuple' object does not support item assignment
```

In [9]:
# tup[0] = 17

In particular, you cannot simply use the `+` operator as you would do with a string to concatenate characters. That is, something like

```python
17 + tup[1:]
```
would cause the following error

```python
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
```

that simply tells you that you cannot _add_ `int` objects (like `17`) with `tuple` objects (like the slice `tup[1:]`).

In [10]:
# 17 + tup[1:]

Nevertheless, there is a workaround... read below once you know about `list` data-structures.

## 1.1. How to modify a Tuple <a name="modify_tuple"></a>
**Read this section once you have covered Sec. [2](#list) on Lists**

Even if you cannot change directly an element of a tuple, you can: 
- use the `list()` _casting_ function to cast the tuple as a list
- modify the list
- re-cast it back as tuple using the casting function `tuple()`

In [11]:
list_tup = list(tup) # cast tup as a list

print(list_tup)
type(list_tup)

[1, 0.35, 'GBP']


list

In [12]:
list_tup[0] = 17 # change the element

In [13]:
tup = tuple(list_tup) # cast-back as a tuple

print(tup)
type(tup)

(17, 0.35, 'GBP')


tuple

## 1.2. Nested Tuples <a name="nested_tuple"></a>
**Read this section once you have covered Sec. [2](#list) on Lists**

Notice that even if the tuple itself is not mutable, its element may consist of _mutable_ objects (such as lists) and/or _immutable_ objects (such as tuple themselves).

In [46]:
l = [87, 100, 99]          # a list
t = ("ACT/365", "ACT/360") # a tuple

nested_tup = (l, t, 100)

print(nested_tup)
type(nested_tup)

([87, 100, 99], ('ACT/365', 'ACT/360'), 100)


tuple

As we have seen, elements of `tup` can be accessed through indexing:

In [15]:
print(nested_tup[0])
type(nested_tup[0])

[87, 100, 99]


list

In [16]:
print(nested_tup[1])
type(nested_tup[1])

('ACT/365', 'ACT/360')


tuple

In [17]:
print(nested_tup[2])
type(nested_tup[2])

100


int

You can as well access elements of list `l` and tuple `s` using a nested-indexing syntax:

In [18]:
# [0][0] is the index of the first element 
# of (list 'l' which is) the first element of the tuple 'nested_tup'
print(nested_tup[0][0]) 
type(nested_tup[0][0])

87


int

In [19]:
# [0][2] is the index of the third element 
# of (list 'l' which is) the first element of the tuple 'nested_tup'
print(nested_tup[0][2])
type(nested_tup[0][2])

99


int

In [20]:
# [1][0] is the index of the first element 
# of (tuple 't' which is) the second element of the tuple 'nested_tup'
print(nested_tup[1][0])
type(nested_tup[1][0])

ACT/365


str

In [21]:
# [1][1] is the index of the second element 
# of (tuple 't' which is) the second element of the tuple 'nested_tup'
print(nested_tup[1][1])
type(nested_tup[1][1])

ACT/360


str

Ok you have understood how it works... This is actually a general rule, that applies to both Tuples (`tuple`), Lists (`list`) and Numpy arrays (`numpy.ndarray`, we'll talk about these in a future notebook), that is it applies to all the three basic sequence-like data-structures used in Python.

If a sequence-like data structure, say `seq`, has nested sequence-like elements, then

```python
seq[i][j]
```

will point to the element of index `j` of the element of index `i` of `seq`.

**Warning**: 

- if you try to refer to an index that does not correspond to any element of the data structure (or of its nested data-structures, if any), Python interpreter shall raise an _out of range_ `IndexError`

In [22]:
# produces: IndexError: tuple index out of range 
# because index 3 would refer to the 4th element of nested_tup, that does not exist.

# nested_tup[3]   

In [23]:
# produces: IndexError: list index out of range
# because index 3 would refer to the 4th element of nested_tup[0] (i.e. list 'l'), that does not exist

# nested_tup[0][3]

In [24]:
# produces: IndexError: tuple index out of range
# because index 2 would refer to the 3rd element of nested_tup[1] (i.e. tuple 't'), that does not exist

# nested_tup[1][2] 

- if you try to refer with an index to an element that is not indexable (like Integers, Floats,...), Python interpreter shall raise an _object is not subscriptable_ `TypeError`

In [25]:
nested_tup[2]

100

In [26]:
# produces: TypeError: 'int' object is not subscriptable
# because we are trying to refer to the first element of nested_tup[2] (i.e. integer 100), 
# that, in poor words, does not have any element inside and thus doesn't admit indexing.

# nested_tup[2][0]

Getting back to our nested tuple, you can modify only its mutable nested elements:

In [47]:
nested_tup

([87, 100, 99], ('ACT/365', 'ACT/360'), 100)

In [48]:
nested_tup[0][1] = 98
nested_tup

([87, 98, 99], ('ACT/365', 'ACT/360'), 100)

In [49]:
nested_tup[0].append(75)
nested_tup

([87, 98, 99, 75], ('ACT/365', 'ACT/360'), 100)

but you cannot assign new values to them (even if they are mutable)

**Warning**: even if a nested tuple contains a list, which is a _mutable_ data-structure, you cannot modify the elements of the list which is inside the tuple, since this would contrast with the immutability of the tuple

In [27]:
nested_tup

([87, 100, 99], ('ACT/365', 'ACT/360'), 100)

In [29]:
nested_tup[0][2]

99

In [30]:
nested_tup[0][2] = 98

In [32]:
nested_tup

([87, 100, 98], ('ACT/365', 'ACT/360'), 100)

**Warning**: even if a nested tuple contains a list, which is a _mutable_ data-structure, you cannot modify the elements of the list which is inside the tuple, since this would contrast with the immutability of the tuple

In [27]:
nested_tup

([87, 100, 99], ('ACT/365', 'ACT/360'), 100)

In [29]:
nested_tup[0][2]

99

In [30]:
nested_tup[0][2] = 98

In [32]:
nested_tup

([87, 100, 98], ('ACT/365', 'ACT/360'), 100)

but you cannot explicitly modify elements of the tuple, even if themselves are mutable (yes, I agree with you, this is a bit strange). That is, an explicit redefinition of the element `nested_tup[0]` of `nested_tup` like this:
```python
nested_tup[0] = [67, 89]
```
would produce:
```python
TypeError: 'tuple' object does not support item assignment
```

In [51]:
# nested_tup[0] = [67, 89]

And of course you cannot modify nested elements which are immutable themselves:

In [54]:
print(nested_tup[1])
type(nested_tup[1])

('ACT/365', 'ACT/360')


tuple

In [56]:
# nested_tup[1][0] = "ACT/360" # TypeError: 'tuple' object does not support item assignment

# 2. Lists <a name="list"></a>

[Lists](https://docs.python.org/3.7/tutorial/introduction.html#lists) consists of a number of values - in general, of heterogeneous data-type - packed together in a mutable sequence and separated by commas between square brackets. 

Lists are very versatile data structures, since they offer flexibility (since they are mutable) and feature several built-in methods that can speed up coding. 

Lists are defined with square brackets `[]` surrounding the `,`-separated sequence.

In [73]:
lis = [1, 0.35, "GBP"]

print(lis)
type(lis)

[1, 0.35, 'GBP']


list

The number of elements is easily retrieved by the `len()` function:

In [74]:
len(lis)

3

Lists share a lot of properties with other sequence-like data-structures. In particular, they share _zero-based_ indexing and slicing with Strings and Tuples.

In [75]:
# 0 is the index of the first element of the list
print(lis[0])
type(lis[0])

1


int

In [76]:
# -1 is the index of the last element of the list
print(lis[-1])
type(lis[-1])

GBP


str

Here is how to slice a list (yes, always the same way):

In [77]:
lis_slice = lis[0:2] # elements from position 0 (included) to 2 (excluded)

print(lis_slice)
type(lis_slice)

[1, 0.35]


list

In [78]:
lis[2:5] # elements from position 2 (included) to 5 (excluded)

['GBP']

In [79]:
lis[:2]   # elements from the beginning to position 2 (excluded) --- equivalent to s[0:2]

[1, 0.35]

In [80]:
lis[-2:]  # elements from the second-last (included) to the end

[0.35, 'GBP']

Differently from strings and tuples, lists are _mutable_ objects.

In [81]:
lis[0] = 17
lis

[17, 0.35, 'GBP']

For details on built-in methods see [5.1. More on Lists](https://docs.python.org/3.7/tutorial/datastructures.html#more-on-lists) of the Python tutorial. In particular, two particularly useful built-in methods are worth of mention:
- `list.append(x)`: which appends element `x` to the end of the list, extending it

In [82]:
lis.append('EUR')
lis

[17, 0.35, 'GBP', 'EUR']

- `list.sort()`: that sorts in ascending order a list.

Notice that the list to be sorted must have elements of homogenous data-type, otherwise the interpreter will complain, as in this case:

```python
TypeError: '<' not supported between instances of 'str' and 'float'
```

In [96]:
# lis.sort()

but the sorting will work if we define a new string as the `[17, 0.35]` slice of the original one

In [97]:
lis_slice = lis[:2]  # [17, 0.35]
lis_slice.sort()
lis_slice

[0.35, 17]

## 2.1. Nested Lists <a name="nested_list"></a>

Lists can nest other data structures, both _mutable_ objects (such as other lists) and/or _immutable_ objects (such as tuples).

In [100]:
l = [87, 100, 99]          # a list
t = ("ACT/365", "ACT/360") # a tuple

nested_lis = [l, t, 100]

print(nested_lis)
type(nested_lis)

[[87, 100, 99], ('ACT/365', 'ACT/360'), 100]


list

As we have seen, elements of `tup` can be accessed through indexing:

In [101]:
print(nested_lis[0])
type(nested_lis[0])

[87, 100, 99]


list

In [102]:
print(nested_lis[1])
type(nested_lis[1])

('ACT/365', 'ACT/360')


tuple

In [103]:
print(nested_lis[2])
type(nested_lis[2])

100


int

You can as well access elements of list `l` and tuple `s` using a nested-indexing syntax:

In [104]:
# [0][0] is the index of the first element 
# of (list 'l' which is) the first element of the list 'nested_lis'
print(nested_lis[0][0]) 
type(nested_lis[0][0])

87


int

In [105]:
# [0][2] is the index of the third element 
# of (list 'l' which is) the first element of the list 'nested_lis'
print(nested_lis[0][2])
type(nested_lis[0][2])

99


int

In [106]:
# [1][0] is the index of the first element 
# of (list 't' which is) the second element of the list 'nested_lis'
print(nested_lis[1][0])
type(nested_lis[1][0])

ACT/365


str

In [107]:
# [1][1] is the index of the second element 
# of (tuple 't' which is) the second element of the list 'nested_lis'
print(nested_lis[1][1])
type(nested_lis[1][1])

ACT/360


str

Ok you have understood how it works... This is actually a general rule, that applies to all the sequence-like data structures: Tuples (`tuple`), Lists (`list`) but also Numpy arrays (`numpy.ndarray`) that will be introduced in a future notebook.

If a sequence-like data structure, say `seq`, has nested sequence-like elements, then

```python
seq[i][j]
```

is the element of index `j` of the element `seq[i]` of index `i` of `seq`. That is, `seq[i][j]` is the $(j+1)$-th element of the $(i+1)$-th element `seq[i]` of `seq`.

**Warning**: 

- if you try to refer to an index that does not correspond to any element of the data structure (or of its nested data-structures, if any), Python interpreter shall raise an _out of range_ `IndexError`

In [22]:
# produces: IndexError: list index out of range 
# because index 3 would refer to the 4th element of nested_lis, that does not exist.

# nested_lis[3]   

In [23]:
# produces: IndexError: list index out of range
# because index 3 would refer to the 4th element of nested_lis[0] (i.e. list 'l'), that does not exist

# nested_lis[0][3]

In [110]:
# produces: IndexError: tuple index out of range
# because index 2 would refer to the 3rd element of nested_lis[1] (i.e. tuple 't'), that does not exist

# nested_lis[1][2] 

- if you try to refer with an index to an element that is not indexable (like Integers, Floats,...), Python interpreter shall raise an _object is not subscriptable_ `TypeError`

In [108]:
nested_lis[2]

100

In [109]:
# produces: TypeError: 'int' object is not subscriptable
# because we are trying to refer to the first element of nested_lis[2] (i.e. integer 100), 
# that, in poor words, does not have any element inside and thus doesn't admit indexing.

# nested_lis[2][0]

## 2.1. for loop <a name="for"></a>

# 3. Dicts <a name="dict"></a>

# 4. Sets <a name="set"></a>