<center>
    
# R406: Applied Economic Modelling with Python

</center>

<br> <br> 

<center>

## Basic Data Structures in Python

</center>

<br><br> 

<center>
<b> Andrey Vassilev </b>
</center>

 

# Outline

1. Basic Python types (some familiar): int, float, complex, bool, str, NoneType
2. Lists
3. Tuples
4. Dictionaries
5. Sets

# Review (and some extensions)

Last time we learned about some Python types and the corresponding assignment operations: integer types (`intvar = 5`), floats (`floatvar = 5.0`) and strings (`strvar = "five"`).

A frequently encountered operation is to update a variable according to some rule (e.g. increment its value by 1).

It can be done as follows:

In [None]:
x = 3
print(f"First x equals {x}.")
x = x+1
print(f"Then x equals {x}.")

The last operation is used often in practice, so Python (like other languages) provides a set of special update operations (augmented assignment operators). The syntax for the preceding example would be 

`x += 1` 

Try it out below!

In [None]:
x += 1
print(x)

There are similar update operations for the main binary operators:

`-=    *=    /=    **=   //=   %=`

Take a few minutes to try them out as well. Be sure to check how they work on strings.

In [None]:
# Try them out here

# Explicit conversion operations

There are functions that can explicitly cast a variable to another type.

For instance, this converts ints to floats:

In [None]:
x = 1
print(x) # Try also type(x) but 
         # temporarily comment out the commands below
x = float(x)
print(x) # Check the type again

This converts ints to strings:

In [None]:
x = 1
print(x) # check the type
x = str(x)
print(x) # check the type again

## Exercise

Define a **float** variable and try to add it to the string "one hundred" to check that it raises an error. Convert the variable to string and repeat the operation. What is the outcome?

In [None]:
# Use this cell to perform the exercise


Strings can be converted to ints or floats under certain conditions:

In [None]:
y = "1.1"
z1 = float(y) # Replace the assignment on this line with z1=int(y). 
# Does it work? After that, try z1 = int(float(y)).
# Does the conversion z1 = int(y) work if we assign y = "1"? 
# What if we use y = "spam"?
print(z1+1)

# Basic Python types (scalar)

There are six basic scalar types in Python:
- int
- float
- str
- complex
- bool
- NoneType

We already saw the first three in action, so let's turn to the other three.

# Complex numbers

Complex numbers can be defined in two alternative ways:

`z = 1.0 + 2.1j`

`z = complex(1.0, 2.1)`

In [None]:
z1 = 1.0 + 2.1j # Try z1 = 1.0 + 2.1 j (with a space)
print(z1)

In [None]:
type(z1)

In [None]:
z2=complex(3.0,2.0)

One can derive the basic characteristics of complex numbers as follows:

In [None]:
print(f"The real part of z1 is {z1.real:.2f}.")
print(f"The imaginary part of z1 is {z1.imag:.2f}.")
print(f"z2 is {z2} and the complex conjugate of z2 is {z2.conjugate()}.") 
# Notice the parentheses after conjugate
print(abs(z1), abs(z2))

The basic operations on complex numbers go through directly:

In [None]:
print(z1+z2)
print(z1-z2)
print(z1*z2)
print(z1/z2)

# NoneType

Python also has a special type (`NoneType`) to capture the idea of an absent value. It has only one possible value: `None`.

This type is useful for instance as the default return value of a function than is called for side effects.

In [None]:
type(None)

In [None]:
print_return_value = print("abc") # Catch the return value of print
                                  # (if any)
print(print_return_value)

# Booleans

The Boolean type has two possible values: `True` and `False`. (Notice that the capitalization matters here, as well as in Python generally.)

Boolean values are important because they are returned by *comparison operations* and they are the building block of *logical (Boolean) operations*.

# Comparison operations

|Operation | Description                  |
|----------| -----------------------------|
|a == b    | a equal to b                 |
|a != b    | a not equal to b             |
|a < b     | a less than b                |
|a > b     | a greater than b             |
|a <= b    | a less than or equal to b    |
|a >= b    | a greater than or equal to b |

# Comparison operations in action

In [None]:
x = 1
y = 2
z = 1.0
print(x==y)
print(x==z)
print(x>y)
print(z>=x)
print(z!=y)

# Logical (Boolean) operations

Boolean operations allow us to combine the results of separate comparison operations and evaluate the final outcome (which is also a Boolean value).

Boolean operations are **`and`**, **`or`** and **`not`**.

| Operation            | Outcome | Operation            | Outcome | Operation       | Outcome |
|--------------------- | --------| -------------------- | --------| --------------- | ------- |
| True and True   | True    | True or True   | True    | not True  | False   |
| True and False  | False   | True or False  | True    | not False | True    |
| False and True  | False   | False or True  | True    |                 |         |
| False and False | False   | False or False | False   |                 |         |

# Boolean operations in action

In [None]:
x = 1.1;   y = 2.2;       z=1.1
# Note that whitespace does not matter in the above statements
print(x==z)
print(not (x==1.1))
print(not x==1.1) # Same as the previous one
print(x>0 and x>1)
print(y<0 or y>2.0)
print(x>0 and not x>1)

There exists shortcut notation to express this idea:

In [None]:
x = 5
x>0 and x<10

We can write this more compactly as:

In [None]:
0 < x < 10

# Type conversion for Booleans

The conversion command is `bool()`. Try it in the examples listed below:

In [None]:
y = 1
# y = 0
# y = -1.1
# y = 0.0
# y = 0.001
# y = "spam"
# y = "eggs"
# y = ""
# y = None
bool(y)

# Data structures (containers)

Data structures provide a way to combine the simple types we saw up to here. (Think of a situation in which you need to store a sequence of results.)

The basic data structures we shall look at are:
* list
* tuple
* dict
* set

# Lists

Lists are an ordered collection structure.

They are defined by using square brackets and separating the elements with commas:

In [None]:
L1 = [1,2,3]
type(L1)

Lists can contain different types of elements and can also be heterogeneous:

In [None]:
L1 = [1.1, 2.2, 3.3]
L2 = ["one", "two", "three", "four"]
L3 = [1, 2.0, "three", 4.0 + 0j]
L4 = [1, 2, [3 ,4]] # They can even contain other lists
L5 = [] # This defines an empty list
print(L1) # Print the rest as well

# Properties of lists

Check the statements below and try changing each of them to see how they work.

In [None]:
L = [5,2,1,3]
# len(L) # The len() function returns 
#        # the list length (=number of elements)
# L.append(6) # The append() method adds an element at the end of the list
# print(L + ["one","two"]) # The + operation performs concatenation
# L.sort() # The sort() method sorts the list in-place
# L.reverse() # Reverses the list in-place
L

# List indexing

List elements can be accessed by referring to the position of the respective element in the list i.e. we can execute commands like "Get the first element." or "Set the fifth element equal to 5." An element is accessed as follows: `L[5]`

**Indexing in Python starts from 0!!!**     

(If that bothers you, check out https://xkcd.com/163/)

In [None]:
L = [-1,1,3,6,8]
L[0] # The first element

In [None]:
L[4] # As a consequence, the n-th element is accessed as L[n-1]
# Try L[len(L)-1]
# Try L[5] to raise an 'index out of range' error

You can also set an element of a list:

In [None]:
L[1] = 2
# Try L[0] = "spam" and L[3] = None
L

There is special syntax for referring to the end of the list:
- The last element is `L[-1]`
- The second-to-last element is `L[-2]`
- ... etc.

In [None]:
L[-1]
# Try also:
# L[-2]
# n = 2; print(L[-n]) # You can index by an integer variable
# L[-len(L)]
# L[-len(L)] = 0

It is also possible to delete an element of a list using the `pop()` method and the respective index:

In [None]:
L = [1,2,3,4,5]
L.pop(0)
# temp = L.pop(1) # The pop() method deletes 
                  # and returns the removed value. 
                  # Try print(temp).
# L.pop(-1)

# List slicing

We often need to access a range of elements in a list (e.g. from the second to the fifth element). *List slicing* provides a way to do that.

The syntax is of the form L[m:n], where the m-th element is included, while the n-th element is **not**.

In [None]:
L = [-1,1,3,6,8]
L[0:3]
# L[2:4]
# L[2:5] # This works (because the upper bound is not inclusive)... 
         # ... even though L[5] will raise an error
# L[5:2] # This will return an empty list

Leaving out the first index defaults to zero:

In [None]:
L[:3] # Same as L[0:3]

Leaving out the second index defaults to the last element:

In [None]:
L[2:] # Same as L[2:6]
# Check whether L[2:-1] yields the same answer. Why?

We can also specify a third integer to denote a step size (i.e. skipping by **k** elements):

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[0:7:2]
# L[:7:2] # Same as previous one
# L[::3] # Step by three
# L[1::2]

The step argument can be used to traverse the list backwards or reverse it:

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[6:2:-1] # Note we start with the higher-valued index
# L[2:6:-1] # Similar to the example L[5:2] we saw earlier
# L[::-1] # This simply produces a reversed copy of the list
# L[::-2]

We can also use slicing in assignments:

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[0:3] = [11,22,33]
# L[:3] = [-1,-2,-3]
# L[::2] = 666 # Try to set every other element equal to 666. 
               # What's the error?
# L[::2] = [666]*len(L[::2]) # Now try this.
# L[5:] = [] # We can also delete a part of a list using a slice.
L

# Mutable and immutable types

Python types can be *mutable* or *immutable*. When a mutable object is created, we can change its contents afterwards. As the name suggests, immutable objects cannot be changed once created.

Among the Python types mentioned so far, the *immutable* ones are:
- the numeric types: int, float, complex
- string
- tuple

The *mutable* types we have seen (or at least heard of) are:
- list
- dictionary (dict)
- set

To illustrate (im)mutability in practice, consider the following:

In [None]:
L = [7,8,9,10,11]
print(L[0])
print(L[0:4])
L[0] = 200
print(L)

In [None]:
S = "This is an immutable string."
print(S[0]) # The string can be indexed and sliced just like a list
print(S[0:4])
# However, uncomment and try to run the line below
# S[0] = "t"

Despite the above problem with changing the first letter of `S`, we can still do the following:

In [None]:
S = "This is an immutable string."
print(S)
S = "this is an immutable string."
print(S)
S = "...but doesn't it change?"
print(S)

What's going on? 

We appear to be unable to change a part of the string, yet we can change it as a whole...

# Variables as pointers

- The behaviour observed in the string example is due to a specificity of Python variables. 
- Unlike other languages, a variable in Python is not a fixed container for a predefined kind of data but a pointer that can dynamically be redirected to point to various types of objects. 
- For this reason the following is legitimate:

In [None]:
x = 1
x = [1,2,3]
x = "A random string"
# Good luck pulling this trick in C or Java

In the last example, the integer `1` and the string `"A random string"` are immutable, while the list `[1,2,3]` is mutable. The variable `x` can be changed to point to any of these without a problem. A problem will arise if we try to modify the string `"A random string"` itself.

Now it is easy to see that in the previous example we were not modifying any strings but sequentially reassigning `S` to point to different strings!

The discussion of variables as pointers might seem technical and of little practical relevance. However, a sound understanding of the nature of Python variables is crucial to avoid making hard-to-detect mistakes in your code.

Here is an example of what may go wrong:

In [None]:
x = [1,2]
y = x
x.append(3)
y.append(4)
print(x)
print(y)

And here is how we can get around this behaviour by creating an explicit copy of the object:

In [None]:
x = [1,2]
y = x.copy()
x.append(3)
y.append(4)
print(x)
print(y)

# Identity operations

Building on the idea of variables as pointers, we can use the **identity operations** in Python to establish whether two variables are identical or merely their contents is the same.

The identity operations in Python are:
* **`is`**
* **`is not`**

**Note:** The Python documentation lists these as comparison operations. We conform to the VanderPlas book taxonomy.

In [None]:
x = [1,2]
y = x
z = [1,2]
print(x is y)
print(x is z)
print(x is not z) # x and z are not identical...
print(x==y)
print(x==z) # ... yet their contents is the same

# Tuples

Tuples can loosely be thought of as an immutable version of lists. They are commonly defined by using parentheses instead of square brackets.

Tuples can be indexed and sliced just like lists. The `len()` function works the same way.

In [None]:
t = ("one","two","three","four")
print(len(t))
print(t[0])
print(t[-1])
print(t[:2])
# t[0]="uno" # This doesn't work as tuples are immutable

Tuples can also be created without parentheses, except for creating an empty tuple:

In [None]:
t = "one","two","three","four"
print(t)
type(t)
# t=() # Here the parentheses are required

One-element tuples are created with a final comma (inside the parentheses if they are used):

In [None]:
t = (1,)
print(t); type(t)
# Be careful! The line below also creates a tuple:
# t = 2,

Tuples are commonly used when a function needs to return multiple values.

As an example, the method `as_integer_ratio()` of a float returns a tuple:

In [None]:
x = 1/8
print(x)
x.as_integer_ratio()

Tuples have a convenient feature that is known as **tuple unpacking**:

In [None]:
num,denom = x.as_integer_ratio()
print(f"num = {num}"); print(f"denom = {denom}")
# a,b,c = ("A","B","C")
# Incidentally, unpacking works on lists as well

# Dictionaries (dicts)

Sometimes the indexing of lists (by the integer index) is not convenient enough and more flexible indexing facilities are desirable. Dictionaries cater for that need.

Dicts have **keys** that map to **values**. They are defined as comma-separated `key:value` pairs enclosed in curly braces. 

In [None]:
numbers = {'one':1, 'two':2, 'three':3}
type(numbers)

Dicts are unordered (in principle, with a caveat for Python), mutable and their elements can be accessed via the respective keys:

In [None]:
numbers['three']

In [None]:
# This adds a new key-value pair to the dict
numbers["four"] = 4
numbers

The following illustrates that dicts are unordered (we cannot access their elements by position):

In [None]:
numbers[0]

Do not confuse with the following key definitions, where we happen to choose the keys to be integers:

In [None]:
nums = {0:"zero",1:"uno",3:"trois"}
print(nums[0])
print(nums)
nums[-200] = -200 # This will throw an 'index out of range' 
                  # error for a three-element list
print(nums[-200])
nums["five"] = 5.0 # Notice that keys and values 
                   # do not have to be of the same type
print(nums["five"])

The dict keys can be accessed like this (note that it is an object of type `dict_keys`):

In [None]:
nums.keys()

The dict values can be accessed like this (an object of type `dict_values`):

In [None]:
nums.values()

These are not useable yet but keep them in mind, they'll come in handy later.

An empty dict can be created in the following manner:

In [None]:
d={}
type(d)

Dict keys are required to be immutable objects:

In [None]:
d = {}
d["name"] = "John"
d[(1,2)] = 666 # A tuple (immutable) can serve as a dict key
print(d.keys())
# Now uncomment and try this one with a mutable object as key:
# d[[3,4]] = 777
# As well as this one, just to be sure:
# d[{"another":"attempt"}] = "FAIL"

# Sets

- Sets are unordered collections of unique items. They correspond to the usual idea known from mathematics.
- Sets are defined like lists and tuples (i.e. using commas to separate the elements) but with curly braces.
- The last convention means that we can visually distinguish sets from dicts by checking whether they contain key-value pairs or not.

In [None]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
type(primes)
# Try primes[0]

 # Set operations
 
 The set operations in Python are the usual set operation from mathematics. They can equivalently be invoked in operator form or using methods.

In [None]:
# SET UNION (= items appearing in either)
print(primes | odds) # with an operator
print(primes.union(odds)) # with a method

In [None]:
# SET INTERSECTION (= items appearing in both)
print(primes & odds) 
print(primes.intersection(odds)) 

In [None]:
# SET DIFFERENCE (= items appearing in primes but not in odds)
print(primes - odds) 
print(primes.difference(odds))
# Try changing the order of the sets to see 
# if the operation works symetrically

In [None]:
# SYMMETRIC DIFFERENCE (= items appearing in only one set)
print(primes ^ odds)
print(primes.symmetric_difference(odds))

## Sets are mutable

We can add elements using `add()` and remove them using `remove()`:

In [None]:
a = {2,3}
print(a)
a.add(1)
print(a)
a.remove(3)
print(a)

Here is how we can obtain the empty set $\emptyset$:

In [None]:
a = {1,2}
b = {3,4}
a & b

The form of this output suggests that there may be alternative ways to define (and convert) sets. (Indeed, given that an assignment like `x={}` creates an empty dict, there must be other ways for sets, as well as other types.)

# More on type definitions and conversions

- A list can also be created using the function `list()`
- A tuple can also be created using the function `tuple()`
- A dict can also be created using the function `dict()`
- A set can also be created using the function `set()`

These functions can be used to create empty instances or to perform type conversions (as long as you do not get carried away).

# More on type definitions and conversions

In [None]:
# This is how we create empty instances
x = list()  # an empty list
y = tuple() # an empty tuple
# etc.

In [None]:
x = [1,2,3]
s_x = set(x)
t_x = tuple(x)
print(s_x, t_x)

In [None]:
# This will fail
d_x = dict(x)

In [None]:
# However, this works:
di = dict(x=1,y=2,z=3)
di

In [None]:
# We can also performs other conversions
S = {1,2,3,5}
L = list(S)
print(L)
T = tuple(S)
print(T)

In [None]:
# Note the following:
di = dict(x=1,y=2,z=3)
L = list(di) # Python <3.6: returns the keys in random order 
             # but we are (more) likely to be interested in the values
# Python >= 3.6: The insertion order is preserved
print(L)
# It is better to be explicit (Python <3.6: the random order problem will persist):
Lk = list(di.keys()); print("Lk =",Lk)
Lv = list(di.values()); print("Lv =",Lv)

# Using sets to get unique elements

Sets can be used to extract unique elements as follows:

In [None]:
L = [1,2,1,3,4,5,3,7,8]
Lnew = list(set(L))
print(Lnew)
# Or, perhaps:
# Lnew.sort()
# print(Lnew)

# Memebership operations

We can check for membership using the operations:
- **`in`**
- **`not in`**

It is done like this:

In [None]:
L = [1,2,1,3,4,5,3,7,8] # Try to see how it works for other types
2 in L

In [None]:
100 in L

In [None]:
100 not in L