<center> 
# R406: Using Python for data analysis and modelling

<br> <br> 

## Lecture 2: Basic Data Structures in Python

<br>

<center> **Andrey Vassilev**

<br> 

<center> **2016/2017**
 

# Outline

1. Basic Python types (some familiar): int, float, complex, bool, str, NoneType
2. Mutable and immutable types
3. Lists
 - indexing,slicing, concatenation, appending an element, deleting an element
 - identity and membership operations
 - variables as pointers
4. Tuples
 - unpacking
5. Sets
 - logical operations on sets; using sets to get unique elements; 
 
Type conversions: see where it fits best

# Review (and some extensions)

Last time we learned about some Python types and the corresponding assignment operations: integer types (`intvar=5`), floats (`floatvar=5.0`) and strings (`strvar="five"`).

A frequently encountered operation is to update a variable according to some rule (e.g. increment its value by 1).

It can be done as follows:

In [None]:
x=3
print("First x equals %d."%(x))
x=x+1
print("Then x equals %d."%(x))

The last operation is used often in practice, so Python (like other languages) provides a set of special update operations (augmented assignment operators). The syntax for the preceding example would be 

`x+=1` 

Try it out below!

In [None]:
x+=1
print(x)

There are similar update operations for the main binary operators:

`-=    *=    /=    **=   //=   %=`

Take a few minutes to try them out as well. Be sure to check how they work on strings.

# Explicit conversion operations

There are functions that can explicitly cast a variable to another type.

For instance, this converts ints to floats:

In [None]:
x=1
print(x) # Try also type(x) but temporarily comment out the commands below
x=float(x)
print(x) # Check the type again

This converts ints to strings:

In [None]:
x=1
print(x) # check the type
x=str(x)
print(x) # check the type again

## Exercise

Define a **float** variable and try to add it to the string "one hundred" to check that it raises an error. Convert the variable to string and repeat the operation. What is the outcome?

In [None]:
# Use this cell to perform the exercise


Strings can be converted to ints or floats under certain conditions:

In [None]:
y="1.1"
z1=float(y) # Replace the assignment on this line with z1=int(y). 
# Does it work? After that, try z1=int(float(y)).
# Does the conversion z1=int(y) work if we assign y="1"? What if we use y="spam"?
print(z1+1)

# Basic Python types (scalar)

There are six basic scalar types in Python:
- int
- float
- str
- complex
- bool
- NoneType

We already saw the first three in action, so let's turn to the other three.

# Complex numbers

Complex numbers can be defined in two alternative ways:

`z = 1.0 + 2.1j`

`z=complex(1.0, 2.1)`

In [None]:
z1 = 1.0 + 2.1j # Try z1 = 1.0 + 2.1 j (with a space)
print(z1)

In [None]:
type(z1)

In [None]:
z2=complex(3.0,2.0)

One can derive the basic characteristics of complex numbers as follows:

In [None]:
print("The real part of z1 is %.2f"%(z1.real))
print("The imaginary part of z1 is %.2f"%(z1.imag))
print("z2 is", z2, "and the complex conjugate of z2 is", z2.conjugate()) 
# Notice the parentheses after conjugate
print(abs(z1), abs(z2))

The basic operations on complex numbers go through directly:

In [None]:
print(z1+z2)
print(z1-z2)
print(z1*z2)
print(z1/z2)
print(z1**3)

# NoneType

Python also has a special type (`NoneType`) to capture the idea of an absent value. It has only one possible value: `None`.

This type is useful for instance as the default return value of a function than is called for side effects.

In [None]:
type(None)

In [None]:
print_return_value = print("abc") # Catch the return value of print(if any)
print(print_return_value)

# Booleans

The Boolean type has two possible values: `True` and `False`. (Notice that the capitalization matters here, as well as in Python generally.)

Boolean values are important because they are returned by *comparison operations* and they are the building block of *logical (Boolean) operations*.

# Comparison operations

|Operation | Description                  |
|----------| -----------------------------|
|a == b    | a equal to b                 |
|a != b    | a not equal to b             |
|a < b     | a less than b                |
|a > b     | a greater than b             |
|a <= b    | a less than or equal to b    |
|a >= b    | a greater than or equal to b |

# Comparison operations in action

In [None]:
x=1
y=2
z=1.0
print(x==y)
print(x==z)
print(x>y)
print(z>=x)
print(z!=y)

# Logical (Boolean) operations

Boolean operations allow us to combine the results of separate comparison operations and evaluate the final outcome (which is also a Boolean value).

Boolean operations are **`and`**, **`or`** and **`not`**.

| Operation            | Outcome | Operation            | Outcome | Operation       | Outcome |
|--------------------- | --------| -------------------- | --------| --------------- | ------- |
| True and True   | True    | True or True   | True    | not True  | False   |
| True and False  | False   | True or False  | True    | not False | True    |
| False and True  | False   | False or True  | True    |                 |         |
| False and False | False   | False or False | False   |                 |         |

# Boolean operations in action

In [None]:
x=1.1;   y=2.2;       z=1.1
# Note that whitespace does not matter in the above statements
print(x==z)
print(not (x==1.1))
print(not x==1.1) # Same as the previous one
print(x>0 and x>1)
print(y<0 or y>2.0)
print(x>0 and not x>1)

# Type conversion for Booleans

The conversion command is `bool()`. Try it in the examples listed below:

In [None]:
y=1
# y=0
# y=-1.1
# y=0.0
# y=0.001
# y="spam"
# y="eggs"
# y=""
# y=None
bool(y)

# Data structures (containers)

Data structures provide a way to combine the simple types we saw up to here. (Think of a situation in which you need to store a sequence of results.)

The basic data structures we shall look at are:
* list
* tuple
* dict
* set

# Lists

Lists are an ordered collection structure.

They are defined by using square brackets and separating the elements with commas:

In [None]:
L1 = [1,2,3]
type(L1)

Lists can contain different types of elements and can also be heterogeneous:

In [None]:
L1 = [1.1, 2.2, 3.3]
L2 = ["one", "two", "three", "four"]
L3 = [1, 2.0, "three", 4.0 + 0j]
L4 = [1, 2, [3 ,4]] # They can even contain other lists
L5 = [] # This defines an empty list
print(L1) # Print the rest as well

# Properties of lists

Check the statements below and try changing each of them to see how they work.

In [None]:
L = [5,2,1,3]
len(L) # The len() function returns the list length (=number of elements)
L.append(6) # The append() method adds an element at the end of the list
print(L + ["one","two"]) # The + operation performs concatenation
L.sort() # The sort() method sorts the list in-place
L.reverse() # Reverses the list in-place
L

# List indexing

List elements can be accessed by referring to the position of the respective element in the list i.e. we can execute commands like "Get the first element." or "Set the fifth element equal to 5." An element is accessed as follows: `L[5]`

**Indexing in Python starts from 0!!!**     

(If that bothers you, check out https://xkcd.com/163/)

In [None]:
L = [-1,1,3,6,8]
L[0] # The first element

In [None]:
L[4] # As a consequence, the n-th element is accessed as L[n-1]
# Try L[len(L)-1]
# Try L[5] to raise an 'index out of range' error

You can also set an element of a list:

In [None]:
L[1] = 2
# Try L[0] = "spam" and L[3] = None
L

There is special syntax for referring to the end of the list:
- The last element is `L[-1]`
- The second-to-last element is `L[-2]`
- ... etc.

In [None]:
L[-1]
# Try also:
# L[-2]
# n=2; print(L[-n]) # You can index by an integer variable
# L[-len(L)]
# L[-len(L)] = 0

# List slicing

We often need to access a range of elements in a list (e.g. from the second to the fifth element). *List slicing* provides a way to do that.

The syntax is of the form L[m:n], where the m-th element is included, while the n-th element is not.

In [None]:
L = [-1,1,3,6,8]
L[0:3]
# L[2:4]
# L[2:5] # This works (because the upper bound is not inclusive)... 
         # ... even though L[5] will raise an error
# L[5:2] # This will return an empty list

Leaving out the first index defaults to zero:

In [None]:
L[:3] # Same as L[0:3]

Leaving out the second index defaults to the last element:

In [None]:
L[2:] # Same as L[2:6]
# Check whether L[2:-1] yields the same answer. Why?

We can also specify a third integer to denote a step size (i.e. skipping by **k** elements):

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[0:7:2]
# L[:7:2] # Same as previous one
# L[::3] # Step by three
# L[1::2]

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[6:2:-1] # Note we start with the higher-valued index
# L[2:6:-1] # Similar to the example L[5:2] we saw earlier
# L[::-1] # This simply produces a reversed copy of the list
# L[::-2]
L

We can also use slicing in assignments:

In [None]:
L = [1,2,3,4,5,6,7,8,9]
L[0:3] = [11,22,33]
# L[:3] = [-1,-2,-3]
# L[::2] = 666 # Try to set every other element equal to 666. What's the error?
# L[::2] = [666]*len(L[::2]) # Now try this
L

# Mutable and immutable types

Python types can be *mutable* or *immutable*. When a mutable object is created, we can change its contents afterwards. As the name suggests, immutable objects cannot be changed once created.

Among the Python types mentioned so far, the *immutable* ones are:
- the numeric types: int, float, complex
- string
- tuple

The *mutable* types we have seen (or at least heard of) are:
- list
- dict
- set

To illustrate (im)mutability in practice, consider the following:

In [None]:
L = [7,8,9,10,11]
print(L[0])
print(L[0:4])
L[0]=200
print(L)

In [None]:
S = "This is an immutable string."
print(S[0]) # The string can be indexed and sliced just like a list
print(S[0:4])
# However, uncomment and try to run the line below
# S[0] = "t"

Despite the above problem with changing the first letter of `S`, we can still do the following:

In [None]:
S = "This is an immutable string."
print(S)
S = "this is an immutable string."
print(S)
S = "...but doesn't it change?"
print(S)

What's going on? 

We appear to be unable to change a part of the string, yet we can change it as a whole...

# Variables as pointers

- The behaviour observed in the string example is due to a specificity of Python variables. 
- Unlike other languages, a variable in Python is not a fixed container for a predefined kind of data but a pointer that can dynamically be redirected to point to various types of objects. 
- For this reason the following is legitimate:

In [116]:
x=1
x=[1,2,3]
x="A random string"
# Good luck pulling this trick in C or Java

In the last example, the integer `1` and the string `"A random string"` are immutable, while the list `[1,2,3]` is mutable. The variable `x` can be changed to point to any of these without a problem. A problem will arise if we try to modify the string `"A random string"` itself.

Now it is easy to see that in the previous example we were not modifying any strings but sequentially reassigning `S` to point to different strings!

The discussion of variables as pointers might seem technical and of little practical relevance. However, a sound understanding of the nature of Python variables is crucial to avoid making hard-to-detect mistakes in your code.

Here is an example of what may go wrong:

In [None]:
x=[1,2]
y=x
x.append(3)
y.append(4)
print(x)
print(y)

# Identity operations

Building on the idea of variables as pointers, we can use the **identity operations** in Python to establish whether two variables are identical or merely their contents is the same.

The identity operations in Python are:
* **`is`**
* **`is not`**

In [None]:
x=[1,2]
y=x
z=[1,2]
print(x is y)
print(x is z)
print(x is not z) # x and z are not identical...
print(x==y)
print(x==z) # ... yet their contents is the same

# Tuples

# Dicts

# Sets

# Memebership operations