# 8. Compound data types

This will be about non-scalar data types. The most important built-in data types are `list`, `str`, `dict`, `generators` and `set`. For time constraints we'll only cover lists, dictionaries and strings.

## 8.1 Lists

Lists are very flexible data types as they can hold virtually every Python object. They are also **mutable** which means that you can change them after creation - add or delete items. The closest equivalent in MATLAB would be `cells`. 

They can be constructed like this:

```Python
list_1 = ['a',2,92.1];
```

**Exercise**

Create a list that contains any number of arbitrary objects like `ints`, `functions` and so on.

In [None]:
#your code here


### 8.1.1 Indexing lists

Indexing in lists works we already learned it: 0-based. That means the first element is [0], the Nth element is [N-1]. Slices include the start index and exclude the stop index. Think about the offsets.

In [19]:
a_list = [0,1,2,3,4];

**Exercises**

   1. Select the second value (2) from the list.
   2. Select the subset [2,3,4] from the list.
   3. Replace the value at index 2 with `99`

In [36]:
#your code here


In [37]:
#your code here


In [38]:
#your code here


### 8.1.2 Create lists from other objects

You can use the `list()`-function to create a list from another object like this:
```Python
new_list = list( old_object );

```

You will need this for objects of type `generator`, `filter` and others. You don't have to understand this now, but take a look at how `range()` behaves.

In [26]:
a = range(4);
print(a);
print(type(a));

range(0, 4)
<class 'range'>


Until now this is a `range` object. If you want to use it in readable form, you can transform it to a `list`:

In [27]:
#your code here


### 8.1.3 List concatenation

Because of the syntax it is easy to confuse lists with MATLAB arrays. The behave **completely** different. Consider the following cell, think about what you except and run it:

In [29]:
a = [1,2,3] + [3,2,1];
print(a);

[1, 2, 3, 3, 2, 1]


So yeah. That's how list concatenation works. 

### 8.1.4 Lists are objects

As should be clear by now. That means they have methods you can use to manipulate their content or get informations about it like `list.append()`, `list.remove()`, `list.index()`. We can't cover everything, see [here](https://docs.python.org/3/tutorial/datastructures.html) for the complete list.

In [1]:
a = [sum, 1, 'word'];

**Exercise**

Use the `list.append()` method to append a `float` of your choice (or anything really) to the list `a`. Then remove the `int` 1. Finally retrieve the index of 'word'. 

In [None]:
#your code here


In [30]:
#your code here


In [4]:
#your code here


### 8.1.5 len()

`len()` is a built-in function that's not part of the classes `list` or `tuple`. That means you can't call `my_list.len()`, but have to use the imperative version: `len(my_list)`. It works on lists, tuples, dictionaries. Even data types from third party packages like **pandas** and **numpy**. The length of an object starts with one, not zero. There's not much more to say about it. Try it yourself and get the length of the following objects:

In [7]:
my_list = [1,2,3];
my_tuple = (1,2,3);
my_dict = {'a': 1, 'b': 2};

## 8.2 Tuples

Tuples are the immutable version of lists. that means, once they are created, they can't be changed. Apart from that, they can be used pretty interchangeably with lists. They can also hold every other object. The construction syntax is one of the following:

```Python
my_tuple_1 = (1, 2, 3 );
my_tuple_2 = 1, 2, 3;
```

As you see, the parentheses are added implicitely. If you want to have a tuple of length one, you have to add a comma. Otherwise the parentheses are interpreted in a mathematical sense:

In [11]:
a = (1);
b = (1,);
c = 1,;
for i in [a,b,c]:
    print( type( i ) )

<class 'int'>
<class 'tuple'>
<class 'tuple'>


<br/>

**Exercise**

Create a tuple of an integer, a function and a string. Use indexing to retrieve the string. Try to change one of the elements and find out what **immutability** means.

In [14]:
#your code here


## 8.3 Strings and characters

Strings and characters are the same thing in Python. They are very easy to handle if you're used to MATLAB because they are consideres as data types from the beginning. Just like tuples they are **immutable**, meaning that you can't change them after creation. You can use methods to return manipulated versions though.

You can use either `"double quotes"` or `'single quotes'`, they act the same.

In [30]:
double_quote = "string";
single_quote = 'string';
print( double_quote == single_quote );

True


You can escape a quote in your string with another quote (`'That''s it'`) or by using double quotes for the string and single quotes within (`"What's up"`).

### 8.3.1 String concatenation 

Is the act of joining strings together. It's really just *adding* two strings together, isn't it?

**Exercise**

Find out what I mean by that and concatenate the following strings.

In [33]:
s_1 = 'Well ';
s_2 = 'done!';

In [35]:
#your code here


This only works for strings. If you want to add e.g. numeric types as string, you have to convert them using the `str()`-constructor first.

**Exercise**

Add your age between the two strings:

In [4]:
s_1 = 'I am ';
s_2 = ' years old';

### 8.3.2 Indexing in strings

Works the same as in lists:

In [39]:
my_str = 'abcde';

**Exercise**

Retrieve the letter 'e' using an index.

In [40]:
#your code here


Want to find out what immutable means? Try to replace the 'a' in `my_str` with something else:

In [42]:
#your code here


### 8.3.3 String methods

Since obviously `str`ings are objects, they provide a lot of functionality. Whatever you can think of, there is probably a method for that. If it's not in the `str` class which is built-in, you can import the module `string` which offers even more options. 

Of course we can't cover all methods of `str`, you can find a list of them somewhere in [here]( https://docs.python.org/3/library/stdtypes.html ).

We will have a look at some ways to format strings. So to do what you would do using `sprintf()` or `fprintf()` in MATLAB. Neither of those exist in Python. In Python, you format the string and then just `print()` it.

There are multiple ways to do this and they differ between Python 2 and 3. You'll learn about two now: `f-strings` and the `str.format()`-method.

#### 8.3.3.1 f-strings

f-strings are one way to format strings. The syntax is

```Python
age = 27;
f'I am {age} years old.';
```
You can add whatever objects in curly brackets and the function will try its best to give out sensible output.


**Exercise**

Assign your name, age and the number of your siblings to three variables and use these to format and print a string like this:<br/>
"My name is { }, I'm { } years old and I have { } siblings."

(Or format any other string using variables.)

In [15]:
#your code here


#### 8.3.3.2 The str.format() method

This is the other most common way to format strings. You use curly brackets as placeholders or input. Then in brackets you define the variables to put in. The simplest form ist this:

```Python
age = 27;
age_str = 'I am {} years old.'.format( age ); 
```
If you want a specific format of the input, e.g. integer or 2 decimals, you can do it like this:

```Python
vol_alc = 5.8;
beer_str = 'This beer has an alcohol content of {:.2f}%.'.format( vol_alc );
```

**Exercise**

Format a string to state pi including 10 decimals.

In [12]:
from math import pi
#your code here


That's enough for strings. Of course there is way more to learn, but that will happen on its own in the future.

<br/> 

## 8.4 Dictionaries

Dictionaries are instances of the class `dict`. If `lists` are something like the equivalent to MATLAB `cells`, then `dicts` would be the equivalent to `structs`. They consist of `keys` and `values`. There are several ways to construct them, some more elegant than others. The most basic one is:

```Python
my_dict = { key1: value1, key2: value2, key3: value3};
```

You can use any immutable object as keys. That means you can use a tuple as a key but not a list. Strings, numeric data types and booleans are fine too. Keys are what fieldnames are in structures.

**Exercise**

Describe yourself in a dictionary. You can e.g. add your age, height, favorite movie, income,...

In [17]:
#your code here


### 8.4.1 Indexing dictionaries

Numeric indexes don't work for dictionaries - only in the sense that you can of course use numeric keys and then retrieve the values using this key. The syntax for indexing is:

```Python
value1 = my_dict[ key1 ]; 
```

**Exercise**

Retrieve one of your aspects from the dictionary using the respective key.

In [22]:
#your code here


### 8.4.2 Dict functions

If you only want the keys or the values of a dictionary, e.g. to loop over them, use `dict.keys()` and `dict.values()` for that. If you want to loop over both at the same time, use `dict.items()`.

**Exercise**

Loop over the items ( = keys + values ) of your dictionary. Use this to print a description of yourself to the screen:

In [26]:
#your code here


8.4.3 Dict from multiple iterables

If you have many values, creating dictionaries this way is not really feasible. There are a few more sophisticated ways to create dictionaries. As usually the class definition can be used as a constructor for dictionaries by calling it like a function `dict()`. Since you need values and keys, you can't just pass e.g. a list. 

**Exercise** 

Find out how to use `zip()` to create a dictionaries from the two lists. Use letters as keys and integers as values:

In [28]:
letters = ['a','b','c'];
integers = [1,2,3];
#your code here


<br/>

## 8.5 Generators

Generators are a bit harder to understand. I only cover them here because you will use them in list comprehension. We'll keep it as superficial as possible while still allowing you to use that later.

Generators are objects without a memory. They use the `yield`-function to give out one item at a time, but once it's gone it's gone. This is very memory efficient for objects that you only need once. 

You could understand the `range`-object as generator (although it is none). Range actually constructs something like a list and can be indexed. So a `range(100000)` takes up a lot of space. For looping it would be enough, if it increased its value every iteration and stopped once it reached the maximum value:

In [50]:
class my_range:
     
    def __init__(self, n):
        self.n = n; #up to where do we go?
        self.i = 0; #where are we right now?
        
    def next_value(self):
        self.i += 1;
        if self.i > self.n:
            raise Exception('I''m empty')
        else:
            return self.i-1

In [None]:
a = my_range(5);
for i in range(6):
    b =   a.next_value();
    print(b);

This is not actually how generators are implemented, but you get the idea how they work. 
You can construct `generators` like this:

In [69]:
my_gen = (x for x in range(10) );
print( type( my_gen ) );

<class 'generator'>


So what do I mean with, once they're empty, they're empty? Generators have a function `__next__()` that yields the next value, as long as there is one. 

In [None]:
while True:
    print( my_gen.__next__() );

And now it can't return anything anymore:

In [None]:
my_gen.__next__();

You can transform `generators` into lists if you want to keep all values. This approach uses the same `__next__()` method and thus the generator is still empty afterwards: 

In [None]:
my_gen = (x for x in range(4));
my_list = list( my_gen);
print(my_list);
#doesn't work anymore
my_gen.__next__()

### 8.5.1 List comprehension

This is by far the most common way to use a generator. List comprehensions are a very useful tool and they are used a lot in idiomatic Python. Often, when you would write a loop in MATLAB, you can use list comprehension. Which is also using a `for`-loop but is more efficient and very readable. List comprehension in easy terms is creating a list using a loop:

```Python
new_list = [ x*2 for x in range(100) ];
```

This will create a new list with length 100 of the doubled values from 0 to 99.

**Exercise**

Use list comprehension to create a list containing the squares of all integers from 1 to 100.

In [80]:
#your code here


<br/>

You can also use multiple `for`-statements in one list comprehension. This gets a bit confusing at first, but it's enough to remember that you write them in the same order as you would otherwise.

This:
```Python
y = [];
for x in [1,2,3]:
    for z in [10,100,1000]:
        y.append(x+z);
```
becomes
```Python
y = [x+z for x in [1,2,3] for z in [10,100,1000];
```

See for yourself:

In [91]:
y1 = [];
for x in [1,2,3]:
    for z in [10,100,1000]:
        y1.append( x+z)
y2 = [x+z for x in [1,2,3] for z in [10,100,1000] ];

print( y1 == y2 );

True


You might have expected that the `==`-operator checks for single values and returns a list of booleans. This is the way it works on MATLAB arrays. Here it checks for all values at once. You could either write your own list class and overwrite the `__eq__()`-method. Or:

**Exercise**

You could use list comprehension to do that ;).

In [93]:
#your code here
bol = [x==y for x,y in zip(y1,y2 )]

#### 8.5.1.1 Conditional list comprehension

You can use `if`-statements in two different ways in list comprehension. Either as **ternary operators** or as actual condition to execute the statement.

**Ternary operator**

A quick reminder. The ternary operator is this nice little thing here:

```Python
a = 100;
size_of_a = 'small' if a < 50 else 'big';
```

You can extend this to a list like this:

```Python
a = [30,89,2,200,239];
size_of_a = ['small' if i < 50 else 'big' for i in a ]; 
```

The position is important!

**Exercise**

Use the ternary operator in list comprehension to dichotomize the following list of BDI scores. Use a cutoff-score of 14, i.e. everyone with a score of at least 14 is considered depressive.

In [95]:
BDI_scores = [11,8,14,19,2,3,16,12,15];

In [96]:
#your code here


**Actual condition**

The ternary operator can't be used to skip an iteration depending on value. It can only choose between two alternatives. There is also a way to skip iterations.

Assume we have a list of positive and negative values. We only want the ones > 0. In a regular loop this would look like this:

```Python
old_list = [-1, 34, 2, -122, 324];
new_list = [];
for old in old_list:
    if old > 0:
        new_list.append( old );
```

Having this structure in mind, it's more intuitive where to put the `if` in the list comprehension. Just like the order of `for`-statements, we just follow the order as well:

```Python
old_list = [-1, 34, 2, -122, 324];
new_list = [old for old in old_list if old > 0];
```

**Exercise**

Use this approach to select only the even numbers from the following list:

In [97]:
odd_and_even = [4,23,443,226,23,36,658,2];

In [98]:
#your code here


List comprehension is optimized very well and thus it's often considerably faster than even more sophisticated approach. When you can use list comprehension or something else to solve the same problem: Use list comprehension.

### 8.5.2 Dict comprehension

Just for the sake of completeness: The same principle can be used to create dictionaries. In many cases you could just use the `dict( zip(a,b) )`-syntax here. But you can also do this:
```Python
new_dict = { a: b for a,b in zip(['a','b','c'],[1,2,3]) };
```



# Conclusion

This is the bare minimum you need to know about data types in Python. Next, let's have a look at how modules and packages are organized.