# Module 4: Data Structures

### CDH course "Programming in Python"

[index](https://colab.research.google.com/drive/1kFvnhumJ0tOTzDVJnIvvMDRRJ19yk9ZS)

Previous module: [3. Conditionals](https://colab.research.google.com/drive/1Lpr5qBYk9bqtAbY6bzfYcbGzCJpWM-ox)

### This module

- Working with collections of many values

## Data structures

- Way to organize data, to make accessing it efficient
- Different types of data structures available
- For now, we will work with `list` and `tuple`

In [None]:
student1 = 'jasmin'
student2 = 'ravi'
student3 = 'john'
# not very efficient, what if we want to add another student? Or take one out?

## Lists

- `list`: an ordered collection of values
- One type of *iterable*, a collection that allows iteration over its elements
- Syntax: `[element1, element2, ...]`
- Empty list also exists: `[]`

In [None]:
students = ['jasmin', 'ravi', 'john']

print(students)

Lists can contain values of mixed types:

In [None]:
['hello', 1, False]

Lists can also contain variables

In [None]:
usa = 'United States of America'
nl = 'The Netherlands'
countries = [usa, nl]

### Accessing elements
- Every element has an *index*
- Index goes from 0 to length of the list - 1
- Negative index counts backwards from the last element

In [None]:
students = ['jasmin', 'ravi', 'john']
students[0]
students[1]
students[2]
students[-1]

- Lists can be *unpacked* into variables

In [None]:
numbers = [1, 2, 3]
one, two, three = numbers

### Changing elements
- Assign element at index just like you would a variable

In [None]:
students = ['jasmin', 'ravi', 'john']
students[0] = 'johanna'

new_student = 'mark'
students[1] = new_student

students

### Adding and removing elements
- The `+` operator works for two lists
- The `.append(value)` and `.remove(index)` functions works on a list

In [None]:
hello_world = ['hello', ',', 'world']
exclamation = ['!']

full_sentence = hello_world + exclamation
print(full_sentence)

In [None]:
# note: .append() works in-place, you don't need to reassign the variable
students = ['jasmin', 'ravi', 'john']
students.append('mark')
print(students)

students.remove('john')
# or by index:
del students[2]
print(students)

### Nested lists
- Anything goes in a list, including *another* list

In [None]:
small_list = [4, 5, 6]
big_list = [1, 2, 3, small_list]

print(big_list)
print(big_list[-1])
print(type(big_list[-1]))

# Access the last element of the small_list inside big_list:
print(big_list[-1][-1])

### Accessing multiple elements
- Select multiple values at once: *slicing*
- Syntax: `list[start_index:end_index]`
- end_index is *exclusive*, so 'up to' end

In [None]:
students = ['jasmin', 'ravi', 'john']
#              0         1       2      3
students[0:1]
students[0:2]

`start_index` and `end_index` are optional

In [None]:
students = ['jasmin', 'ravi', 'john']
students[1:]
students[:-1]
students[:]

- slices can be used to reassign list elements



In [None]:
students = ['jasmin', 'ravi', 'john']

students[0:2] = ['johanna', 'mark']

print(students)

- in this way, you can also add or remove elements in the middle

In [None]:
students = ['jasmin', 'ravi', 'john']

students[1:2] = []
print(students)

students[1:1] = ['ravi']
print(students)

### Checking if an element is in a list
- Use the syntax `<element> in <list>`

In [None]:
students = ['jasmin', 'ravi', 'john']

'ravi' in students
'Ravi' in students

### Useful tricks
- the `len` *function* (we will learn about functions later) gives us the length of a list

In [None]:
students = ['jasmin', 'ravi', 'john']
len(students)

- `list.index(<value>)` finds a value and gives us the index

In [None]:
students = ['jasmin', 'ravi', 'john']
students.index('ravi')

## Tuples
- Different type of *iterable*
- Syntax: `(element1, element2, ...)`
- Important difference: not *mutable* (cannot change elements)
- Often used to unpack, we will work with tuples in data analysis

In [None]:
students = ('jasmin', 'ravi', 'john')
students[0]

## Exercise 4.1: Lists

1. For each of the `print` statements below, what do you expect is printed? Run the lines to check predictions

In [None]:
countries = ['japan', 'hungary', 'maldives', 'gabon', 'bhutan']

print(countries[0])
print(countries[-3])
print(countries[0:1] + countries[2:4])

more_countries = countries + ['mexico', 'haiti']
print(more_countries)

countries.append(['mexico', 'haiti'])
print(countries)

2. Transform the list below into `['jasmin', 'john', 'ravi']` in one line of code.



In [None]:
students = ['jasmin', 'ravi', 'john']

3. For each of the print statements below, what do you expect is printed? Run the lines to check predictions.

In [None]:
random_fruit = 'pineapple'
fruits = ['apple', 'pear', random_fruit]
print(fruits)

random_fruit = 'blueberry'
print(fruits)

random_veggie = ['brussel sprouts']
veggies = ['broccoli', 'green beans', random_veggie]
print(veggies)

random_veggie.append('kale')
print(veggies)

## Exercise 4.2: Bonus

Below we introduce another parameter in the list slice. Try to explain what it does.

In [None]:
countries = ['japan', 'hungary', 'maldives', 'gabon', 'bhutan']

print(countries[0:5:1])
print(countries[0:5:2])
print(countries[-1::-1])
print(countries[-1::-2])

The piece of code below is supposed to recognize "fancy" words: words that are longer than 5 characters, contain at least one copy of the letter 'a' and start with an uppercase. However, the code is broken. It does not recognize any of our fancy example words.

1. Change the value of `word` into each of the examples in the comments on the first two lines and then run the code. See for yourself that the code considers none of the example words fancy. Try some other words as well.
3. Try to understand why the code is giving the wrong result. Can you come up with a word that the code does consider fancy?
4. Repair the code so that it gives the right result for all examples, and any other words that you come up with.

In [None]:
# fancy: Alhambra, Arthur, Jasmine, Turandot
# not so fancy: Jeep, paper, Python, Ada
word = 'Alhambra'

lengthy = len(word) > 5
has_a = 'a' in word
first_uppercase = 'A' <= word[1] <= 'Z'

if lengthy and has_a and first_uppercase:
    print('The word is fancy')
else:
    print('The word is not so fancy')

## Next module

[5. Assertions](https://colab.research.google.com/drive/1OBdYVZCMXGzb3fCM_FPAqY_IfeDR1kub)