# Week 5e: Data structures

This is the second notebook for this week, this notebook is an introduction to data structures in Python.

Before you get started though, let just make sure that this notebook is setup to run using the `nlp` conda environment that you created last week.

To set this notebook to the right environment, click the **Select kernel** button in the top right corner of this notebook, then select **Python Environments...** and then select the environment `nlp`.

To double check you have done this correctly, hit the run cell button (▶) on the cell below:

In [None]:
import os
print(os.environ['CONDA_DEFAULT_ENV'])

The output of this cell should say `nlp`.

## Tuple

A tuple is the simplest data structure in Python. This is a collection of variables that is ordered and **unchangeable**, that means once you have created it and given the variables in it values, those values cannot be changed. 

A tuple is created with brackets `()` that contains a list of variables separated by commas `,`:

In [None]:
drink_selection_tuple = ('cappuccino', 'large', True, 'oat')
print(drink_selection_tuple)

To see how many variables are in a tuple you can use the `len()` function:

In [None]:
print(len(drink_selection_tuple))

Individual items in the tuple can be accessed using square brackets `[]` using an integer that represents the index of the variable you want to retrieve:

> **Note:** We always count from 0 in software development (as that is how the computer counts), so to access the 4th item in your tuple, you will need to use the index 3. 

In [None]:
print(drink_selection_tuple[0])
print(drink_selection_tuple[1])
print(drink_selection_tuple[2])
print(drink_selection_tuple[3])

If you try to access something outside the tuple you will get an error:

In [None]:
print(drink_selection_tuple[4])

And you will get an error if you try to change the value of an item in a tuple:

In [None]:
drink_selection_tuple[0] = 'americano'

A nice feature in python is the easy conversion from a tuple to individual named variables like so:

In [None]:
drink, size, has_milk, milk_type = drink_selection_tuple
print(drink)
print(size)
print(has_milk)
print(milk_type)

When a function returns multiple variables, it automatically creates a tuple:

In [None]:
def abc_out():
    return 'a', 'b', 'c'

Normal coding practice in python is to assign these each to their own variables when you call the function:

In [None]:
a, b, c = abc_out()
print(f'{a} {b} {c}')

But lets assign it to an individual variable and check what the type is:

In [None]:
my_var = abc_out()
print(my_var)
print(type(my_var))

Was the output a tuple?

## Lists

Lists are similar to tuples, these also contain variables that are **ordered**, but these are **changeable**.

This makes them useful in lots more contexts, and for lots more things than you can use tuples for.

> **Note:** Lists in Python are analogous to, and look very similar to how Arrays are used in other programming languages. Python lists are much more flexible and easy to use than the way Arrays work in most other programming languages. This makes them very user friendly for us, but it does mean they are not as efficient. To process large uniform arrays in Python in an efficient way, we use libraries like [numpy](https://numpy.org/).

Lists are created using square brackets `[]` with variables in separated by commas `,`:

In [None]:
greetings = ['Hi', 'Hello', 'Hey', 'Howdy', 'Yo', 'Whats up']

You can use the `len()` function again to see the size of your list:

In [None]:
print(len(greetings))

And index individual elements with their an int that represents the index of the variable you want:

In [None]:
print(greetings[0])

A handy shortcut in Python for finding the last element in the list is using `-1` as the index:

In [None]:
print(greetings[-1])

Iterating over a list with a for loop is extremely easy in Python too:

In [None]:
for greeting in greetings:
    print(greeting)

You can use the `random.choice()` function from the [random library]() to get a random selection from a list.

In the context of building chatbots, this is a very handy way to add some variety to the dialogue that your chatbot outputs:

In [None]:
import random
print(random.choice(greetings))

You can add new things to your list with the `.append()` function:

In [None]:
greetings.append('Alreet mate')

Your list should now be bigger than before:

In [None]:
print(len(greetings))

And the last element in the list is the one that has just been appended:

In [None]:
print(greetings[-1])

You can remove specific elements with `.remove()`: 

In [None]:
greetings.remove('Howdy')

And now the list should now be back to it's original size:

In [None]:
print(len(greetings))

And if you iterate over the list you should that the greeting 'Howdy' has been removed:

In [None]:
for greeting in greetings:
    print(greeting)

#### Strings <> Lists

**Fun fact** Strings in Python are essentially just lists of characters. Many of the things you can do with a list you can also do to a string:

In [None]:
abc = 'abc'

print(len(abc))
print(abc[0])
print(abc[-1])

> If you want to know more about the similarities (and differences) between strings and lists in Python you can checkout this thread: https://stackoverflow.com/a/18264371

## Set

A set is an **unordered** collection of variables. These variables are **mutually exclusive** meaning that you cannot have duplicates.

> **Note**: The concept of a Set comes from [set theory in Mathematics](https://en.wikipedia.org/wiki/Set_(mathematics)). Python has many in-built functions from set theory that you can do using Python Sets, like [union](https://www.w3schools.com/python/ref_set_union.asp), [intersection](https://www.w3schools.com/python/ref_set_intersection.asp) and [difference](https://www.w3schools.com/python/ref_set_difference.asp). This class won't cover these, but they can come in very useful for certain things that we might want to do when analysing data.

To create a set you can use the curly brackets `{}`, with variables separated by comma's `,`:

In [None]:
drink_options = {'cappuccino', 'americano', 'flat white', 'espresso'}

When printed you may (or may not) see that the order of the variables in the set is different from when it was created:

In [None]:
print(drink_options)

To see if something is already is an set you can use the `in` conditional statement:

In [None]:
print('cappuccino' in drink_options)

In [None]:
print('latte' in drink_options)

To add a new item to a set use the function `.add()`:

In [None]:
drink_options.add('latte')

In [None]:
print(drink_options)

If you try to add an item to a set where it already exists you won't get an error:

In [None]:
drink_options.add('cappuccino')

But you also won't get a duplicate:

In [None]:
print(drink_options)

To remove an item use the `.remove()` function:

In [None]:
drink_options.remove('flat white')

In [None]:
print(drink_options)

And remember, sets are **unordered** so you will get an error if you try to get an item using an int as an index: 

In [None]:
print(drink_options[0])

## Dictionary

Dictionaries are the final data structure in Python that you need to know about. 

Dictionaries store data in **Key-Value** pairs. 

Like when you create a Set you use curly brackets `{}` with items separated by commas `,`. However the difference is that you need to have a colon `:` to separate the key (left side of colon) and value (right side of colon). 

By convention, the keys in dictionaries are almost always Strings. The values can be whatever variable type you want.

Below is a dictionary where the values are a mix of Strings, Booleans and Ints:

In [None]:
drink_selection = {
    'drink': 'cappuccino',
    'size': 'large',
    'with_milk': True,
    'milk_type': 'oat',
    'customer_name': 'Bob',
    'price': 460
}

Print to see the dict:

In [None]:
print(drink_selection)

If you only want to see the keys in your dict you can use the `.keys()` function:

In [None]:
print(drink_selection.keys())

If you only want to see the values in your dict you can use the `.values()` function:

In [None]:
print(drink_selection.values())

To access an element in a dict you use the square brackets `[]` using the key as the index inside the brackets:

In [None]:
print(drink_selection['drink'])

Use this method to print out the type of the value that is stored using the key `with_milk`:

In [None]:
# code goes here

The change the value of something in your dict, you just need to index it with the right key, and then assign a new value:

In [None]:
drink_selection['drink'] = 'americano'

Now check to see the change has been made:

In [None]:
print(drink_selection['drink'])

Like sets, dictionaries are **not ordered** so you can't simply use an int to index an item in your dict:

> **Note:** In theory you can use Ints as keys in your dict, but don't do this. You will just confuse the heck out of yourself and others later on. 

In [None]:
print(drink_selection[0])

### Storing other data structures in a dictionary 

Dictionaries are super flexible. As well as storing many variables types (whilst this is something you can do in Lists and Sets, it is not recommended), you can also store other data structures (including another dictionary) inside you dictionary!

See the example below:

In [None]:
cafe_dict = {
    'drink_options': {'cappuccino', 'americano'},
    'drink_prices': { 
        'cappuccino': 350,
        'americano': 220
    },
    'customer_names': ['Bob','Bill','Tracy','Linda','Bob','Callum']
}

Can you name what each data structure is for `drink_options`, `drink_prices` and `customer_names`?

Just like before you use the `key` to get the value out, even if the value is itself a data structure containing other variables:

In [None]:
print(cafe_dict['drink_options'])

To access a dict within a dict, you just need to use two square brackets, one after the other using the correct keys:

In [None]:
print(cafe_dict['drink_prices']['cappuccino'])

To access a list within a dict, you also use two square brackets, but this time it is a key (string) followed by a index (int):

In [None]:
print(cafe_dict['customer_names'][0])

# Tasks

If you have not completed **tasks 1, 2 & 3** from [Week-5d-Functions.ipynb](Week-5d-Functions.ipynb), then finish those first before starting these tasks.

Your task now is to use data structures to simplify and improve your chatbot code. Continue working in the file `week-5d-coffee-shop-bot.py`:
- **Task 1:** Create three Sets called `drink_options`, `milk_options` and `size_options` that are used to store the options in your coffee shop. Use `size_options` to check if the size option chosen by the user is valid.
- **Task 2:** Create three Dictionaries called `drink_prices`, `milk_prices` and `size_prices` and use these to store the price related to each one (as given in [Week-5d-Functions.ipynb](Week-5d-Functions.ipynb))
- **Task 3:** Create a data structure to store an individual drink order, this could be a **Tuple** or a **Dictionary**

##### Bonus task

1. Can you adapt your code so that multiple drink orders can be places (one at a time)?
   -  You will need to create a function the performs all of the steps of taking a drink order. 
   -  Use a list to put the drink orders into, this way you can add as many orders as you want to this list.
2. Once the user has ordered their drinks, use a loop to print out all of the drink orders to the user, listing each drink and it's options one at a time.
3. Then output a total price at the end? 
   - To do this you go will need to go through the list of drink orders, get the price for each drink, and sum them together before giving the total price.)
