# Data Structures

In this notebook, we'll explore different types of data structures that Python can use to store information, namely **lists, tuples, and dictionaries.**


## At the end of this notebook, you'll be able to:
* Compare & contrast the types of structures that Python uses to store data points
* Recognize & create lists, tuples, and dictionaries in Python
* Index, slice, cast, and mutate lists
* Understand the implications of mutability and object-oriented programming

<hr>

## Lists
A _list_ is a mutable collection of ordered items, that can be of mixed type.

**Mutable** means that individual items in the object can be changed. Lists are mutable. Tuples and strings are not -- they're **immutable**.

Lists are created using square brackets `[ ]`, and individual elements are separated by commas.

In [2]:
# Create a list of fruits
my_list = ['Avocado','Tomato','Orange','Fig']
print(my_list)

['Avocado', 'Tomato', 'Orange', 'Fig']


### Useful list methods
- Check the length of your list by using `len(my_list)`
- Use `my_list.append()` to add elements to a list
- Remove elements by index using `del my_list[index]`
- Remove elements by value by using `my_list.remove('value')`
- Sort by using `my_list.sort()`

In [3]:
# Try different list methods here
print(len(my_list))
my_list.append('Lemon')
print(my_list)
del my_list[4:]
print(my_list)

my_list.append('Lemon')
my_list.append('Lemon')
my_list.append('Lemon')
my_list.append('Lemon')
print(my_list)
my_list.remove('Lemon')
my_list.remove('Lemon')
my_list.remove('Lemon')
my_list.remove('Lemon')
print(my_list)
my_list.count('Lemon')

my_list.index('Tomato')

my_list.extend('Grapefruit')
print(my_list)
del my_list[4:]
print(my_list)
my_list.extend(my_list[2])
print(my_list)


my_list.insert(2,'Dragon fruit')
my_list.insert(1424,'Dragon fruit')
print(my_list)

my_list = ['Avocado','Tomato','Orange','Fig']
print(my_list)


4
['Avocado', 'Tomato', 'Orange', 'Fig', 'Lemon']
['Avocado', 'Tomato', 'Orange', 'Fig']
['Avocado', 'Tomato', 'Orange', 'Fig', 'Lemon', 'Lemon', 'Lemon', 'Lemon']
['Avocado', 'Tomato', 'Orange', 'Fig']
['Avocado', 'Tomato', 'Orange', 'Fig', 'G', 'r', 'a', 'p', 'e', 'f', 'r', 'u', 'i', 't']
['Avocado', 'Tomato', 'Orange', 'Fig']
['Avocado', 'Tomato', 'Orange', 'Fig', 'O', 'r', 'a', 'n', 'g', 'e']
['Avocado', 'Tomato', 'Dragon fruit', 'Orange', 'Fig', 'O', 'r', 'a', 'n', 'g', 'e', 'Dragon fruit']
['Avocado', 'Tomato', 'Orange', 'Fig']


### List indexing & slicing
**Indexing** refers to selecting an item from within a collection (e.g., lists, tuples, and strings). Indexing is done by placing the **index number** in square brackets, directly after the list variable.

For example, if `my_list = [1,3,5]`, we can get the second value using `my_list[1]`. (Remember that Python starts indexing at zero!)

### Reminders
- Python is zero-based (The first index is '0')
- Negative indices index backwards through a collection

In [4]:
# Try indexing our list of fruits here
my_list = ['Tomato','Orange','Fig']

print(my_list[my_list.index('Tomato')])
a = 2
print(my_list[a**2-3])
print(my_list[1])
print(my_list[1.0])


Tomato
Orange
Orange


TypeError: list indices must be integers or slices, not float

### If we want multiple items, we can **slice** the list.

There are a few ways to slice:

1. We can **slice** a part of a list using the syntax `[start:stop]`, which extracts characters between index start and stop (-1).

**Notes**
- `start` is __included__ then every element __until__ `stop` is included.
- Negative values count backwards through the list.

2. If we omit either (or both) of start or stop from `[start:stop]`, the default is the beginning and the end of the string, respectively, e.g. `[:3]`
3. We can also define the step size (instead of default 1) using the syntax `[start:stop:step]`

<div class="alert alert-success"><b>Task:</b> For our list of fruits, create three different slices, and save them as different variables:
    
1. A slice of the first two fruits.
2. A slice of the middle three fruits.
3. A slice of the last fruit.
    
</div>

In [5]:
# Your code here!
my_list = ['Avocado', 'Tomato', 'Dragon fruit', 'Orange', 'Orange','Orange']
my_list_start  = my_list[0:2]
my_list_middle  = my_list[2:5]
my_list_end  = my_list[-1]

print(my_list_start)
print(my_list_middle)
print(my_list_end)



['Avocado', 'Tomato']
['Dragon fruit', 'Orange', 'Orange']
Orange


### Checking length
We can use the function `len( )` to check the length of lists.

**Note**: We can also use this to get the number of characters in a string!

In [6]:
my_list_end  = my_list[-1]
my_list_end_alt = my_list[len(my_list)-1]
print(my_list_end)
print(my_list_end_alt)

Orange
Orange


### Checking membership
We can use `in` to see if an item exists in a list. The `in` operator checks whether an element is present in a collection, and can be negated with `not`. _(More on operators in the next lecture)_

In [7]:
print('Orange' in my_list)
print('orange' in my_list)
print('Oranges' in my_list)


True
False
False


### Mutating lists
After definition, we can update members of our list _because lists are mutable!_ This also impacts aliases of our lists.

In [8]:
my_list = list('abcd')
print(my_list)
my_list_alias = my_list
my_list[-1] = 'z'
print(my_list)
print(my_list_alias)

my_list_alias[-2] = 'y'
print(my_list)
print(my_list_alias)

my_list[-2:] = ['cdef','ghijk','lmnop']
print(my_list)
print(my_list_alias)

my_list[1] = ['xyz','xyz','xyz']
print(my_list)
print(my_list_alias)

['a', 'b', 'c', 'd']
['a', 'b', 'c', 'z']
['a', 'b', 'c', 'z']
['a', 'b', 'y', 'z']
['a', 'b', 'y', 'z']
['a', 'b', 'cdef', 'ghijk', 'lmnop']
['a', 'b', 'cdef', 'ghijk', 'lmnop']
['a', ['xyz', 'xyz', 'xyz'], 'cdef', 'ghijk', 'lmnop']
['a', ['xyz', 'xyz', 'xyz'], 'cdef', 'ghijk', 'lmnop']


### Creating lists of lists
Sometimes, it's useful to create lists of lists. Often, if we import big datasets as lists, this is how it will be organized.

![](https://media.giphy.com/media/z1meXneq0oUh2/giphy.gif)

In [9]:
gene_1 = ['gene1',0.48,0.55]
gene_2 = ['gene2',0.38,0.85]
gene_3 = ['gene3',0.21,0.81]
all_genes = [gene_1, gene_2, gene_3]

# We can use this syntax to get a specific value
print(all_genes[0])

['gene1', 0.48, 0.55]


## Tuples
A _tuple_ is an **immutable** collection of ordered items, that can be of mixed type.

* Tuples are created using parentheses.
* Indexing works similar to lists.

In [10]:
# Define a tuple
tup_1 = tuple('abcdef')
tup_2 = ('ab','c','def')

print(tup_1)
print(tup_2)
tup_1[-1] = tup_2[0]


('a', 'b', 'c', 'd', 'e', 'f')
('ab', 'c', 'def')


TypeError: 'tuple' object does not support item assignment

<div class="alert alert-success"><b>Question</b>: Before running the cell below, try to predict: What will be printed out from running this code?</div>

In [11]:
lst = ['a', 'b', 'c']
tup = ('b', 'c', 'd')

if lst[-1] == tup[-1]:
    print('EndMatch')
elif tup[1] in lst:
    print('Overlap')
elif len(lst) == tup:
    print('Length')
else:
    print('None')

Overlap


### Casting between variable types
We can use `list( )` or `tuple( )` to convert variables into different types. This is called **casting**.

This is particularly useful when we use an operator like `range( )` which generates a range, but in the form of an **iterator**.

**Note**: `range`, like indexing, is defined with `start`,`stop`, and `step`, but commas in between each. Remember that you can always use `?range` or `help(range)` to get details on how a function works. 

In [12]:
# Test range here
print(range(0,5))
print(list(range(0,5)))
print(tuple(range(0,5)))
print(list(range(0,5,2)))

print(range(5,0))
print(list(range(5,0)))
print(list(range(5,0,-1)))
print(list(range(5,0,-2)))

print(2 in range(0,5,2))

tup = (range(5,0,-1),list(range(5,0,-2)),tuple(range(5,0,-3)))

print(tup)

range(0, 5)
[0, 1, 2, 3, 4]
(0, 1, 2, 3, 4)
[0, 2, 4]
range(5, 0)
[]
[5, 4, 3, 2, 1]
[5, 3, 1]
True
(range(5, 0, -1), [5, 3, 1], (5, 2))


# Dictionaries
Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is `{key1 : value1, ...}:`

### When dictionaries are useful
1. Flexible & efficient way to associate labels with heterogeneous data
2. Use where data items have, or can be given, labels
3. Appropriate for collecting data of different kinds (e.g., name, addresses, ages)

> In the cell below, create a dictionary for three countries and capitals using the syntax `{country:capital,...}`. Remember that strings still need parentheses!

**Note**: You can also create an empty dicitionary using `{}` and fill it using `dictionary['key'] = 'value'`.

In [13]:
capitals = {'United Kingdom':'London','France':'Paris','Spain':'Madrid','Australia':'??'}
capitals
capitals.update({'Australia':'Canberra'})
capitals

{'United Kingdom': 'London',
 'France': 'Paris',
 'Spain': 'Madrid',
 'Australia': 'Canberra'}

<div class="alert alert-success"><b>Question:</b> Before running the cell below, predict: What would the following code produce?</div>

In [14]:
capitals.update({'Australia':'Canberra'})
capitals

{'United Kingdom': 'London',
 'France': 'Paris',
 'Spain': 'Madrid',
 'Australia': 'Canberra'}

In [15]:
capitals.update({'Argentina':'Buenos Aires'})
capitals['Argentina']
del capitals['Argentina']
capitals
# capitals['Argentina']
for k in capitals:
    print('The capital of', k,'is',capitals[k])

for k,v in capitals.items():
    print(v,'is the capital of', k)

v_lst = list(capitals.values())
k_lst = list(capitals.keys())
for v in capitals.values():
    print(v,'is the capital of', capitals[k_lst[v_lst.index(v)]])


k = list(capitals.keys())
v = list(capitals.values())
for i in range(len(k)):
    print(v[i],'is the capital of', k[i])
    print('The capital of', k[i],'is',v[i])


The capital of United Kingdom is London
The capital of France is Paris
The capital of Spain is Madrid
The capital of Australia is Canberra
London is the capital of United Kingdom
Paris is the capital of France
Madrid is the capital of Spain
Canberra is the capital of Australia
London is the capital of London
Paris is the capital of Paris
Madrid is the capital of Madrid
Canberra is the capital of Canberra
London is the capital of United Kingdom
The capital of United Kingdom is London
Paris is the capital of France
The capital of France is Paris
Madrid is the capital of Spain
The capital of Spain is Madrid
Canberra is the capital of Australia
The capital of Australia is Canberra


<div class="alert alert-success"><b>Task</b>: What happens if we look for a key that doesn't exist? Try this above.

### Additional dictionary functionality
- Use `capitals.update(morecapitals)` to add another dictionary entry
- Use `del capitals['US']` to delete entries
- Loop by key or values, or both

<hr>

## Additional resources

<a href="https://python101.pythonlibrary.org/chapter3_lists_dicts.html">Python 101: Lists, Tuples, and Dictionaries</a>

<a href="https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/06-Built-in-Data-Structures.ipynb">Whirlwind Tour of Python: Built-In Data Structures</a>


## About this notebook
This notebook is largely derived from UCSD COGS18 Materials, created by Tom Donoghue & Shannon Ellis, as well as the <a href="https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-1-Introduction-to-Python-Programming.ipynb">Scientific Python Lecture</a> by J.R. Johansson.

Want to run this notebook as a slideshow? If you have Python (or Anaconda) follow <a href="http://www.blog.pythonlibrary.org/2018/09/25/creating-presentations-with-jupyter-notebook/">these instructions</a> to setup your computer with the RISE plugin.