# Data Structures

In this notebook, we'll explore different types of data structures that Python can use to store information, namely **lists, tuples, and dictionaries.**


## At the end of this notebook, you'll be able to:
* Compare & contrast the types of structures that Python uses to store data points
* Recognize & create lists, tuples, and dictionaries in Python
* Index, slice, cast, and mutate lists
* Understand the implications of mutability and object-oriented programming

<hr>

## Lists
A _list_ is a mutable collection of ordered items, that can be of mixed type.

**Mutable** means that individual items in the object can be changed. Lists are mutable. Tuples and strings are not -- they're **immutable**.

Lists are created using square brackets `[ ]`, and individual elements are separated by commas.

In [1]:
# Create a list of fruits
fruits = ['strawberry','raspberry','apple','mango']
fruits

['strawberry', 'raspberry', 'apple', 'mango']

### Useful list methods
- Check the length of your list by using `len(my_list)`
- Use `my_list.append()` to add elements to a list
- Remove elements by index using `del my_list[index]`
- Remove elements by value by using `my_list.remove('value')`
- Sort by using `my_list.sort()`

In [2]:
# Try different list methods here
print(fruits)
#fruits.remove('kiwi')
print(len(fruits))
fruits

['strawberry', 'raspberry', 'apple', 'mango']
4


['strawberry', 'raspberry', 'apple', 'mango']

In [3]:
del fruits[1]
print(fruits)

['strawberry', 'apple', 'mango']


In [4]:
fruits.sort()
fruits

['apple', 'mango', 'strawberry']

In [5]:
fruits.sort(reverse=True)
fruits

['strawberry', 'mango', 'apple']

In [6]:
fruits.reverse()
fruits

['apple', 'mango', 'strawberry']

### List indexing & slicing
**Indexing** refers to selecting an item from within a collection (e.g., lists, tuples, and strings). Indexing is done by placing the **index number** in square brackets, directly after the list variable.

For example, if `my_list = [1,3,5]`, we can get the second value using `my_list[1]`. (Remember that Python starts indexing at zero!)

### Reminders
- Python is zero-based (The first index is '0')
- Negative indices index backwards through a collection

In [7]:
# Try indexing our list of fruits here
fruits = ['strawberry','raspberry','apple','mango','kiwi']

fruits[0:1] #slice the first two fruits

['strawberry']

### If we want multiple items, we can **slice** the list.

There are a few ways to slice:

1. We can **slice** a part of a list using the syntax `[start:stop]`, which extracts characters between index start and stop (-1).

**Notes**
- `start` is __included__ then every element __until__ `stop` is included.
- Negative values count backwards through the list.

2. If we omit either (or both) of start or stop from `[start:stop]`, the default is the beginning and the end of the string, respectively, e.g. `[:3]`
3. We can also define the step size (instead of default 1) using the syntax `[start:stop:step]`

<div class="alert alert-success"><b>Task:</b> For our list of fruits, create three different slices, and save them as different variables:
    
1. A slice of the first two fruits.
2. A slice of the middle three fruits.
3. A slice of the last fruit.
    
</div>

In [8]:
# Your code here!
fruits = ['strawberry','raspberry','apple','mango','kiwi']

print(fruits[0:2]) # first two

print(fruits[1:4]) # middle three

print(fruits[-1]) # last fruit

['strawberry', 'raspberry']
['raspberry', 'apple', 'mango']
kiwi


### Checking length
We can use the function `len( )` to check the length of lists.

**Note**: We can also use this to get the number of characters in a string!

In [9]:
len(fruits[0])

10

### Checking membership
We can use `in` to see if an item exists in a list. The `in` operator checks whether an element is present in a collection, and can be negated with `not`. _(More on operators in the next lecture)_

In [10]:
'blueberry' not in fruits

True

### Mutating lists
After definition, we can update members of our list _because lists are mutable!_ This also impacts aliases of our lists.

In [11]:
# Create alias of our list
fruits2 = fruits

# Update the original list
fruits.append('blueberry') # add blueberry

# Check both lists
print(fruits) # print original
print(fruits2) # print alias

['strawberry', 'raspberry', 'apple', 'mango', 'kiwi', 'blueberry']
['strawberry', 'raspberry', 'apple', 'mango', 'kiwi', 'blueberry']


In [12]:
fruits.append?

[0;31mSignature:[0m [0mfruits[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      builtin_function_or_method

In [13]:
id(fruits)

140564179853632

In [14]:
id(fruits2)

140564179853632

### Creating lists of lists
Sometimes, it's useful to create lists of lists. Often, if we import big datasets as lists, this is how it will be organized.

![](https://swcarpentry.github.io/python-novice-inflammation/fig/indexing_lists_python.png)
<div align="center"><a href="https://swcarpentry.github.io/python-novice-inflammation/04-lists/index.html">Image source</a></div>

In [15]:
gene_1 = ['gene1',0.48,0.55]
gene_2 = ['gene2',0.38,0.85]
gene_3 = ['gene3',0.21,0.81]
all_genes = [gene_1, gene_2, gene_3]

# We can use this syntax to get a specific value
print(all_genes[0])

['gene1', 0.48, 0.55]


## Tuples
A _tuple_ is an **immutable** collection of ordered items, that can be of mixed type.

* Tuples are created using parentheses.
* Indexing works similar to lists.

In [16]:
# Define a tuple
my_tuple = ('gene1',0.5)
my_tuple

('gene1', 0.5)

<div class="alert alert-success"><b>Question</b>: Before running the cell below, try to predict: What will be printed out from running this code?</div>

In [17]:
lst = ['a', 'b', 'c']
tup = ('b', 'c', 'd')

if lst[-1] == tup[-1]:
    print('EndMatch') #1 
elif tup[1] in lst:
    print('Overlap') #2
elif len(lst) == tup:
    print('Length') #3 
else:
    print('None') #4

Overlap


### Casting between variable types
We can use `list( )` or `tuple( )` to convert variables into different types. This is called **casting**.

This is particularly useful when we use an operator like `range( )` which generates a range, but in the form of an **iterator**.

**Note**: `range`, like indexing, is defined with `start`,`stop`, and `step`, but commas in between each. Remember that you can always use `?range` or `help(range)` to get details on how a function works. 

In [18]:
# Test range here
tuple(range(0,100,10))

(0, 10, 20, 30, 40, 50, 60, 70, 80, 90)

# Dictionaries
Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is `{key1 : value1, ...}:`

### When dictionaries are useful
1. Flexible & efficient way to associate labels with heterogeneous data
2. Use where data items have, or can be given, labels
3. Appropriate for collecting data of different kinds (e.g., name, addresses, ages)

> In the cell below, create a dictionary for three countries and capitals using the syntax `{country:capital,...}`. Remember that strings still need parentheses!

**Note**: You can also create an empty dicitionary using `{}` and fill it using `dictionary['key'] = 'value'`.

In [19]:
capitals = {'Spain':'Madrid','USA':'DC','Japan':'Tokyo'}
capitals['Japan']

'Tokyo'

<div class="alert alert-success"><b>Question:</b> Before running the cell below, predict: What would the following code produce?</div>

In [20]:
capitals.update({'United Kingdom':'England'})
capitals

{'Spain': 'Madrid', 'USA': 'DC', 'Japan': 'Tokyo', 'United Kingdom': 'England'}

<div class="alert alert-success"><b>Task</b>: What happens if we look for a key that doesn't exist? Try this above.

In [21]:
capitals['Kenya']

KeyError: 'Kenya'

In [22]:
gene_dict = {'gene1':[0.3,0.2,0.5]}
gene_dict['gene1']

[0.3, 0.2, 0.5]

### Additional dictionary functionality
- Use `capitals.update(morecapitals)` to add another dictionary entry
- Use `del capitals['US']` to delete entries
- Loop by key or values, or both

<hr>

## Additional resources
<a href="https://swcarpentry.github.io/python-novice-gapminder/11-lists/index.html">Software Carpentries Lists</a>

<a href="https://python101.pythonlibrary.org/chapter3_lists_dicts.html">Python 101: Lists, Tuples, and Dictionaries</a>

<a href="https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/06-Built-in-Data-Structures.ipynb">Whirlwind Tour of Python: Built-In Data Structures</a>


## About this notebook
This notebook is largely derived from UCSD COGS18 Materials, created by Tom Donoghue & Shannon Ellis, as well as the <a href="https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-1-Introduction-to-Python-Programming.ipynb">Scientific Python Lecture</a> by J.R. Johansson.