# Programming Fundamentals II: Data Structures

In this notebook, we'll explore different types of data structures that Python can use to store information, namely **lists, tuples, and dictionaries.**


## At the end of this notebook, you'll be able to:
* Compare & contrast the types of structures that Python uses to store data points
* Recognize & create lists, tuples, and dictionaries in Python
* Index, slice, cast, and mutate lists
* Understand the implications of mutability and object-oriented programming

## A _list_ is a mutable collection of ordered items, that can be of mixed type.

**Mutable** means that individual items in the object can be changed. Lists are mutable. Tuples and strings are not -- they're **immutable**.

Lists are created using square brackets `[ ]`, and individual elements are separated by commas.

In [None]:
# Create a list of fruits
fruits = 

### Useful list functions
- Check the length of your list by using `len(my_list)`
- Use `my_list.append()` to add elements to a list
- Remove elements by index using `del my_list[index]`
- Remove elements by value by using `my_list.remove('value')`
- Sort by using `my_list.sort()`

In [None]:
# Try different list functions

### List indexing & slicing
**Indexing** refers to selecting an item from within a collection (e.g., lists, tuples, and strings). Indexing is done by placing the **index number** in square brackets, directly after the list variable.

For example, if `my_list = [1,3,5]`, we can get the second value using `my_list[1]`. (Remember that Python starts indexing at zero!)

### Reminders
- Python is zero-based (The first index is '0')
- Negative indices index backwards through a collection

![](http://www.nltk.org/images/string-slicing.png)

In [None]:
# Get the third fruit in our list
second_fruit = 

### If we want multiple items, we can **slice** the list.

There are a few ways to slice:

1. We can **slice** a part of a list using the syntax `[start:stop]`, which extracts characters between index start and stop (-1).

**Notes**
- `start` is __included__ then every element __until__ `stop` is included.
- Negative values count backwards through the list.

2. If we omit either (or both) of start or stop from `[start:stop]`, the default is the beginning and the end of the string, respectively, e.g. `[:3]`
3. We can also define the step size (instead of default 1) using the syntax `[start:stop:step]`

<div class="alert alert-success"><b>Task:</b> For our list of fruits, create three different slices, and save them as different variables:
    
    1. A slice of the first two fruits.
    2. A slice of the middle three fruits.
    3. A slice of the last fruit.
    
</div>

In [None]:
# Your code here!


### Checking length
We can use the function `len( )` to check the length of lists.

**Note**: We can also use this to get the number of characters in a string!

### Checking membership
We can use `in` to see if an item exists in a list. The `in` operator checks whether an element is present in a collection, and can be negated with `not`. _(More on operators in the next lecture)_

### Mutating lists
After definition, we can update members of our list _because lists are mutable!_

### Creating lists of lists
Sometimes, it's useful to create lists of lists. Often, if we import big datasets as lists, this is how it will be organized.

![](https://media.giphy.com/media/z1meXneq0oUh2/giphy.gif)

In [None]:
gene_1 = ['gene1',0.48,0.55]
gene_2 = ['gene2',0.38,0.85]
gene_3 = ['gene3',0.21,0.81]
all_genes = [gene_1, gene_2, gene_3]

# We can use this syntax to get a specific value
print(all_genes[0][-1])

## A _tuple_ is an **immutable** collection of ordered items, that can be of mixed type.

* Tuples are created using parentheses.
* Indexing works similar to lists.

In [None]:
# Define a tuple
tup = (2, 'b', False)
tup[1]

<div class="alert alert-success"><b>Question</b>: Before running the cell below, try to predict: What will be printed out from running this code?</div>

In [None]:
lst = ['a', 'b', 'c']
tup = ('b', 'c', 'd')
if lst[-1] == tup[-1]:
    print('EndMatch')
elif tup[1] in lst:
    print('Overlap')
elif len(lst) == tup:
    print('Length')
else:
    print('None')

## Casting between variable types
We can use `list( )` or `tuple( )` to convert variables into different types. This is called **casting**.

This is particularly useful when we use an operator like `range( )` which generates a range, but in the form of an **iterator**.

**Note**: `range`, like indexing, is defined with `start`,`stop`, and `step`, but commas in between each. Remember that you can always use `?range` or `help(range)` to get details on how a function works. 

In [10]:
this_range = range(1,10)
type(this_range)
list(this_range)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## Circling back to aliases
<b>Reminder</b>: Aliases are copies of the same variable.

In [None]:
# Make a variable & an alias
# change value of original variable
a = 1
b = a
a = 2

print(a)
print(b)

### What happens if we make an alias of a **mutable** variable, like a list?

In [None]:
first_list = [1, 2, 3, 4]
alias_list = first_list
alias_list

In [None]:
#change second value of first_list
first_list[1] = 29
first_list

In [None]:
# check alias_list
alias_list

### **Takeaway**: For *mutable* type variables, when you change one, both change.

### Why allow aliasing? 

Aliasing can get confusing and be difficult to track, so why does Python allow it?

Well, it's more efficient to point to an alias than to make an entirely new copy of a a very large variable storing a lot of data. 

Python allows for the confusion, in favor of being more efficient.

# Dictionaries
Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is `{key1 : value1, ...}:`

In the cell below, create a dictionary for three countries and capitals (`{country:capital,...`}). Remember that strings still need parentheses!

In [None]:
capitals = 

<div class="alert alert-success"><b>Question:</b> Before running the cell below, predict: What would the following code produce?</div>

In [None]:
capitals['United Kingdom']

### What happens if we look for a key that doesn't exist?

### Additional dictionary functionality
- Use `capitals.update(morecapitals)` to add another dictionary entry
- Use `del capitals['US']` to delete entries
- Loop by key or values, or both

### When dictionaries are useful
1. Flexible & efficient way to associate labels with heterogeneous data
2. Use where data items have, or can be given, labels
3. Appropriate for collecting data of different kinds (e.g., name, addresses, ages)

# References
<a href="https://swcarpentry.github.io/python-novice-gapminder/11-lists/index.html">Software Carpentries Lists</a>

<a href="https://python101.pythonlibrary.org/chapter3_lists_dicts.html">Python 101: Lists, Tuples, and Dictionaries</a>

<a href="https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/06-Built-in-Data-Structures.ipynb">Whirlwind Tour of Python: Built-In Data Structures</a>


# About this notebook
This notebook is largely derived from UCSD COGS18 Materials, created by Tom Donoghue & Shannon Ellis, as well as the <a href="https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-1-Introduction-to-Python-Programming.ipynb">Scientific Python Lecture</a> by J.R. Johansson.

Want to run this notebook as a slideshow? If you have Python (or Anaconda) follow <a href="http://www.blog.pythonlibrary.org/2018/09/25/creating-presentations-with-jupyter-notebook/">these instructions</a> to setup your computer with the RISE plugin.