# Lists, Sets, and Tuples

Given that we want to be able to process large datasets, it would be useful to be able to easily store and reference into collections of data. 

Consider analyzing a large number of pixels (or [voxels](https://en.wikipedia.org/wiki/Voxel)) in a medical image, or the coordinates of many atoms in some large molecular structures. 


For example, here is the atomic structure of one of the four components of hemoglobin

<img src = "figures/hemoglobin.png" width = "60%">

We don't want to, and realistically can't, hard code a variable for each each atomic coordinate, nor would we want to only have only a handful of values stored in individual variables at any given time. 

It is much better to have a single variable store a collection of all values for easy access. This is where Python's built-in collections come into play.

Python provides four built in **collections**: **Lists**, **Sets**, **Tuples**, and **Dictionaries**. 

Of these four, **Lists** are the most basic and ubiquitous of the collections. The majority of this notebook focuses on lists as a result. 

**Sets** and **Tuples** are more niche but have their uses so they will be introduced briefly after lists.

**Dictionaries** allow us to associate data by storing key-value pairs. They are less general, but extremely useful. They will be covered in the next notebook.

# Lists

As stated, a **List** of values is the most basic and ubiquitous of these collections.

```python
lst = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
```

## Syntax

List syntax uses the square brackets: `[]`

You've seen them before when indexing and slicing into strings. This is because strings are built as a lists. A string is a list of characters.

We can create an empty list or create a list with hard coded initial values:

In [8]:
# Create an empty list
lst1 = [] 
lst2 = list() # a second way to create an empty list

# A list of strings representing the days of the week
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

Note that list values are separated by commas.

Every list has a length which can be returned using the `len()` function.

In [2]:
print(len(days))

7


## Indices

Every element in a list has a position, its **index**.

In the list of days above, `"Monday"` is at index `0`, and `"Sunday"` is at index `6`.

Since we start counting from `0`, the last element in a list is always at index `len(list) - 1`.

**Accessing by Index**

You can access into a list by index:

```python
first_element = lst[0]
```

This should look familiar. It is the same syntax we use for Strings.



**Try It**: Access and print the element at index 5 in the list of days:

In [None]:
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]



**Updating by Index**

Unlike strings, you can update elements in the list:

```python
nums = [1,3,3,4]
nums[1] = 2
print(nums) # [1,2,3,4]
```

**Try It**: Swap the two elements which are out of order and print out the corrected list.

In [None]:
lst = [2, 3, 7, 5, 11]



**Try It**: Try to update a character in a string and see what happens.

In [None]:
string = "banama"



**List Slicing**

Lists also support slicing.

```python
sublist = lst[2:5] # return a list containing the elements in lst at indices 2, 3, and 4.
```

**Try It**: Extract a list of the *weekdays* from `days` using list slicing and print it. 

If you have a list `lst`, you can print it using `print(lst)`.

In [4]:
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]



## Inserting and Removing

Just like strings, there are a set of [list methods](https://www.w3schools.com/python/python_ref_list.asp) we can use on lists.

A few of the most useful are:

- `insert(index, element)`
- `append(element)`
- `remove(element)`
- `count(element)`
- `index(element)`

You will use the first three of these below. See the link above for details on the others.

### Inserting into a list

The `insert(index, element)` function allows us to insert `element` at the given `index`. All element from that index to the end will be shifted forward to make room for `element`.

**Try It**: `insert`

The following list of prime numbers is missing `5`. Insert `5` into its proper position and print out the corrected list.

In [None]:
primes = [2, 3, 7, 11, 13]



### Appending to a list

`append(element)` allows us to append that element onto the list.

**Try it**: `append`

Append the 17 onto your list of corrected primes from above.

Note: Once you execute a code cell in a notebook, its variables are available to any other code cell in that notebook.

### Removing from a list

`remove(element)` removes the first instance of `element` from the list.

**Try it**: `remove`

Remove both instances of 15 from the following list and print out the corrected list:

In [12]:
primes = [2, 3, 5, 7, 11, 13, 15, 15, 17]



## Looping over lists

Looping over a list is trivially easy using a `for` loop.

```python
for element in lst:
    print(element)
```

The loop iterates over the list from index 0 to the end. `element` is a variable which holds the current element at every iteration of the list. 

**Try It**: Loop over days

Write a loop to iterate over `days` and print out each day one at a time.

In [None]:
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]



### Tracking indices: `enumerate`

Sometimes, it is useful to know both the element and its index in a loop. For example, if we are sorting a list, we will need to compare elements directly and know their positions so that we can swap them if need be.

We can use `enumerate()` in a `for` loop track both the current element and its index.

```python
for index, element in enumerate(lst):
    print("Element {} is at index {}".format(element, index))
```

**Try it**: `enumerate` and days

Print each day and its index using a for loop with enumerate:

In [11]:
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]



# Sets

A **set** is an **unordered** collection of elements with **no duplicates**. Sets are useful if your dataset does not or should not contains any duplicates and if set operations ([set functions](https://www.w3schools.com/python/python_sets_methods.asp)) are useful to you.

As an unordered collection, set elements do not have indices and there is no guarantee on the order in which elements will be stored:

**Set Creation**:

In [47]:
s0 = set() # Create an empty set
s1 = set("wednesday") # recall, a string is just a list of characters
                      # s1 is an unordered collection of the unique
                      # characters in "wednesday"
print("s1:\n  {}".format(s1))

# Alternate syntax for initializing a set
roster1 = {"aaron", "laya", "qi", "avik", "jesse", "aaron"} 
print("roster1:\n  {}".format(roster1))

s1:
  {'w', 'y', 'e', 's', 'n', 'a', 'd'}
roster1:
  {'jesse', 'qi', 'avik', 'laya', 'aaron'}


**Set Intersection**:

In [43]:
roster1 = {"aaron", "laya", "qi", "avik", "jesse", "aaron"} 
roster2 = {"desmond", "sam", "aaron", "matilda", "qi", "aaron"} 

print("roster1:\n  {}".format(roster1))
print("roster2:\n  {}".format(roster2))

roster1.intersection(roster2)
print("roster1 INTERSECT roster2:\n  {}".format(roster1.intersection(roster2)))

roster1:
  {'jesse', 'qi', 'avik', 'laya', 'aaron'}
roster2:
  {'desmond', 'qi', 'sam', 'aaron', 'matilda'}
roster1 INTERSECT roster2:
  {'qi', 'aaron'}


# Tuples

A **tuple** is an indexable collection of values that is immutable. That is, once created, a tuple can not be changed.

Tuples are most commonly used when a function needs to return multiple values. For example, `enumerate()` returns both an index and the element at that index.

In [None]:
tup = (0, "Monday")

print(tup)
print(tup[0])
print(tup[1])

### Practice

#### `most_frequent_char`

Write a function which takes in a string and returns a tuple of the form `(most_freq_char, frequency)` where `frequency` is the number of times the `most_freq_char` appeares in the string.

*Tip: Look over the [string methods](https://www.w3schools.com/python/python_strings_methods.asp) to see if any of them can help you implement this function.*

In [None]:
def most_frequent_char(str):
    # Your implementation here
    return ("",-1) # dummy return statement, put your own

most_frequent_char("banana")