# Python Data Structures

All programming languages have ways to store and recognize different types of data (numbers, text, etc.), along with the logical mechanisms that allow you to perform useful or interesting things on that data. Often times you'll need more complex ways to organize your program's data than having each item saved in a variable.

**Data structures** are schemes for how to organize your data in more efficient or effective ways. Python offers many different options, and the 'best' choice will highly depend on the task your program is performing. Taking the time to think through your options can not only make your job of coding the program easier, but also optimize it to run faster and/or use less memory.

Table of Contents:

- <a href="#list">Lists</a>
- <a href="#sets">Sets</a>
- <a href="#frozsets">Frozen Sets</a>
- <a href="#comprehensions">Comprehensions</a>
- <a href="#arrays">numpy Arrays</a>


## <a name="lists"></a>Lists

In Python, a `list` is a collection of items with the following properties:

- It's *mutable*: you can change it after you've created it
- It's *ordered*: each item's 'address' is its index (aka it's position) in the list (note, this does not mean it's *sorted*, that's a different concept)
- Usually the items are all the same type (homogenous list), but they don't have to be (heterogenous list)

A list is what's called a sequence in Python, which means it's automatically iterable and items can be accessed using integer indices. 

[Python documentation on lists](https://docs.python.org/3/library/stdtypes.html#list)

In [9]:
# Create a list with square brackets
l1 = [1, 2, 3, 4, 5]
print(l1)

# Create a list with the `list` type constructor and optionally another iterable
l_empty = list()
print(l_empty)

l_from_str = list('abc')
print(l_from_str)

[1, 2, 3, 4, 5]
[]
['a', 'b', 'c']


In [84]:
# Unlike arrays in other languages, can hold different data types
l_mixed = [42, True, 'Hello', None, 9.9, l_from_str]
print(f'Heterogenous list: {l_mixed}\n')

for item in l_mixed:
    print(f'{item} is type {type(item)}')

Heterogenous list: [42, True, 'Hello', None, 9.9, ['a', 'b', 'c', 'd']]

42 is type <class 'int'>
True is type <class 'bool'>
Hello is type <class 'str'>
None is type <class 'NoneType'>
9.9 is type <class 'float'>
['a', 'b', 'c', 'd'] is type <class 'list'>


In [12]:
# You can add items to the end of a list with `.append()` method
print('Before append:', l_from_str)
l_from_str.append('d')
print('After append:', l_from_str)

# You can remove items

Before: ['a', 'b', 'c']
After: ['a', 'b', 'c', 'd']


### List Indexing and Slicing

As noted above, lists are accessed by an item's index. Python uses zero-based indexing, so the first item has index 0, then subsequent integers. You can also access items with negative indexing - those start at the end of the list with -1.

```shell
Length of list: 6

Index from front:   0    1    2    3    4    5
List Items (L):    ['a', 'b', 'c', 'd', 'e', 'f']
Index from rear:    -6   -5   -4   -3   -2   -1
```

`L[0] -> 'a'` and `L[-1] -> 'f'`

**Slicing** allows you to select a segment of items in the list. The sytax for a slice is:

```py
L[start:stop:step]
```

Note that the `stop` index is not inclusive. Leaving the `start` or `stop` value blank will default to using the start and end of the list, respectively. So a quick way to create a copy of a list is `L[:]`.

You can use indexing and slicing to **access**, **update**, **insert**, or **delete** items in the list.

In [74]:
# Create example list
l_alpha = list('abcdefg')
print(l_alpha)

# Grab the middle item
mid_idx = len(l_alpha)//2
print(f'Middle index: {mid_idx}')
print(f'Middle item: {l_alpha[mid_idx]}')

['a', 'b', 'c', 'd', 'e', 'f', 'g']
Middle index: 3
Middle item: d


In [94]:
# Multi-level indexing: accessing items in lists of lists

# Create a matrix
M = [
    # 1st col 2nd col
    ['r0_c0', 'r0_c1'], # First row: M[0]
    ['r1_c0', 'r1_c1'], # Second row: M[1]
    ['r2_c0', 'r2_c1'], # Third row: M[2]
]

print(f'Starting matrix: {M}')
print(f'Get second row\'s first item: {M[1][0]}')

Starting matrix: [['r0_c0', 'r0_c1'], ['r1_c0', 'r1_c1'], ['r2_c0', 'r2_c1']]
Get second row's first item: r1_c0


In [75]:
# Create a shallow copy of a list
l_alpha_2 = l_alpha[:]  # l_alpha.copy() also works
print(l_alpha_2)

['a', 'b', 'c', 'd', 'e', 'f', 'g']


In [76]:
# Slicing with step examples
l_nums = list(range(11))
print(f'Starting list: {l_nums}')

# Reverse a list
print(f'\nReversed numbers: {l_nums[::-1]}')

# Grab the even numbers in a list
print(f'\nEven numbers: {l_nums[::2]}')

# Grab the odd numbers in a list
print(f'\nOdd numbers: {l_nums[1::2]}')

Starting list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Reversed numbers: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Even numbers: [0, 2, 4, 6, 8, 10]

Odd numbers: [1, 3, 5, 7, 9]


### Access and Replace Items in a List

In [77]:
# Replace one item via its index
print(f'Starting list: {l_alpha}')
l_alpha[-1] = 'golf'
print(f'Updated list: {l_alpha}')

# Replace several items via a slice
print(f'\nStarting list: {l_alpha}')
l_alpha[:3] = ['alpha', 'bravo', 'charlie']
print(f'Updated list: {l_alpha}')

Starting list: ['a', 'b', 'c', 'd', 'e', 'f', 'g']
Updated list: ['a', 'b', 'c', 'd', 'e', 'f', 'golf']

Starting list: ['a', 'b', 'c', 'd', 'e', 'f', 'golf']
Updated list: ['alpha', 'bravo', 'charlie', 'd', 'e', 'f', 'golf']


### Remove Items from a List

In [78]:
# Remove items from a list by position
print(f'Starting list: {l_alpha}')
del l_alpha[4:6]
print(f'Updated list: {l_alpha}')

Starting list: ['alpha', 'bravo', 'charlie', 'd', 'e', 'f', 'golf']
Updated list: ['alpha', 'bravo', 'charlie', 'd', 'golf']


In [79]:
# Remove items from a list by item
print(f'Starting list: {l_alpha}')
l_alpha.remove('d')
print(f'Updated list: {l_alpha}')

Starting list: ['alpha', 'bravo', 'charlie', 'd', 'golf']
Updated list: ['alpha', 'bravo', 'charlie', 'golf']


### Insert Items into a List

In [80]:
# Insert items into a list by index
print(f'Original list: {l_alpha}')
l_alpha.insert(3, 'delta')

# Insert multiple items without creating a sub-list
l_alpha[4:4] = ['echo', 'foxtrot']
print(f'Updated list: {l_alpha}')

Original list: ['alpha', 'bravo', 'charlie', 'golf']
Updated list: ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf']


### Clear a List

In [81]:
# Clear a list's contents
print(f'Starting list: {l_alpha}')
l_alpha.clear()
print(f'Updated list: {l_alpha}')

Starting list: ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf']
Updated list: []


### Sort a List

In [None]:
# Two options to sort a list:
l_phonetic_end = ['whiskey', 'uniform', 'x-ray', 'zulu', 'victor', 'yankee']
print(f'Starting list: {l_phonetic_end}')

# 1) Sort is done via the `sorted()` function and returns a separate list
l_phon_sort = sorted(l_phonetic_end)
print(f'New sorted list: {l_phon_sort}')
print(f'Original list (unchanged): {l_phonetic_end}')

# 2) Sort is done in place via the `.sort()` method and mutates the original list
l_phonetic_end.sort()
print(f'Updated original list: {l_phonetic_end}')

l_phonetic_end

### List Summary

Lists are a good data structure choice when:

- You need a flexible structure where you can change (add, update, remove) items on the fly in your program
- You also want items to remain in the order you placed them in the data structure

When there are better options:

- You are working with a huge set of numbers of the same type (lists aren't as memory efficient as numpy arrays)
- You are working with vectors or matrices and need access to optimized calculations
- You need to efficiently add/remove items at both ends of the list (check out double-ended queues (`deque`) in the `collections` module)

That said, lists are more memory-intensive than other options if you're working with a large number of data points that are all the same type.

## <a name="sets"></a>Sets

In Python, the `set` is a collection of items with the following properties:

- it's *mutable*
- it's *unordered*
- items are *unique*

## <a name="frozsets"></a>Frozen Sets

## <a name="comprehensions"></a>Comprehensions

## <a name="arrays"></a>numpy Arrays

In [82]:
import numpy as np