# Python Standard Library - Data Structures
---

Python's built-in data structures are simple, but quite powerful **containers for data**. 
Understanding them well and mastering their use is critical for a programmer to write efficient code for doing data science.

Also known as $Collections$, the Python standard library contains containers such as `Tuples`, `Lists`, `Dictionaries` and `Sets`, 
each of which are characterized by a few common features such as

- They are *iterable* and can be *subscripted* for accessing elements
- They may or may not be *immutable*

Also important to know and remember about these containers is 

- how data is stored in them
- methods that work on them such as *search*
- applications they’re most suitable for

There are many more higher-level containers, you may read about them [here](https://docs.python.org/3/library/collections.html)

## Tuple

A `tuple` is a sequence of Python objects that is

- one-dimensional, ordered, indexable
- each element can be of a different type
- immutable once created, hence also fixed-length

> Many functions from data science libraries like scikit-learn take tuples as arguments or produce tuples as output. <br> It is therefore important to understand how they work and what are their advantages and limitations.

### Creating `tuple` objects

We can create tuples in two ways
- A comma-separated sequence of values assigned to a variable (optional: placed inside parentheses)
- Calling `tuple()` on any one-dimensional collection/container (eg. a list)

```python
tup = 1, 2, 6
nested_tup = (('a', 'b', 'c'), (1, 2), 'xyz', 123)

tuple([3.3, 1.1, 4.0])
tuple('forests')
```

#### Task - Run the following code. Note what happens. Why?

```python
tuple_1 = (1, 2, 3)
tuple_1[2] = 4
```

In [None]:
# create a tuple

### Accessing elements of a `tuple`

Subsetting works via the square brackets `.[index]` accessor with single indexes `0 < index < N-1` or *splices* such as `tup_1[0:5]`

An example

```python
In : nested_tup[0]
Out: ('a', 'b', 'c')
```

Try it out for yourself in the cell below

In [None]:
# access an element

### Tuple Concatenation 

The `+` operator joins tuples to form longer tuples

An example

```python
(3, None, 'foo') + (0, -2.22) + ('bar', 'xyz')
```

Try it out for yourself in the cell below

In [None]:
# add two tuples

### Tuple $unpacking$

In an assignment statement, corresponding elements of a tuple will be assigned to respective objects on the right-hand-side (given that the number of objects is the same as length of the tuple.) 
This makes it very easy to swap variables, a task that requires a third variable in other languages.

> This pattern is frequently used in data science when calling modelling functions that return tuples, such as `train_test_split`

An example

```python
a, b, c = (1, 2, 3) 
# equivalent to writing a=1; b=2; c=3

a, b = b, a 
# a and b swap values
```

Try it out for yourself in the cell below

In [None]:
# unpack a tuple into variables

### Tuple Methods 

- There aren't too many methods that work on Tuples. 
- `count` and `index` are useful

An example 

```python
In:
tup_1 = ('a', 'p', 'p', 'l', 'e')
print(tup_1)
print(tup_1.count('p'))
print(tup_1.index('l'))

Out:
('a', 'p', 'p', 'l', 'e')
2
3
```

Try it out for yourself in the cell below.

In [None]:
# tuple methods

### Tuples - Coding practice

- Try to predict what will happen when you run the following code?

<br>

---
## Lists `list`

A Python list is simply an ordered collection of values or objects. 
It is similar to what in other languages might be called an array, but with some added functionality. 

Lists are
- one-dimensional
- containers for collections of objects of any type
- variable-length
- mutable, ie, their contents can be modified in-place

### Importance of Lists

Lists are frequently used as the preferred containers 
    - for storing column and row names in a table
    - for iterating over ranges
    - for creating new features

### Creating `list` objects

Lists can be defined using

- square brackets `[]`
- calling the `list()` function on an iterable
- Python functions that produce lists (such as `range`), and $Comprehensions$

Examples

```python
int_list = [1, 2, 3]
mixed_list = ["string", 0.1, True]
list_of_lists = [int_list, mix_list, [‘A’, ‘B’]]
x = range(10)
```

Try it out in the cell below 

<br>

In [None]:
# make a list

<br>

---

### Subsetting: Accessing elements of a `list`

- Single elements can be accessed using their index or position

```python
list_1 = [1, 12, 34, 49, -97, 0, -15, 8]
list_1[4] # fetches the 5th element
list_1[‐1] # fetches the last element
```

- Subsets of lists (smaller lists) can be accessed using **integer slicing**

```python
list_1[:4] # first four elements
list_1[4:] # all elements from fourth to the end
list_1[1:5] # second to fifth
list_1[‐3:] # last three
list_1[::-1] # reverse the list
```

Try it out in the cell below 

<br>

In [None]:
# subset and slice

<br>

---

### List methods

List objects in Python have convenient methods to append, remove, extract, insert and sort elements
- `.append()` adds a single element at the end of the list
- `.extend()` adds multiple elements at the end of an existing list
  - Lists can be also be concatenated using the `+` operator
- `.insert()` to insert elements at a specific index (this is an expensive operation)
- `.pop()` removes and returns an element at a particular index (default: from the end)
- `.remove()` takes an element as input and removes its first occurrence
- `.sort()` is used to sort a list in the ascending order (default action) in-place
  - The `reverse=True` parameter sorts the list in reverse order

#### Task: Try the following code and observe the output

```python
x = ['x', 'y', 'z']
x.append(‘a’)
x.extend([10, 11, 12])
[1, 2, 3] + [‘a’, ‘b’, ‘c’]
x.insert(3, ‘a’)
x.pop()
x.remove(10)

x = range(10, 1, -1)
print("Before Sort:", x)
x.sort()
print("After Sort:", x)

y = list('abcde')
print("Before Sort:", y)
y.sort(reverse=True)
print("After Sort:", y)
``` 

<br>

In [None]:
# list methods

<br>

---

### Functions that work on `list` objects

In addition to methods, Python also has a few general-purpose functions that work with lists
- `len()` returns the number of elements in the list
- `in` checks whether an element belongs to a list and returns a boolean
- `sorted()` returns a new list from the elements of given list, sorted in ascending order (default action)
- `reversed()` yields an iterator to go over the list in reverse order

#### Task: Try the following code and observe the output

```python
x = list(‘just a string’)
len(x)
't' in x
sorted(x)
reversed(x)
``` 

<br>

In [None]:
# functions on lists

<br>

---

### List $unpacking$

Works a lot like tupleunpacking, where you can assign list elements to objects using an assignment statement

```python
a, b, c, d = [1, 2, 3, 4]
``` 

<br>

In [None]:
# list unpacking

<br>

---

### Lists - Coding practice

- Change the type of each cell to code and run it
- Try to predict what will happen, comment and explain what the code does
 
<br>

<br>

---
## Careful when copying mutable objects

- Whenever you find yourself typing something like `list_1 = list_2`, stop and think about the consequences. 
- When binding lists with variable names, you are passing a 'reference' to the data, not the actual data
    - Copies aren't created, instead you have 2 variables *pointing* to the same memory location
    - This means that modifying either of `list_a` or `list_b` will change the object they point to
- It's like a shared bank account of 2 people, anyone can debit or credit money in it, it will change the amount in the account

> PS: <br> Use the `id()` function to find the address of the memory location where an object is stored

Examples

```python
In : list_1 = range(10)
In : print(list_1)
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In : list_2 = list_1

In : id?

In : id(list_1)
Out: 4396728616

In : id(list_2)
Out: 4396728616

In : list_1.append(10)
In : print(list_1)
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In : print(list_2)
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In : list_2.remove(5)
In : print(list_1)
Out: [0, 1, 2, 3, 4, 6, 7, 8, 9, 10]
```

To avoid this, when you want to copy a list, use (1) the list function (2) the `.copy()` method

```python
In : list_1 = range(10)

In : list_2 = list_1
In : id(list_1) == id(list_2)
Out: True

In : list_3 = list(list_2)
In : id(list_3) == id(list_2)
Out: False

In : list_4 = list_3.copy()
In : id(list_4) == id(list_3)
Out: False
```

Try these out below

<br>

In [None]:
# practice - call by reference

<br>

---
## Dictionaries `dict`

`dict` is likely the *most important* built-in Python data container
It is a collection of **key-value pairs** (where key and value are Python objects) and has the following properties
- flexibly-sized 
- unordered 
- mutable 

Example: A Python dictionary to store 4 bits of information about a book

```python
doc_info = {
  "author": "dushyant",
  "title": "Data Science in Python",
  "chapters": 10,
  "tags": ["#data", "#science", "#datascience", "#python", "#analysis"]
}
```

### Creating `dict` objects

Dictionaries can be created using 

- comma-separated collection of colon-separated key-value pairs
- dictionary comprehensions
- from another object that stores key-value pairs, such as a `Series` from `pandas`

```python
empty_dict = {}
a_dict = {‘key_1’:12, ‘key_2’:36}
b_dict = {'a': 'a string', 'b' : [1, 2, 3, 4]}
```

Try it out 

<br>

In [None]:
# create a dictionary with 5 string keys and 5 float values

<br>

---
### Subsetting: Accessing elements of a `dict`

This is done using 
- the square brackets accessor `my_dict[key_1]`, or
- using the `.get()` method which handles errors gracefully

```python
a_dict['key_2']
b_dict.get('a')
b_dict.get('c', 'Not found')
```

Try these out below

<br>

In [None]:
# dict subsetting

<br>

---
### `dict` methods

Dictionaries have the following important methods

- `.keys()` returns the keys (as a list)
- `.values()`  returns the values (as a list)
- `.items()` returns the key-value pairs (as a list of tuples)
- `.get(key, robj)` returns the value for the key provided. If key does not exist, it returns the return object provided
- `.pop(key)` removes the key-value pair for the key provided 
- `.update(dict)` merges two dictionaries and does not return a value

### Task: Create a dictionary and experiment with these methods

<br>

In [None]:
# dict methods

<br> 

---
### Functions that work on `dict` objects

- Check if a `dict` contains a key using the `in` keyword
  - Example `'d' in b_dict`
  - This search is faster than lists or tuples because dictionary keys are *hashed*
- The `del` keyword is used to delete the key-value pair associated with the passed key
  - Example: `del a_dict['key_1']`
  
<br>

In [None]:
# in, del keywords

<br> 

---
### `dict` coding practice

- Change the type of each cell to code and run it
- Try to predict what will happen, comment and explain what the code does
 
<br>

<br>

---
### Task: Diner Menu

Using the data below

```python
food = ['ham', 'eggs', 'bacon', 'coffee', 'toast', 'jam']
price = [2, 0.5, 1, 1, 0.2, 0.1]
```

- Create a dictionary called Diner, using these keys and keep values as prices
- Find out if 'pancakes' is available at the Diner. If not, add it to the menu for USD 4
- Poultry prices have gone up. Change the price of 'eggs' to USD 3.40
- We are no longer serving pork products. Get rid of bacon.
- Use a for loop to print the entire menu.

<br>

<br> 

---
## Sets `set`


- A `set` is an *unordered* collection of **unique** elements
- They can be thought of being like dicts, but keys only - no values
- Quite useful for set-theory operations such as *union, intersection, difference*


---
### Creating `set` objects

A set can be created in two ways:

- By calling the function `set()` on a one-dimensional collection
- Using curly braces oprator `{}`

```python
In : set_1 = set([2, 2, 2, 1, 3, 3])
In : print(set_1)
Out: set([1, 2, 3])
```

---
### `set` Methods

Sets support mathematical set operations like union, intersection, difference, and symmetric difference.

- `set_a.union(set_b)` returns a collection of unique elements from sets a and b
- `set_a.intersection(set_b)` returns a collection of unique common elements from sets a and b
- `set_a.difference(set_b)` returns a collection of unique elements from set a that are not in b

Examples

```python
In:
setx = {1, 2, 3, 4}
sety = {3, 4, 5, 6}
print(setx.union(sety))
print(setx.intersection(sety))
print(setx.difference(sety))

Out:
set([1, 2, 3, 4, 5, 6])
set([3, 4])
set([1, 2])
```

Try these operations out below

<br>

In [None]:
# set operations practice

<br> 

---
#### Task: Explore `set` methods 

- Find 3 methods not listed above
- What is their function?

Use the cell below

<br>

In [None]:
# set methods for updating / deleting

<br>

---
### Searching for elements in a `set`

- The `in` keyword can be used to find if an item belongs to a collection
- However, not all collections are equally performant for this operation
  - A `list` is searched using a linear scan - this becomes slower for longer lists
  - A `set` or `dict` is searched based on *hashes* meanning the search is completed in constant time regardless of size

```python
In:
list_1 = range(10_000)
set_1 = set(list_1)

In: %timeit 1123 in list_1
Out: 100000 loops, best of 3: 23.8 µs per loop

In: %timeit 1123 in set_1
Out: 100000 loops, best of 3: 194 ns per loop
```

A significant speedup! 

> Sets are most useful when we want to check if an item belongs to a long list,<br> for example, in checking if a URL is blacklisted or if an email exists in a database


<br>

Try the exercise below

In [None]:
# test speed of finding an element in a tuple vs. a dict

<br> 

---
### `set` coding practice

- Change the type of each cell to code and run it
- Try to predict what will happen, comment and explain what the code does
 
<br>

<br>

---
## Functions `zip()` and `enumerate()` 

- These functions come in handy when dealing with Python collections.
- They take in iterable collections as input and produce iterable collections as output
- `zip` produces a list of tuples containing corresponding elements from each iterable
    - the length of this list is equal to the shortest iterable provided
- `enumerate` returns the elements in the iterable with a counter (starts at zero)

```python
# Syntax
zip(iterable_1, iterable_2, iterable_3 ... iterable_n)
enumerate(iterable_1)

# Examples
In : zip([1, 2, 3], (4, 5, 6), ['a', 'b', 'c'])
Out: [(1, 4, 'a'), (2, 5, 'b'), (3, 6, 'c')]   

In : enumerate(('one','two','three'))
Out: <enumerate at 0x10e527500>

In : list(enumerate(('one','two','three')))
Out: [(0, 'one'), (1, 'two'), (2, 'three')]
```

Run the examples below

<br>

<br>

----
## Comprehensions

Comprehensions are Pythonic patterns that condense loops and if-then routines in a single line.<br>
They can be seen as a rearragement of code written for a for-loop + if-else

Used for

- Creating new lists and dictionaries
- Filtering existing lists and dictionaries

There are extremely powerful programming patterns that express complex logic in a short expression.
When writing code for data science, we frequently employ these.

**Syntax**

```python
# List comprehension to filter elements from an iterable 
[f(x) for x in iterable if condition] 

# List comprehension to conditionally apply functions to elements
[f(x) if condition else g(x) for x in iterable]

# Dictionary comprehension to create a dict from filtered iterables
{k:v for k,v in zip/enumerate(iterable(s)) if condition}

```

**Examples**

We want to create a list using squares of even numbers and cubes of odd numbers <br>
First, *without* comprehension

```python
list_1 = []
for i in range(11):
    if i % 2 == 0:
        list_1.append(i**2)
    else:
        list_1.append(i**3)
        
print(list_1)
```    

Now, using a list comprehension

```python
list_1 = [i**2 if i % 2 == 0 else i**3 for i in range(10)]

print(list_1)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
```

### How fast are comprehensions?

- Run the code below and find out

In [1]:
list_1 = []
for i in range(11):
    if i % 2 == 0:
        list_1.append(i**2)
    else:
        list_1.append(i**3)
        
print(list_1)

[0, 1, 4, 27, 16, 125, 36, 343, 64, 729, 100]


In [2]:
%%timeit

sqrs = []
for i in range(10**3):
    if i % 2 == 0:
        sqrs.append(i**2)

109 μs ± 5.82 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [3]:
%%timeit
sqrs = [i**2 for i in range(10**3) if i % 2 == 0]

94.3 μs ± 4.89 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


<br>

---
#### Task: What is the following comprehension doing?

```python
alphabets = list('abcdefghijklmnouvwxyz')
[a.upper() for a in alphabets if a not in set('aeiou')]
```

<br>

Try it out below

In [4]:
# task - alphabets
alphabets = list('abcdefghijklmnouvwxyz')
[a.upper() for a in alphabets if a not in set('aeiou')]

['B',
 'C',
 'D',
 'F',
 'G',
 'H',
 'J',
 'K',
 'L',
 'M',
 'N',
 'V',
 'W',
 'X',
 'Y',
 'Z']

<br>

---
#### Task: Create a list of numbers from 1 to 100. Store the cubes of all odd numbers in a list called `odd_cubes`

<br>

<br>

---
#### Task: Perfect squares

- Create a list of numbers from 1 to 1000.  
- Create another list from it, called perfect-squares.
- That contains only perfect squares.

<br>

In [None]:
# perfect squares

<br>

---
#### Task: Create a dictionary of the form {value: value squared}, for multiples of 5 under 51

<br>

In [None]:
# squares of multiples of 5

<br>

---
#### Task: Create a dictionary with the counts for vowels (vowel: count) 

- Use the sentence 'a quick brown fox jumps over the lazy dog'

<br>

In [None]:
# count vowels

<br>

----
## Functions


Simply put, you can think of functions as groups of code that have a name and can be called using parentheses. 

But in practice, functions are perhaps the most powerful and convenient way of accomplishing two critical tasks:

- code organization (think about your project as being composed of modules, bundle up code in functions for each)
- code reuse (functionality that you will use in multiple places)

As a direct extension of the famous DRY principle (Don't Repeat Yourself) if you ever find yourself copy-pasting code
written earlier in a script to do a similar task, stop and write a function. As a guiding principle, take a leaf from the UNIX
philosophy, and

> Write short functions that do one thing, but do it really well.

Every function in Python is characterized by the following steps:  

1. Take some argument(s)
2. Flow it through the body of the function
3. Return object(s)  

---
**Syntax**

```python
# Define function
def function_name(parameters):
    """Docstring"""
    .
    .
    function_body
    .
    .
    return something

# Call function
result = function_name(arguments)
```

**Example**

```python
def add_one(num):
    """
    This function takes a number, and increases it by 1
    Returns the increased value
    """
    return num + 1

add_one(99)
100
```

<br>

#### Task: Write a function that takes two numbers X and Y and returns X/Y

In [None]:
# function division

---
### `args` and `kwargs`

- Functions in Python can take positional and keyword arguments. 
    - Note that when passing both kinds of arguments to a function, the positional arguments should be listed first. 
- It is also possible to declare default arguments, which will take the values specified with the function's definition, unless overriden by explicitly passing parameters.

**Example - args**

```python
def any_adder(*args):
    """
    args passed will be summed together
    This is a flexible function
    """
    return np.sum(args)

any_adder(1, 2, 3, 4)
10

any_adder(10, 78, -234, 99)
-47
```

**Example - kwargs**

```python
def adder(m, n, *args, **kwargs):
    print("the first positional arg is ", args[0])
    print("Name is ", kwargs['name'])
    return m + n

In : adder(998, 2, 10, 20, 30, 40, 50, name='Dush', car='None', color='Red')

Out: 
the first positional arg is 10
Name is Dush
1000
```

In [None]:
# practice args and kwargs

<br> 

---
## Functions are Objects 

You can 

- check their type
- put them in a list or dict
- return a function from a function

Run the examples below 

<br>

In [10]:
def add_one(x):
    return x+1

type(add_one)

function

In [12]:
def add_ten(x):
    return x+10

type(add_ten)

list_of_funcs = [add_one, add_ten]

print(list_of_funcs[0](99))
print(list_of_funcs[1](90))

100
100


<br> 

---
## Error Handling

- Performed using `try except` construct
- First the execution tries to run the code in the try block
- If it fails (raises and exception), the code in the Except block is run

This is extremely helpful when writing functions to avoid the interruption of program execution due to bad input or any other factor. 

It allows us to 'catch' the problematic inputs.

Run the example below

<br>

In [None]:
def div_by(a, b):
    try:
        return a/float(b)
    except:
        return 'Invalid Input'

<br>

---
#### Task: Write a function to catch invalid input

- Your function should take 2 integers as input and return their product
- If an invalid argument is passed, such as boolean or string - store it in a separate list

<br>

In [None]:
# function to filter invalid inputs

<br>

---

# Lambda Functions

Are a special category of functions that

- do not have a name
- are temporary in nature and intent
- employed when the desired functionality is to use & throw

These are used extensively in Data Science with Pandas and scikit-learn

<br>

In [None]:
def squarer(x):
    """
    This function takes a number and returns its square
    """
    return x**2

In [None]:
squarer(5)

In [None]:
squarer = lambda x: x**2

In [None]:
squarer(5)