# Data Structure in Python

## Data types

Python supports many data types. Some of them are common among other programming languages and some of them are native to Python.
The list of data types is given in the picture below, and we discuss all the data types in detail, throughout this notebook.

<img src="../../../images/python_data_structures.png" style="width: 650px;">

### Initializing Variables

Python variables do not need declaration or specification of data type. For example,

```python
a = 5
b = 5.0
c = "string"
```
are all correct assignments.

The same value can be assigned to multiple variables in a sequence as shown below unlike many other languages which allow single variable assignment.

```python
a = b = c = 5
```

<b>Let us begin coding with our first exercise.</b>


### Exercise

Initialize and print 
* a) an integer variable (x) with value 4
* b) a floating point variable (y) with value 6.78
* c) a string variable (z) with value 'Hello'
* d) re-assign all three variables with a string 'I love python' and print all three variables.

The objective of this task is to show the ease of re-assigning a variable in python. The variable does not need to be typecasted and python automatically assigns the data type to the variable based on the value it is assigned. (i.e., If x='Hello', then 'x' is automatically identified by python as a string variable. If 'x' is re-assigned a value of 12.6, say x=12.6, then 'x' is automatically identified as a float)

In [8]:
# Initialize the variables
# Please make sure you use the variables x,y,z only to do your exercise and pass the exercise. If you use other variables, you might not get right results.

## Solution

```python
x = 4
y = 6.78
z = 'Hello'

print (x, y, z)

# Re-assigning all three variables
x = y = z = 'I love python'
print (x, y, z)

```

### Format specifiers in Python

When you want to put a placeholder for a variable and pass the value to the function separately, we can use something called format specifier. This is typically a set of symbols which tell the function about the format in which the value being passed is to be printed. Format specifiers for various types of data types in Python are similar to other programming languages:

* %d for integer
* %f for float
* %s for string
* %r for raw object using 'repr' method
* %x for hex code

We can use the format specifiers as placeholders in a print statement, and pass the value of a specific variable using the '%' operator. Multiple variables/values can also be passed by enclosing all variables in paranthesis and separating them by commas.

An example:

```python
a = 10
b = 5

print("I want to print %d and %d in the same sentence"%(a,b))

>>> I want to print 10 and 5 in the same sentence
```

Another example:

```python
a = 10.264725
b = "a string type"

print("I want to print %f and %s in the same sentence"%(a,b))

>>> I want to print 10.264725 and a string type in the same sentence
```
Alternative to the '%' operator, we have the format() function which can also be used.

```python
a = 10
b = 5

print("I want to print {:d} and {:d} in the same sentence".format(a,b))

>>> I want to print 10 and 5 in the same sentence
```

```python
a = 10.264725
b = "a string type"

print("I want to print {:f} and {:s} in the same sentence".format(a,b))

>>> I want to print 10.264725 and a string type in the same sentence
```

#### Rounding off a float and adding preceeding zeroes

In the above example, where we had a float value, we may want to restrict the number of decimals and print a rounded value. On the other hand we may want to add a preceeding zero to the number, in case the value a here is a single observation in a larger pool of numbers which have 3 digits before the decimal.
We can add preceeding zeroes and round off float values by specifying the type of formatting for a float, after the '%' symbol and before the 'f' symbol of the format specifier.
* Use a period symbol ('.') to denote the decimal in the float value
* For preceeding zeroes, add total desired length of the number (including decimal point), in digits, before the period symbol. And before this number, add a '0' to denote that all preceeding blank spaces need to be filled with zeroes.
* For rounding off decimals, add the number up to which you want decimals, right after the period symbol and before the 'f' symbol

<img src="../../../images/format_specifiers.png">

A few examples:
```python
a = 10.264725 

print('''Variations of a floating point:
1. {:.2f}
2. {:.3f}
3. {:4.4f}
4. {:04.4f}
5. {:8.4f}
6. {:08.4f}
7. {:010.4f}'''.format(a,a,a,a,a,a,a))

>>> Variations of a floating point:
>>> 1. 10.26
>>> 2. 10.265
>>> 3. 10.2647
>>> 4. 10.2647
>>> 5.  10.2647
>>> 6. 010.2647
>>> 7. 00010.2647
```

<b>Important note:</b> In above example '{:4.4f}' and '{04.4f}' give the same output, where we cannot see any padding (neither blanks nor zeroes). This is because the length that we specified for the string is 4, whereas we have also asked for 4 digits after the decimal point. So the digits after decimal take precedence and these 4 digits are added first.
Then the whole number is generated - the characteristic (digits before decimal) and the mantissa (digits after the decimal). After this the control checks to add padding. It now finds that the condition for length of number we have specified is 4, and the length of number is already more than 4, hence it does not add any padding blanks or zeroes.

For more details on formatting refer to: https://pyformat.info/

#### Exercise

Given variables - 
* a = 5
* b = 134.264262
* c = "Hello! How are you?"

Print the following using appropriate format specifiers
* the integer and a floating point equivalent of a
* the value of b, upto one decimal place and padded with two preceeding zeroes
* the string c, truncated up to first 10 characters

## Solution

```python
a = 5
b = 134.264262
c = "Hello! How are you?"

print('''The answers are:
1. {:d} and {:f}
2. {:07.1f}
3. {:.10s}'''.format(a,a,b,c))
```

## Variable Operators

Operators are the constructs which can manipulate the value of operands/variables.

### Arithmetic Operators

The standard arithmetic operators in Python are:

* \+ for addition - Adds the values of two variables
* \- for subtraction - Subtracts the value of right hand side variable from left hand side variable
* \* for multiplication - Multiplies the values of two variables with each other
* / for division - Divides the value of left hand side variable (called Dividend) with the value of right hand side variable (called Divisor) and returns 'Quotient'
* \*\* for exponentials - Exponential, multiplies the number on the left hand side with itself, the number of times equal to the the right hand side value
* % for modulus - Modulus, similar to division operator, but returns the remainder of the division, rather than the quotient

A few examples to understand output of each arithmetic operator:

```python
# Importing numpy library to create n-dimensional arrays for matrix multiplication. We will learn more about this later
import numpy as np

# Creating simple variables
a = 10
b = 5

# Creating arrays for matrix multiplication
c = np.array([[1,2,3],[4,5,6]])
d = np.array([[1,0,0],[0,1,0],[0,0,1]]) # This is identity matrix. We will learn about this in Math module

# Simple operations
sum_ab = a + b
diff_ab = a - b
prod_ab = a*b
div_ab = a/b
rem_ab = a%b
exp_ab = a**b

# Matrix multiplication
mat_mul = c@d

print('''The sum of a and b is: %.2f
The difference of a and b is: %.2f
The product of a and b is: %.2f
The quotient of a divided by b is: %.2f
The remainder of a divided by b is: %.2f
The exponential of a to the power of b is: %.2f
The matrix product of a and b is:'''%(sum_ab,diff_ab,prod_ab,div_ab,rem_ab,exp_ab),mat_mul)

# Output
>>> The sum of a and b is: 15.00
>>> The difference of a and b is: 5.00
>>> The product of a and b is: 50.00
>>> The quotient of a divided by b is: 2.00
>>> The remainder of a divided by b is: 0.00
>>> The exponential of a to the power of b is: 100000.00
>>> The matrix product of a and b is: [[1 2 3]
>>> [4 5 6]]
```

As the name suggests, the arithmetic operators generally support numeric value operations with the exception of '+', which works as concatenation in case of string values. Table of supported data structures:
<img src="../../../images/arith_support.png">

### Relational/Comparison Operators

The comparison operators are similar to most other programming languages with the commonly known symbols:
* \> for greater than
* < for less than
* = for assignment
* == for equal to
* \>= for greater than or equal to
* <= for less than or equal to
* != for not equal to

The relational/comparison operators are generally used to compare numeric values (integer and float). Though output may be generated while using character data types, it is often unexpected and irrational.

The operators return a boolean value 'True' or 'False', which can also be presented numerically as 1 for True and 0 for False.

An example:
```python
# Creating simple variables
a = 10
b = 5

# Simple operations
greater_ab = a>b
lesser_ab = a<b
gr_or_eq_ab = a>=b
le_or_eq_ab = a<=b
eq_to_ab = a==b
not_eq_to_ab = a!=b

print('''Is a greater than b: %r
Is a less than b: %r
Is a greater than or equal to b: %r
Is a less than or equal to b: %r
Is a equal to b: %r
Is a not equal to b: %r'''%(greater_ab,lesser_ab,gr_or_eq_ab,le_or_eq_ab,eq_to_ab,not_eq_to_ab))

#Output
>>> Is a greater than b: True
>>> Is a less than b: False
>>> Is a greater than or equal to b: True
>>> Is a less than or equal to b: False
>>> Is a equal to b: False
>>> Is a not equal to b: True
```

Another example:

```Python
# Comparing 2 values:

a = 5 
b = 15

cond_1 = a > b
print('Is a smaller than b: {}'.format(cond_1))

cond_2 = a == b
print('Are a and b equal: {}'.format(cond_2))

cond_3 = a < b
print('Is a greater than b: {}'.format(cond_3))

cond_4 = a != b
print('Are a and b unequal: {}'.format(cond_4))

# Output
>>> Is a smaller than b: False
>>> Are a and b equal: False
>>> Is a greater than b: True
>>> Are a and b unequal: True
```

### Exercise

Given a=34 and b=57, verify the following conditions:
* Is the total of a and b equal to b?
* Is the total of a and b greater than a?
* Is the total of a and 23 less than or equal to b?
* Is the total of a and 23 equal to a?
* Is the product of a and 18, divided by 18 is not equal to a itself?

Is a greater than b: True
Is a less than b: False
Is a greater than or equal to b: True
Is a less than or equal to b: False
Is a equal to b: False
Is a not equal to b: True


### Solution code

```python

a = 34
b = 57

ans = a + b
cond_1 = ans == b
cond_2 = ans > a 

ans = a + 23
cond_3 = ans <= b
cond_4 = ans == a

ans = a * 18
cond_5 = (ans/18) != a

print('cond_1: {}'.format(cond_1))
print('cond_2: {}'.format(cond_2))
print('cond_3: {}'.format(cond_3))
print('cond_4: {}'.format(cond_4))
print('cond_5: {}'.format(cond_5))


```

### Membership Operator (in)

Python’s membership operators test for membership in a sequence, such as strings, lists, or tuples.

For the code 'x in y', if x is a subset of y, then the result is True:

```python
# list
a = [1,2.35,"Hello","cat",114]

# "\n" creates a new line while printing
print(" Is 15 in list a?", 15 in a, "\n", "Is 'cat' in list a?", 'cat' in a, "\n", "Is 'dog' in list a?", 'dog' in a, "\n", "Is 'H' in the word 'Hello'?", 'H' in 'Hello', "\n", "Is 'd' in the word 'cat'?", 'd' in 'cat')

# Output
>>> Is 15 in list a? False 
>>> Is 'cat' in list a? True 
>>> Is 'dog' in list a? False 
>>> Is 'H' in the word 'Hello'? True 
>>> Is 'd' in the word 'cat'? False
```

The 'in' operator is frequently used as a iterator which runs through every element of a list, tuple or string, while using decision statements or loops. We will see more usage of 'in' when we learn about loops and decision statements.

### Identity Operators (is) and the id() function

In order to check the memory location of a variable in Python, we use the id() function. When we pass a variable as an argument to the id() function, it returns a integer which represents the memory location of the given variable.
Identity operators compare the memory locations of two python objects and returns True if they point to the same location. 'is' is such an operator.

Two objects can contain the same value but can be different objects, stored at different memory locations. Here is an example which illustrates the 'is' operator:

```python
# Example 1
a = 5
b = 5
c = 5.0

print("Location of a:", id(a), "; Location of b:", id(b), "; Is a same as b?", a is b)
print("Location of a:", id(a), "; Location of c:", id(c), "; Is a same as c?", a is c)
print("Is a equal to b?", a==b, "\n", "Is a equal to c?", a==c)

# Output
>>> Location of a: 10919552 ; Location of b: 10919552 ; Is a same as b? True
>>> Location of a: 10919552 ; Location of c: 140045356024168 ; Is a same as c? False
>>> Is a equal to b? True
>>> Is a equal to c? True

# Example 2
a = 'This '
b = a + 'is a python tutorial'
c = 'This is a python tutorial'

b == c
>>> True
b is c
>>> False
```

<b>Note:</b> In above example we can see that when we initialized two variables 'a' and 'b' within the same block with the same value, they were both pointing to the same location. Python saves memory in this way, by referring to the same object, when values are same, instead of creating a new object. When some operations are performed on 'a' and 'b' they take up different values and that is when the memory location they reference would change based on which value they would be taking.

```python
# Example 3
a = 5
b = 5
c = 5

# Before operating on a,b and c
print("Location of a:", id(a), "; Location of b:", id(b), "; Is a same as b?", a is b)
print("Location of a:", id(a), "; Location of c:", id(c), "; Is a same as c?", a is c)

# Manipulating a,b and c
a = 2*a
b = 2*b
c = 3*c

# After operating on a,b and c
print("Location of a:", id(a), "; Location of b:", id(b), "; Is a same as b?", a is b)
print("Location of a:", id(a), "; Location of c:", id(c), "; Is a same as c?", a is c)

# Output
>>> Location of a: 10919552 ; Location of b: 10919552 ; Is a same as b? True
>>> Location of a: 10919552 ; Location of c: 10919552 ; Is a same as c? True
>>> Location of a: 10919712 ; Location of b: 10919712 ; Is a same as b? True
>>> Location of a: 10919712 ; Location of c: 10919872 ; Is a same as c? False
```

For more on operators, refer to: https://docs.python.org/3/library/operator.html

#### Exercise

Given two variables - a="Hello" and b="Hello"
* Check if a is in b, i.e., is 'a' a subset of 'b'
* Check if a is equal to b. Also check if a is same as b, using the 'is' operator
* Verify above result by comparing the memory locations of a and b using the id() function

### Solution

```python
a = "Hello"
b = "Hello"

print("Is a in b?", a in b, "\n", "Is a equal to b?", a==b, "\n", "Is a same as b?", a is b)
print("Location of a:", id(a), "; Location of b:", id(b), "; Do a and b have same location?", id(a)==id(b))
```

## Python Lists

Python lists are equivalent to arrays. For example:

fruits = ["Mango", "Banana", "Apple", "Orange"]

Python lists can have multiple types of objects. For example:

mixed_list = ["Apple", 3, 25.68, "This is a mixed list"]

### Exercise

Create a list vegetables with Tomato, Spinach, Asparagus, Lettuce and Jalapenos as the elements and print the list out

In [12]:
# Create a list of vegetables
# Use 'vegetables' as the variable name of the list.


## Solution

```python

vegetables = ["Tomato", "Spinach", "Asparagus", "Lettuce", "Jalapenos"]
print (vegetables)

```

## Indexing in a list

You can access the objects with the indices:

To access the first object:

```
>> fruits[0]
'Mango'
```

To access the last object, esp when the length of the list is large, a useful index method is to use negative index that refers to objects from the end of the list:

```
>> fruits[-1]
'Orange'
```

The flexibility of python is that it can contain various objects of different types unlike many other languages:

```
collection = ["cost", 100, "Apples", 5.0]
```

Ref: https://docs.python.org/3/tutorial/datastructures.html

<img src="https://s3.amazonaws.com/rfv2/list.png" style="width: 170px;">

### Exercise

* Retrieve the third element in the vegetables list created before and assign it to variable, veg_2
* Retrieve the last element using the negative index and assign it to the varible veg_1
* Print out veg_1 and veg_2 variables

In [16]:
# Use the indexing to retrieve the variables You don't have to re-initialize the variable 'vegetables' here, as it is already initialized by you in the previous exercise and available in memory for you to reuse through out this lesson.
# Index of a list starts with 0

## Solution

```python

veg_2 = vegetables[2]
veg_1 = vegetables[-1]
print (veg_2, veg_1)

```

## Useful indexing tricks

Inexing in Python is really advanced and helps retrieve elements of a list in more than just the traditional way. We have seen some simple indexing above. In this section we will have a look at some advanced indexing which would help retrieve elements satisfying some simple conditions.

### The colon and string reversal using indexing

In most programming languages a string is a list/array of characters. So the string reversal operations are the same (or similar) to array or list reversal.

The colon plays an important part in indexing in Python.
* Using a single colon to separate two indices: The number on the left side of the colon is the lower index (generally inclusive) and the number on the right side of the colon is the upper index (generally exclusive). This notation retrieves all elements of the list starting from the lower index up to the upper index.
* Using two colons to separate two indices and an iterator: A second colon can be used to separate the upper index with an iterator. The iterator is a number which determines the increment the index will take while traversing the list/array. By default this number is '1', so the index increases one place at a time, i.e., next index = previous index + increment.

```python
# Single colon example

a = [1,2,3,4,5,6,8,10] # Defining the list
a[2:6] # 2 is lower index and 6 is upper index

>>> [3, 4, 5, 6] # Output

# Double colon example

a = [1,2,3,4,5,6,8,10] # Defining the list
a[2:6:2] # 2 is lower index and 6 is upper index and 2 is iterator/increment in index

>>> [3, 5] # Output

# Another double colon example

a = [1,2,3,4,5,6,8,10] # Defining the list
a[::2] # 0 is lower index and len(a)-1 is upper index and 2 is iterator/increment in index

>>> [1, 3, 5, 8] # Output

# Yet another example

a = [1,2,3,4,5,6,8,10] # Defining the list
a[1::2] # 1 is lower index and len(a)-1 is upper index and 2 is iterator/increment in index

>>> [2, 4, 6, 10] # Output
```

Now, what would happen if the iterator were to be set to a negative number, say -1. The list would be traversed and elements retrieved in a reverse order. This is how we can actually perform a list/string reversal.

```python
# List reversal example

a = [1,2,3,4,5,6,8,10] # Defining the list
a[::-1] # 0 is lower index and len(a)-1 is upper index and -1 is iterator/increment in index; Traverse list in reverse order

>>> [10, 8, 6, 5, 4, 3, 2, 1] # Output
```

#### Exercise

Given two lists: a = [36,21,6,23,77,14,7,24,4,13] and b = [48,14,1,3,63,23,24,53,21,59]
* Starting from the first element of list a, retrieve every third element of the list. Print the result.
* Starting from the first element of list b, select every alternate element of the list, in reverse order. Print the result.
* Retrieve every alternate number, starting from first element from list a and every alternate element, starting from second element from list b, subtract elements of list b from list a and print both the lists.

### Solution code

```python
a = [36,21,6,23,77,14,7,24,4,13]
print(a[::3])

b = [48,14,1,3,63,23,24,53,21,59]
print(b[::-2])


print(a[::2],"\n",b[1::2])
```

## List As Stack

<img src="https://s3.amazonaws.com/rfv2/stack.png" style="width: 350px;">
A list in python can be used as a stack and is flexible to support stack operations such as push and pop.

#### Push

Use the .append() command to push objects to a list.

#### Pop

Use the .pop() command to pop objects from a list.

### Exercise

* Use Stack operations on the vegetables list to first pop Jalapenos off the list, and then add Celery. Print the new list out.



In [20]:
vegetables = ["Tomato", "Spinach", "Asparagus", "Lettuce", "Jalapenos"]

# We recommend re-initializing variable 'vegetables' with Tomano, Spinach, Asparagus, Lettuce and Jalapenos at the beginning of this exercise. When you run the code push and pop operations change the data in the vegetables list. If you do not initialize the variable vegetables, when you execute the same code multiple times, you might get different results as the new code gets executed on changed list.
# Re-initialize the vegetables list by going back to the previous exercise and running it again or create a new vegetables list with in this exercise.
# Do this re-inializing step first, before each time you run the pop operation.
# pop jalapenos, push celery and print


## Solution

```python

vegetables.pop(-1)
vegetables.append("Celery")
vegetables

```

## List As Queue

A queue is a data structure similar to real-life queue systems. A queue follows FIFO process, i.e. First-in-First-out, for insertion and deletion of elements. For example, lets assume one end of a queue to be front and the other to be rear end. When the first element is inserted in this queue, because it is the only element present, it faces both front and rear end. But when the second element is inserted, this element faces the front end and the first element takes the rear end. From then on, every element that gets added faces the front end and the first element remains at the rear end, till its deleted. After its deletion, the second element takes rear end.

Python lists can be used as queues as well. The collections module contains a function called 'deque'. It converts any list variable into a 'double-ended queue'. Once the variable is converted into a double ended queue, elements can be added using the 'append' operation and deleted using the 'pop' operation in the FIFO (First-In-First-Out) order. The insertion of an element or the enqueue(append) operation is illustrated in the diagram below:

<img src="https://s3.amazonaws.com/rfv2/enqueue.png" style="width: 350px;">

Below given example code shows 'pop' operation on a list.

from collections import deque

fruit_queue = deque(["Apples", "Oranges", "Mango"])
fruit_queue.append("Banana")        
fruit_queue.popleft()  

The output of the block of code given above is:
'Apples'. The 'popleft' operation works by removing the first inserted element first (FIFO - First In, First Out) as shown below;

<img src="../../../images/deque.png" style="width: 350px;">

### Exercise

Convert the vegetables list into a queue with the name 'vegetables_queue' and pop the leftmost element. Print the remaining queue.

In [24]:
from collections import deque
vegetables = ["Tomato", "Spinach", "Asparagus", "Lettuce", "Jalapenos"]
vegetables.pop(-1)
vegetables.append("Celery")
# Read the code examples given in 'List as queue' lesson and apply the same
#enter your code here


## Solution

```python

vegetables_queue = deque(vegetables)
vegetables_queue.popleft()
print(vegetables_queue)

```

### Tuples

A tuple is a sequence of immutable sequential objects, i.e., Once they are declared, they cannot be edited or changed. Tuples <b>use parentheses</b> for declaration and access unlike lists and cannot be changed. It is an ordered sequence and used for objects such as coordinates that contain latitude, longitude. They can also be used to denote any data which you would want to be read only.

Example: 

```python
loc = (30.456, 50.436)
atup = (12,35,123,"hello")

# printing the tuple and confirming the data type of variable as tuple
print(atup,type(atup))

# Checking whether hello is present in the tuple and retrieving 3rd element of the tuple
print('hello' in atup, atup[2])

# Output
>>> (12, 35, 123, 'hello') <class 'tuple'>
>>> True 123
```

#### Packing and Unpacking a tuple

Tuples can be accessed using the indices, just like a list, array or string.

```python
atup = (12,35,123,"hello")
print(atup[2],atup[:2])

# Output
>>> 123 (12, 35)
```
However, using indices we cannot access specific objects unless we know their position in the tuple.

One of the main uses of tuples is when we are entering records of values pertaining to a certain sequence of keys, i.e., Lets say we have a table of data with name, address, email id as three columns.
* The column names are the keys and each row, pertaining to a specific individual/entity is a set of values.
* Assume that we are expected to provide read access to this data but not allow manipulation of the same.
* Each record/row can then be defined as a tuple.

Now packing a tuple is where in we define a tuple out of a sepcific set of values. For example:
```python
# This is called packing a tuple
Person_1 = ("John Doe", "john.doe@gmail.com", "31 Chandler Street, Phoenix, Arizona 02411")
```

Unpacking a tuple is where in we define the raw structure (referred here as first tuple) of a tuple, which contains labels (keys or variable names) and assign a tuple (containing values, this is referred to here as the second tuple) to it. The values from the second tuple get attached to the labels in the first tuple. An example:
```python
# This is called unpacking a tuple
(name, email, address) = Person_1

print("The name of the person is {:s} and their email id is {:s}".format(name,email))

# Output
>>> The name of the person is John Doe and their email id is john.doe@gmail.com
```
#### Adding an element to a tuple

Though the existing contents of the tuple are immutable, we can add additional objects to a tuple and create a new tuple, or even add two tuples to create a new one. The following are examples to show how to add additional objects to a tuple and also how to add two tuples to create a new one.

```python
loc = (30.456, 50.436)
atup = (12,35,123,"hello")

# adding two tuples
loc = loc + atup
# adding a new object to atup
atup = atup + ("I can add an object",)

# printing the new tuples
print(loc,type(loc),"\n",atup,type(atup))

# Output
>>> (30.456, 50.436, 12, 35, 123, 'hello') <class 'tuple'>
>>> (12, 35, 123, 'hello', 'I can add an object') <class 'tuple'>
```
Ref: https://docs.python.org/3/c-api/tuple.html

#### Exercise

Create a tuple with the following objects:
* Your name
* Your email id
* Your mailing address

Once the tuple is initialized, add your phone number to the tuple. You may use string type for all objects.
* Print the tuple and also the type (using type()) to confirm that the object is indeed a tuple

### Solution code

```python
# Sample example solution

bio_data = ("John Doe", "john.doe@gmail.com", "31 Chandler Street, Phoenix, Arizona 02411")

# adding phone number
bio_data = bio_data + ("602-123-4567",)

# printing the new tuples
print(bio_data,type(bio_data))
```

### Sets

A set is an unordered <b>collection of unique objects</b>. Sets are collection types and very useful for Mathematicians, Statisticians and Data Scientists. It is equivalent to the mathematical definition of sets. Sets in python are defined using the function 'set()'. Any list, or array type can also be recast into a set, using the set() function.

Note that when a list containing multiple occurrences of a specific element is passed to a set() function, the resulting set contains only unique elements (single occurrence) and not all occurrences.

```python
# Example 1
even_number_set = set([0, 2, 4, 6])

# Example 2
list_A = [25,1,2,3,"Hello",2,4,3,2,4,7,"Hello"]

print(even_number_set, "\n", set(list_A))

# Output
>>> {0, 2, 4, 6}
>>> {'Hello', 1, 2, 3, 4, 7, 25}
```

Note that by definition a set is an <b>'unordered'</b> collection of objects. In the above output, the set form of list_A is printed in a sorted order. However, that is merely the output of the set() function and that is not how the elements in the set itself are organized. The set does not support indexing as the elements/objects are not stored in any specific order.

<b>Using sets to reduce the search space:</b>
One way we can use sets is if we would like to check the existence of a certain object in a very large collection of frequently repeating objects. We may create a set, which has only unique elements and thereby reduces the list size drastically. Then the decreased range is easier to iterate through and find the existence of the current object using the 'in' membership operator.

Ref: https://docs.python.org/3/c-api/set.html

### Exercise

Convert the list given below into a set:
* a = [0.00000001,0.0000001,0.0000001,0.00000001,0.00000001,0.00000001,0.0000001,0.0000001,0.0000001,0.000001,0.00000001,0.0000001,0.00000001,0.00000000001,0.000000001,0.00000001,0.0000000001,0.000000001,0.00000000001,0.0000000000001,0.000000001,0.0000000000001,0.0000000001,0.0000000001,0.00000001,0.00000001,0.00000001,0.000000001,0.000000001,0.000000001,0.00000000000001,0.0000000000001]

* Print the lengths of both the list and the set in order to understand the maximum number of comparisons needed to search for an object within the list vs within the set

In [10]:
# data
a = [0.00000001,0.0000001,0.0000001,0.00000001,0.00000001,0.00000001,0.0000001,0.0000001,0.0000001,0.000001,0.00000001,0.0000001,0.00000001,0.00000000001,0.000000001,0.00000001,0.0000000001,0.000000001,0.00000000001,0.0000000000001,0.000000001,0.0000000000001,0.0000000001,0.0000000001,0.00000001,0.00000001,0.00000001,0.000000001,0.000000001,0.000000001,0.00000000000001,0.0000000000001]

The list is [1e-08, 1e-07, 1e-07, 1e-08, 1e-08, 1e-08, 1e-07, 1e-07, 1e-07, 1e-06, 1e-08, 1e-07, 1e-08, 1e-11, 1e-09, 1e-08, 1e-10, 1e-09, 1e-11, 1e-13, 1e-09, 1e-13, 1e-10, 1e-10, 1e-08, 1e-08, 1e-08, 1e-09, 1e-09, 1e-09, 1e-14, 1e-13]
The length of list is: 32
The set is {1e-09, 1e-08, 1e-10, 1e-14, 1e-13, 1e-07, 1e-06, 1e-11}
The length of set is: 8


### Solution code

```python
# data
a = [0.00000001,0.0000001,0.0000001,0.00000001,0.00000001,0.00000001,0.0000001,0.0000001,0.0000001,0.000001,0.00000001,0.0000001,0.00000001,0.00000000001,0.000000001,0.00000001,0.0000000001,0.000000001,0.00000000001,0.0000000000001,0.000000001,0.0000000000001,0.0000000001,0.0000000001,0.00000001,0.00000001,0.00000001,0.000000001,0.000000001,0.000000001,0.00000000000001,0.0000000000001]

# Printing the list vs set
print('''The list is {}
The length of list is: {:d}
The set is {}
The length of set is: {:d}'''.format(a,len(a),set(a),len(set(a))))
```

### Dictionary

Dictionaries are hash maps containing key-value pairs. 
* Keys are unique identifiers which are similar to an index used to identify a specific value or set of values
* Values are said to be the observed attributes associated with a specific key
* Keys are unique whereas values need not be unique, i.e., two keys can have same value, but two keys can't be the same
* Values are accessed via the keys

Keys can be numbers or strings

Example: A dictionary of city to state:
```python
location_dict = {'Boston': 'MA', 'Chicago': 'IL', 'New York': 'NY'}
```

Ref: https://docs.python.org/3/c-api/dict.html

### Exercise

Build a location map that contains latitude and longitude of the following cities.

Boston: 42.318365, -71.086692<br>Chicago: 41.797568, -87.620958<br>New York: 40.685526, -73.887406<br><br>
Assign it to the variable location_map 

Print out the variable location_map

In [13]:
# Example 1
even_number_set = set([0, 2, 4, 6])

# Example 2
list_A = [25,1,2,3,"Hello",2,4,3,2,4,7,"Hello"]

print(even_number_set, "\n", set(list_A))

{0, 2, 4, 6} 
 {'Hello', 1, 2, 3, 4, 7, 25}


TypeError: 'set' object does not support indexing

In [28]:
location_map = {}
#Hint: Use dictionary and tuples.


## Solution

```python

location_map = {'Boston': (42.318365, -71.086692),
                'Chicago': (41.797568, -87.620958), 
                'New York': (40.685526, -73.887406)}
print(location_map)

```

### Nested Lists

In real-world data processing you may often need to write programs to store tabular data, such as a set of rows and columns. You may also have the need to create a list where each element of the list is in turn a list. This structure is called a list of lists or simply a nested list. A nested list could have multiple levels.  
A simple list of lists, where all elements are homogenous (same data type) in nature is similar to a two-dimensional array. In Python any table can be represented as a list of lists or nested list. 

An example code on creation of a list of lists, is given below:
```
list_of_lists = [[1, 2, 3], [4, 5, 6]]
print(list_of_lists[0])
print(list_of_lists[1])
```

Ref: https://docs.python.org/3.6/tutorial/datastructures.html#nested-list-comprehensions

#### Exercise

Given a list of lists matrix = [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]], write a python program to transpose the input matrix and set the result in a variable transposed.

In [31]:
matrix = [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]




[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]



## Solution

```python

transposed = []
for i in range(3):
    transposed_row = []
    for row in matrix:
        transposed_row.append(row[i])
    transposed.append(transposed_row)

print(transposed)

```

## Nested Dictionaries

In Python, a nested dictionary is a dictionary inside a dictionary. It's a collection of dictionaries into one single dictionary.

In the below tomato 'lotsnone
,''
```python
fruits_and_veggies = { 'fruits': {'apple': 1,'oranges': 3, 'grapes': 'lots'},
                'vegetables': {'tomato': 10,'potatoes': 'none', 'banana' : 'Not a veggie'}}

```

### Exercise

In the fruits_and_veggies dictionary, add another dictionary called 'juices' with the following item '{'pineappleJuice' : 10}'

In [1]:
fruits_and_veggies = { 'fruits': {'apple': 1,'oranges': 3, 'grapes': 'lots'},
                'vegetables': {'tomato': 10,'potatoes': 'none', 'banana' : 'Not a veggie'}}


## Solution

```python

fruits_and_veggies['juices'] = {'pineappleJuice' : 10}
fruits_and_veggies

```