---

# Python Part 2. Lists, Tuples, and Sets

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RandyRDavila/Data_Science_and_Machine_Learning_Spring_2022/blob/main/Lecture_1/Python_Part_2_Lists_Tuples_and_Sets.ipynb)



## 2.a. Lists

Python lists are ordered objects that can contain any type that you want. For example, run the following code in the cell below:
```python
# List of numbers
num_list = [1, 3.5, 1.2, 2, 10, 6]
print(num_list)

# List of strings and numbers
num_str_list = [1, "one", 4, "five"]
print(num_str_list)

# Nested list
nested_list = [num_list, num_str_list]
print(nested_list)
```



---

---

Anytime you are dealing with sequential Python objects you may count the number of elements with the ```len()``` function. So if we would like to count the number of items in ```num_list``` we run the following code in the cell below:
```python
print(f"The number of items in {num_list} is {len(num_list)}")
```

---

---

## List Indexing


Since sequences are *ordered* and since lists are sequences of python objects, it is natural to assume that items in a list have a *location* specified by a given index. In Python (and most other programming languages) indexing starts at 0. For example try running the following code in the cell below:
```python
sample_list = ["a", "b", "c", "d"]
print(f"{sample_list} has length {len(sample_list)}")
print(f"The 0 index element is: {sample_list[0]}")
print(f"The 1 index element is: {sample_list[1]}")
print(f"The 2 index element is: {sample_list[2]}")
print(f"The 3 index element is: {sample_list[3]}")
```


---

---

As you might have noticed in the above code, our final index ended with 3, which is equal to the length of ```sample_list```-1. If we do not know the length of the list in question and would like to start indexing at the *end* of the list, we can use the following code which you should run in the cell below:
```python
print(f"{sample_list} has length {len(sample_list)}")
print(f"The -1 index element is: {sample_list[-1]}")
print(f"The -2 index element is: {sample_list[-2]}")
print(f"The -3 index element is: {sample_list[-3]}")
print(f"The -4 index element is: {sample_list[-4]}")

```

**What do you observe?** What do you notice about the starting and ending values of the indices when starting from the end of the list?


---

---

## List Slicing

Anytime you can index through an object in Python you can also *slice* through the object. For example, try running the following code in the cell below:
```python
print(f"sample_list = {sample_list} \n")
print(f"sample_list[1:4] = {sample_list[1:4]} \n")
```
**Note.** The syntax ```sample_list[a:b]``` means every entry of your ordered and iterable ```sample_list``` starting at index ```a``` and ending at index ```b-1```.

---

---

## ```sum()```, ```max()```, and ```min()```

When your list contains numerical values we can find the sum of elements in the list, find the maximum of the elements in the list, and find the minimum of the elements in our list with the base Python functions ```sum()```, ```max()```, and ```min()```, respectively. For example, try running the following code in the cell below:
```python
sample_list = [10, 12, 2, 1.5, .25, 6, 8]

title = f"sample_list = {sample_list} \n"

print(title)
print("-"*len(title), "\n")

print(f"The sum of numbers in sample_list = {sum(sample_list)} \n")
print(f"The maximum of elements in sample_list = {max(sample_list)} \n")
print(f"The minimum of elements in sample_list  = {min(sample_list)} \n")

print("-"*len(title), "\n")

```

**Challenge.** The function calls of ```sum()```, ```max()```, and ```min()``` in this code should be clear, but what else is happening with this code? What is happening with ```print("-"*len(title), "\n")```?



---

---

As junior data scientist we should always be aware of statistical computations! With the above base Python functions we can easily compute the mean of ```sample_list``` by running the following code in the cell below:
```python
print(f"The mean of sample_list = {sum(sample_list)/len(sample_list)}")
```

---

---

### List Methods

As mentioned in a previous lesson, everything in Python is an object, and all objects have *attributes* and *methods*. Lists are no exception. Because lists are so useful in data science, I include all list methods below with no specific ordering:

1. ```append()```:	Adds an element at the end of the list
2. ```clear()```:	Removes all the elements from the list
3. ```copy()```:	Returns a copy of the list
4. ```count()```:	Returns the number of elements with the specified value
5. ```extend()```:	Add the elements of a list (or any iterable), to the end of the current list
6. ```index()```:	Returns the index of the first element with the specified value
7. ```insert()```:	Adds an element at the specified position
8. ```pop()```:	Removes the element at the specified position
9. ```remove()```:	Removes the first item with the specified value
10. ```reverse()```:	Reverses the order of the list
11. ```sort()```:	Sorts the list


While looking at the descriptions of these methods you might notice methods 1, 2, 5, 7, 8, 9, 10, and 11 do not return a new list, but wrather, *modify* the current list in question. This indicates that Python lists are **mutable**, meaning that we can modify the entries of a given instance of a list. For example, run the following code in the cell below:
```python
sample_list = [10, 2, 9, 7]
print(f"sample_list before applying the append method: {sample_list} \n")
sample_list.append("apple")
print(f"sample_list after applying the append method: {sample_list} \n")

```

---

---

## Mutability Warning with List Methods!

Try running the code below in the following cell. 
```python
x = [10, 2, 9, 7]
print(f"x before applying the append method: {x} \n")
x = x.append("apple")
print(f"x after applying the append method: {x} \n")

```

What do you notice?

**The function mutate the variable x**

---

---

Assigning to a new variable to a list variable **does not make a copy**, it makes an additional name. For example, run the following code in the cell below:

```python
x = [10, 2, 9, 7]
y = x
print(f"x before applying the append method: {x} \n")
y.append("apple")
print(f"x after applying the append method ON y: {x} \n")

```

---

---

```y = x``` does not create a new variable and make a copy. It gives the list [10, 2, 9, 7] another name y. It should be done by the function ```copy()```:

```python
x = [10, 2, 9, 7]
y = x.copy()
print(f"x before applying the append method: {x} \n")
print(f"y before applying the append method: {y} \n")
y.append("apple")
print(f"x after applying the append method ON y: {x} \n")
print(f"y after applying the append method ON y: {y} \n")

```

---

---

The fact that Python lists are mutable is very important to remember, and also very useful. In data science we are often concerned with sorting lists of numerical values. With the ```sort()``` method we can achieve this goal easily. For example, run the following code in the cell below:
```python
sample_list = [10, 2, 9, 7]
print(f"sample_list before applying the sort method: {sample_list} \n")
sample_list.sort()
print(f"sample_list after applying the sort method: {sample_list} \n")

```



---

---

As you may notice, the ```sort()``` list method sorts the numerical entries (what happens with strings?) in non-decreasing order. But what if we would like to sort the entries of of ```sample_list``` in non-increasing order? As with many things in python, we should first check this with the built in Python ```help()``` function. Try running the following code in the cell below:
```python
sample_list = [10, 2, 9, 7]
help(sample_list.sort)
```




---

---

After running the ```help()``` function on the ```sort()``` list method you should see:

----

```python
Help on built-in function sort:

sort(*, key=None, reverse=False) method of builtins.list instance
    Sort the list in ascending order and return None.
    
    The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
    order of two equal elements is maintained).
    
    If a key function is given, apply it once to each list item and sort them,
    ascending or descending, according to their function values.
    
    The reverse flag can be set to sort in descending order.

```

---

If we look at the output from the ```help()``` function we notice the following line:
```Python
The reverse flag can be set to sort in descending order.
```

What this tells me is that we need provide a *keyword argument* in order to sort our list in a descending order.  In the case of the ```sort()``` list method, you can try running the following code in the cell below:
```Python
sample_list = [10, 2, 9, 7]
print(f"sample_list before applying the sort method: {sample_list} \n")
sample_list.sort(reverse = True)
print(f"sample_list after applying the sort method (with reverse = True): {sample_list} \n")
```


**Note.** In Python, the Boolean value of true is ```True``` and the boolean value of false is ```False```.

---

---

The ```extend()``` joins one list to another in a similar manor to that of string concatenation. For example, run the following code in the cell below:
```python
a = [1, 2, 3, 4, 5]
b = ["a", "b", "c", "d"]

a.extend(b)
print(a)

```

---

----

Finally we mention the ```count()``` method in relation to probabilities. Recall that the probability of event $A$ in a sample space $S$ is given by the equation:

$$
\mathbb{P}(A) = \frac{|A|}{|S|}.
$$

Since the ```count()``` method counts the number of times an element appears in list, we can easily find the probability of selecting an element in a list by combining this method with the ```len()``` function as follows:
```python
sample_space = ["apple", "banana", "apple", "blueberry", "lime", "apple"]
number_of_apples = sample_space.count("apple")

print(f"P(apple) = {number_of_apples/len(sample_space)}")

```


----

---

## 2.b. Tuples

Python provides another type that is an ordered collection of objects, called a **tuple**.
Tuples are identical to lists in all respects, except for the following properties:

1. Tuples are defined by enclosing the elements in parentheses ```()``` instead of square brackets ```[]```.

2. Tuples are immutable.

These are the only real differences. For example run the following code in the cell below:
```python
a = (1, 2, 3, 4, 5)
b = ("a", "b", "c", "d")

print(f"a = {a}")
print(f"len(a) = {len(a)}")
print(f"sum(a) = {sum(a)}")
print(f"min(a) = {min(a)}")
print(f"max(a) = {max(a)} \n")

print(f"b = {b}")
print(f"b[0] = {b[0]}")
print(f"b[-1] = {b[-1]}")
print(f"b[1:4] = {b[1:4]}")

```

---

---

## 2.c. Sets

Python provides yet another type called **sets**. These sets are made to represent the formal mathematical definition of sets. Namely, unordered collections of elements without repetition of elements. So it is important to remember: 

* Sets are unordered.
* Set elements are unique. Duplicate elements are not allowed.
* A set itself may be modified, but the elements contained in the set must be of an immutable type.


A set can be created in two ways. Run the following code in the cell below:
```python
set_one = {"a", "b", "c", "d"}
set_two = set(["c", "d", "e"])

print(f"set_one = {set_one}")
print(f"set_two = {set_two}")

```



---

---

Sets by definition contain unique elements. So, if you try and define a set with multiple instances of the same element, only one instance will be saved in the set. Try running the following code in the cell below:
```python
x = {"a", "a", "b", "c", "c", "d"}
print(f"x = {x}")

```

---

---

This removal of duplicate entries can be especially useful when you would like to see the unique elements in a list of data. For example:
```python
data = [100, 90, 100, 30, 30, 40, 30]
print(f"data = {data}")
print(f"unique data entries = {list(set(data))}")

```

**Note.** In this code we first convert ```data``` to a set, which removes duplicate entries, and then convert back to a list using the ```list()``` function. 

---

---

## Set Operations

Mathematical sets have several fundamental operations:
* The union of A and B = all elements contained in A or in B
* The intersection of A and B = all elements contained in A and in B
* A setdifference B = all elements in A and not in B
* A symmetric difference B = All elements of A and B not contained in the intersection of A and B


Python has set methods for each of these operations. This can be illustrated by running the following code in the cell below:
```python
A = {"a", "b", "c", "d", "e"}
B = {"c", "d", "e", "f", "g"}

print(f"A = {A}")
print(f"B = {B} \n")
print("---------------- Set Operators ---------------- \n")
print(f"Set Union: A or B = {A.union(B)} \n")
print(f"Set Intersection: A and B = {A.intersection(B)} \n")
print(f"Set Difference: A - B = {A.difference(B)} \n")
print(f"Set Symmetric Difference: A (+) B = {A.symmetric_difference(B)} \n")

```

---

----

For more on Python sets and set operations see the excellent article by RealPython.com by clicking [here](https://realpython.com/python-sets/)



---