# 2. Lists and Sets

## Learning objectives
- Understand the nature of a list.
- Know how to index and slice a list.
- Know some ways to add items to a list.
- Know some methods associated with a list.
<br> <br>
- Understand the nature of a set.
- Learn how to add to and update a set.
- Learn how to join sets using different methods.

## Lists

- A powerful data type in Python.
- It is denoted by square brackets [].
- Lists store items as a mutable ordered sequence of elements.
- Each element in a list is an item.
- Indexing and slicing operations apply to lists.
- Lists can be nested several levels deep.

![Lists](images/lists.png)

### Term Definitions
- Mutable: indicates that the item can be changed after creation (supports addition/removal/reassignment of items).
- Ordered: indicates a fixed order (the order of elements provided at the time of assignment); thus, indexing can occur using numbers. For example, the list, \[2,1,3,4\], will remain unchanged; 2 will remain the first element, 1 will remain the second, ..., etc.
- Sequence of elements: fairly self explanatory.

In [None]:
# can handle multiple object types.
my_list = [3, "three", 3.0, True]

In [None]:
my_list[0]

In [None]:
sentence = 'This is just a sentence'
sentence.split('s')

In [None]:
# can use the type() method to determine the data type of an object.
type(3)

In [None]:
type("three")

In [None]:
type(3.0)

In [None]:
type(True)

In [None]:
type(my_list)

## Indexing and Slicing

In Python, slicing

- begins at index 0 (zero).
- is inclusive of the lower bound (including).
- is exclusive of the upper bound (up to, but not including).

In [3]:
my_list = ['John', 'Paul', 'George', 'Ringo']
my_list

['John', 'Paul', 'George', 'Ringo']

In [None]:
# Index 0 outputs the first element.
my_list[0]

In [None]:
# Use a colon to indicate a slice; 1:3 returns the 2nd and 3rd items, but not the 4th.
my_list[0:3]

In [None]:
# In the absence of an upper bound, the slicing operation begins with the 1st index indicated and outputs everything beyond.
my_list[1:]

In [None]:
# In the absence of a lower bound, the slicing begins from index 0 up to, but not including, the upper bound.
my_list[:3]

In [None]:
# The third argument defines the step size.
my_list[::2]

In [None]:
# When a negative index is applied, the slicing starts from the end.
print(my_list[-1])
print(my_list[-2])
# You can define the step size as negative, which will go backwards.
print(my_list[::-1])

In [None]:
# Elements in a list can be reassigned to new values.
my_list[1] = my_list[1].upper()
my_list

In [None]:
# Elements can be added to lists; however, this does not change the original list.
my_list + ['Yoko']
print(my_list)

In [None]:
# To change the original list, you must reassign it.
my_list = my_list + ['Yoko']

my_list

In [None]:
# The list can be reassigned to a slice of the original list.

my_list = my_list[:4]

my_list

In [None]:
# The list can be multiplied by integer x to replicate the original list x times.
print(my_list * 2)
# As mentioned previously, to make changes to the original list, it must be reassigned.
my_list

In [None]:
# A list can be contained within another list.
# The concept of inserting a list into another is called nesting.
lst_1=[1,2,3]
lst_2=[4,5,6]
lst_3=[7,8,9]

# A nested list is formed by making a list of lists.
nest_list = [lst_1,lst_2,lst_3]
nest_list

>Note that it is possible to nest lists several levels deep.

In [None]:
print('Accessing second item', nest_list[1])
print('Accessing second item of the second item', nest_list[1][1])

## List Functions and Methods

Here, we review some of the most common functions and methods that apply to lists.

The difference between functions and methods is that functions are not bound to a specific data type, whereas methods are. In other words, a method is a function that is specific to a data type.

Therefore, all methods are functions, but not all functions are methods.

Here, we examine the most common methods and functions in Python for lists. We encourage you to explore the official Python documentation [here](https://docs.python.org/3/tutorial/datastructures.html) for an exhaustive list.

### `len`

The `.len()` function counts the number of items in a data type containing multiple elements.

In [None]:
# We redefine my_list in order not to lose track of its contents.
my_list = ['John', 'Paul', 'George', 'Ringo']
len(my_list)

Note that the `len` function does not work for numbers or floats. It works for data types that are collections, such as lists or strings.

In [None]:
len(3)

### `min` and `max`

Apply min() and max() to find the highest and lowest items in a list, respectively.

In [None]:
num_list = [3, 7, 1, 9, 42]
print(min(num_list))
print(max(num_list))


When used in lists with strings, they work in alphabetical order.

In [8]:
print(min(my_list))
print(max(my_list))

George
Ringo


### `.append()` and `.extend()`
- The .append() method adds items to the end of a list.
- The .extend() method adds the items in a list (or other iterables) itemwise to the end of the list.
- The difference between the two is shown below.

In [None]:
# add items by the .append() method
my_list = ['John', 'Paul', 'George', 'Ringo']
my_list.append(['Lennon','McCartney','Harrison', 'Starr'])

print(my_list)

Notice that with the `append` method, we did not have to reassign the list to a new variable. Rather, the original value of the variable was changed. 

This same principle applies to the `extend` method.

In [None]:
# add items in iterable itemwise by the .extend() method.
my_list = ['John', 'Paul', 'George', 'Ringo']
my_list.extend(['Lennon','McCartney','Harrison', 'Starr'])

my_list

### `.insert()`

The `.insert()` method adds an item to a specific index. Note that lists are 0 indexed.

In [12]:
my_list = ['John', 'Paul', 'George', 'Ringo']
my_list.insert(1, 'Lennon')
my_list

['John', 'Lennon', 'Paul', 'George', 'Ringo']

#### <font size=+2> `.pop()` </font>

`pop` removes, by default, the last item. Similar to `append`, `extend` and `insert`, it changes the original state of the variable. However, this method returns the removed item.

In [None]:
# last_item will be assigned the value of the last element.
last_item = my_list.pop()
last_item

Consequently, `my_list` will contain one element less.

In [None]:
my_list

The index to be removed can be specified.

In [None]:
# can index, default index -1
my_list.pop(0)
my_list

#### <font size=+2> `.remove()` </font>

Another method for removing elements in a list is the `.remove()` method. This method finds and removes a specified item from a list.

In [40]:
my_list = ['John', 'Lennon', 'Paul', 'George']
my_list.remove('John')
print(my_list)

['Lennon', 'Paul', 'George']


Please exercise caution! If the specified item does not exist in the list, Python will throw an error.

In [None]:
my_list = ['John', 'Lennon', 'Paul', 'George']
my_list.remove('Yoko')

### `.sort()` and `.reverse()`

`sort` orders a list based on the value of its elements.

In [3]:
# using the .sort() method to sort a list changes the original list, and no value is returned.

let_list = ["a", "d", "v", "x", "g"]

num_list = [13,42,4,24,2,46,3,7]

In [4]:
let_list.sort()
num_list.sort()

In [5]:
print(let_list)
print(num_list)

['a', 'd', 'g', 'v', 'x']
[2, 3, 4, 7, 13, 24, 42, 46]


In [6]:
A = ["G06 WTR", "WL11 WFL", "QW68 PQR"]
print(A.sort())

None


The `.reverse()` method is employed to order a list in the reverse order.

In [None]:
# use the .reverse() method to reverse a list order.
num_list.reverse()

print(num_list)

### `.join()` 

`join` is actually a string method, although it accepts lists as arguments. It combines every item in the list, separated by the string to which the method was applied.

In [None]:
list_of_strings = ["This", "is", "a", "sentence."]

print("    Random String    ".join(list_of_strings))

### `.index()` 

The `.index()` method is employed to determine the index of a certain element in a list.

In [37]:
# my_list.remove('PAUL')
my_list = ['John', 'Lennon', 'Paul', 'George']
idx = my_list.index('Lennon')
print(idx)

1


Please exercise caution! If the specified item is not in the list, Python will throw an error.

In [None]:
my_list = ['John', 'Lennon', 'Paul', 'George']
idx = my_list.index('Yoko')
print(idx)

## A Brief Introduction to Sets

- Sets are a data type in Python.
- They follow the rules of mathematical sets that you should already be familiar with.
- They are mutable and unordered, and they do not contain repeated items (items are unique).
- This means one useful usage of a set is to find all unique items in a list, as we will see
- Sets also have their own methods, with operations derived from mathematical sets.

We can define a set using the `set` method to cast, for example, a list into a set. If the list contained repeated elements, they will be removed in the set.

In [18]:
my_set = set([1, 2, 3, 4, 4, 4, 6])
print(my_set)

{1, 2, 3, 4, 6}


Observe above that number 4 appears only once in the set.

Also, observe that sets are represented by curly brackets (`{}`). This is the second way to define a set, using curly brackets when assigning it to a variable

In [19]:
my_set = {1, 2, 3, 4, 4, 4, 6}
print(my_set)

{1, 2, 3, 4, 6}


As mentioned above, sets are unordered and mutable. Mutable means that we will be able to change its content, as we will see later in this notebook

Unordered means that its elements don't have a specific order, and therefore sets can't be indexed.

In [20]:
# Trying to index a set
my_set[1]

TypeError: 'set' object is not subscriptable

After running the code above, we obtain a `TypeError`

### Sets functions and methods

We can retrieve the number of elements in a set using the `len` method, just like in a list

In [22]:
my_set = {1, 5, 3, 6, 7, 5, 4, 5, 5, 5, 6}
len(my_set)

6

And retrieve the minimum and maximum value in the set using the `min` and the `max` functions

In [23]:
min(my_set)

1

#### <font size=+2>`.add()`</font>

`add` (as the name suggests) adds an item to the set. As mentioned, sets are unordered, so it doesn't matter where we add it

In [24]:
set_x = set()

print(set_x)

set_x.add(1)

print(set_x)

set_x.add(2)

print(set_x)

# if we add 2 again, we see the set does not change, as items in a set are unique

set_x.add(2)

print(set_x)

set()
{1}
{1, 2}
{1, 2}


#### <font size=+1>Mathematical Operations on Sets</font>

Sets in Python share the same principles as sets in maths, so you can use the same operations. 

The most common ones are `Union`, `Intersection`, `Difference`, and `Symmetric Difference`

<p align=center><img src=images/sets.png width=400></p>

#### <font size=+2>`.union()`</font>

`union` essentially takes one set and it will add all its elements to another set

In [27]:
set_1 = {'Dog', 'Cat', 'Platypus', 'Koala'}
set_2 = {'Crocodile', 'Hyena', 'Koala', 'Cat'}
print(set_1)
print(set_2)
union_set = set_1.union(set_2)
print(union_set)

{'Platypus', 'Koala', 'Dog', 'Cat'}
{'Crocodile', 'Koala', 'Hyena', 'Cat'}
{'Platypus', 'Crocodile', 'Dog', 'Koala', 'Hyena', 'Cat'}


Once again, the obtained set doesn't contain repeated values

#### <font size=+2>`.intersection()`</font>

`intersection` returns a set containing the items common in both sets

In [28]:
inter_set = set_1.intersection(set_2)
print(inter_set)

{'Koala', 'Cat'}


#### <font size=+2>`.difference()`</font>

`difference` returns a set with the items that are in `set_1` but not in `set_2`

In [None]:
# a.difference(b) returns the items in a that are NOT in b
differ_set = set_1.difference(set_2)
print(differ_set)

#### <font size=+2>`.symmetric_difference()`</font>

`symmetric_difference` returns a set with the items that are in `set_1` and `set_2`, but without the items that are in BOTH

In [31]:
differ_set = set_1.symmetric_difference(set_2)
print(differ_set)

{'Crocodile', 'Platypus', 'Dog', 'Hyena'}


## Summary
We now understand:
- The nature of lists, and sets.
- The basic concept of mutability.
<br><br>

We now know:
- How to index and slice lists.
- List functions and methods including len(), .append(), .extend() etc.
- How to use a set to find the unique values in a list.

<br>

Please use this notebook as a reference, and refer to the links below for more information.

## Further reading
- List methods: https://docs.python.org/3/tutorial/datastructures.html
- Sets: https://docs.python.org/3/library/stdtypes.html#set