# Container sequences

A container sequence gets its name because it holds a sequence of elements of other data types. In the data science world, we'll use these containers to store our data for aggregation, order, sorting, and more. Python provides several container sequences such as lists, sets, and tuples to name a few. 

They can be mutable meaning that they can have elements added and removed from them. Immutability, not able to be altered, allows us to protect our reference data, and replace individual data points with sums, averages, derivations, etc. 

We can iterate, also known as looping, over the data contained within these containers. Being able to iterate over these sequences allows us to group data, aggregate it, and process it over time. 

Let's start with learning about container types by looking at lists.



## Lists

Often we need to hold an ordered collection of items, and lists allow us to do just that. Lists are mutable so we can add or remove data from them. Lists also allow us to access an individual element within them using an index.

In [12]:
# Create a list containing the names: baby_names
baby_names = ['Ximena', 'Aliza', 'Ayden', 'Calvin']
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin']

### .append() method

If we want to add individual data elements to a list, we can do it by using the .append() method. 

In [13]:
baby_names.append("anik")
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik']

In [14]:
new_baby_names = ["shafin", "nahin"]

# Now if we want to add these names we can use append method

baby_names.append(new_baby_names)
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik', ['shafin', 'nahin']]

But we can see there's a problem. It adds the whole list while we want the names of that listWe can solve this by indexing. But first we'll have remove the list of new baby names. Which we can do by remove method.

### .remove() method

In [15]:
baby_names.remove(new_baby_names)
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik']

Now if we want to add the names, we can use indexing

In [16]:
baby_names.append(new_baby_names[0])
baby_names.append(new_baby_names[1])

baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik', 'shafin', 'nahin']

Now we can see that we can add many names by append method and indexing. But in this way it'll be difficult when there will be so many names. To solve this issue we can use extend method.

### .extend() method

If you want to combine a list with another array type (list, set, tuple), you can use the .extend() method on the list.

In [23]:
new_baby_names2 = ["baby1", "baby2"]

baby_names.extend(new_baby_names2)
print(baby_names)

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik', 'shafin', 'nahin', 'baby1', 'baby2']


### Another way of removing elements

Suppose we want to remove the last element of the list. We can use pop method.

### .pop() method

In [24]:
baby_names.pop()
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik', 'shafin', 'nahin', 'baby1']

So, we can see in the baby names last name baby2 has been removed. But to make this method covenient we can use index method to find out the specific index for the element or elements we want to remove.

### .index() method

Suppose we want to remove baby1 from the list. So first we can find out the index and remove it

In [25]:
idx = baby_names.index('baby1')
idx

7

In [26]:
## Now we can remove it by pop() method

baby_names.pop(idx)
baby_names

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'anik', 'shafin', 'nahin']

We can also save the removed value to a variable by pop() method.

In [29]:
removed_name = baby_names.pop(baby_names.index("anik"))
removed_name

'anik'

### Sorting lists

Python provides the sort method and sorted function that accepts an iterable such as a list and returns a new list with the elements in the proper order.

In [37]:
courses = ['math', 'physics', 'chem', 'biology', 'comsci']
courses

['math', 'physics', 'chem', 'biology', 'comsci']

### .sort() method

.sort() method changes the whole list

In [38]:
c = courses.sort()
print(c)
print(courses)

None
['biology', 'chem', 'comsci', 'math', 'physics']


We can see that courses variable no longer the same. This can lead to many mistakes if we do any big projects. To slove this we can use sorted function

### sorted function 

In [39]:
courses2 = ['math', 'physics', 'chem', 'biology', 'comsci']
courses2

['math', 'physics', 'chem', 'biology', 'comsci']

In [40]:
d = sorted(courses2)

print(d)
print(courses2)

['biology', 'chem', 'comsci', 'math', 'physics']
['math', 'physics', 'chem', 'biology', 'comsci']


here we can see sorted function doesn't change the variable elements

### reversing a sorted list

We can also reverse the sorted list 

In [41]:
d

['biology', 'chem', 'comsci', 'math', 'physics']

In [42]:
e = sorted(d, reverse=True)
e

['physics', 'math', 'comsci', 'chem', 'biology']

###  .insert() method

In this case you can add data anywhere in the list by using indexing

In [44]:
courses

['biology', 'chem', 'comsci', 'math', 'physics']

Suppose we want to add a subject in the first index

In [45]:
courses.insert(0, "bangla")
print(courses)

['bangla', 'biology', 'chem', 'comsci', 'math', 'physics']


## Tuples

The next container type we want to learn about is the tuple. Tuples are widely used internally many of the systems we depend on like databases.

Tuples are very much like lists they hold data in the order, and we can access elements inside a tuple with an index. However, the similarities end there. 

Tuples are easier to process and more memory efficient than lists. Tuples are immutable, which means we can't add or remove elements from them. This is powerful because we can use them to ensure that our data is not altered. 

We can create tuples by pairing up elements. Finally, we can use something called unpacking to expand a tuple into named variables that represent each element in the tuple. 

### Zipping tuples

Often, we'll have lists where we want to matchup elements into pairs, and the zip function enables us to do that. 

In [46]:
girl_names = ['JADA', 'Emily', 'Ava', 'SERENITY', 'Claire', 'SOPHIA', 'Sarah', 'ASHLEY', 'CHAYA']
boy_names = ['JOSIAH', 'ETHAN', 'David', 'Jayden', 'MASON', 'RYAN', 'CHRISTIAN', 'ISAIAH', 'JAYDEN']

Here we can see that the letter cases of the names are not the same. We can use many methods to solve this. Here we'll use .title() method to resolve this.

In [49]:
g_names = []

for i in girl_names:
    g_names.append(i.title())

print(g_names)

['Jada', 'Emily', 'Ava', 'Serenity', 'Claire', 'Sophia', 'Sarah', 'Ashley', 'Chaya']


In [50]:
b_names = []

for j in boy_names:
    b_names.append(j.title())

print(b_names)

['Josiah', 'Ethan', 'David', 'Jayden', 'Mason', 'Ryan', 'Christian', 'Isaiah', 'Jayden']


Now we can pair up the girl names and boy names

In [52]:
pairs = list(zip(g_names, b_names))
print(pairs)

[('Jada', 'Josiah'), ('Emily', 'Ethan'), ('Ava', 'David'), ('Serenity', 'Jayden'), ('Claire', 'Mason'), ('Sophia', 'Ryan'), ('Sarah', 'Christian'), ('Ashley', 'Isaiah'), ('Chaya', 'Jayden')]


We can see pair looks like a list of Tuples. Notice that the tuples use a parenthesis as their object representation.

### Unpacking tuples

Now let's look at how we can use unpacking with a tuple. Tuple unpacking, also sometimes called tuple expansion, allows us to assign the elements of a tuple to named variables for later use. This syntax allows us to create more readable and less error prone code.

In [53]:
g_name_2, b_name_2 = pairs[1]
print(g_name_2)
print(b_name_2)

Emily
Ethan


#### More unpacking in Loops

Unpacking is especially powerful in loops.

In [55]:
for g_n, b_n in pairs:
    print(f"Girl_Name: {g_n}")
    print(f"Boy_Name: {b_n}")

Girl_Name: Jada
Boy_Name: Josiah
Girl_Name: Emily
Boy_Name: Ethan
Girl_Name: Ava
Boy_Name: David
Girl_Name: Serenity
Boy_Name: Jayden
Girl_Name: Claire
Boy_Name: Mason
Girl_Name: Sophia
Boy_Name: Ryan
Girl_Name: Sarah
Boy_Name: Christian
Girl_Name: Ashley
Boy_Name: Isaiah
Girl_Name: Chaya
Boy_Name: Jayden


###  Enumerating positions

Often we want to know what the index is of an element in the iterable is. The enumerate function enabled us to do that by creating tuples where the first element of the tuple is the index of the element in the original list, then the element itself.

We can use this to track rankings in our data or skip elements we are not interested in. Here I'm going to enumerate our top pairs list and split that resulting tuple into idx and item. 

I can also use tuple unpacking on the item to get all three components separately. This can be exceptionally powerful. Let's look at a bit of responsibility that comes with this power.

In [62]:
for idx, names in enumerate(pairs):
    g_name, b_name = names
    print(f"Index: {idx}\n", f"Girl_Name: {g_name}\n", f"Boy_Name: {b_name}")

Index: 0
 Girl_Name: Jada
 Boy_Name: Josiah
Index: 1
 Girl_Name: Emily
 Boy_Name: Ethan
Index: 2
 Girl_Name: Ava
 Boy_Name: David
Index: 3
 Girl_Name: Serenity
 Boy_Name: Jayden
Index: 4
 Girl_Name: Claire
 Boy_Name: Mason
Index: 5
 Girl_Name: Sophia
 Boy_Name: Ryan
Index: 6
 Girl_Name: Sarah
 Boy_Name: Christian
Index: 7
 Girl_Name: Ashley
 Boy_Name: Isaiah
Index: 8
 Girl_Name: Chaya
 Boy_Name: Jayden


## Sets

Now that you've learned about lists and tuples let's look at our last built-in array data type the set. Sets are excellent for finding all the unique values in a column of your data, a list of elements, or even rows from a file.We use sets when we want to store unique data elements in an unordered fashion. 

For example, I might want to store a list of each type of cookie I had without any duplicates. Set are also mutable so I can add and remove elements from them.

### Creating Sets

A set is almost always created from a list. For example, I might have a list of cookies I've eaten today. I can make a set of them by passing them into the set constructor. 

In [74]:
cookies_eaten_today = ['chocolate chip', 'peanut butter', 'chocolate chip', 'oatmeal cream', 'chocolate chip']
print(cookies_eaten_today)
types_of_cookies_eaten = set(cookies_eaten_today)
print(types_of_cookies_eaten)

['chocolate chip', 'peanut butter', 'chocolate chip', 'oatmeal cream', 'chocolate chip']
{'oatmeal cream', 'peanut butter', 'chocolate chip'}


After printing, you might notice that although I had 3 chocolate chip cookies in my list, once I make it a set, there is only one occurrence of it in the set. This occurs because sets only store unique items.

###  Modifying Sets

### .add() method

When working with a set we will use the add method to add a new element to the set. It will only add the element if it is unique otherwise it just continues on.

In [75]:
print(types_of_cookies_eaten)

types_of_cookies_eaten.add('biscotti')

types_of_cookies_eaten.add('chocolate chip')

print(types_of_cookies_eaten)

{'oatmeal cream', 'peanut butter', 'chocolate chip'}
{'oatmeal cream', 'peanut butter', 'biscotti', 'chocolate chip'}


### .update() method

Also, we can add multiple items using the update method. The update method takes a list of items and adds each one to the set if it is not present. 

In [76]:
cookies_hugo_ate = ['chocolate chip', 'anzac']

types_of_cookies_eaten.update(cookies_hugo_ate)

print(types_of_cookies_eaten)

{'chocolate chip', 'biscotti', 'peanut butter', 'anzac', 'oatmeal cream'}


### Removing data from sets

When removing data from a set, we can use the discard method to safely remove an element from the set by its value. No error will be thrown if the value is not found. 

We can also use the pop method to remove and return an arbitrary element from the set.

In [77]:
print(types_of_cookies_eaten)

types_of_cookies_eaten.discard('biscotti')

print(types_of_cookies_eaten)

{'chocolate chip', 'biscotti', 'peanut butter', 'anzac', 'oatmeal cream'}
{'chocolate chip', 'peanut butter', 'anzac', 'oatmeal cream'}


In [78]:
print(types_of_cookies_eaten)

print(types_of_cookies_eaten.pop())
print(types_of_cookies_eaten.pop())

print(types_of_cookies_eaten)

{'chocolate chip', 'peanut butter', 'anzac', 'oatmeal cream'}
chocolate chip
peanut butter
{'anzac', 'oatmeal cream'}


### Set theory

We can also perform some set theories from math to perform very quick comparison operations.

### Set Operations - .union() method

The union method on a set accepts a set as an argument and returns all the unique elements from both sets as a new one. 

Speaking of examples, I'm going to create two new sets of the cookies Hugo and I have eaten. Then I'm going to use the union method to see the full set of cookies eaten by both of us. 

In [79]:
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream', 'peanut butter'])

cookies_hugo_ate = set(['chocolate chip', 'anzac'])

print(cookies_jason_ate.union(cookies_hugo_ate))

{'peanut butter', 'chocolate chip', 'anzac', 'oatmeal cream'}


### Set Operations - .intersection() method

The intersection method also accepts a set and returns the overlapping elements found in both sets. I'll use the intersection method to see the cookies that Hugo and I both ate. While these two methods help us find commonality, sets also provide methods to help us find differences.

In [82]:
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream', 'peanut butter'])

cookies_hugo_ate = set(['chocolate chip', 'anzac'])

print(cookies_jason_ate.intersection(cookies_hugo_ate))

{'chocolate chip'}


### Set Operations - .difference() method

We can use the difference method, which accepts a set, to find elements in one set that are not present in another set. The target we call the difference method on is important as that will be the basis for our differences. 

So here I want to see the cookies that I ate that Hugo didn't, which I can do by calling difference on my set of cookies and giving it Hugo's set and I can perform the reverse of this operation by using Hugo's list as the target.

In [83]:
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream', 'peanut butter'])

cookies_hugo_ate = set(['chocolate chip', 'anzac'])

cookies_jason_ate.difference(cookies_hugo_ate)

{'oatmeal cream', 'peanut butter'}

In [84]:
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream', 'peanut butter'])

cookies_hugo_ate = set(['chocolate chip', 'anzac'])

cookies_hugo_ate.difference(cookies_jason_ate)

{'anzac'}