<font color='blue'> First of all, please “Copy to Drive” to get your own copy for editing. </font>

<font color='red'> Run all the cells. For places with "Complete the codes below", please replace the "XXX" placeholder with your own codes.</font>

# Ch 3.1 Data Structures and Sequences

Python’s data structures are simple but powerful. We start with tuple, list, and dictionary, which are some of the most frequently used sequence types.

![data structure](http://drive.google.com/uc?export=view&id=18HnsL2KUdRCg_BKcZElZ0Wuf6QRpSeBy)

* *You can use curly braces to define a set like this: {1, 2, 3}. However, if you leave the curly braces empty like this: {} Python will instead create an empty dictionary. So to create an empty set, use set().
* **A dictionary itself is mutable, but each of its individual keys must be immutable.

## Tuple

  ### Basics

A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed. The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:

In [None]:
tup = (4, 5, 6)
tup

(4, 5, 6)

In [None]:
tup = 7, 8, 9  # In many contexts, the parentheses can be omitted
tup

(7, 8, 9)

You can convert any sequence or iterator to a tuple by invoking `tuple()`:

In [None]:
mylist = [1, 2, 3]
tup1 = tuple(mylist)
tup1

(1, 2, 3)

In [None]:
string = "Hello"
tup2 = tuple(string)
tup2

('H', 'e', 'l', 'l', 'o')

Elements can be accessed with square brackets [] as with most other sequence types. Sequences are 0-indexed in Python:

In [None]:
tup1[0]

1

In [None]:
tup2[1]

'e'

When you're defining tuples within more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this example of creating a tuple of tuples:

In [None]:
nested_tup = (4, 5, 6), (7, 8) # the parentheses can be omitted
nested_tup

((4, 5, 6), (7, 8))

In [None]:
nested_tup[0]

(4, 5, 6)

In [None]:
nested_tup[1]

(7, 8)

While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot:

In [None]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False  #Error --cannot be changed

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in place:

In [None]:
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the `+` operator to produce longer tuples:

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating that many copies of the tuple:

In [None]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

### Unpacking Tuples

If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:

In [None]:
a, b, c = (1, 2, 3)
b

2

In [None]:
tup = 4, 5, (6, 7) # It also works for sequences with nested tuples
a, b, (c, d) = tup
d

7

Using this functionality you can easily swap variable names.

In [None]:
# In other languages:

a = 5
b = 6

temp = a
a = b
b = temp

print(f"a:{a}, b:{b}")

a:6, b:5


In [None]:
# In python:

a = 5
b = 6

a, b = b, a

print(f"a:{a}, b:{b}")

a:6, b:5


A common use of variable unpacking is iterating over sequences of tuples or lists:

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


There is a special syntax `*rest` which is used when you may want to "pluck" a few elements from the beginning of a tuple.

In [None]:
values = (1, 2, 3, 4, 5)
a, b, *rest = values

print(f'a={a}, b={b}, rest={rest}')

a=1, b=2, rest=[3, 4, 5]


In [None]:
a, b, *_ = values  # The 'rest' name is not necessary and can be changed.
_                  # As a matter of convention underscore (_) for unwanted variables


[3, 4, 5]

### Tuplet Methods

Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is `count()`, which counts the number of occurrences of a value:

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

<font color='red'>Complete the codes in the cell below. Please replace the "XXX" placeholder with your own codes. </font>

In [None]:
list_of_tuples = [(1, 2), (3, 4, 5), (6, 7, 8, 9), (10, 11, 12, 13, 14),(15, 16, 17, 18, 19, 20)]

# Iterate over each tuple in the list (use variale unpacking)
for first, second, *rest in list_of_tuples:

    # Print the values of 'first', 'second', and 'rest' for each tuple
    print(f"First: {first}  Second: {second}  Rest: {rest}")

    # Calculate the total sum of the first two elements and the sum of the rest
    first_two = first + second
    rest_sum = sum(rest)

    # Print the results of the additional operations
    print(f"Sum of first two: {first_two}  Sum of Rest: {rest_sum}")
    print()

First: 1  Second: 2  Rest: []
Sum of first two: 3  Sum of Rest: 0

First: 3  Second: 4  Rest: [5]
Sum of first two: 7  Sum of Rest: 5

First: 6  Second: 7  Rest: [8, 9]
Sum of first two: 13  Sum of Rest: 17

First: 10  Second: 11  Rest: [12, 13, 14]
Sum of first two: 21  Sum of Rest: 39

First: 15  Second: 16  Rest: [17, 18, 19, 20]
Sum of first two: 31  Sum of Rest: 74



## List

### Basics

In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are mutable. You can define them using square brackets [] or using the `list()` type function:

In [None]:
a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [None]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

The `list()` built-in function is frequently used in data processing as a way to materialize an iterator or generator expression:

In [None]:
gen = range(10)
gen

range(0, 10)

In [None]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Adding and removing elements

Elements can be appended to the end of the list with the **`append()`** method:


In [None]:
mylist = ['foo', 'bar', 'baz']
mylist.append("dwarf")
mylist

['foo', 'bar', 'baz', 'dwarf']

Using **`insert()`** you can insert an element at a specific location in the list:

In [None]:
mylist.insert(1, "red")  # The insertion index must be between 0 and the length of the list, inclusive.
mylist

['foo', 'red', 'bar', 'baz', 'dwarf']

With the **`pop()`** you can remove and return an element at a particular index:

In [None]:
mylist.pop(2)

'bar'

In [None]:
mylist

['foo', 'red', 'baz', 'dwarf']

Elements can be removed by value with **`remove()`**, which locates the first such value and removes it from the list:

In [None]:
mylist.append('foo')
mylist

['foo', 'red', 'baz', 'dwarf', 'foo']

In [None]:
mylist.remove('foo')
mylist

['red', 'baz', 'dwarf', 'foo']

You can check if a list contains a value using `in` keyword:

In [None]:
'red' in mylist

True

In [None]:
'country' not in mylist

True

### Concatenating and combining lists

Similar to tuples, adding two lists together with + concatenates them:

In [None]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list already defined, you can append multiple elements to it using the **`extend()`** method:

In [None]:
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that list **concatenation by addition is a comparatively expensive operation** since a new list must be created and the objects copied over. Using **extend** to append elements to an existing list, especially if you are building up a large list, **is usually preferable**.

    (Concatination by addition)
    
    everything = []
    for chunk in list_of_lists:
        everything = everything + chunk

    (Preferred)
    
    everything = []
    for chunk in list_of_lists:
        everything.extend(chunk)

### Sorting

You can sort a list in place (without creating a new object) by calling its **`sort()`** function:

In [None]:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [None]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)  # We can pass a secondary sort key. Example: sort a collection of strings by their lengths
b

['He', 'saw', 'six', 'small', 'foxes']

### Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of `start:stop` passed to the indexing operator `[]`:

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [None]:
seq[3:5] = [6, 3]  # Can also be assigned with a sequence
seq

[7, 2, 3, 6, 3, 6, 0, 1]

*  `start` index is included
*  `stop` index is not included
*  Number of elements in the result is `stop` - `start`.

Either the `start` or `stop` can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [None]:
seq[:5]

[7, 2, 3, 6, 3]

In [None]:
seq[3:]

[6, 3, 6, 0, 1]

In [None]:
seq[-4:] # Negative indices slice the sequence relative to the end

[3, 6, 0, 1]

In [None]:
seq[-6:-2]

[3, 6, 3, 6]

A `step` can also be used after a second colon to, e.g., step = 3 or -1

In [None]:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
seq[::3]

[1, 4, 7]

In [None]:
seq[::-1]

[9, 8, 7, 6, 5, 4, 3, 2, 1]

<font color='red'>Complete the codes in the cell below. Please replace the "XXX" placeholder with your own codes. </font>

In [None]:
list1 = [1, 'two', 3, 'four', 5, 'six', 7, 'eight', 9, 'ten']
list2 = [11, 'twelve', 13, 'fourteen', 15, 'sixteen', 17, 'eighteen', 19, 'twenty']
list3 = []

# Concatenate list1 and list2 to list3 using extend()
list3.extend(list1 + list2)

# Slice through list3 to keep only integer elements (step = 2)
list3 = list3[::2]

# Append numbers from 0 to 20 that are not in list3
for num in range(20):
    if num not in list3:
        list3.append(num)

# Sort the list in descending order
list3.sort(reverse=True)

# Print the final list
print(list3)


[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


## Dictionary

A dictionary stores a collection of key-value pairs, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key. Dictionaries use curly braces `{}` and colons to separate keys and values:

Dictionary:
* a list stores a collection of *ordered* items
* a dictionary stores a collection of *unordered* items
    - as a result, there's no guarantee that the items in a dictionary remain in the same order

![Python](http://drive.google.com/uc?export=view&id=1e2y6fkuwhW88ZGaibO9W6g9fU7aS1Fda)

In [None]:
empty_dict = {}

d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:

In [None]:
d1[7] = "an integer" # dictionary[key] = value
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [None]:
d1['b']

[1, 2, 3, 4]

You can check if a dictionary contains a key using the same syntax used for checking whether a list or tuple contains a value:

In [None]:
"b" in d1

True

You can delete values using either the **`del`** keyword or the **`pop()`** method (which simultaneously returns the value and deletes the key):

In [None]:
d1[5] = "some value"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [None]:
d1["dummy"] = "another value"
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [None]:
del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [None]:
ret = d1.pop("dummy")
ret

'another value'

In [None]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

The **`keys()`** and **`values()`** method gives you iterators of the dictionary's keys and values, respectively. The order of the keys depends on the order of their insertion, and these functions output the keys and values in the same respective order:

In [None]:
list(d1.keys())

['a', 'b', 7]

In [None]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

If you need to iterate over both the keys and values, you can use the **`items()`** method to iterate over the keys and values as 2-tuples:

In [None]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

You can merge one dictionary into another using the **`update()`** method:

In [None]:
d2 = {"b": "foo", "c": 12}
d1.update(d2)  # Any existing keys in the data passed to update will have their old values discarded.
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

### Creating Dictionaries from Sequences

If you have two sequences that you want to pair up element-wise in a dictionary, you might try to do this:

In [None]:
mapping = {}
key_list = [1, 2, 3]
value_list = ['a', 'b', 'c']

for key, value in zip(key_list, value_list):
    mapping[key] = value

mapping


{1: 'a', 2: 'b', 3: 'c'}

However, since a dictionary is essentially a collection of 2-tuples, the **`dict()`** function accepts a list of 2-tuples:

In [None]:
keys = [1, 2, 3]
values = ['a', 'b', 'c']

# Combine the lists into a list of tuples
tuples = zip(keys, values)

# Create a dictionary from the list of tuples
mapping = dict(tuples)

mapping

{1: 'a', 2: 'b', 3: 'c'}

### Valid dictionary key types

While the values of a dictionary can be any Python object, the keys generally have to be immutable objects. The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dictionary) with the **`hash()`** function:

In [None]:
hash("string")

-4738782282401574538

In [None]:
hash((1, 2, (2, 3)))

-9209053662355515447

In [None]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

In [None]:
hash({1,2,3})

TypeError: unhashable type: 'set'

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can be:

In [None]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

### **Create pandas DataFrame using dictionaries**

<font color='blue'> One of the most common ways to construct a DataFrame is from a dictionary of equal-length lists or NumPy arrays:

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({"Total Volume": [64236.62, 54876.98,118220.22],
                   "total bags": [8696.87, 9505.56, 8145.35],
                   "Small Bags": [8603.62, 9408.07, 8042.21],
                   "X Large bags": [0.0, 0.0, 0.0]})
df

Unnamed: 0,Total Volume,total bags,Small Bags,X Large bags
0,64236.62,8696.87,8603.62,0.0
1,54876.98,9505.56,9408.07,0.0
2,118220.22,8145.35,8042.21,0.0


In [None]:
# convert the columns to a list
Cols = df.columns.tolist()
print(Cols)

['Total Volume', 'total bags', 'Small Bags', 'X Large bags']


## Set

A set is an unordered collection of unique elements. A set can be created in two ways: via the **`set()`** function or via a `set literal` with curly braces:

In [None]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [None]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations:

   *  union
   *  intersection
   *  difference
   *  symmetric difference.

In [None]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

 * We cannot access items in a set by referring to an index, since sets are unordered the items has no index.
 * But we can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in keyword.

In [None]:
5 in a

True

In [None]:
for cha in a:
  print(cha)

1
2
3
4
5


The union of these two sets is the set of distinct elements occurring in either set. This can be computed with either the **`union()`** method or the `|` binary operator:

In [None]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
a or b

{1, 2, 3, 4, 5}

The intersection contains the elements occurring in both sets. The `&` operator or the **`intersection()`** method can be used:

In [None]:
a.intersection(b)

{3, 4, 5}

In [None]:
a & b

{3, 4, 5}

In [None]:
a and b

{3, 4, 5, 6, 7, 8}

A list of commonly used set methods:

![data structure](http://drive.google.com/uc?export=view&id=1TCycAXpk78npC9BEUtO-E_MmPXqazF5q)

All of the logical set operations have in-place counterparts, which enable you to replace the contents of the set on the left side of the operation with the result. For very large sets, this may be more efficient:

In [None]:
c = a.copy()
c |= b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
d = a.copy()
d &= b
d

{3, 4, 5}

In [None]:
a

{1, 2, 3, 4, 5}

Like dictionary keys, set elements generally must be immutable, and they must be hashable. In order to store list-like elements (or other mutable sequences) in a set, you can convert them to tuples:

In [None]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [None]:
hash(my_set)

TypeError: unhashable type: 'set'

In [None]:
my_data = [1, 2, 3, 4]
my_set = {my_data}

TypeError: unhashable type: 'list'

You can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set:

In [None]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)

True

In [None]:
a_set.issuperset({1, 2, 3})

True

Sets are equal if and only if their contents are equal:

In [None]:
{1, 2, 3} == {3, 2, 1}

True

<font color='red'>Complete the codes in the cell below. Please replace the "XXX" placeholder with your own codes. </font>

In [None]:
# Maria's set of ingredients
maria = {'tomatoes', 'onions', 'garlic', 'olive oil', 'pasta'}

# John's set of ingredients
john = {'onions', 'pepper', 'chicken', 'mushrooms', 'rice'}

# Ingredients for a French dish
dish = {'chicken', 'red wine', 'bacon', 'onions', 'carrots',
                           'garlic', 'mushrooms', 'thyme', 'bay leaves', 'flour',
                           'butter', 'chicken broth', 'salt', 'pepper'}

# Combine Maria's and John's ingredients
combined_set = maria | john

# Find the ingredients needed for the French dish that are not in Maria's and John's combined set
#missing_ingredients = {item for item in dish if item not in combined_set}
missing_ingredients = dish - combined_set

# Print the needed ingredients
print(missing_ingredients)

{'bay leaves', 'red wine', 'carrots', 'butter', 'bacon', 'flour', 'thyme', 'chicken broth', 'salt'}


In [None]:
print("The end of my notebook")

The end of my notebook
