<h1 style="font-size: 40px; margin-bottom: 0px;">16.5 Python Data Structures</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 750px;"></hr>

Today in lecture, we went over some more compound data types and how to work with them in Python. Of particular importance are the data structures found in the Pandas package, which is commonly used for data analysis and one that we'll make extensive use of in the Fall semester. In today's exercises, you'll review lists, reinforce what you learned with new compound data types, and begin learning more about Series. Tomorrow, we'll continue with reinforcing our understanding of DataFrames and practicing working with them. 

<strong>Today's learning objectives:</strong>
<ol>
    <li>Import necessary packages</li>
    <li>Learn to work with different compound data types</li>
    <li>Understand Series</li>
    <li>Practice basic operations on data structures</li>
</ol>

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercises</h1>
<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 400px;"></hr>

<h1 style="font-size: 32px;">Exercise #1: Import packages</h1>

As a common practice, you generally will want to import all the packages that you'll need at the top of your notebook or Python file, so that you have all your definitions ready to go for when you need to call them up later on in your file.

Recall that importing a package into your notebook or Python file makes use of the <code>import</code> keyword, and you can also assign a name to the package or module that you're importing using the <code>as</code> keyword.

Run the code cell below to import the packages that we'll use today.

In [None]:
import numpy as np
import pandas as pd

<h1 style="font-size: 32px;">Exercise #2: Append lists</h1>

Recall that lists are mutable, and we can not only change elements within a list but also append new elements to an existing list. To do this, you'll need to make use of the <code>list.append()</code> function, where <code>list</code> is replaced by your list that you want to append something to. You'll pass your object to be appended in the parentheses, so your code would look something like the below example:

```
time = ['breakfast', 'lunch', 'dinner']

time.append('bonito flake')
```

Create a new code cell below and do the following:
<ol>
    <li>Create a new list and assign it to a variable</li>
    <li>Use the <code>list.append()</code> function to append a new object to your list</li>
    <li>Use the <code>print()</code> function to check if you successfully appended your object to your list</li>
</ol>

<h1 style="font-size: 32px;">Exercise #3: Use a list to create an array</h1>

Now using the list from exercise #2, create a numpy array using the <code>np.array()</code> function. <a href="https://numpy.org/doc/stable/reference/generated/numpy.array.html" rel="noopener noreferrer"><u>Documentation for the <code>np.array()</code> function can be found here.</u></a>

You'll need to pass your list to the <code>np.array()</code> function, and don't forget to assign your array to a new variable so that you'll be able to use it later on.

Unlike the <code>list.append()</code> function, the <code>np</code> portion of this function is to instruct our Python interpreter that we are calling up a definition from the numpy package, so it knows where to look to find the correct definition.

<h1 style="font-size: 32px;">Exercise #4: Accessing elements of an array</h1>

Like with lists, you can have Python pull and return specific elements or a slice of your array, and you can do so with the same syntax.

Give it a try below to see if you can pull a single element and if you can pull a slice out of your newly created array.

<h1 style="font-size: 32px;">Exercise #5: Math operations in arrays</h1>

Recall from lecture that arrays are able to handle math operations whereas lists cannot, and lists will treat them more like how a string would. In the code cell below, create a new array consisting of floats and/or ints and a list consisting of the same elements. Don't forget to assign them to their own variables.

Now perform some basic mathematical operations on your array and try to do the same on your list.

What ends up happening to the values in the array compared to what happens to the list? Do you run into any errors when trying to perform operations on a list?

<h1 style="font-size: 32px;">Exercise #6: Modifying arrays</h1>

Arrays like lists are mutable, and as a result, the elements in an array can be changed, and new elements can be added to an array.

In the spot below, see if you can use the same syntax that you would to modify a single element in a list to update an element in your array that you set up in exercise #3.

Now see if you can make use of the <code>np.append()</code> function to add a new element(s) to your list. <a href="https://numpy.org/doc/stable/reference/generated/numpy.append.html" rel="noopener noreferrer"><u>Documentation for <code>np.append</code> can be found here.</u></a> 

When you take a look at the documentation, what does the function need in order to perform its task?

What parameter will take your array? What parameter will take the new values that you want to append to your array?

Were you able to append new elements to your array? Did you encounter any errors with data type conflicts or other types of errors?

<h1 style="font-size: 32px;">Exercise #7: Set up a 2D list and a 2D array</h1>

This exercise will be slightly more challenging than the previous ones, and it'll require you to visualize the structure of your 2D list and 2D array in order to set it up from 1-dimensional text.

Recall that lists can be set up using the square brackets <code>&lbrack; &rbrack;</code>, and if you want to create a 2D list, you essentially need to nest list(s) within a larger list. In order words, you create a list of lists where the number of nested lists determines one dimension, and the number of elements in each nested list determines the second dimension.

See if you can set up a 2 column x 3 row list and use the <code>np.array()</code> function to create a 2D array from it. Don't forget to assign both to their own variables.

Now use the <code>print()</code> function to see how the output looks for a 2D list vs a 2D array. What difference do you see in terms of how they are written out?

<h1 style="font-size: 32px;">Exercise #8: Pull an element from a 2D list and 2D array</h1>

Now take what you know about accessing elements in a list one step further to pull out a specific element from a 2D list. Think about how you specified the position for a one-dimensional list, and how you can pull from a second axis if you have a two dimensional list.

Give it a try for a 2D array, which follows a similar syntax to pull elements out.

<h1 style="font-size: 32px;">Exercise #9: Understanding tuples</h1>

Tuples are another compound data type in Python, and it's largely similar to a list. The key difference is that while lists are changeable (mutable), tuples cannot be changed (immutable). This means that while you can modify a list or array with functions such as <code>list.append()</code> or <code>np.append()</code>, you cannot do so with a tuple. 

Tuples can contain a heterogeneous set of data types, so in that instance, they are similar to a list. The elements stored within a tuple are indexed by integers, like lists and arrays. So you can have Python return specific elements of a tuple.

<strong>Creating a tuple</strong>

The syntax for creating a tuple in Python is with parentheses <code>( )</code> with each element of a tuple separated by a comma. 

```
(1, 4, 5, 19, 25, 60)
```

Since parantheses are also used in Python for grouping, there's an additional consideration you need to keep in mind if you ever want to create a tuple with a single element. You'll need to add a hanging comma after the single element to specify that it is a tuple containing just one element.

```
(1,)
```

Give it a try below. Create a tuple and assign it to a variable.

<h1 style="font-size: 32px;">Exercise #10: Accessing elements of a tuple</h1>

Because elements in a tuple also have an index, much like lists and arrays, you can also use slice notation to access the elements of a tuple. Go ahead and give it a try below.

Now what happens if you try to update an element of a tuple with something else? Do you encounter an error? What is it telling you?

<h1 style="font-size: 32px;">Exercise #11: Packing and unpacking a tuple</h1>

Tuple packing occurs when you assign multiple values to a single variable. Python will interpret the assignment to generate a tuple from the different values that you assigned.

```
packed_potato = 1, 'hot potato', 2, 'tired potato', 3
```

The five objects to the right of the <code>=</code> operator will be packed together into a single tuple that is assigned to the variable <code>packed_potato</code>

You can unpack a tuple to extract the values stored within it as individual variables:

```
u1, u2, u3, u4, u5 = packed_potato
```

In this case, you have multiple variables on the left side of the <code>=</code> operator, which Python will then interpret as unpacking the tuple into its individual objects and assigning each object to each variable. This requires that you have provided as many variables on the left as there are objects in the tuple.

See if you can set up a handful of objects and pack them into a tuple and then unpack the tuple with a new set of variables. Then have Python output one of the unpacked tuple objects using the <code>print()</code> function.

<h1 style="font-size: 32px;">Exercise #12: Zip together lists and/or arrays to use in a for-loop</h1>

For this exercise, set up two lists and two arrays that are all equal in length. They don't have to have the same objects contained within them for this example, but their lengths should be equal.

Now set up a for-loop where you print out the resulting tuples that arise from zipping together two lists/arrays using the <code>zip()</code> function.
What do you need to set up a for-loop and what's the appropriate syntax to do so? Your for-loop should look something like:

```
for i in zip(list_1, list_2):
    print(i)
```

What do you notice about how the two lists/arrays got zipped together? Are they paired by position or by something else?

Now unpack the tuple while iterating through the zipped lists. What do you need to adjust in your original for-loop so that you can unpack the tuples?


<h1 style="font-size: 32px;">Exercise #13: Understanding sets</h1>

Sets can contain either homogeneous data types or heterogeneous data types. While you can add and remove elements within a set, the elements themselves cannot be mutable. One notable distinction from the previous compound data types is that sets are unordered and also unindexed. Additionally, you can't have duplicate elements in a set, and since sets are unordered, each time you use a set, the order can be different. This means that you can't use the usual indexing and slicing syntax to retrieve information from a set. There is a special set of operators that allow you to operate on a set.

Sets are denoted with curly brackets <code>{ }</code>, and each object is separated by commas.

```
{'tomato', 1, 'potato', 3, 'rice', 'wheat', 6}
```

Create a few sets below with some shared and some not shared values. Don't forget to assign them to variables, as we'll be performing set operations on them.

<h1 style="font-size: 32px;">Exercise #14: Updating sets</h1>

You can add or remove elements from a set using either the <code>update()</code> function to add something, or the <code>remove()</code> function to remove an element. For <code>remove()</code>, you will get an error if what you want to remove is not contained within the set. An alternative function is <code>discard()</code>, which won't raise an error if the value is not present in the set. 

Play around with these functions to update your sets, and see if you can add a three more elements to each set that are unique to each of them.

<h1 style="font-size: 32px;">Exercise #15: Operate on sets</h1>

To compare sets, we can use operators such as:
<ul>
    <li><code>&vert;</code> or <code>union()</code> to output all elements of both sets without duplication</li>
    <li><code>&amp;</code> or <code>intersection()</code> to output the shared elements of both sets</li>
    <li><code>&minus;</code> or <code>difference()</code> to output the elements in the first set that are not in the second</li>
    <li><code>&Hat;</code> or <code>symmetric_difference()</code> to output the elements in the that are either in the first set or the second set but not in both sets</li>
</ul>

See if you can make use of each of these operators on the sets that you created to identify where the union, intersection, difference, and symmetric differences lie.

<h1 style="font-size: 32px;">Exercise #16: Understanding dictionaries</h1>

Dictionaries are a type of compound data structure that allows us to associate a value to a specific key for more efficient access to that value. While we can use lists to perform this same function, dictionaries are faster. In a dictionary, values are mapped to a unique key that can be used to quickly find a specific object (the mapped value). The speed comes from the fact that dictionary keys are hashable, meaning that they can be assigned an integer value which then is a shortcut to find the associated value.

You can test it out below. Check any an object's ID using <code>id()</code> and then check its hash using <code>hash()</code>. Do you notice any differences between the outputs? What's the ID value telling us versus the hash value?

<h1 style="font-size: 32px;">Exercise #17: Creating dictionaries</h1>

To create a dictionary, you will make use of the curly brackets <code>{ }</code> or the <code>dict()</code> function. You'll pair up a key with a value by following the <code>key:value</code> syntax and separating pairs with commas.

So an example would be something like:

```
cat_dictionary = {'Tony': 1, 'Victor': 2, 'Liebchen': 3}
```

See if you can set up your own dictionary following the same syntax. Don't forget to save the dictionary to a variable.

<h1 style="font-size: 32px;">Exercise #18: Accessing values in a dictionary through a key</h1>

As shown in lecture, you can make use of 'dictionary' notation to pull out the value associated with a specific key. This isn't too different than trying to access a single element from an array or list. The only difference is that instead of specifying the index position, you'll specify the key instead.

For example:

```
cat_dictionary['Tony']
```

The example will pull the value associated with the key <code>'Tony'</code>.

Practice pulling values from your dictionary by specifying the associated key.

<h1 style="font-size: 32px;">Exercise #19: Updating values in a dictionary through a key</h1>

Similarly to lists and arrays, when we call up a key and its associated value, we can also assign a new value to that key by making use of the assignment operator <code>=</code>.

Give it a try below. See if you can assign a new value to one of your dictionary keys.

Now check to see if the value was properly updated.

<h1 style="font-size: 32px;">Exercise #19: Adding new values in a dictionary through a key</h1>

You can take advantage of 'dictionary' notation to essentially pull up a 'non-existent' key and assign a new value to it kind of like it had always existed. This will create a new key:value pair in your dictionary, and you'll be able to access that value through the same key.

Give it a try below, where you pull up a key that doesn't exist in your dictionary and then use the assignment operator to assign a new value to that key.

Now check to see if the assignment was successful and if your dictionary now has been updated with that new key:value pair.

<h1 style="font-size: 32px;">Exercise #20: Deleting key:value pairs in a dictionary</h1>

There are two ways to delete a key:value pair from a dictionary. The first is to make use of the <code>del</code> keyword followed by invoking the dictionary and key associated with the key:value pair that you want to delete. Alternatively, you can also make use of the <code>dict.pop()</code> function, which will delete the key:value pair for the key that you provide it and then also output the value that got deleted.

For example, if I wanted to delete the key value pair <code>'Tony': 1</code> from the <code>cat_dictionary</code>, the code would be as follows:

```
del cat_dictionary['Tony']
```

or

```
cat_dictionary.pop('Tony')
```

The key distinction is that <code>dict.pop()</code> will not run into errors if the key does not exist, whereas the <code>del</code> keyword will throw an error if the key does not exist.

See if you can delete a key:value pair from your dictionary using either of the above methods.

<h1 style="font-size: 32px;">Exercise #21: Using a random number generator</h1>

For these sets of exercises, we'll make use of a random number generator to populate our Series and DataFrames. One random number generator function that we'll use is the <code>np.random.rand()</code> function from the numpy package. <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html" rel="noopener noreferrer"><u>Documentation for <code>np.random.rand()</code> can be found here.</u></a> This function simply needs us to pass a number that will define the shape of the array containing random numbers between 0 and 1 (excluding 1).

Give the random number generator a try below. Give it a single number. What is it's output? Is it one-dimensional?



What if instead, we provide it with a list containing 2 numbers? How does the output look?

<h1 style="font-size: 32px;">Exercise #22: Understanding Pandas Series</h1>

A pandas Series is a one-dimensional labeled array, and it acts similarly to an array. But a Series is also dictionary-like. As you saw in lecture, you can perform certain operations as if your Series is a dictionary. 

So to get a better understanding of what's going on under the hood, we'll create a Series and poke around with it to see how it's handled by Python.

We've already imported <code>pandas</code> as <code>pd</code> earlier, so we already have the package imported and ready to use. To create a pandas Series, you will use the <code>pd.Series()</code> function. <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.html" rel="noopener noreferrer"><u>Documentation for <code>pd.Series()</code> can be found here.</u></a>. Don't forget that we imported pandas as <code>pd</code>, so the <code>pd</code> is telling the Python interpreter where it can find the definitions we're invoking here.

```
pd.Series(data, index=index)
```

Where <code>data</code> can be a scalar value (which is basically just a single value of some kind) or an <code>ndarray</code> or a Python dictionary.

When creating a Series, the index is by default set up similarly to an <code>ndarray</code>, with the element in the first position indexed as 0, and the element in the last position indexed with the length minus 1, or <code>len(your_data_set)-1</code>. 

So let's start off by creating a Series with a random value and a defined Series size, and we can break it down to better understand how to read Python code as things get slightly more complicated. 

When you provide a scalar value (like an single <code>int</code> or <code>float</code>) when creating a Series along with an index length, pandas will create a Series repeating that scalar value for the entire length of the index.

For this example, we'll be creating a set of 10 randomly generated numbers and using that array as the basis for our Series.

```
rand_array = np.random.rand(10)
```

Then to use that array to set up our Series:

```
rand_series = pd.Series(rand_array)
```

To better understand this, we can break this line down:
<ul>
    <li><code>rand_series</code> - this is our variable that we will assign to the Series we are creating</li>
    <li><code>=</code> - this is our assignment operator</li>
    <li><code>pd.Series()</code> - we are telling Python we want to use the <code>Series()</code> function of the pandas package that we imported as <code>pd</code></li>
    <li><code>rand_array</code> - our array that we want to be the basis of our Series</li>
</ul>

Go ahead and set that up below.

Now output your new Series using the <code>print</code> function. Note that if you re-run the intial line of code to generate a new array and Series, then your output will change as well since it's randomly generated.

<h1 style="font-size: 32px;">Exercise #23: Specify an index when setting up a Series</h1>

You might have noticed that in the documentation for <code>pd.Series()</code> that there was a parameter that you can use to specify your index values instead of having it just be reflective of the position. 

First, set up a list consisting of 10 strings.

Then, begin setting up the code to generate a new Series consisting of 10 randomly generated numbers, but this pass your list of 10 strings as an argument to the <code>index=</code> parameter. Then create your Series.

Now output this Series and take a look at how the index looks compared to the other Series you set up without specifying an index.

<h1 style="font-size: 32px;">Exercise #24: Create a Series from a dictionary</h1>

Recall that Series are also dictionary-like, and as a result, you can also set up a Series from a dictionary object. The keys of the dictionary will become the index values (labels) of the Series, and the values of the dictionary will become the values of the Series.

Below try setting up a Series using a dictionary that you have already set up from an earlier exercise. All you'll need to do is to provide the dictionary to the <code>pd.Series()</code> function, and you don't need to specify additional arguments.

Now output this Series. What dictionary component has become the labels of the Series and what has become the values?

<h1 style="font-size: 32px;">Exercise #25: Access elements of a Series</h1>

Since a Series shows similarities to lists, arrays, and dictionaries, you can access elements within a Series similarly. Play around below to access the elements of one of your Series using slice notation and 'dictionary' notation.

Do you get any warnings or run into issues when using one method over the other? 

<h1 style="font-size: 40px; margin-bottom: 0px;">Summary</h1>
<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 400px;"></hr>

Today, you learned more about different compound data types, their differences, and how to operate on them. You also learned about Pandas Series and how to create a Series object. Much like with the other compound data types, you also learned how to operate on Series and to pull information out of them.