## **Python Bootcamp - Unit 4**
---
**Author:** David Dobolyi

**Key Concepts**
- [Variables](#Variables)
    - [Variable Naming](#VariableNaming)
    - [Augmented Assignment](#AugmentedAssignment)
    - [Multiple Assignment](#MultipleAssignment)
    - [Deleting Variables](#DeletingVariables)
- [Data Structures](#DataStructures)
    - [Lists](#Lists)
        - [Accessing Elements](#ListsAccessingElements)
        - [In and Not In](#InAndNotIn)
        - [Updating Elements](#ListsUpdatingElements)
        - [Adding Elements](#ListsAddingElements)
        - [Deleting Elements](#ListsDeletingElements)
        - [Joining](#ListsJoining)
        - [Other Methods](#ListsOtherMethods)
        - [Generating Ranges](#ListsGeneratingRanges)
    - [Tuples](#Tuples)
        - [Accessing Elements](#TuplesAccessingElements)
        - [Updating/Adding/Deleting Elements](#TuplesUpdatingAddingDeletingElements)
        - [Joining](#TuplesJoining)
        - [Other Methods](#TuplesOtherMethods)
    - [Sets](#Sets)
        - [Accessing Elements](#SetsAccessingElements)
        - [Updating Elements](#SetsUpdatingElements)
        - [Adding Elements](#SetsAddingElements)
        - [Deleting Elements](#SetsDeletingElements)
        - [Joining](#SetsJoining)
        - [Other Methods](#SetsOtherMethods)
    - [Dictionaries](#Dictionaries)
- [References vs. Copies](#ReferencesCopies)
    - [Comparison Operators: Identity](#ComparisonOperatorsIdentity)


---
### <a name = "Variables">Variables</a>

Python allows us to assign values to *variables* so we can refer to them later. To perform assignment, we must provide a named variable and a value to bind to it using the assignment operator `=`:

In [1]:
variable = 5

Once assigned, we can refer to the variable we created by name to call it back:

In [2]:
variable

5

Variables must have a data type, and Python is *dynamically typed*, meaning that the type of a variable is determined based on the object it holds rather than the type of the variable being explicitly defined in advance (languages that do the latter are  *statically typed*):

In [3]:
type(variable)

int

As shown in this example, because myVar was set to `5`, which is an `int` the variable itself was assigned as an `int`.

If we change the value stored in our variable, note that the type will change dynamically as well:

In [4]:
variable = 5.5
type(variable)

float

#### <a name = "VariableNaming">Variable Naming</a>

When it comes to naming variables in Python, there are several important rules that must be considered. Specifically, variable names:
- May contain letters, numbers, underscores (i.e., `_`), and/or Unicode characters
- Must not begin with a number
- Must not contain punctuation (e.g., `!`, `@`)
- May be of any length
- Are case sensitive (e.g., `Variable` is not the same as `variable`)
- Must not be one of Python's reserved words (e.g., `True` or `None`)

Regarding the latter point, to see a full list of reserved keywords, you use the following command:

In [5]:
help('keywords')


Here is a list of the Python keywords.  Enter any keyword to get more help.

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not                 



In addition, there are several common conventions for creating variable names that are clear and easy to understand. For example, see the PEP 8 guide for a full set of [descriptive](https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles) and [prescriptive](https://www.python.org/dev/peps/pep-0008/#prescriptive-naming-conventions) recommendations.

#### <a name = "AugmentedAssignment">Augmented Assignment</a>

In addition to using `=` for assignment, Python supports several more for *augmented assignment*. Augmented assignment is useful for writing more condensed code when changing values in variables involves basic arithmetic. For instance, suppose we wanted to increment the value of our variable named `variable` by one. We could do this as follows:

In [6]:
variable = 5
variable = variable + 1
variable

6

Alternatively, we could use augmented assignment to simplify this operation:

In [7]:
variable = 5
variable += 1
variable

6

Augmented assignment can be done with a wide range of arithmetic operators as shown above, although addition and subtraction care the most common. For an example involving multiplication, consider the following:

In [8]:
variable = 5
variable *= 2
variable

10

For more details on what's possible with augmented assignment, see `help('+=')`:

In [9]:
help('+=')

Augmented assignment statements
*******************************

Augmented assignment is the combination, in a single statement, of a
binary operation and an assignment statement:

   augmented_assignment_stmt ::= augtarget augop (expression_list | yield_expression)
   augtarget                 ::= identifier | attributeref | subscription | slicing
   augop                     ::= "+=" | "-=" | "*=" | "@=" | "/=" | "//=" | "%=" | "**="
             | ">>=" | "<<=" | "&=" | "^=" | "|="

(See section Primaries for the syntax definitions of the last three
symbols.)

An augmented assignment evaluates the target (which, unlike normal
assignment statements, cannot be an unpacking) and the expression
list, performs the binary operation specific to the type of assignment
on the two operands, and assigns the result to the original target.
The target is only evaluated once.

An augmented assignment expression like "x += 1" can be rewritten as
"x = x + 1" to achieve a similar, but not exactly equ

#### <a name = "MultipleAssignment">Multiple Assignment</a>

Although not as frequently used, Python supports assigning multiple values at once. For instance, you can assign a number of variables to the same value like so:

In [10]:
varA = varB = varC = 123
print(varA, varB, varC)

123 123 123


Alternatively, you can assign a set of values to specific variables in one line like so:

In [11]:
varA, varB, varC = 3, 4, 5
print(varA, varB, varC)

3 4 5


It's ultimately up to you to decide if you want to use multiple assignment syntax.

#### <a name = "DeletingVariables">Deleting Variables</a>

Variables can be unbound using the `del` statement:

In [12]:
varA = 5
varA

5

In [13]:
del(varA)

In [14]:
varA

NameError: name 'varA' is not defined

---
### <a name = "DataStructures">Data Structures</a>

Now that we've covered the basics of variables, let's talk about basic data structures in Python. Python supports a variety of built-in data structures for storing sets of values in a particular order, and these are known as [*sequences*](https://docs.python.org/3/library/stdtypes.html#typesseq). We've already covered one type of sequence, the text sequence, which involves one or more characters that for a string (i.e., `str`). Other types of ordered sequences include:

- Lists: `list`
- Tuples: `tuple`

In addition to these ordered sequences, Python supports unordered collections of data using structures such as:

- Sets: `set`
- Dictionaries: `dict`

Moreover, through the use of modules, Python can be extended to support a wide range of additional structures, including the following that are commonplace in data science:

- numpy.array ([NumPy](https://numpy.org/doc/1.18/reference/generated/numpy.array.html))
- pandas.DataFrame ([pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html))

This unit will focus on the built-in data structures, but in the following unit we will review the latter two (i.e., arrays and DataFrames). Among the built-in structures, lists, tuples, and sets will be covered in detail, whereas dictionaries will be briefly introduced since they aren't as commonly used in data science.

#### <a name = "Lists">Lists</a>

Lists are one of the most basic and practically useful data structures in Python. At the most basic level, lists function similarly to an *array* (e.g., Java) and *vector* (e.g., R). They are designed to hold a collection of values (including duplicates) that can be changed (i.e., they are designed to be *mutable*). Lists are defined using a sequences of comma-separated values inside of square brackets:

In [18]:
[1, 2, 3]

[1, 2, 3]

This list contains 3 numbers (i.e., 1, 2, and 3) in that order from left to right. Let's assign this array to a variable and call it back:

In [19]:
myList = [1, 2, 3]
myList

[1, 2, 3]

Note that the list has a specific type, which is defined as its structure:

In [20]:
type(myList)

list

In Python, the individual values in the list can be of various types. For example, we can mix integers and strings in a list:

In [21]:
myList = [1, 'two', 3]

This resultant list still has the type `list`:

In [22]:
type(myList)

list

Finally, note that lists can be multidimensional, meaning that you can -- if desired -- create a list of lists (although this isn't a particularly common thing that you'll likely need to do):

In [23]:
[[1, 2, 3], ['A', 'B']] # two lists within a list

[[1, 2, 3], ['A', 'B']]

##### <a name = "ListsAccessingElements">**Accessing Elements**</a>

To access specific elements of a list, we need to reference values by *index*, or position, using bracket notation. It's important to note that the first element will always be in position 0, and the last element will be in the n-1 position (where n is the number of elements in the list). For example:

In [24]:
myList[0]

1

In [25]:
myList[1]

'two'

In [26]:
myList[2]

3

Note that since this list only contains 3 elements, the following statement will return an error:

In [27]:
myList[3]

IndexError: list index out of range

Negative indexing can also be used to return values, and these negative values work by starting at the end of the list and working backwards:

In [28]:
myList[-1]

3

In [29]:
myList[-2]

'two'

In [30]:
myList[-3]

1

In addition, we can use list *slicing* to get multiple list elements at once:

In [31]:
myList[1:3]

['two', 3]

In this simple example, the slice `1:3` is defined by a starting point (i.e., `1`) and ending point (i.e., `3`) such that for the ending point, the final value is not returned (i.e., the 3rd value in myList is in position `2`, which is one less than the ending point of the slice, `3`). To see this more clearly, let's use a longer list example:

In [32]:
longerList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
longerList

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [33]:
longerList[0:3]

[1, 2, 3]

In [34]:
longerList[3:5]

[4, 5]

More advanced list slicing can also be performed by providing a third value to the slice. For instance, to get every other number in a specified range of the list, you could write:

In [35]:
longerList[2:10:2]

[3, 5, 7, 9]

It's also possible to use negative indexing with slices (although note that this can quickly become confusing):

In [36]:
longerList[-5:-2]

[6, 7, 8]

##### <a name = "InAndNotIn">**In and Not In**</a>

In addition to extracting values from lists by index, we can also check if values exist in a list using the *sequence operators* ***in*** and ***not in***, which are useful when working with a collection of values. For example:

In [37]:
myList = [1, 'two', 3]
'two' in myList

True

In [38]:
'three' not in myList

True

With lists this short, these operators aren't particularly helpful, but they can come in handy when lists get large.

Moreover, these operators will come in handy for a bunch of other uses will see in future units (e.g., for *iteration* and *control flow*).

##### <a name = "ListsUpdatingElements">**Updating Elements**</a>

List elements can be updated or modified by referencing by position. For instance, to change the first value to 100, we could do the following:

In [39]:
myList = [1, 'two', 3]

In [40]:
myList[0] = 100
myList

[100, 'two', 3]

We could also change two values at once using a slice:

In [41]:
myList = [1, 'two', 3] # reset the list
myList[0:3:2] = 100, 300
myList

[100, 'two', 300]

##### <a name = "ListsAddingElements">**Adding Elements**</a>

Elements can be added to the end of a list using the *append* method:

In [42]:
myList = [1, 'two', 3]
myList.append('four')
myList

[1, 'two', 3, 'four']

Note that appending an element does not require using the assignment operator to make the change permanent. This is because the *append* method is what's known as an *in-place* operation.

You can also add multiple elements to a list simultaneously by using *extend* and supplying a list of values to be tacked on:

In [43]:
myList.extend([5, 'six'])
myList

[1, 'two', 3, 'four', 5, 'six']

Finally, you can insert a value in a specific position using the aptly named *insert* method:

In [44]:
myList = [1, 'two', 3]
myList.insert(1, 'insertion')
myList

[1, 'insertion', 'two', 3]

##### <a name = "ListsDeletingElements">**Deleting Elements**</a>

Elements can be deleted from a list using *del*:

In [45]:
myList = [1, 'two', 3]
del myList[1] # remove 2nd element
myList

[1, 3]

Using slicing, you can also delete multiple elements at once:

In [46]:
myList = [1, 'two', 3]
del myList[0:2]
myList

[3]

Alternatively, if you would like to both remove a value by position and return that value while doing so, you can use the *pop* method:

In [47]:
myList = [1, 'two', 3]
myList.pop(1)

'two'

In [48]:
myList

[1, 3]

Finally, you can use *remove* to remove by value rather than by position:

In [49]:
myList = [1, 'two', 3]
myList.remove(1)
myList

['two', 3]

Note however that remove will only eliminate the first occurence of a value:

In [50]:
myList = [1, 'two', 3, 1]
myList.remove(1)
myList

['two', 3, 1]

##### <a name = "ListsJoining">**Joining**</a>

In the previous unit, we saw how text sequences can be concatenated (or put together) using the `+` operator. For example:

In [51]:
'Hello' + ' ' + 'world!' # concatenate multiple strings

'Hello world!'

Lists can also be concatenated -- or combined -- in a similar fashion:

In [52]:
myListA = [1, 'two', 3]
myListB = ['four', 5, 'six']

myListA + myListB

[1, 'two', 3, 'four', 5, 'six']

The result of adding the two lists would need to be stored in a variable if you planned on retaining it.

Alternatively, you can also use *extend* to achieve the same result as an in-place operation:

In [53]:
myListA.extend(myListB)
myListA

[1, 'two', 3, 'four', 5, 'six']

##### <a name = "ListsOtherMethods">**Other Methods**</a>

Python lists support a variety of other methods beyond ones you have already seen (e.g., *append* and *extend*). To see a list of these (no pun intended), see `help(list)`. A few common examples follow:

In [54]:
# len (return the length of the list)
myList = [1, 'two', 3]
len(myList)

3

In [55]:
# clear (empty the list entirely)
myList = [1, 'two', 3]
myList.clear()
myList

[]

In [56]:
# reverse (reverse the list)
myList = [1, 'two', 3]
myList.reverse()
myList

[3, 'two', 1]

In [57]:
# count (count the number of times a specified value appears in the list)
myList = ['A', 'B', 'A', 'B', 'C', 'B', 'C', 'C']
myList.count('C')

3

In [58]:
# index (return the position of the first instance of a value in the list)
myList = ['A', 'B', 'A', 'B', 'C', 'B', 'C', 'C']
myList.index('C')

4

In [59]:
# sort (sort the list)
myList = [4, 7, 2, 2, 5]
myList.sort()
myList

[2, 2, 4, 5, 7]

##### <a name = "ListsGeneratingRanges">**Generating Ranges**</a>

In an earlier example, we create a list of ten numbers by hand:

In [60]:
longerList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Creating a list like this requires a lot of typing even with just ten values. To create integer ranges in Python quickly and conviently, consider using the *range* function:

In [61]:
list(range(1, 11))

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Note that similar to slicing, the first argument to `range` is the first value in the resulting list, whereas the second number is the last number, which isn't actually included. In other words, in the example above, to get a list from 1 to 10, the second argument to `range` must be 11.

Also, be aware that range can take negative values in the arguments:

In [62]:
list(range(-3, 4))

[-3, -2, -1, 0, 1, 2, 3]

Using the other list methods described above, you can also get reverse number ranges. For instance:

In [63]:
revRange = list(range(-3, 4))
revRange.reverse()
revRange

[3, 2, 1, 0, -1, -2, -3]

#### <a name = "Tuples">Tuples</a>

Tuples are very similar to lists (i.e., a collection of values in a specific order that can include duplicates) with one major difference: tuples are *immutable*, meaning they cannot be changed once created. To define a tuple, we use parentheses instead of brackets:

In [64]:
('cat', 'dog', 'fish')

('cat', 'dog', 'fish')

Unsurprisingly, the type of a tuple is `tuple`:

In [65]:
myTuple = ('cat', 'dog', 'fish')
display(myTuple, type(myTuple))

('cat', 'dog', 'fish')

tuple

One important note with tuples: to create a tuple with a single element, you must put a comma at the end of the sequence. For example:

In [66]:
catTuple = ('cat',)
display(catTuple, type(catTuple))

('cat',)

tuple

Without the final comma, you will not actually end up with a tuple if supplying only one value:

In [67]:
notTuple = ('cat')
display(notTuple, type(notTuple))

'cat'

str

##### <a name = "TuplesAccessingElements">**Accessing Elements**</a>

Similar to lists (i.e., see above for more details), tuples elements can be accessed by position:

In [68]:
print(myTuple[0], myTuple[2]) # first and last elements of myTuple

cat fish


Negative indexing and slicing may also be used:

In [69]:
myTuple[-2] # return last element

'dog'

In [70]:
myTuple[1:3] # return 2nd and 3rd element

('dog', 'fish')

Finally, sequence operators such as `in` and `not in` apply to tuples as well:

In [71]:
'puppy' in myTuple

False

##### <a name = "TuplesUpdatingAddingDeletingElements">**Updating/Adding/Deleting Elements**</a>

As noted above, tuples are not meant to be changed once they are created, meaning you cannot update values, add new ones, or delete existing ones.

Trying to do any of these operations will result in an error:

In [72]:
myTuple[1] = 'hamster' # attempting to change the 2nd value to 'hamster'

TypeError: 'tuple' object does not support item assignment

That being said, if you do need to make edits to a tuple, you can convert it to a list quickly and easily as needed using the *list* function:

In [73]:
myList = list(myTuple)
display(myList, type(myList))

['cat', 'dog', 'fish']

list

Once converted to a list, the regular rules for lists apply; for example:

In [74]:
myList[1] = 'hamster'
myList

['cat', 'hamster', 'fish']

If desired, you can also convert your list back to a tuple using the *tuple* function:

In [75]:
myTuple = tuple(myList)
display(myTuple, type(myTuple))

('cat', 'hamster', 'fish')

tuple

##### <a name = "TuplesJoining">**Joining**</a>

Again, similar to lists, tuples can be combined via concatenation:

In [76]:
myTupleA = ('cat', 'dog', 'fish')
myTupleB = ('kitten', 'puppy')

myTupleA + myTupleB

('cat', 'dog', 'fish', 'kitten', 'puppy')

##### <a name = "TuplesOtherMethods">**Other Methods**</a>

Given all the restrictions surrounding tuples, the options for working with them are limited when compared to lists. As per `help(tuple)`, the main options are essentially *count* and *index*, which have similar functionality in lists:

In [77]:
# count (count the number of times a specified value appears in the tuple)
myLongerTuple = ('cat', 'dog', 'cat', 'dog', 'fish', 'dog', 'fish')
myLongerTuple.count('fish')

2

In [78]:
# index (return the position of the first instance of a value in the tuple)
myLongerTuple = ('cat', 'dog', 'cat', 'dog', 'fish', 'dog', 'fish')
myLongerTuple.index('fish')

4

Various other methods that also work for lists (and other structures involving collections of values) apply to tuples as well, including `len`:

In [79]:
myLongerTuple = ('cat', 'dog', 'cat', 'dog', 'fish', 'dog', 'fish')
len(myLongerTuple)

7

#### <a name = "Sets">Sets</a>

Unlike lists and tuples, sets are unordered collections of values (i.e., values have no indicies/positions), and they cannot contain duplicates. Moreover, they are more similar to tuples in that they are immutable and not meant to be changed once created.

To create a set, we use curly braces:

In [80]:
{'yellow', 'green', 'blue'}

{'blue', 'green', 'yellow'}

You'll notice when the set is defined, it is automatically placed in alphabetical order when display: remember, however, that sets do not contain an order per se, and as such position within the set is completely arbitrary. Let's assign our set to a variable and check its type:

In [81]:
mySet = {'yellow', 'green', 'blue'}
display(mySet, type(mySet))

{'blue', 'green', 'yellow'}

set

Unsurprisingly, the type is `set`.

##### <a name = "SetsAccessingElements">**Accessing Elements**</a>

Since sets do not have indicies, we can't use the bracket approach that applies to lists and tuples. Nevertheless, we can still use the sequence operators `in` and `not in` described earlier to check if values exist a set or not:

In [82]:
'green' in mySet

True

In [83]:
'red' not in mySet

True

Additionally, while it's something we'll cover in more detail later, we can also use iteration for sets and other types of sequences to extract values one-at-a-time:

In [84]:
mySet = {'yellow', 'green', 'blue'}

for eachVal in mySet:
    print(eachVal)

yellow
green
blue


##### <a name = "SetsUpdatingElements">**Updating Elements**</a>

As noted above, sets are immutable and the values contained within cannot be changed once they are created. As a workaround however, you can add a new element and delete the existing one you wanted to change using the code described in the following two sections.

##### <a name = "SetsAddingElements">**Adding Elements**</a>

It is possible to add elements to a set, although given the lack of indicies, we can really only append them.

To add a single element, use the *add* method:

In [85]:
mySet = {'yellow', 'green', 'blue'}
mySet.add('purple')
mySet

{'blue', 'green', 'purple', 'yellow'}

To add multiple elements, use the *update* method:

In [86]:
mySet = {'yellow', 'green', 'blue'}
mySet.update({'purple', 'pink'}) # you can also supply a list, tuple, etc. as the argument to update
mySet

{'blue', 'green', 'pink', 'purple', 'yellow'}

##### <a name = "SetsDeletingElements">**Deleting Elements**</a>

It is possible to delete elements from a set using various methods. These include:

In [87]:
# remove (delete an element by value and return an error if the value is not found)
mySet = {'yellow', 'green', 'blue'}
mySet.remove('green')
mySet

{'blue', 'yellow'}

In [88]:
# discard (delete an element by value but do not return an error if the value is not found)
mySet = {'yellow', 'green', 'blue'}
mySet.discard('purple') # not in the set, but not a problem
mySet

{'blue', 'green', 'yellow'}

In [89]:
# clear (empty the set completely)
mySet = {'yellow', 'green', 'blue'}
mySet.clear()
mySet

set()

In [90]:
# pop (delete the first element from the set while also returning the value)
mySet = {'yellow', 'green', 'blue'}
display(mySet)
display(mySet.pop()) # note the first value is based on the alphabetical sorting
display(mySet)

{'blue', 'green', 'yellow'}

'yellow'

{'blue', 'green'}

##### <a name = "SetsJoining">**Joining**</a>

There are several methods for joining two sets. The primary ones involve the methods *union* and *update*. Regarding the former, `union` will return all the items across the two sets while omitting duplicates:

In [91]:
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red', 'orange'}

mySetA.union(mySetB)

{'blue', 'green', 'orange', 'purple', 'red', 'yellow'}

Note that union returns the result of the operation but does not update either set:

In [92]:
display(mySetA, mySetB)

{'blue', 'green', 'purple', 'red', 'yellow'}

{'green', 'orange', 'red'}

In [93]:
To perform the operation in-place, use `update`:

SyntaxError: invalid syntax (<ipython-input-93-9f37e433f274>, line 1)

In [94]:
mySetA.update(mySetB)
mySetA

{'blue', 'green', 'orange', 'purple', 'red', 'yellow'}

In this case, `mySetA` is updated, while `mySetB` is untouched:

In [95]:
mySetB

{'green', 'orange', 'red'}

Finally, note that union and update can also be performed using symbols using `|` and `|=` respectively:

In [96]:
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red', 'orange'}

mySetA | mySetB # equivalent to union

{'blue', 'green', 'orange', 'purple', 'red', 'yellow'}

In [97]:
mySetA |= mySetB # equivalent to update (i.e., mySetA is changed in-place)
mySetA

{'blue', 'green', 'orange', 'purple', 'red', 'yellow'}

##### <a name = "SetsOtherMethods">**Other Methods**</a>

As noted in `help(set)`, sets support other methods that can come in handy. Some such as `len` work with sets just like they do for other structures such as lists and tuples.  Others are specific to sets; for instance, you can use set operators such as *intersection* and *difference* to make comparisons across two different sets, and these work in a fashion conceptually similar to structured query language (SQL):

In [98]:
# intersection (returns a set of values that appear across both sets without duplicates)
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red', 'orange'}
mySetA.intersection(mySetB)

{'green', 'red'}

In [99]:
# difference (returns the values that appear in one set but not the other)
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red', 'orange'}
mySetA.difference(mySetB)

{'blue', 'purple', 'yellow'}

Other functions can be used to see if one set is contained in another:

In [100]:
# superset (returns True or False depending on whether the argument set is contained in the overarching set or not)
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red'}
mySetA.issuperset(mySetB)

True

In [101]:
# subset (returns True or False depending on whether the argument set is contained in the overarching set or not)
mySetA = {'yellow', 'green', 'blue', 'purple', 'red'}
mySetB = {'green', 'red'}
mySetB.issubset(mySetA) # note this is essentially the inverse of superset above (mySetA and mySetB were swapped in the call)

True

Again, see `help(set)` for additional methods.

#### <a name = "Dictionaries">Dictionaries</a>

Dictionaries are another useful data structure provided by Python for working with data. Dictionaries are unordered, mutable (i.e., can be changed), and include an index. The latter point is unique however when compared to lists and tuples: specifically, data in dictionaries are stored and referenced as key: value pairs that create a mapping (in other languages, this structure is sometimes referred to as an *associative array*).

To show how dictionaries are used, the best thing to do is to create an example. Consider the following dictionary for an employee at a company:

In [102]:
myDict = {
    'FName': 'Jane',
    'LName': 'Smith',
    'Dept': 'Accounting',
    'StartYear': 2015
}

myDict

{'FName': 'Jane', 'LName': 'Smith', 'Dept': 'Accounting', 'StartYear': 2015}

In [103]:
type(myDict)

dict

As shown above, a dictionary is defined using curly brackets, and within these brackets are a sequence of key: value pairs as mentioned earlier. More specifically, `myDict` contains four keys, which we can see directly using the *keys* method:

In [104]:
myDict.keys()

dict_keys(['FName', 'LName', 'Dept', 'StartYear'])

Presumably, we will want to collect and store this information for every employee at our company. In that case, we can use *nesting* to store information on several employees at once:

In [105]:
myDictMore = {
    'emp1': {
        'FName': 'Jane',
        'LName': 'Smith',
        'Dept': 'Accounting',
        'StartYear': 2015
    },
    'emp2': {
        'FName': 'Lucy',
        'LName': 'Holland',
        'Dept': 'IT',
        'StartYear': 2012
    },
    'emp3': {
        'FName': 'Martin',
        'LName': 'Stephenson',
        'Dept': 'HR',
        'StartYear': 2020
    }
}

myDictMore

{'emp1': {'FName': 'Jane',
  'LName': 'Smith',
  'Dept': 'Accounting',
  'StartYear': 2015},
 'emp2': {'FName': 'Lucy',
  'LName': 'Holland',
  'Dept': 'IT',
  'StartYear': 2012},
 'emp3': {'FName': 'Martin',
  'LName': 'Stephenson',
  'Dept': 'HR',
  'StartYear': 2020}}

When organized this way, we can use methods we've seen before to work with the data; for instance, if we wanted to know the number of employees we have data on, we can use `len`:

In [106]:
len(myDictMore)

3

In addition, if you are working with `dict` objects heavily, you can find additional modules that can make your life either. For instance, to get a cleaner printout of a `dict`, consider importing the *pprint* function:

In [107]:
from pprint import pprint

pprint(myDictMore)

{'emp1': {'Dept': 'Accounting',
          'FName': 'Jane',
          'LName': 'Smith',
          'StartYear': 2015},
 'emp2': {'Dept': 'IT', 'FName': 'Lucy', 'LName': 'Holland', 'StartYear': 2012},
 'emp3': {'Dept': 'HR',
          'FName': 'Martin',
          'LName': 'Stephenson',
          'StartYear': 2020}}


### <a name = "ReferencesCopies">References vs. Copies</a>

One particularly important note regarding lists, tuples, and other data structures has to do with variable assignment. For instance, consider the following example involving lists:

In [108]:
myListA = [1, 'two', 3]
myListB = myListA

So far, we have defined two lists, and both will return the same value when called back:

In [109]:
print(myListA, myListB)

[1, 'two', 3] [1, 'two', 3]


Now, let's go ahead and make a change to `myListA`:

In [110]:
myListA[1] = 2
myListA

[1, 2, 3]

As you can see, this code changed the second value of `myListA`. The question however is did this have any impact on `myListB`?

In [111]:
myListB

[1, 2, 3]

The answer is yes, and the reason is that `myListB` did not store the value of the original list (i.e., `[1, 'two', 3]`) but rather a reference to that list by way of `myListA`. As such, any changes to `myListA` made in this fashion will also reflect in `myListB`.

While somewhat technical, we can use the *id* to function to see the identity of both variables and confirm they are identical:

In [112]:
print(id(myListA), id(myListB))

id(myListA) == id(myListB)

4443127232 4443127232


True

Assuming this is not what you wanted/intended (e.g., perhaps you wanted to make myListA and myListB initially identical but not tied together for subsequent code), what you'd need to do instead is to make a copy of the list using the *copy* method:

In [113]:
myListA = [1, 'two', 3]
myListB = myListA.copy()

myListA[1] = 2

print(myListA, myListB)

[1, 2, 3] [1, 'two', 3]


Notice how changes to `myListA` do not impact `myListB` when the latter was set up as a copy. This is because copy creates a new value for `myListB` that is indepedent of the one assigned to `myListA`:

In [114]:
print(id(myListA), id(myListB))

id(myListA) == id(myListB)

4443254016 4443103616


False

While the distinction of a reference vs. a copy is somewhat technical, keep this important distinction in mind to avoid accidentally making changes to one variable when working with another -- it's very easy to make a critical mistake if you are not keeping track of this distinction!

---
#### <a name = "ComparisonOperatorsIdentity">Comparison Operators: Identity</a>

Speaking of references, now is a good time to return to the two remaining comparison operators that we hadn't yet talked about in the previous unit. These are:

- is: object identity
- is not: negated object indentity

These operators are useful for making comparisons across objects and variables. They often work similarly to the analagous comparison operators `==` and `!=` (for `is` and `not is` respectively). For instance, consider how these operators can also be used to test whether different lists -- either references or copies -- refer to the same object (or not):

In [115]:
myListA = [1, 'two', 3]
myListB = myListA

myListB is myListA

True

In [116]:
myListA = [1, 'two', 3]
myListB = myListA.copy()

myListB is myListA

False

Note how the results of these comparisons differ as compared to using `==`:

In [117]:
myListA = [1, 'two', 3]
myListB = myListA

myListB == myListA

True

In [118]:
myListA = [1, 'two', 3]
myListB = myListA.copy()

myListB == myListA

True

The latter comparison returns `True` this time since it is really comparing values. By contrast, `is` is comparing identity. To simulate `is` with `==`, we'd need to wrap the comparison with the `id` function we used earlier:

In [119]:
myListA = [1, 'two', 3]
myListB = myListA.copy()

id(myListB) == id(myListA)

False

Now the result is as expected (i.e., since `myListB` is a copy of `myListA` and not simply a reference to it in this case).