# Python Review

## Overview

Python and R are the primary languages used for data science and machine learning.  Since we are coming at this problem from an information security perspective and Python is so prevalent in this space, we will use Python throughout the course.

Students were informed that they should come to class with at least a basic background in Python.  This section will review the fundamentals of lists, tuples, and arrays in addition to introducing some of the fundamentals of NumPy.

## Goals

By the end of this lab, you should be able to:

 * Get all of the required libraries configured
 * Explain the differences between lists, tuples, arrays, and dictionaries
 * Create all of these various objects
 * Iterate and enumerate these objects
 * Perform basic list comprehensions
 * Create NumPy arrays
 * Perform some basic manipulations of NumPy arrays
 
## Estimated Time: 30 - 45 minutes

In this class, we will rely heavily on Python, Numpy, Pandas, and a number of other libraries for our exercises.  For this reason, we want to take some time to have you walk through some material that is both refresher (the Python part, hopefully) and a gentle introduction to some fundamentals (for Numpy and Pandas).

# <img src="../images/task.png" width=20 height=20> Instructor Walkthrough

Before we dive into the Python Review lab, we want to make certain everything needed for all of the rest of the labs in the course is set up.  Additionally, your instructor will walkthrough using Anaconda to create a new environment, as outlined below.  If you are taking the class through OnDemand, you should walk through these steps along with the studio recording.

*PLEASE NOTE!* - If you are using a Macintosh, you must additionally:

1. Use a browser to go to https://brew.sh (Homebrew is a package management tool for MacOS)
2. Copy the command line on that home page and paste it into a terminal window to have it automatically download and install the manager.
3. After the installation completes, in a terminal window execute `brew install cmake` to get the cmake package installed.  It’s not on a Mac by default.
4. Next, execute `brew install openssl` to install the Open SSL development libraries and headers.

## Creating an Anaconda Environment

The Anaconda GUI is a wonderfully easy way to start tools like Jupyter Lab.  Unfortunately, if you start Jupyter through this interface, you will not be able to monitor Jupyter's status or see any warnings or errors that would be displayed at the console.  For this reason, we strongly recommend running Jupyter from the command line.

Additionally, you could use the `base` environment that comes with Anaconda, which already has Jupyter installed, but this isn't a great idea either.  Why not?  When working with Python, it isn't unusual to need to switch to different versions of Python, or to use different versions of various libraries to deal with different code dependencies.  If you get into the habit of installing everything into the `base` environment, or worse, into your system Python if you are on a Mac or on Linux, it can be very difficult to detangle all of the different versions.  This is where Python virtual environments come in.

# <img src="../images/task.png" width=20 height=20> Instructor Step 1

The first thing to do is to get an Anaconda prompt open.  You can do this by clicking on your Windows menu and typing `anaconda`.  Look for the entry for "Anaconda prompt."  Take note that it might not be the first result displayed!

![Anaconda Prompt](../images/anaconda-prompt.png)

Once you have opened a prompt, you can now create a new Python environment to work with.  We would recommend using Python 3.10.  You also need to select a name for the environment you will use.  Any name is fine.  It might be good to choose a name related to the class so that you know what it is when you see it again some months from now.

# <img src="../images/task.png" width=20 height=20> Instructor Step 2

In the open prompt, execute the following, setting the `name` to be anything you would like:

![Create an Environment](../images/create-environment.png)

It is *very likely* that you will be prompted about updating Anaconda.  You may do this if you wish, but it is absolutely not necessary for anything related to the course.

A great deal of information will stream by, ending with a prompt asking you whether Anaconda should proceed.  You should hit 'y' and press enter:

![Proceed](../images/proceed.png)

We next need to switch into our newly created environment and install some software.  The only additional tool that we need to install manually is Juptyer Lab.

# <img src="../images/task.png" width=20 height=20> Instructor Step 3

Change into your newly created environment and install Jupyter Lab. To accomplish this, please execute the following commands:

```
conda activate 595
pip install jupyterlab
```

Take note that `jupyterlab` is all one word.  Also note that `595` should be replaced with whatever name you chose to use for your environment.

![Activate the environment and install Jupyter](../images/jupyter.png)

# <img src="../images/task.png" width=20 height=20> Instructor Step 3

The very last step is to change to wherever your exercises have been copied and to execute `jupyter lab` at your console.  Please take note that this is *two words!*

![Running jupyter lab](../images/run-jupyter.png)

# <img src="../images/task.png" width=20 height=20> Instructor Step 4

Please execute the following cell to pull the remaining dependencies into your new environment.


In [1]:
#Most systems
#!pip3 install mongoengine bs4 lxml ssh2-python tensorflow==2.4 graphviz pydot-ng scapy py-postgresql Pillow pydot-ng
# On MacOS you may need to use conda to install cmake first.  If the ssh2-python install fails
# this should solve it.
!pip3 install \
    mongoengine \
    bs4 \
    lxml \
    paramiko \
    graphviz \
    pydot-ng \
    scapy \
    py-postgresql \
    psycopg2-binary \
    Pillow \
    pandas \
    matplotlib \
    scipy \
    scikit-learn \
    "tensorflow<=3.9" \
    pycryptodome \
    cryptography

# Note that there are newer releases of TensorFlow.  We are fixing this to 3.9 or below in
# case someone wishes to configure GPU support on Windows.  Installing GPU support in versions
# greater than 3.9 is very complicated on Windows.



With these things out of the way, we're now ready to dive into Python!

# <img src="../images/task.png" width=20 height=20> Task 1.1

First, let's learn an important feature that makes using and troubleshooting Python using Jupyter very easy.  Use the following cell to multiply 2 by 14 and include *no other code* in the cell.  What happens?

In [2]:
2*14

28

A wonderful feature of Jupyter is that it will print the value of the last statement in every cell.  While you can explicitly print a value at any time, this makes troubleshooting very easy.  Rather than writing debugging code or explicitly printing a value, you can just put any variable that you want to inspect as the final (or only) element in a code cell and execute it.

# Important Data Types

NumPy n-dimensional arrays are a fundamental data type that we will work with throughout the course.  Most, if not all, machine learning and statistics libraries for Python make heavy use of NumPy arrays.  While they will feel very familiar to someone who has worked with Python arrays, there are definitely differences.  Before we dive into NumPy arrays, let's first take some time to experiment with Python tuples, lists, arrays, and dictionaries.  We'll end up using all of these at some point during this course.  Once we've reviewed a few fundamentals, we'll transition to NumPy arrays and make some comparisons.

## Tuples

The notion of a tuple is an important one.  We may not use this extensively in our course, but we will find a few places, especially when we start exploring the creation of TensorFlow datasets using generators, where they are used.  A tuple is an *immutable ordered set of objects*.  Notice that we do not define a tuple as an ordered set of *values*.  This is because, as you are likely aware, everything in Python is an object.

To create a tuple, we can assign a list of objects in parentheses to a variable:

```
my_tuple = (3,4,5)
```

While this *looks* like we are assigning values (and you certainly can think about it in that way), we are really assigning an ordered set of three objects that happen to be integers.

A key difference between tuples and lists in Python is that tuples are *immutable*.  This means that they cannot be modified once created.

# <img src="../images/task.png" width=20 height=20> Task 1.2

Create a tuple named `my_tuple`.  Assign the objects `'A string'`, `3.14159`, and `42` to your tuple.


In [3]:
my_tuple =('A Sting', 3.14, 42)

### Iterating
Now that we have a tuple to work with, let's explore it just a bit.  As it turns out, Python tuples, lists, and arrays all have a number of common methods that they implement.  A key example is that we can *iterate*, or process each element successively, over each of these types of objects.

There are a number of different ways to iterate over an iterable class.  One of the most familiar is the `for` loop.  We can use it as follows:

```
for item in my_tuple:
    print(item)
```

This instructs Python to take each element of the tuple in turn, assigning the current element to the variable `item`.  We can then do something with that element.  In this case, we just print out each element.

Continuing from the thought that tuples need not be homogeneous (i.e., do not require that all elements are the same type or class), let's prove that this is true in our case.  For example, is it possible that when we created our tuple with what appear to be mixed object types, Python just cast them all to be strings?

To investigate, we can make use of the `type()` function.  This built-in function will return the type or class of the object that you pass to it.

# <img src="../images/task.png" width=20 height=20> Task 1.3

Iterate over `my_tuple`, printing out the type and the value for each element.

In [4]:
for item in my_tuple:
    print(item)

A Sting
3.14
42


### F-String Aside

There are many ways that you could have solved the printing aspect of the last task.  For example, a very straightforward way to approach it would be to use the following statement inside of a `for` loop:

```
print(type(item), item)
```

Another alternative that was introduced with Python 3 are *f-strings*, or formatted strings.  Python has long supported formatted output of strings using a variety of formatting options, but Python 3 introduced a much simpler way to approach formatted output.  Using an f-string we can embed the variables directly into our format string without having to work out locations and pass a long list of variables at the end!  Using them is quite simple.  For example:

```
print(f'Item class: {type(item)}\tValue: {item}')
```

Notice that our quoted string is prefixed with a lowercase *f*.  Within this string we can embed any variable, function, or expression that we wish by enclosing it within curly-braces ({}).  That value will be inserted into the output directly with no additional processing.

You might also notice that we have used single quotes for our f-string.  This is not a requirement.  You can use either single or double quotes.  Which you choose to use is completely up to you and is ultimately a style choice.  It is good to be consistent, though you might choose to leverage them to avoid escaping characters.  Consider the following:

```
print(f"I'm sorry Dave, I'm afraid I can't do that.")
print(f'Hal said, "All my circuits are functioning properly."')
```

Both of these strings are completely valid.  Can you see the difference?  In the first case we used double quotes and in the second we used single quotes.  To what end?  To avoid having to escape the embedded quotes!

In a number of languages (Perl, Ruby, Bash, etc.) there is a very big distinction between single and double quoted strings.  Usually a single quoted string is a string literal and a double quoted string supports string interpolation.  *This is not true in Python.*

### Enumeration while Iterating

In many cases, we will want to iterate over a tuple, list, array, or other iterable and we will want to keep track of the *index* of the current object.  For example:

```
idx = 0
for item in my_tuple:
    print(f'Item index: {idx}\tItem value: {item}')
    idx = idx + 1
```

Even though this code works perfectly well, it is not *pythonic*.  Whether or not something is pythonic is determined by whether or not we have used the features of the language to express our intent in a way that is beautiful, explicit, simple, uncomplicated, as flat as possible, as sparse as possible, and as readable as possible.  To be frank, some of these are debatable.  What might be perfectly readable for one person might be completely unintelligible for another with less Python experience.

Even so, there is a built-in function, `enumerate()`, that can be used to achieve the same result more simply and more beautifully:

```
for idx, item in enumerate(my_tuple):
    print(f'Item index: {idx}\tItem value: {item}')
```

The `enumerate()` function takes any iterable as an argument and returns a *tuple* containing the index of the current object along with a copy of the object itself.

# <img src="../images/task.png" width=20 height=20> Task 1.4

Iterate over the `my_tuple` object using the `enumerate()` function.  Print out the index, value, and type of each object.

In [5]:
for idx, item in enumerate(my_tuple):
    print("Item Index: {} \t Item Value {}".format(idx, item))

Item Index: 0 	 Item Value A Sting
Item Index: 1 	 Item Value 3.14
Item Index: 2 	 Item Value 42


To wrap up with tuples, we want to do two things:
 * Prove that tuples are immutable
 * Introduce some basic exception handling
 
Let's take our tuple and heedless of Python documentation attempt to modify the tuple.  For example, much as we can with a list or an array, we can index the values in a tuple directly:

```
print(my_tuple[2])
```

What happens if we try to make an assignment?  Consider:

```
my_tuple[2] = 10

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-28e852db28b9> in <module>
----> 1 my_tuple[2] = 10

TypeError: 'tuple' object does not support item assignment
```

Clearly, we have angered the Python gods.  Obviously, we would never attempt something quite so brazen within our real code since we know that tuples are immutable.  We will discover situations, however, where exceptions might occur.  During development, we likely *want* exceptions to halt our code so that we can work out what's been done wrongly and make corrections.  However, we may also find situations where the data that we are processing is sometimes unpredictable.  When we are pre-processing our data, which is a very common task in data science, or deploying our code into production, we likely do not want exceptions to interrupt processing!

### Exception Handling

To facilitate this, Python supports the `try...except...` clause.  Please note that this is similar to, but different from, many other languages.  It is far more common to have a `try...catch...` semantic.  We find ourselves mistakenly writing exactly this quite often.

How can we use this exception handling?  Like so:

```
try:
    my_tuple[2] = 10
except:
    print("The documentation is correct; we cannot assign to an immutable tuple!")
```

This code will run just fine, albeit silencing the error.

# <img src="../images/task.png" width=20 height=20> Task 1.5

Iterate over the `my_tuple` tuple, using 1.0 as the numerator and each value in the tuple as the denominator.  Print each result and prevent an exception from halting your program.

In [6]:
numerator = 1.0
for item in my_tuple:
    try:
        print("Numerator {} over {} = {}".format(1.0,item,numerator/item))
    except:print("Error in calculation")
     

Error in calculation
Numerator 1.0 over 3.14 = 0.3184713375796178
Numerator 1.0 over 42 = 0.023809523809523808


While our code can now execute, wouldn't be nice to know what the problem was and to provide that as feedback to the user?  We can capture this information by providing a variable name to which the exception can be assigned.  For example:

```
numerator = 1.0
for item in my_tuple:
    try:
        print(f'{numerator} / {item} = {numerator / item}')
    except Exception as e:
        print(f'Could not divide {numerator} by {item} because exception {e.__class__} occurred!')
```

Notice that we are now capturing the exception in a variable named `e`, which is of type `Exception`.  We use the dunder method `__class__` to obtain the class name.  Just in case you are unfamiliar with them, there are many "Double Underscore" methods in Python, which are affectionately called "dunder methods" or "magic methods".

### Copies or References

Something that can get us into a bit of trouble in Python is how objects are handled during iteration.  Suppose we iterate over an immutable tuple.  Suppose that during each iteration, we modify the value of the item that is assigned to our temporary variable during iteration.  What happens?

# <img src="../images/task.png" width=20 height=20> Task 1.6

Iterate over `my_tuple`.  Assign a new value to the variable passed into your iteration loop and perform some operation on it.  What happens?

In [7]:
#for idx, item in enumerate(my_tuple):
 #   my_tuple[idx] = 'Gus'
# You cant do this so dont know why he is asking 

So what happened?  Did you have to use exception handling?  Unless you chose to do something weird, probably not!  But wait a minute, tuples are supposed to be immutable.  Does this mean that the tuple itself is immutable but the objects within it can be changed?  Let's check...

# <img src="../images/task.png" width=20 height=20> Task 1.7

You should have just iterated over `my_tuple`, changing the value passed into your iteration loop in every pass.  Print out the current value of `my_tuple`.  Prove to yourself whether or not the objects within the tuple have changed:

In [8]:
for item in my_tuple:
    print(item)

A Sting
3.14
42


Exactly what does this mean is happening?  This is proof that Python is passing a *copy* of each object, not a *reference* to the original object within the tuple!  This is a *very* important distinction.  We'll explore this a bit more as we investigate lists and arrays since this behavior can lead to somewhat unexpected results.

## Lists and Arrays

Lists and arrays are very similar in Python, but they are not identical.  They can be easily confused since we access the members of each in exactly the same way that we would the members of a tuple; that is, using square brackets or iteration.  Let's create a list and an array and run some experiments.

### Defining a List

Creating a list is very easy.  Simply assign a set of objects in a square-bracketed list to a variable:

```
a_list = [1, 2, 3, 4, 5]
```

Everything that we discussed with regard to iteration and tuples can also be performed using a list.

# <img src="../images/task.png" width=20 height=20> Task 1.8

Create a list containing the values 1, 2, 3, 5, 7, 11, and 13.  Iterate over the list, printing the index of each value, the value, and the square of each value.


In [9]:
my_list = [1,2,3,4,5,7,11,13]
for idx, item in enumerate(my_list):
    print("Index: {} Value: {} Square: {}".format(idx, item, item*item))

Index: 0 Value: 1 Square: 1
Index: 1 Value: 2 Square: 4
Index: 2 Value: 3 Square: 9
Index: 3 Value: 4 Square: 16
Index: 4 Value: 5 Square: 25
Index: 5 Value: 7 Square: 49
Index: 6 Value: 11 Square: 121
Index: 7 Value: 13 Square: 169


### Defining an Array

Defining an array looks very similar to defining a list, but we must explicitly call the class constructor for the `array` class.  Since Python arrays must be homogeneous (all elements of the same type or class), we must also pass an indicator configuring the class for the specific type of data that we are storing.  Another difference is that we must import the `array` module first!  A typical abbreviation for the `array` class is `arr`:
    
```
import array as arr
a_list = arr.array("i", [1, 2, 3, 4, 5])
```

While we certainly would not recommend it, it is pretty common to find code that reads more like this:

```
from array import array

a_list = array("i", [1, 2, 3, 4, 5])
```

This brings the `array` class from the `array` module directly into the global namespace.  This is a *bad idea* when we will be importing other libraries and modules that implement arrays differently.

In the examples above, we have used the `"i"` character to indicate that we are storing an integer.  What other values are possible and what do they mean?

| Code  |   C Type             |   Type        |Size in Bytes|
|:-----:|:--------------------:|:-------------:|:-----------:|
| b     |   signed char        | int           |    1        |
| B     |   unsigned char      | int           |    1        |
| u     |   wchar_t            | unicode char  |    2        |
| h     |   signed short       | int           |    2        |
| H     |   unsigned short     | int           |    2        |
| i     |   signed int         | int           |    2        |
| I     |   unsigned int       | int           |    2        |
| l     |   signed long        | int           |    4        |
| L     |   unsigned long      | int           |    4        |
| q     |   signed long long   | int           |    8        |
| Q     |   unsigned long long | int           |    8        |
| f     |   float              | float         |    4        |
| d     |   double             | float         |    8        |

You might wonder why we make a point of indicating the number of bytes consumed by each type.  It turns out that one of the advantages to using an array over a list in Python is that arrays are stored more efficiently.  This can matter if we are dealing with millions or billions of elements and we want to minimize the footprint of each element in the array.  You might also notice that we cannot store strings or other arbitrary objects in a basic Python array!  They are intended to store numeric values *only*.

Again, all of the previous iteration and enumeration operations work just fine on arrays.

# <img src="../images/task.png" width=20 height=20> Task 1.9

Create an integer array containing the values 1, 2, 3, 4, 7, 11, 13.  Iterate over the list, printing the index of each value, the value, and the square of each value.

In [10]:
import array as arr
my_array = arr.array("i", [1,2,3,4,5,7,11,13])
for idx, item in enumerate(my_array):
    print("Index: {} Value: {} Square: {}".format(idx, item, item*item))

Index: 0 Value: 1 Square: 1
Index: 1 Value: 2 Square: 4
Index: 2 Value: 3 Square: 9
Index: 3 Value: 4 Square: 16
Index: 4 Value: 5 Square: 25
Index: 5 Value: 7 Square: 49
Index: 6 Value: 11 Square: 121
Index: 7 Value: 13 Square: 169


There are some things that might be counterintuitive when using lists and arrays in Python.  For example, what do you expect will happen if you attempt to multiply a list or an array by 2?

# <img src="../images/task.png" width=20 height=20> Task 1.10

Using your existing array and list, multiply each by two and see what the result is.

In [11]:
my_list2 = my_list * 2
my_array2 = my_array * 2
print(my_list2)
print(my_array2)

[1, 2, 3, 4, 5, 7, 11, 13, 1, 2, 3, 4, 5, 7, 11, 13]
array('i', [1, 2, 3, 4, 5, 7, 11, 13, 1, 2, 3, 4, 5, 7, 11, 13])


Wait... what??  What just happened??  Multiplying a list or an array in Python generates an array or a list that has an arbitrary number of copies of that array!  This can be handy in some cases, but it might be unexpected.  How can we double each value?  Well, we could use iteration, but could there be a better, more pythonic, way?  Definitely!

### List Comprehensions

A common expression in Python is that of the *list comprehension* (or the related *dictionary comprehension*).  It might not be clear what the word *comprehension* means here.  Instead, it is good enough to just think of this as a shorthand for iteration.  It takes the following form:

```
resulting_list = [i for i in original_list_or_array]
```

The example above creates a new list but otherwise makes no changes to the values.  We can, however, make any change we wish:

```
resulting_list = [i**2 for i in original_list_or_array]
```

This example iterates over the list, substituting each value for the variable `i` and squaring each value.  The return of this comprehension is a new list where each element has been squared.  Generally, a list, or dictionary comprehension, is *always* faster than iterating using a `for` loop.

**It is very important to note** that in both of the examples above, the returned object is a *list* **even if you started out with an array**.  This means that if you really need an array, you will have to create one using the *result of the list comprehension*.

# <img src="../images/task.png" width=20 height=20> Task 1.11

Using your existing list and array, use a comprehension on each of them.  Produce a list containing the squares of your list and an array containing the cubes of your array.  Print the squares and cubes.

In [12]:
square_list = [i*i for i in my_list]
square_array = [i*i for i in my_array]
print(square_list)
print(square_array)

[1, 4, 9, 16, 25, 49, 121, 169]
[1, 4, 9, 16, 25, 49, 121, 169]


Before moving on from arrays and lists, we want to formalize something that was implied above.  *When iterating over a python list, array, or dictionary, we are working with a **copy** of each element, not the element itself.*  Please execute the following cell and consider the output:

In [13]:
print(square_array)
for i in square_array:
    i = i * 2
    print(i)
print(square_array)

[1, 4, 9, 16, 25, 49, 121, 169]
2
8
18
32
50
98
242
338
[1, 4, 9, 16, 25, 49, 121, 169]


Look at the output above carefully and make sure you understand what it means.  When we are handed something by the iterator, it is a *copy*, not the *original*.  This means that any changes made to the copy *have no effect* on the original.  We're sure that for some of you, this is old news.  However, we are making a big deal about this because understanding this can save you hours of debugging headaches when you think you are modifying an array while iterating!

## Dictionaries

Python dictionaries are another very common object type that we will leverage throughout the class.  Dictionaries are similar to lists and arrays, but rather than retrieving an element by its index, we typically retrieve values using some type of key.  In fact, Python dictionaries are a form of a *key-value store*.

Even though dictionaries are similar to lists and arrays, there are differences.  Like lists (but unlike arrays), dictionaries allow us to have mixed types for both the keys and the values within the dictionary.  We can even make the values (or the keys, though it would be a bit unusual) lists or arrays, just as we can with a Python list.

We can easily differentiate between arrays, lists, and dictionaries when we look at how they are created.  A dictionary is created as follows:

```
a_dictionary = {}
# or
another_dictionary = dict()
```

Assigning a value to a key is as simple as using that key as a key and assigning a value to it:

```
a_dictionary['life'] = 42
```

Notice that when we access a key, we use the same square-bracket notation that we use with lists, arrays, and sometimes tuples.  Even so, it can seem that it would be readily apparent which type of object we are using since we can see that the key is a string.  However, consider the following code fragment:

```
value = obscure_object[i]
```

In the above code fragment, we have no way of knowing what kind of object `obscure_object` is, especially since we have no idea what value `i` holds.  This could be a tuple, a list, an array, or a dictionary.

We can also assign values upon creation, much as we can with a list or an array:

```
a_dictionary = {"life":42, "alf":"hungry", 42:"life"}
```

Notice that we have used mixed types for both the values and the keys in this case.

Iterating over a dictionary can be accomplished using the same `for` construct where we iterate over the keys:

```
for k in a_dictionary:
    print(f'Key: {k}\tValue: {a_dictionary[k]}')
```

# <img src="../images/task.png" width=20 height=20> Task 1.12

Create a dictionary named `grades` with the following keys and values, then iterate over the dictionary displaying all keys and values.

| Key      |  Value        |
|:--------:|:-------------:|
| 'quiz 1' | 90            |
| 'quiz 2' | 70            |
| 'quiz 3' | 60            |
| 'test 1' | 75            |
| 'final'  | 98            |

In [14]:
grades = {"quiz 1": 90, "quiz 2": 70, "quiz 3": 60, "test 1":75, "final": 98}
for key in grades:
    print("K {} Value {}".format(key, grades[key]))

K quiz 1 Value 90
K quiz 2 Value 70
K quiz 3 Value 60
K test 1 Value 75
K final Value 98


While this approach works just fine, if our goal is to access the values within the dictionary, there is a much more pythonic way to approach this.  The dictionary class implements several useful methods:

 * **keys()** : The `keys()` method on a dictionary will return a list of the keys.
 * **values()** : The `values()` method on a dictionary will return a list of the values.
 * **items()** : The `items()` method on a dictionary returns a list of tuples, each of which holds a respective key and value.(!!)
 
That last method, `items()`, can be used to simplify our logic a bit:

```
for k,v in grades.items():
    print(f'Key: {k}\tValue: {v}')
```

# <img src="../images/task.png" width=20 height=20> Task 1.13

Prove to yourself that the `items()` method can be used to iterate over tuples of keys and values by simplifying your code from Task 1.12.

In [15]:
for key, value in grades.items():
    print("K {} Value {}".format(key, value))

K quiz 1 Value 90
K quiz 2 Value 70
K quiz 3 Value 60
K test 1 Value 75
K final Value 98


# Using NumPy

The NumPy package is leveraged by every machine learning solution for Python.  Its name is an abbreviation for "Numeric Python."  As pointed out in the class discussion, NumPy provides native compiled numeric extensions for Python with optimized linear algebra capabilities.  We do not yet know why this matters, but it will turn out to be exceptionally important in this class.


We'll start by importing NumPy.  In Python it is possible to simply import an entire library into the global namespace by using:

```
import * from numpy
```

While this would make all of the classes and utility functions in NumPy available without needing to specify a namespace, this is actually terrible practice.  Imagine that we have two Python libraries that we are going to import.  Both of them define a class with the name `array`.  If we import the libraries like this:

```
import * from libraryA
import * from libraryB
```

Which `array` class will be used if we define an array?  What if we want to use the base Python `array` type?  What has happened is that we have polluted the global namespace.  If we try to define an array, it will automatically be of the type last defined.

Much better practice is to import the parts that we need, minimizing what's loaded, or to import the library with a specified namespace.  To do this, we could do something like this:

```
import numpy as np
import pandas as pd
```

Now both NumPy and Pandas have been imported.  If we want to reference something from NumPy, we simply prefix it with `np.`, or to access a part of Pandas we would use `pd`.

# <img src="../images/task.png" width=20 height=20> Task 1.14

Please use the next cell to import the NumPy library.  Import NumPy as `np`.
For this reason, let's have a look at the analog to Python lists and arrays:  the NumPy array.

In [16]:
import numpy as np
import pandas as pd 

# Using NumPy

The NumPy package is leveraged by every machine learning solution for Python.  Its name is an abbreviation for "Numeric Python."  As pointed out in the class discussion, NumPy provides native compiled numeric extentions for Python with optimized linear algebra capabilities.  We do not yet know why this matters, but it will turn out to be exceptionally important in this class.  For this reason, let's have a look at the analog to Python lists and arrays:  the NumPy array.

## NumPy Arrays

There are other types within NumPy that we will make use of, but the core object type that we care about is the NumPy array.  While these arrays are similar to Python arrays, they are not the same.  Let's jump into our exploration by creating some arrays and manipulating them.

The first step is always to import NumPy.  We've already done that in this notebook; it was the very first task that we asked you to complete.  Even so, for completeness, let's import it again when we create our first array.  Creating the array will look very similar to Python arrays:

```
import numpy as np
my_array = np.array([1, 2, 3, 5, 7, 11, 13, 17])
```

Except for the fact that we are instantiating a NumPy array, the call is nearly identical.  You will notice, however, that we do not need to specify the type.  Why not?  There are two reasons.  Let's experiment and show you the first.

# <img src="../images/task.png" width=20 height=20> Task 1.15

Please create a NumPy array named `numpy_array` that contains the values 1, 2, 3, 5, 7, 11, 13, 17.  After creating the array, please use `print()` to print out the object.

In [17]:
numpy_array = np.array([1,2,3,5,7,11,13,17])
print(numpy_array)

[ 1  2  3  5  7 11 13 17]


So far, so good.  We have created the array and can print it and it seems to contain integers.  How did that happen without us specifying the type?  NumPy *inferred* the type based on the values that we passed to it.  Let's try that again, but this time assigning in different types.

# <img src="../images/task.png" width=20 height=20> Task 1.16

Create another NumPy array named `test_array` that contains the values 1, 2.2, 15, 'test', 3.14159265358979323846264338327950288419716939937510.  After creating the array, print it.

In [18]:
test_array = np.array([1, 2.2, 15, 'test', 3.14159265358979323846264338327950288419716939937510])
print(test_array)

['1' '2.2' '15' 'test' '3.141592653589793']


What happened?  Can you see a difference?

Notice that the output from our first array looks like this:
    
```
[ 1  2  3  5  7 11 13 17]
```

Our new array looks like this:

```
['1' '2.2' '15' 'test' '3.141592653589793']
```

Once again, it has inferred the type for our values, but it has made them all strings!  Just like Python arrays, all of the elements in a NumPy array must be of the same type.  In this case, the class initializer is inferring a common type that all of the values can be cast.

Let's return to our first array, `numpy_array`, and attempt some arithmetic.

# <img src="../images/task.png" width=20 height=20> Task 1.17

Multiply `numpy_array`, which was defined previously, by 3 and print the result.  Is the result different from what happens when you multiply a Python array by a value?

In [19]:
print(numpy_array*3)

[ 3  6  9 15 21 33 39 51]


This is a huge difference.  Rather than appending duplicates of the array, NumPy arrays perform element-wise arithmetic!

In fact, what you are really seeing here is an artifact of linear algebra.  In linear algebra, our array is really a *vector*, or an ordered list of values... which is what an array is!  The number three that we are multiplying it by would be called a *scalar* value.  That is, a value that will scale a vector.  In linear algebra, when you multiply a scalar by a vector, you perform element-wise multiplication.

Let's try this again, this time using division.

# <img src="../images/task.png" width=20 height=20> Task 1.18

Divide `numpy_array` by 7 and print the result.

In [20]:
print(numpy_array/7)

[0.14285714 0.28571429 0.42857143 0.71428571 1.         1.57142857
 1.85714286 2.42857143]


NumPy, as a general rule, uses a *functional programming* approach to manipulating data.  This means that NumPy functions will not change the data passed in, but instead will return a transformed copy of the data.

> Be warned that there can be what may seem to be some inconsistency here.  There are some NumPy functions that modify the NumPy array in-place!  This is definitely the exception, however, and not the rule.  An example of this would be the `np.random.shuffle(ndarray)` function which will shuffle the values in a NumPy array in-place.

While the behavior that we are observing is a property of linear algebra, you will often hear it referred to within NumPy as *broadcasting*.

Let's look at a few other features of NumPy arrays and how they can be interrogated and manipulated.  One of the things that we will often want to check is the *shape* of our data.  We can find this by examining the `shape` attribute of our array.

# <img src="../images/task.png" width=20 height=20> Task 1.19

Print out the `shape` of the `numpy_array` that you created above.

In [21]:
print(np.shape(numpy_array))

(8,)


The result that we receive is a tuple, in this case `(8,)`.  This indicates that our array has eight elements in it.  Let's reshape this data into a grid, or *matrix*, of values.  To do this we can use the `reshape()` method.

# <img src="../images/task.png" width=20 height=20> Task 1.20

Let's look at the help, or *docstring*, for the `reshape()` method.  To do this, please execute the following cell:

In [22]:
help(numpy_array.reshape)

Help on built-in function reshape:

reshape(...) method of numpy.ndarray instance
    a.reshape(shape, order='C')
    
    Returns an array containing the same data with a new shape.
    
    Refer to `numpy.reshape` for full documentation.
    
    See Also
    --------
    numpy.reshape : equivalent function
    
    Notes
    -----
    Unlike the free function `numpy.reshape`, this method on `ndarray` allows
    the elements of the shape parameter to be passed in as separate arguments.
    For example, ``a.reshape(10, 11)`` is equivalent to
    ``a.reshape((10, 11))``.



In the cell above, we are taking advantage of the fact that the output of the last line of code executed in a Jupyter cell will be rendered as output.  The built-in function `help()` allows us to retrieve the docstring for python objects and functions easily, saving us the effort of looking things up on the internet.  This is also practical for another reason; when we use the `help()` function we absolutely know that we are looking at the documentation *for the version that we have installed* rather than some random version that we happen to find on the internet.

Since this function relies on the documentation of the more general `reshape()` function, let's also look at that documentation.

# <img src="../images/task.png" width=20 height=20> Task 1.21

Please use the following cell to print out the help for the NumPy function `reshape()`.

In [23]:
help(numpy_array.reshape)

Help on built-in function reshape:

reshape(...) method of numpy.ndarray instance
    a.reshape(shape, order='C')
    
    Returns an array containing the same data with a new shape.
    
    Refer to `numpy.reshape` for full documentation.
    
    See Also
    --------
    numpy.reshape : equivalent function
    
    Notes
    -----
    Unlike the free function `numpy.reshape`, this method on `ndarray` allows
    the elements of the shape parameter to be passed in as separate arguments.
    For example, ``a.reshape(10, 11)`` is equivalent to
    ``a.reshape((10, 11))``.



That's a lot to take in, but we don't need to understand every aspect of that documentation to use the `reshape()` function on an array.

Our array is currently shape `(8,)`.  Let's reshape it to be a 2x4 grid.  That is, 2 rows of 4 columns each.

# <img src="../images/task.png" width=20 height=20> Task 1.22

Use the following cell to reshape `numpy_array` as a 2x4 grid.  You should replace the value of `numpy_array` with the reshaped array.  Print the resulting array.

In [24]:
numpy_array2 = numpy_array.reshape(2,4)

If we look at the shape of the resulting array, we have the expected shape of `(2,4)`.  Looking at the output above, we can see that this is stored in memory as a set of nested numpy arrays.

In addition to the shape of an array, we are frequently interested in the number of *dimensions*.  While these might intuitively feel as though they are the same things, they are different.  To see the number of dimensions, we can look at the `ndim` attribute.

# <img src="../images/task.png" width=20 height=20> Task 1.23

Output the number of dimensions of the `numpy_array` that you reshaped in task 1.20.

In [25]:
np.shape(numpy_array2)

(2, 4)

The result tells us that there are two dimensions to our current representation.  It is simple coincidence that this is the same as the number of rows.  Let's prove that this is true.

# <img src="../images/task.png" width=20 height=20> Task 1.24

Please reshape the `numpy_array` in the following ways.  After reshaping, print the reshaped array and the number of dimensions for each.

 * 1, 8
 * 4, 2
 * 8
 * 2, 2, 2


In [26]:
numpy_array= numpy_array.reshape(1,8)
print(np.shape(numpy_array))
numpy_array= numpy_array.reshape(4,2)
print(np.shape(numpy_array))
numpy_array= numpy_array.reshape(8,)
print(np.shape(numpy_array))
numpy_array= numpy_array.reshape(2,2,2)
print(np.shape(numpy_array))

(1, 8)
(4, 2)
(8,)
(2, 2, 2)


As you can see, the number of dimensions is *not* the same as the number of rows.  Instead, it has to do with how the arrays are structured or nested.  In our examples, we have only created an array that has a maximum of 3 dimensions, but by no means are NumPy arrays limited to this.  In fact, the NumPy objects that we are working with are `ndarray` objects.  This stands for "n-dimensional arrays", meaning that we can use any number of dimensions when representing our data.

## Conclusion

Obviously there is far more to Python lists, arrays and NumPy arrays than we have reviewed here.  Rather than inundate you with everything at once, we will stop here for now, adding in additional capabilities in other labs.