<a href="https://colab.research.google.com/github/gg5d/DS-1002/blob/main/numpy1_continued_students_F23.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### NumPy 1
### University of Virginia
### DS 1002: Programming for Data Science
### Last Updated: 10 Sept 2023
---  

### PREREQUISITES
- import
- functions
- for ... in

### SOURCES
- https://numpy.org/
- https://en.wikipedia.org/wiki/NumPy
- https://www.scipy.org/
- https://en.wikipedia.org/wiki/SciPy
- https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html
- https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html
- https://numpy.org/devdocs/user/absolute_beginners.html

### OBJECTIVES
- Introduction to Numpy Continued

### CONCEPTS
- The numpy package contains useful functions for math operations
- The ndarray is the workhorse of the package

# NumPy

**A new data structure**

Essentially, NumPy introduces a new data structure to Python -- the **n-dimensional array**. Along with it, it introduces a collection of function and methods that take advantage of this data structure.

The data structure is designed to support the use of **numerical methods**: algorithmic approximations to the problems of mathematical analysis.

**New Functions**

It also provides a new way of appling functions to data made possible by the data structure -- **vectorized functions**.  
Vectorized functions replace the use of loops and comprehensions to apply a function to a set of data.

In addition, given the data structure, it provides a library of **linear algebra** functions.

**New Data Types**

NumPy also introduces a bunch of new **data types**.

**Python for Science**

Finally, because [numerical methods](https://www.britannica.com/science/numerical-analysis) are so important to so many sciences, NumPy is the basis of what is called **the scientific "stack"** in Python, which consists of SciPy, Matplotlib, SciKitLearn, and Pandas. All of these assume that you have some knowledge of NumPy.

Let's take a look at it.

In [None]:
import numpy as np

NumPy is by widespread convention aliased as `np`.

# The ndarray

The ndarray is a multidimensional array object.

## About Dimensions

The term dimension is ambiguous.
* Sometimes refers to the dimensions of things in the world, such as space and time.
* Sometimes refers to the dimensions of a data structure, independent of what it represents in the world.

NumPy dimensions are the latter, although they can be used to represent the former, as physicists do.

The dimensions of data structures are sometimes called **axes**.

Consider this: Three-dimensional space can be represented as three columns in a two-dimensional table OR as three axes in a data cube.


# Basic Indexing and Slicing

In [None]:
arr = np.arange(10)
arr

In [None]:
arr[5]

In [None]:
arr[5:8]

Notice that if we assign a scalar to a slice, all of the elements of the slice get that value. This is called **broadcasting**.

In [None]:
arr[5:8] = 12

In [None]:
arr

Also, notice that changes to slices are changes to the arrays they are slices of. They are **views**, not copies.

In [None]:
arr_slice = arr[5:8]
arr_slice

In [None]:
arr_slice[1] = 12345
arr

In [None]:
arr_slice[:] = 64
arr

In [None]:
arr_slice

As NumPy has been designed with large data use cases in mind, you could imagine performance and memory problems if NumPy insisted on copying data left and right.

⭐ If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array; for example `arr[5:8].copy()`.

**Higher Dimensional Arrays**

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

In [None]:
arr2d[2]

In [None]:
arr2d[0][2]

**Simplified notation**

In [None]:
arr2d[0, 2]

A nice visual of a 2D array

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png" height="50%" width="50%"/>

**Two-Dimensional Array Slicing**

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172114.png" height="50%" width="50%"/>

**3D arrays**

We can add another dimension to the array, and generate what it is called a *Tensor*.

https://en.wikipedia.org/wiki/Tensor_(machine_learning)

This can be done by stacking 2 arrays one after the other, as we would do with books on a shelf.

As an example we stack arr2d and the new arr2d2 (defined below) together.

In [None]:
print(arr2d)
arr2d2 = np.array([[10,11,12],[13,14,15],[16,17,18]])
print(arr2d2)

In [None]:
arr3d = np.array([arr2d,arr2d2])
arr3d

We may be tempted to use np.hstack(), but it won't do what we want. Let's check:

In [None]:
arr3d = np.hstack((arr2d,arr2d2))
arr3d

We can also declare the tensor by listing the 2 arrays:

In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

In [None]:
arr3d.shape

In [None]:
arr3d


💡 **Here is a way to visualize 3 and higher dimensional data:**

```python
[ # AXIS 0                     CONTAINS 2 ELEMENTS (arrays)
    [ # AXIS 1                 CONTAINS 2 ELEMENTS (arrays)
        [1, 2, 3], # AXIS 2    CONTAINS 3 ELEMENTS (integers)
        [4, 5, 6]  # AXIS 2
    ],  
    [ # AXIS 1
        [7, 8, 9],
        [10, 11, 12]
    ]
]
```
Each axis is a level in the nested hierarchy, i.e. a tree or DAG (directed-acyclic graph).

* Each axis is a container.
* There is only one top container.
* Only the bottom containers have data.

**Omit lower indices**

In multidimensional arrays, if you omit later indices, the returned object will be a **lower-dimensional ndarray** consisting of all the data contained by the higher indexed dimension.

So in the 2 × 2 × 3 array `arr3d`:

In [None]:
arr3d[0]

Saving data before modifying an array.

In [None]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

Putting the data back.

In [None]:
arr3d[0] = old_values
arr3d

Similarly, `arr3d[1, 0]` gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:

In [None]:
arr3d[1, 0]

In [None]:
x = arr3d[1]
x

In [None]:
x[0]

## Student Exercises

In the first lesson, you created a `sudoku_game` two-dimensional NumPy array. Perhaps you have hundreds of sudoku game arrays, and you'd like to save the solution for this one, `sudoku_solution`, as part of the same array as its corresponding game in order to organize your sudoku data better. You could accomplish this by stacking the two 2D arrays on top of each other to create a 3D array.



In [None]:
# sudoku_game is Python list containing a sudoku game

sudoku_game = [[0, 0, 4, 3, 0, 0, 2, 0, 9],
               [0, 0, 5, 0, 0, 9, 0, 0, 1],
               [0, 7, 0, 0, 6, 0, 0, 4, 3],
               [0, 0, 6, 0, 0, 2, 0, 8, 7],
               [1, 9, 0, 0, 0, 7, 4, 0, 0],
               [0, 5, 0, 0, 8, 3, 0, 0, 0],
               [6, 0, 0, 0, 0, 0, 1, 0, 5],
               [0, 0, 3, 5, 0, 8, 6, 9, 0],
               [0, 4, 2, 9, 1, 0, 3, 0, 0]]

In [None]:
sudoku_solution = [[8, 6, 4, 3, 7, 1, 2, 5, 9],
                   [3, 2, 5, 8, 4, 9, 7, 6, 1],
                   [9, 7, 1, 2, 6, 5, 8, 4, 3],
                   [4, 3, 6, 1, 9, 2, 5, 8, 7],
                   [1, 9, 8, 6, 5, 7, 4, 3, 2],
                   [2, 5, 7, 4, 8, 3, 9, 1, 6],
                   [6, 8, 9, 7, 3, 4, 1, 2, 5],
                   [7, 1, 3, 5, 2, 8, 6, 9, 4],
                   [5, 4, 2, 9, 1, 6, 3, 7, 8]]


* Create a 3D array called `game_and_solution` by stacking the two 2D arrays, created from `sudoku_game` and `sudoku_solution`, on top of one another; in the final array, `sudoku_game` should appear before `sudoku_solution`.  

* Print game_and_solution.

In [None]:
# Create the game_and_solution 3D array


# Print game_and_solution


* Create another 3D array called `new_game_and_solution` with a different 2D game and 2D solution pair: `new_sudoku_game` and `new_sudoku_solution`. `new_sudoku_game` should appear before `new_sudoku_solution`.   

* Create a 4D array called `games_and_solutions` by making an array out of the two 3D arrays: `game_and_solution` and `new_game_and_solution`, in that order.  

* Print the shape of `games_and_solutions`  

In [None]:
new_sudoku_game = [[0, 0, 4, 3, 0, 0, 0, 0, 0],
                   [8, 9, 0, 2, 0, 0, 6, 7, 0],
                   [7, 0, 0, 9, 0, 0, 0, 5, 0],
                   [5, 0, 0, 0, 0, 8, 1, 4, 0],
                   [0, 7, 0, 0, 3, 2, 0, 6, 0],
                   [6, 0, 0, 0, 0, 1, 3, 0, 8],
                   [0, 0, 1, 7, 5, 0, 9, 0, 0],
                   [0, 0, 5, 0, 4, 0, 0, 1, 2],
                   [9, 8, 0, 0, 0, 6, 0, 0, 5]]

new_sudoku_solution = [[2, 5, 4, 3, 6, 7, 8, 9, 1],
                       [8, 9, 3, 2, 1, 5, 6, 7, 4],
                       [7, 1, 6, 9, 8, 4, 2, 5, 3],
                       [5, 3, 2, 6, 9, 8, 1, 4, 7],
                       [1, 7, 8, 4, 3, 2, 5, 6, 9],
                       [6, 4, 9, 5, 7, 1, 3, 2, 8],
                       [4, 2, 1, 7, 5, 3, 9, 8, 6],
                       [3, 6, 5, 8, 4, 9, 7, 1, 2],
                       [9, 8, 7, 1, 2, 6, 4, 3, 5]]

In [None]:
# Create a second 3D array of another game and its solution

# Create a 4D array of both game and solution 3D arrays

# Print the shape of your 4D array


* Flatten `sudoku_game` so that it is a 1D array, and save it as `flattened_game`.  

* Print the `.shape` of `flattened_game`.

* Hint: look up the documentation on `.flatten`.

In [None]:
# Conver sudoku_game to array


In [None]:
# Print flatten sudoku_game





In [None]:
# Print the shape of flattened_game


* Reshape the `flattened_game` back to its original shape of nine rows and nine columns; save the new array as `reshaped_game`.  

* Look up documenation on `.reshape`.

* Does NumPy to keep the array elements in the same order after being flattened and reshaped?

In [None]:
# Reshape flattened_game back to a nine by nine array
reshaped_game =

In [None]:
# Print sudoku_game and reshaped_game
print(sudoku_game)
print("-" * 23)
print(reshaped_game)

We can say that a multidimensional array is a 1D list of elements with a given set of dimension, in this example (9,9).