<a href="https://colab.research.google.com/github/dylanwalker/MGSC496/blob/main/MGSC496_R04.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Your Info

your_name = '' #@param {type:"string"}
your_email = '' #@param {type:"string"}
today_date = '' #@param {type:"date"}


# How to "read" this notebook

As you go through this notebook (or any notebook for this class), you will encounter new concepts and python code that implements them -- just like you would see in a textbook. Of course, in a textbook, it's easy to read code and an explanation of what it does and think that you understand it.
<br />
<br />

### Learn by doing
But this notebook is different from a textbook because it allows you to not just read the code, but play with it. **You can and should try out changing the code that you see**. In fact, in many places throughout this reading notebook, you will be asked to write your own code to experiment with a concept that was just covered. This is a form of "active reading" and the idea behind it is that we really learn by **doing**. 
<br />
<br />

### Change everything
But don't feel limited to only change code when I prompt you. This notebook is your learning environment and your playground. I encourage you to try changing and running all the code throughout the notebook and even to **add your own notes and new code blocks**. Adding comments to code to explain what you are testing, experimenting with or trying to do is really helpful to understand what you were thinking when you revisit it later. 
<br />
<br />
### Make this notebook your own
Make this notebook your own. Write your questions and thoughts. At the end of every reading notebook, I will ask the same set of questions to try to elicit your questions, reaction and feedback. When we review the reading notebook in class, I encourage you to   



# Numpy - A Module for Scientific Computing

![](https://drive.google.com/uc?id=1K1CAmcBTb97VGWp-AKydGK_TEQ-95Js7)
## Why Numpy?
We have seen collections in Python and how useful they are. However, when it comes to handling numerical data, such as storing and computing large arrays or matrices of numbers, collections are not appropriate. The reason for this is that collections can handle heterogenous types of data -- you can have a list with elements that are floats, strings, ints, other lists, etc. This means that the items in a list can't be stored contiguously in memory, making it slower to access each successive item. 

Numpy allows you to define an array or matrix of elements which are all the same data type (which is often the case for many applications in data science). This means that numpy can store the elements contiguously in memory and also take advantage of many computationally efficient tricks to speed up calculations.

Here is a comparison of how lists store items in memory compared to how numpy stores an array:

![](https://drive.google.com/uc?id=1dvdA39EgYWkkVX2rJmeq871vVewdXEsU)

In a list, pointers to each object (the items) are stored in a contiguous block of memory. A pointer is just the actual memory address where the object itself is located.  Because pointers are a fixed size (e.g., 64 bits), its easy to start at the top and then zip through the memory in chunks of the pointer size (i.e., in 64 bit increments) to get the location of the object you want. 

In order to access the third item of a list, Python first accesses the location in memory where the pointers to items start and then travel ahead by (3-1)*64 bits, reads the next 64 bits to get the address, and then looks in the memory location at that address to find the object. This involves two reference lookups and jumping around in memory.

In contrast, because the elements of a numpy array are all the same type (e.g., a float), they have fixed size and they can all be stored in sequentially in memory.  This means we can read the third element of a numpy array by just looking at the address where the array starts and looking ahead by (3-1)*sizeof(element) and the just reading sizeof(element) bits. This doesn't involve looking to a second location in memory and is therefore much faster.

In addition, numpy implements its routines to process computations in C, which means it can do all sorts of low-level tricks to parallelize computations and take advantage of special linear algebra libraries in C that speedup common matrix operations (such as intel's Math Kernel Library)







# Using Numpy - Getting Started



Before we can use numpy, we first have to import it. We'll import it as the name ```np``` so that we won't need to keep typing ```numpy.``` everywhere:

In [None]:
import numpy as np

Now let's make a simply array of ints:

In [None]:
#Make an array
np.array([2,3,7,-1,5])

Notice that we didn't tell Python what data type the array should hold.  It guessed it based on the elements of the list we provided as the argument to ```np.array()```. In the above case, it guessed that elements were ints.

We can specify the data type with the ```dtype``` keyword:

In [None]:
np.array([2, 3, 7, -1,5], dtype='float32')

Remember that the elements of a numpy array all have to be the same data type. If we try to make an array of different data types, numpy would "upconvert" the elements so that the data types of all elements match: 



In [None]:
np.array([1,3,5.0,3]) # numpy will upconvert to float

In [None]:
np.array([1,5,"hello"]) # numpy will upconvert to a fixed-size type that can handle characters

We can also create multi-dimensional arrays:

In [None]:
np.array( [ [1,5,0], [0,1,0], [-1,3,2] ] )

## Creating Arrays from Scratch



Often we will want to start with an array that we will later fill with data. Numpy has some useful methods for this:
- ```np.ones()``` -- create an array filled with ones
- ```np.zeros()``` -- create an array filled with zeros
- ```np.full()``` -- create an array filled with a particular value
- ```np.eye()``` -- create an array that is the identity matrix, with ones on the diagonal and zeros elsewhere
- ```np.empty()``` -- create an array without filling the values, the values will be whatever happens to be in the memory location.


In [None]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

In [None]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

And some methods to fill arrays with sequences:
- ```np.arange()``` - create an array in steps between a start and end value
- ```np.linspace()``` - create an array of values evenly spaced between a start and end value



In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)


And some methods to fill arrays with random numbers (from different types of distributions:
- ```np.random.random()``` - create an array with random numbers uniformly spaced between a start and end value.
- ```np.random.normal()``` - create an array with random numbers normally distributed with some mean and standard deviation.
- ```np.random.randint()``` - create an array of random integers between some start and end value.

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Dice Roller</font>

<font>
Write a small dice-rolling function. The function should take two inputs: the number of sides of the die and the number of rolls. It should return a numpy array holding the values of the rolls.
</font>

In [None]:
# Try it out
def diceRoller(numSides, numRolls):
  # Enter your code here

# Test it out with some inputs:
print(diceRoller(6,10))
print(diceRoller(20,3))

<hr/>

# Numpy Data Types - Reference



It is useful to be familiar with the different data types that numpy can use. Picking the right data type for your task is essential, particularly when working with large-scale data, so that you use "just enough" space in memory to accomplish what you need.

There are two ways to do this:

In [None]:
np.zeros(10, dtype='int16') # specify the data type with a string

In [None]:
np.zeros(10, dtype=np.int16) # specify the data type with the numpy type object


Here is a table of the different data types:


| Data type | Description | 
|---------------|-------------| 
| ``bool_`` | Boolean (True or False) stored as a byte | 
| ``int_`` | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc`` | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp`` | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8`` | Byte (-128 to 127)| 
| ``int16`` | Integer (-32768 to 32767)| 
| ``int32`` | Integer (-2147483648 to 2147483647)| 
| ``int64`` | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8`` | Unsigned integer (0 to 255)| 
| ``uint16`` | Unsigned integer (0 to 65535)| 
| ``uint32`` | Unsigned integer (0 to 4294967295)| 
| ``uint64`` | Unsigned integer (0 to 18446744073709551615)| 
| ``float_`` | Shorthand for ``float64``.| 
| ``float16`` | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32`` | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64`` | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_`` | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 


More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the [Numpy documentation](http://numpy.org/). 


# Using Numpy - Array Basics

Data manipulation in Python is nearly synonymous with Numpy array manipulation. As we will see shortly, even newer tools like Pandas are built around the Numpy array.

This section will present several examples of using Numpy array manipulation to access data, subarrays, and to split, reshape, and join arrays.

Becoming familiar with these fundamental operations is important for becoming fluent with manipulating data in python. More advanced tools such as Pytorch, for working with neural networks, were built by those very familiar with working in the "grammar style" of Numpy -- they built their tools with this in mind.


Below, we'll cover:

- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

## Numpy Array Attributes

First let's discuss some useful array attributes.
We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.

We'll use Numpy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [None]:
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print(x1)
print(x2)
print(x3)

Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):

In [None]:
print(f"x3 ndim: {x3.ndim}")
print(f"x3 shape: {x3.shape}")
print(f"x3 size: {x3.size}")

Another useful attribute is the ``dtype``, the data type of the array:

In [None]:
print(f"dtype: {x3.dtype}")

Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:

In [None]:
print(f"itemsize: {x3.itemsize} bytes")
print(f"nbytes: {x3.nbytes} bytes")

In general, we expect that ``nbytes`` is equal to ``itemsize*size``.

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Make a grid representing the Electromagnetic Field in a volume of space</font>

<font>
Make a 100x100x100x6 array of floating point numbers drawn from a normal distribution w/ mean and standard deviation equal to 1. This represents the 6 components of the electromagnetic field in a volume of space. The electric and magnetic fields both have 3 components (in the 3 directions). 

Now print out the data type of your array, the size in bytes of each element and the total size of the array in bytes. Verify that the size of the array is equal to the itemsize times the number of elements.

</font>

In [None]:
# Try it out


<hr/>

## Array Indexing: Accessing Single Elements

Indexing in Numpy is very similar to indexing standard Python lists.

In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]

To index from the end of the array, you can use negative indices:

In [None]:
x1[-1]

In [None]:
x1[-2]

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

Values can also be modified using any of the above index notation:

In [None]:
x2[0, 0] = 12
x2

IMPORTANT: Unlike Python lists, Numpy arrays have a fixed type.
This means that if you attempt to insert a floating-point value to an integer array, the value will be ***silently truncated***. Don't be caught unaware by this behavior!

In [None]:
x1[0] = 3.14159  # this will be truncated!
x1

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Make a two move tic-tac-toe game</font>


Use numpy to make an array that will represent a tic-tac-toe board. Since tic-tac-toe uses X's and O's, let's make it a character array. We can do this with:
```python
board = np.chararray((3,3), unicode=True)
```
Use the `input()` function to have the first player enter their move as `row,col`. Then process this string so that you can use numpy indexing to put an 'X" in the position in the array that the user indicated. Then, do the same thing for the 'O' player, but first you must check that the position they gave is not already occupied (and keep asking for a move until you get a valid move). Print out the board before and after the move.**Don't try to implement the entire tic-tac-toe game now**, just get two valid moves and update the board. 


In [None]:
# Try it out
board = np.chararray((3,3),unicode=True)



<hr/>

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The Numpy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [None]:
x = np.arange(10)
x

In [None]:
x[:5]  # first five elements

In [None]:
x[5:]  # elements after index 5

In [None]:
x[4:7]  # middle sub-array

In [None]:
x[::2]  # every other element

In [None]:
x[1::2]  # every other element, starting at index 1

A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [None]:
x[::-1]  # all elements, reversed

In [None]:
x[5::-2]  # reversed every other from index 5

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [None]:
x2

In [None]:
x2[:2, :3]  # two rows, three columns

In [None]:
x2[:3, ::2]  # all rows, every other column

Finally, subarray dimensions can even be reversed together:

In [None]:
print(x2)
x2[::-1, ::-1] # reverse the rows and reverse the columns

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

In the case of row access, ***the empty slice can be omitted*** for a more compact syntax:

In [None]:
print(x2[0])  # equivalent to x2[0, :]

### Subarrays as no-copy views

**IMPORTANT**: Array slices return *views* rather than *copies* of the array data.
This is one area in which Numpy array slicing differs from Python list slicing: in lists, slices will be copies.

Consider our two-dimensional array from before:

In [None]:
print(x2)

Let's extract a $2 \times 2$ subarray from this:

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [None]:
x2_sub[0, 0] = 99
print(x2_sub)

In [None]:
print(x2)

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [None]:
print(x2)
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

If we now modify this subarray, the original array is not touched:

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

In [None]:
print(x2)

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Tile a chessboard with the correct colors</font>

First look up an image of a chessboard to see how the 8x8 board is tiled with black and white. Now, **add only three lines of code** below to tile the chess board with 'W' and 'B'. You will need to use slicing to do this.

<font size=2 color="blue">hint: One of your lines should set all cells to be the same color.</font>

In [None]:
# Try it out
chessboard = np.chararray((8,8), unicode=True)
# Insert only three lines of code here to tile the chessboard correctly with 'W' and 'B'
print(chessboard)

<hr/>

## Reshaping of Arrays

Another useful type of operation is reshaping of arrays.
The most flexible way of doing this is with the ``reshape`` method.
For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [None]:
grid = np.arange(1, 10).reshape((3, 3))
print(np.arange(1,10))
print(grid)

Note that for this to work, the size of the initial array must match the size of the reshaped array. 
Where possible, the ``reshape`` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:

In [None]:
x = np.array([1, 2, 3])
print(x)

# row vector via reshape
x.reshape((1, 3))

In [None]:
print(x)
# row vector via newaxis
x[np.newaxis, :]

In [None]:
print(x)
# column vector via reshape
x.reshape((3, 1))

In [None]:
print(x)
# column vector via newaxis
x[:, np.newaxis]

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

You can also concatenate more than two arrays at once:

In [None]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

It can also be used for two-dimensional arrays:

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

Similary, ``np.dstack`` will stack arrays along the third axis (depth).

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

Similarly, ``np.dsplit`` will split arrays along the third axis.

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Split array vertically into two equal sized parts</font>

Suppose you have been given a rectangular array whose size is randomly generated and with arbitrary shape, except that its number of rows is guaranteed to be even. Your task is to split this array into two equally sized arrays and then print them out.


In [None]:
# Try it out
nrow = np.random.randint(1,15)*2
ncol = np.random.randint(1,30)
rectArray = np.arange(nrow*ncol).reshape((nrow,ncol))
# Enter your code here


<hr/>

# Computing with Numpy



To make computing with numpy very natural, it provides:
- Universal Functions - specialized vectorized versions of common operations
- Aggregation - functions that aggregate the elements of arrays, such as``max()``, ``min()``, ``mean()``, ``std()``
- Broadcasting - simple rules for applying ufuncs on arrays of different sizes

## Universal Functions

Looping in python is comparatively very slow relative to other languages. This is because most implementations of Python are interpretive rather than compiled. Compiling allows a program to understand how to map high level code into machine code that can be exectued quickly, because the compiler knows the entirety of the code that will be executed. But interpreted code is executed line by line.  Numpy gets around this limitation because it is mostly implemented in low level C. The key to efficient computation on arrays is to vectorize them.

Ufuncs provide access to statically typed and compiled routines that are vectorized. They override common operators (such as ``+``, ``*``, ``/``) so that when you operate on numpy arrays, you can use the standard operations without thinking about it.

The following table lists the arithmetic operators implemented in Numpy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

So, for example, when you call ``+`` to add two numpy arrays, python will actually call ``np.add()`` 

Additionally there are Boolean/bitwise operators that we'll look at later.

In [None]:
#%%timeit
# Add two arrays
x = np.array([1,5,1,-1,3])
y = np.array([2,-1,3,4,5])
x+y

This happens significantly faster than running through a for loop and adding each of the corresponding elements of the arrays.

Here are some other examples:

In [None]:
np.sin(x)

In [None]:
x/y

In [None]:
np.log(x+y)*np.exp(x)

## Aggregating Numpy Arrays

Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).


|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |



In [None]:
x = np.random.random(100)
print('Summary Statistics')
print('------------------')
print(f'sum:\t{x.sum()}')
print(f'mean:\t{x.mean()}')
print(f'stdev:\t{x.std()}')
print(f'min:\t{x.min()}')
print(f'max:\t{x.max()}')
for p in range(25,100,25):
  print(f'{p}%ile:\t{np.percentile(x,p)}')

You can also aggregate multi-dimensional arrays by specifying which axis you want to aggregate over:

In [None]:
M = np.random.random([5,5])
M

In [None]:
M.min(axis=0) # take the min for each column (aggregate min() over rows axis)

## Exercise: Dice Roller 2.0

Modify the dice-rolling function that you made earlier. Instead of returning the full array of rolls, return only the max.

In [None]:
def diceRollerMax(numSides,numRolls):
  # Enter your code here



#Test it out on some values:
print(diceRollerMax(6,3))
print(diceRollerMax(20,5))

## Broadcasting in Numpy

Often you will want to do a simple operation on arrays of different sizes. Broadcasting is a set of rules for ufuncs to do this.

For example, suppose you have an initial array (``x``) and want to add one to every element. Technically, the correct way to do this would be to create a new array of all ones (e.g., with ``np.ones()`` that is the same shape as ``x`` and then add this to ``x``:

In [None]:
x = np.array([3,1,5,4,2,-2,5,3])
x + np.ones(x.shape,dtype='int')

But broadcasting make this much more compact:

In [None]:
x+1

And, similarly, broadcasting lets you do this:

In [None]:
1/x

In [None]:
1/x + (x + 3)**5

In general, broadcasting tries to guess at your intent when adding arrays of different sizes. To do so, it follows a strict set of rules for "stretching" the arrays so the shapes match:

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is *padded* with ones on its leading (left) side.
- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

The rules are illustrated like this:

![](https://drive.google.com/uc?id=1lNSFvGLFcvcE-6Mmri-2w2FAff-qCi9I)

Here's another example:

In [None]:
y = np.array([1,2]).reshape(2,1)
print(x)
print(y)
print(x+y)

# Indexing and Masking with Numpy

## Boolean Logic and Numpy Arrays



We can use comparison operators (``==``,``!=``,``>``,``<``,``>=``,``<=``, etc.) on numpy arrays as well. When we do, the result will be a boolean array of the same shape as the (broadcasted) arrays we are operating on.

For example:

In [None]:
x=np.random.randint(0,6,10)
print(x)
print(x>3)

In [None]:
y=np.random.randint(0,6,[2,1])
print(x)
print(y)
print(x>y)

Ok, but what if we want to logically operate on boolean arrays, how do we do that?

Remember from the first lecture the difference between ``and`` and ``or`` keywords and the ``&`` and ``|`` operators?  

``and`` and ``or`` treat the entire object as True or False, whereas ``&`` and ``|`` operate on the bits of the object -- in other words "inside" the object. 

For that reason, numpy overrides the ``&`` and ``|`` operators (defines ufuncs for them).  When used on numpy arrays:
-  ``&`` perform "elementwise" **AND**  
-  ``|`` perform "elementwise" **OR**  

Let's see an example:

In [None]:
b1 = np.array([True,False,False,True])
b2 = np.array([False,True,False,True])
print(b1)
print(b2)
print(b1&b2)
print(b1|b2)

And of course, we can combine this with comparison operators to do more sophisticated comparisons:



In [None]:
x = np.array([1,4,3,6,8,9])
y = np.array([3,2,1,1,10,7])

(x>y)&(x==y+2) # note the use of parentheses here -- they are VERY important

IMPORTANT: When combining logical expressions, we have to be careful to **always wrap sub-expressions in parentheses**.  The reason is that, if we didn't we would get an error:

In [None]:
x>y&x==y+2

This is because, without the parentheses, ``y&x`` is evaluated first.

Combining boolean expressions with aggregation can make some operations really simple. 

For example, if we wanted to count the number of occurences of a particular value in an array, we could do this:

In [None]:
ints=np.random.randint(0,9,30)
print(ints)
numFives = (ints==5).sum() # First get an array of bools for the expression, then sum it (sum() works because True is converted to 1, False is converted to 0)
print(numFives)

# Run this a few times to see that it works

In [None]:
print(ints)
print((ints>3).sum()) # print how many integers in the ints array are bigger than three

<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Tic-Tac-Toe 2.0</font>


Suppose you are given a tic-tac-toe `board` that is a character array, (just as we had before). But unlike before, some of the squares filled in with X's and O's. 

<br />

1. Make a boolean array `hasX` where the cells are `True` whenever an X is in the cell. 
2. Count how many X's are in each row and column by using the `.sum(axis=1)` and `.sum(axis=0)`methods. Store these as variables `row_counts` and `col_counts`. Remember: If you sum over the rows (axis=0), then your result will be an array of counts for each columns, and vice versa.
3. Write a function `isWin()` that takes the arguments `board` and `player` ( either an `'X'` or an `'O'`) and returns `True` if any of the elements in `row_counts` or `col_counts` is equal to `3`. You may find the function `np.any()` helpful.


Now you can use your function to determine if either player won by getting 3 in a row or column. Of course, we still haven't checked the diagonals. **Don't worry about checking the diagonals for now.** 

Try changing the configuration of the board and testing out your function. Make sure it works are expected.




In [None]:
# Try it out
# Here's an example board with some moves (I'm just putting X's and ignoring O's for now)
#  You will want to change the position of the X's in order to test out your function
board = np.chararray((3,3),unicode=True)
board[2,0] = 'X'
board[2,1] = 'X'
board[2,2] = 'X'
print(board)


In [None]:
# 1. Define hasX, where the cells are True whenever an X is in the cell. This should be a single line that makes use of numpy



In [None]:
# 2. Define row_counts and col_counts variables, which count how many X's are in each row and column using the .sum() method w/ arguments axis=0 or axis=1

In [None]:
# 3. Write a function isWin() that takes board and player (this is just a variable that equals 'X' or 'O') and returns True if any of the elements of row_counts or col_counts equals 3
#    hint: you may find np.any() helpful

<hr/>

## Using booleans to mask an array



One cool and very useful thing we can do with bools and numpy is mask (or hide) elements of an array according to some logical criteria.

Let me show you with an example:

In [None]:
x=np.arange(0,10)
print(x)

# Mask (hide) all the elements of x where the expression (x>5) evaluates to False
x[x>5] # This will return a 1-D array of the values of x where x>5

In [None]:
x[(x>2)&(x<=7)] # mask (hide) all the elements of x where the expression (x>2)&(x<=7) evaluates to False

In [None]:
# What about a multi-dimensional array?
x = np.arange(1,10).reshape([3,3]) # make a 3x3 grid of numbers from 1 to 9
print(x)
x[x>5] # This will be a 1-D array -- masks always return 1-D arrays

Notice that when we mask, ***we always obtain a one-dimensional array*** of the all the values of the original array for in positions where the condition evaluates to True.


## Fancy Indexing



Fancy indexing allows you to specify the indices of the values of an array that you'd like to select.

For example:

In [None]:
x = np.random.randint(0,10,9)
print(x)
indices= np.array([0,2,7,3])
x[indices] # return the values of x at the indices.

Note that ``x[indices]`` will have whatever shape ``indices`` has.

For example:

In [None]:
x[indices.reshape(2,2)]

What about multi-dimensional arrays?  You can use fancy indexing with them too! Of course you need to provide an index for each dimension.

So, for a 2-D array, we need to provide a row index and a column index:

In [None]:
x = np.random.randint(0,10,9).reshape(3,3)
print(x)
rowInd=np.array([0,1,0])
colInd=np.array([2,0,0])
x[rowInd,colInd]

# Make sure you understand how this works -- fancy indexing is essential to working efficiently with data in numpy!

You can also assign values with fancy indexing:

In [None]:
print(x)
x[rowInd,colInd] = 0
print(x)

You can even combine slicing with Fancy Indexing:

In [None]:
print(x)
colInd=np.array([0,2])
x[1:,colInd] = 20 # for rows 1 or greater and columns specified by colIndex, set the value to 20
print(x)


<hr/>
<img src="https://drive.google.com/uc?id=1sk8CSP26YY7sfyzmHGFXncuNRujkvu9v" align="left">

<font size=3 color="darkred">Exercise: Tic-Tac-Toe 3.0</font>


Copy and paste your function from the Exercise Tic-Tac-Toe 2.0 below. Your task is to add a check for a win on either the first or second diagonal.

<br /> 
First, write down (on paper or as a code comment) the positions of the elements on the first diagonal (top left to bottom right) as tuples, like this: `(r1,c1)`, `(r2,c2)` and `(r3,c3)`. Now write these tuples as indices of rows and columns (e.g., `rowInd` and `colInd` as in the example above). Use fancy indexing to index the elements of `board` that make up the first diagonal and create a boolean array that is `True` whenever each of the cells of the diagonal are equal to `X`. You can check if all of the elements are `True` using `np.all()`.

Repeat this for the second diagonal and incorporate the entire thing into your `isWin()` function, which should now return `True` if `player` has 3 in a row, column or either diagonal.

As a last step, you may find it satisfying to tie all the tic-tac-toe exercises in this notebook together to make a fully playable game for two players.



In [None]:
# Try it out
# Here's an example board with some moves (I'm just putting X's and ignoring O's for now)
#  You will want to change the position of the X's in order to test out your function
board = np.chararray((3,3),unicode=True)
board[2,0] = 'X'
board[2,1] = 'X'
board[2,2] = 'X'
print(board)


In [None]:
# Write your isWin function here:

In [None]:
# Write your entire game code here

<hr/>

# Feedback
What did you think about this notebook? What questions do you have? Were any parts confusing? Write your thoughts in the text box below.

<font size =2> note: You can double click this text box in colab to edit it.</font>

PUT YOUR THOUGHTS HERE

# Submit
Don't forget to submit your notebook before class! Make sure you have saved your work (**Colab Menu: File-> Save**) and then download a pure python copy (**Colab Menu: File-> Download -> Download .py**) and a python notebook copy (**Colab Menu: File-> Download -> Download .ipynb**). You will upload both of these to the assignment on the canvas page.
