# Introduction to Numpy

Sources for this tutorial:
+ [Datacamp Numpy Tutorial](https://www.datacamp.com/community/tutorials/python-numpy-tutorial)
+ [Offical Numpy Tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)

_Note: Text that has been borrowed from these resources is noted throughout the text._

**Instructions**: For this assignment, please complete each of the tasks by entering your own code and execute the code.  When you have completed all of the tasks, please save the notebook and upload it to your GitHub repository.


## Notebook Setup
First, we need to import the numpy library.  All packages are imported at the top of the notebook. Execute the following line of code to get started:

In [1]:
# Import numpy
import numpy as np

## The NumPy Array

What is an array?  An array is a data structure that stores similar objects (e.g. integers, strings, etc.) in a dimensional structure (e.g. list, 2D matrix, 3D matrix, etc).   In python, the list data type provides this type of functionality, however, it lacks important operations that make it useful for scientific computing.  NumPy is a Python package that defines N-dimensional arrays and provides support for linear algebra, and other fucntions useful to scientific computing.

From the Numpy QuickStart Tutorial: 
> NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. 

_Note: a "tuple" is a list of numbers: e.g. (2,4) is a tuple containing two numbers._

From the DataCamp Tutorial, NumPy arrays can be visualized in the following way:

<img src="http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/332/content_arrays-axes.png">

(image source: https://www.datacamp.com/community/tutorials/python-numpy-tutorial)

Using Python lists, arrays are created in the following way:

```python
# a 1-dimensional list of numbers
my_array = [1,2,3]  

# a 2-dimensional list of numbers
my_2d_array = [[1,2,3],[4,5,6]]

# a 3-dimensional list of numbers
my_3d_array = [[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]

# Two lists of boolean values
a = [True, True, False, False]
b = [False, False, True, True]
```

Using Numpy, arrays are created using the `np.array()` function. For example, arrays with the same contents as above are created in the following way:

```python
# a 1-dimensional list of numbers
my_array = np.array([1,2,3,4])

# a 2-dimensional list of numbers
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])

# a 3-dimensional list of numbers
my_3d_array = np.array([[[1,2,3,4], [5,6,7,8]], [[1,2,3,4], [9,10,11,12]]])

# Two lists of boolean values
a = np.array([True,True,False,False])
b = np.array([False,False,True,True])
```

## Creating Arrays
Let's create some arrays that we will use for learning NumPy.

### TASK 1
Re-create the numpy arrays above by entering the lines of code above into the code cell below:

In [None]:
# a 1-dimensional list of numbers
my_array = np.array([1,2,3,4])

# a 2-dimensional list of numbers
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])

# a 3-dimensional list of numbers
my_3d_array = np.array([[[1,2,3,4], [5,6,7,8]], [[1,2,3,4], [9,10,11,12]]])

# Two lists of boolean values
a = np.array([True,True,False,False])
b = np.array([False,False,True,True])

In the cell below, print the contents of any of the arrays using the `print()` function.  You may print any one of those arrays.

In [None]:
print(my_3d_array)

## Accessing Array Attributes
For this section we retrieving information about the arrays. Once an array is created you can access information about the array such as the number of dimensions, its shape, its size, the data type that it stores, the number of bytes it is consuming. There are a variety of attributes you can use such as:
+ `ndim`
+ `shape`
+ `size`
+ `dtype`
+ `itemsize`
+ `data`
+ `nbytes`

For example, to get the number of dimensions for an array:
```Python
# Print the number of dimensions for the array:
print(my_3d_array.ndim)
```

### Task 2

In the code cell below, practice using each of the attributes above. Add a comment line, as shown in the preceeding code to describe what each attribute is for. Use the [NumPy ndarray reference page](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) if you need help understanding the attributes.

_Note: Notice that we use dot notation to access these attributes, yet we do not provide the parenthesis `()` like we would for a function call.  This is because we are accessing attributes (i.e. member variables) of the numpy object, we are not calling a function_

In [None]:
print(my_3d_array.nbytes)

## Creating Initialized Arrays

Here we will learn to create initialized arrays. Some refer to these as "empty" arrays, but in reality, the arrays are not empty. Rather they are pre-initalized with default values.  NumPy provides a variety of functions for creating and intializing an array in easy-to-use functions. These include: 

+ `np.ones()`
+ `np.zeroes()`
+ `np.random.random()`
+ `np.empty()`
+ `np.full()`
+ `np.arange()`
+ `np.linspace()`

For example, to create an 2-dimesional _3 x 4_ array intialized with all zeros:
```Python
  # Create an array 3x4 array initialized with zeros
  zeros = np.ones((3,4))
```

### TASK 3

Practice creating initialized arrays by using each of the functions above in the code cell below. Just as in the preceeding code example, add a comment above each function call describing what is being done.  Use the [Numpy Function Reference](https://docs.scipy.org/doc/numpy/reference/routines.html) to learn more about each function. Be sure to follow each array creation with a call to `print()` to display your newly created arrays. 

In [None]:
zeros = np.ones((3,4))
print(zeros)

## Performing Math and Broadcasting

At times you may want to apply mathematical operations between arrays. For example, suppose you wanted to add, multiply or divide the contents of two arrays.  If the two arrays are the same size this is straightfoward. However if the arrays are not the same size then it is more challenging.  This is where Broadcasting comes to play:

> The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)


### Arrays of the same size
To demonstrate math with arrays of the same size, the following cell contains code that creates two arrays of the exact same size: _3 x 4_.  Execute the cell to create those arrays:

In [None]:
# Define demo arrays:
demo_a = np.ones((3,4))
demo_b = np.random.random((3,4))
print(f"demo_a shape: {demo_a.shape}\ndemo_b Shape: {demo_b.shape}")

Let's print the array to see what they contain:

In [None]:
print(demo_a)
print(demo_b)

Because these arrays are the same size we can perform basic math by using common arithamtic symbols. Exectue the following cell to see the results of adding the two demo arrays:

In [None]:
# These arrays have the same shape, 
demo_a + demo_b

The addition resulted in the value of a given position of the `demo_a` array being added to the value in the same position of `demo_b`.

### Broadcasting
You can perform math on two arrays by allowing NumPy to broadcast

>When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when they are equal, or one of them is 1 (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)

Consider two arrays of the following dimensions:

+ array #1:  10 x 1 x 3 x 1
+ array #2:       2 x 1 x 9

These two arrays are compatible for broadcasting because 
+ Starting at the right end, the array #1 has a dimension of 1.
+ Array #2 has a dimension of 1 in the second to the right dimension
+ Array #1 has a dimension of 1 in the third to the right dimsion.

The result after broadcasting is an array of size _10 x 2 x 3 x 9_

To demonstrate math with arrays of different size, the following cell contains code that creates two arrays: one of size _3 x 4_ and onther of size _4 x 1_.  Execute the cell to create those arrays:

In [None]:
demo_c = np.ones((3,4))
demo_d = np.arange(4)
print(f"demo_c shape: {demo_c.shape}\ndemo_d Shape: {demo_d.shape}")

Let's print the array to see what they contain:

In [None]:
print(demo_c)
print(demo_d)

Because these arrays meet our brodcasting requirements, we can perform basic math by using common arithamtic symbols. Exectue the following cell to see the results of adding the two demo arrays:

In [None]:
demo_c + demo_d

The addition resulted in the value in each position of the smaller array, `demo_d` being added to the corresponding position of each row in the larger array, `demo_c`

### Broadcasting With Higher Dimensions

Consider the following arrays of 2 and 3 dimensions. 

In [None]:
demo_e = np.ones((3, 4))
demo_f = np.random.random((5,1,4))
print(f"demo_e shape: {demo_e.shape}\ndemo_f shape: {demo_f.shape}")

Print the arrays to see what they contain:

In [None]:
print(demo_e)
print(demo_f)

These two arrays meet the rules for broadcasting becuase they both have a 4 in their last dimension and a 1 in the  `demo_f` dimension.  Perform the math by executing the following cell:

In [None]:
demo_e + demo_f

The resulting array has dimensions of _5 x 3 x 4_.  For this math to work, the values from `demo_f` had to be "stretched" (i.e. copied and then added) in the second dimension

### Task 4
Try practicing math operations by creating your own arrays of differeing sizes.  In the cell below experiment adding, multiplying or dividing two arrays of different (but compatible) sizes. Do this with two different sets of arrays.

### Task 5
Find an example of non-compatible array shapes for an operation, and explain why it fails. You can demonstrate using code or written text. If you use written text, be sure to switch the cell below to use Markdown.

## NumPy Aggregate Functions
NumPy also provides a variety of functions that "aggregate" data. 
Examples of aggreagation of data include calculating the sum of every element in the array, calculating the mean, standard deviation, etc.  Below are a few examples of aggregation functions provided by numpy:

+ `np.sum()`
+ `np.min()`
+ `np.max()`
+ `np.cumsum()`
+ `np.mean()`
+ `np.median()`
+ `np.corrcoef()`
+ `np.std()`

For example:
```Python
# Calculate the sum of our demo data from above
np.sum(demo_e)
```


### Task 6
Create three to five arrays (or more as needed) and experiment with each of the aggregation functions above. For each function, add a comment line above it that describes what it does.  Use the [Numpy Function Reference](https://docs.scipy.org/doc/numpy/reference/routines.html) to learn more about each function.

In [None]:
np.sum(demo_e)

### Logical Aggregate Functions
When arrays contain boolean values there are additional logical aggregation functions you can use: 

 + `logical_and()`
 + `logical_or()`    
 + `logical_not()`    
 
For example:
```Python
# Two lists of boolean values
a = [True, True, False, False]
b = [False, False, True, True]
# Perform a logical "or":
np.logical_or(a, b)
```

### Task 7

Using the code cell below, practice using each of the three logical aggregate functions listed above.

In [None]:
# Task 7

## Basic Indexing: Subsets and Slicing

We often want to consider a subset of a given array. You will recognize basic subsetting as it is similar to indexing of Python lists.  

The following code examples demonstrate how to subset a NumPy array:

```python
# Get items from "start" to "end" (but the end is not included!)
a[start:end] 

# Get all items from "start" through the rest of the array
a[start:]    

# Get items from the beginning to "end" (but the end is not included!)
a[:end]      
```

Similarly to Python lists, retriving elements from the end of a NumPy array uses negative indexing.  Execute the example code below to see a demonstration:

In [None]:
# Create a 5 x 2 array of random numbers
demo_g = np.random.random((5,2))
print(demo_g)

# Get the last item from the last 'row':
demo_g[-1, -1]

### TASK 8
Perform the following in the code cell below:

1. Create (or re-use) 3 arrays, each containing three dimensions.
2. Slice each of these arrays so that:
    + One element / number is returned.
    + One dimension is returned.
    + A subset of a dimension is returned.
3. What is the difference between `[x:]` and `[x, ...]`? (hint, try on high-dimension arrays).
    
*Exactly what you choose to return is not imporant at this point, the goal of this task is to train you so that if you are given an n-dimension numpy array, you will be able to write an index or slice that returns a subset of desired positions.*

## "Fancy" Indexing

Fancy indexing allows you to provide an array of indicies or an array of boolean values in order to subset an array.


### Using a Boolean Array for Indexing
Rather than using an index range, as shown in the previous section, we can provide an array of boolean values where `True` indicates that we want the value in the position where `True` is found, and `False` indicates we do not want it.  Creating these boolean arrays is simple if we use conditional statements. 

For example, review and then execute the following code:

In [None]:
# Create a 5 x 2 array of random numbers
demo_g = np.random.random((5,2))

# Find all values in the matrix less than 0.5
demo_g < 0.5

Notice the return value is an array of boolean values.  True indicates if the value was less than 0.5. False indicates it is greater or equal. We can use this boolean array as an index for the same array to return only those values satisfy the boolean condition. Try executing the following code:

In [None]:
demo_g[demo_g < 0.5]

Or alternatively:

In [None]:
sig_list = demo_g < 0.5
demo_g[sig_list]

### TASK 9
In the code cell below, experiment with the following boolean conditionals to generate boolean arrays for indexing:
  + Greater than
  + Less than
  + Equals
  + Combine two or more of the above with:
      + or `|`
      + and `&`

You can create arrays or use existing ones:

### Using exact indicies

Alternatively, if there are specific elements from the array that we want to retrieve we can provide the specific numeric indices.  

For example, review and then execute the following code:

In [None]:
# Generate a list of 500 random numbers
demo_f = np.random.random((500))

# Retreive 5 random numbers from the list
demo_f[[0,100,200,300,400]]

# Intermission -- Getting Help

Python has a built in function, `help()`, we can call on any object (anything) to find out more about it. As we move deeper into the functions provided by most packages, we often need to know exactly what a given function expects as arguments.

The output of these `help()` calls can be long. Try executing the following help call for the `np.array` attribute:

In [None]:
# Call help on a thing from a package.
help(np.array)

Additionally, we can get help about an object that we created! Execute the following code to try it out:

In [None]:
# Call help on an object we created.
x = np.array([1, 2, 3, 4])
help(x)

### TASK 10

In the code cell below, call `help()` on one of the following functions:
 + `np.transpose()`
 + `np.reshape()`
 + `np.resize()`
 + `np.ravel()`
 + `np.append()`
 + `np.delete()`
 + `np.concatenate()`
 + `np.vstack()`
 + `np.hstack()`
 + `np.column_stack()`
 + `np.vsplit()`
 + `np.hsplit()` 

## Manipulating Arrays

Thus far, we have larned to create arrays, perform basic math, aggregate values, and index arrays. Finally, we need to learn to manipulate them by transposing, reshaping, splitting, joining appending, and deleting arrays.


### Transposing
Transposing an array is equivalent to flipping it both horizontally and vertically as shown in the following animated image:

<img src="https://upload.wikimedia.org/wikipedia/commons/e/e4/Matrix_transpose.gif">

(image source: https://en.wikipedia.org/wiki/Transpose)

Numpy allows you to tranpose a matrix in one of two ways:

+ Using the `transpose()` function
+ Accessing the `T` attribute.

Execute the following code examples to see an example of an array transpose

In [None]:
# Create a 2 x 3 random matrix
demo_f = np.random.random((2,3))

print("The original matrix")
print(demo_f)

print("\nThe matrix after being tranposed")
print(np.transpose(demo_f))

print("\nThe tranposed matrix from the T attribute")
print(demo_f.T)


### Reshaping and Resizing
You can change the dimensions of your array by use of the following two functions:
 + `resize()`
 + `reshape()`
 
The `resize()` function allows you to "stretch" your array to increase its size.  This can be useful if you need to add more data to an existing array or you need to adjust it prior to performing arithmatic and Broadcasting.

The `reshape()` function allows you to change the dimensions of an existing array. For example, if you have a _3 x 2_ array you can change it to a _6 x 1_ array using the `reshape()` function without losing the data values in the array.

Examine and execute the following code adapted from the DataCamp Tutorial:

In [None]:
# Create an array x of size 4 x 1. Print the shape of `x`
x = np.array([1,1,1,1])
print(x.shape)

# Resize `x` to ((6,4))
np.resize(x, (6,4))

Notice how the array was resized from a _4 x 1_ to a _6 x 4_ array.

In [None]:
# Reshape `x` to (2,6)
x = np.array([1,2,3,4])
print("\noriginal:")
print(x)
print("\nreshaped:")
print(x.reshape((2,2)))


### Appending Arrays
Sometimes, you may want to want to append one array to another.  You can append one array to another using the `append()` function.  You can append an array to any dimension.  Remember that NumPy arrays have **axes**.  When you append one array to another you must specify the axes (e.g. row or column for 2D array) that you want to append. Axes are identified using a numeric index starting from 0, therefore:

+ `0`: the first dimension (the columns, or x-axis)
+ `1`: the second dimension (the rows, or y-axis)
+ `2`: the third dimension (the z-axis)
+ `3`: the fourth dimension
+ etc...

For example, examine and execute this code borrowed from the DataCamp tutorial:

In [None]:
# Append a 1D array to your `my_array`
my_array = np.array([1,2,3,4])
new_array = np.append(my_array, [7, 8, 9, 10])

# Print `new_array`
print(new_array)

# Append an extra column to your `my_2d_array`
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])
new_2d_array = np.append(my_2d_array, [[7], [8]], axis=1)

# Print `new_2d_array`
print(new_2d_array)

In the code above, for the first example, the array `[7, 8, 9, 10]` is appended or added to the existing 1D `my_array`.  For the second example, the values `7` and `8` are added to the rows (note the `axis=1` parameter.

### Task 11
Practice appending matricies to one another. In the code cell below perform the following:
 + Create a three dimensional array
 + append another row to the array
 + append another colum to the array
 + print the final results

### Inserting and Deleting Elements
You can easily add a new element, or elements to an array using the `insert()` and `delete()` functions.  Examine the `help()` documentation for how to use these functions.


### Joining Arrays
There are a variety of functions for joining arrays:

 + `concatenate()`
 + `vstack()`
 + `hstack()`
 + `column_stack()`

Each of these functions is used in the following code borrowed form the DataCamp tutorial. Examine and execute the following code cell:

In [None]:
# Concatentate `my_array` and `x`: similar to np.append()
my_array = np.array([1,2,3,4])
x = np.array([1,1,1,1])
print("concatenate:")
print(np.concatenate((my_array, x)))

# Stack arrays row-wise
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])
print("\nvstack:")
print(np.vstack((my_array, my_2d_array)))

# Stack arrays horizontally
print("\nhstack:")
print(np.hstack((my_2d_array, my_2d_array)))

# Stack arrays column-wise
print("\ncolumn_stack:")
print(np.column_stack((my_2d_array, my_2d_array)))


### Task 12
Examine the output from each of the function calls in the cell above. Also, review the help pages for each tool either using the `help()` command or the [Numpy Function Reference](https://docs.scipy.org/doc/numpy/reference/routines.html). Can you identify what is happening with each of them?

### Splitting an Array
You may find that you need to split arrays. The following functions allow you to split horizontally or vertically:
 + `vsplit()`
 + `hsplit()`
 
Examine and execute the following code borrowed from the DataCamp Tutorial:

In [None]:
# Create a 2D array.
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])
print("original:")
print(my_2d_array)

# Split `my_stacked_array` horizontally at the 2nd index
print("\nhsplit:")
print(np.hsplit(my_2d_array, 2))

# Split `my_stacked_array` vertically at the 2nd index
print("\nvsplit:")
print(np.vsplit(my_2d_array, 2))

### Task 13
Examine the output from each of the functions used in the cell above. Review the help pages for each tool either using the `help()` command or the [Numpy Function Reference](https://docs.scipy.org/doc/numpy/reference/routines.html). Can you identify what is happening with each of them?