# Session 1: Introduction to NumPy

## Overview

**NumPy (Numerical Python)** is a foundational package for scientific computing. It is built on the C programming language for efficiency by offering the powerful *n*-dimensional array (ndarray).

In data analytics, NumPy offers:

- Fast (vectorized) array operations for data processing
- Efficient descriptive statistics
- Manipulations for merging multiple data sets

To use the library, the NumPy library must be installed in the Python environment. Anaconda already has the library installed. It must then be imported for use.

In [1]:
# Import statement


## Creating ndarrays

The ndarray (alias: array) is an *n*-dimensional array object similar to a list, but designed to facilitate fast computation. For arrays to be useful they must hold a single type of object. For most analytics using NumPy, the primary focus is on int, float, and boolean arrays

Arrays will most likely be loaded from external data sources (later). For now, an array can be created manually.

In [2]:
# The array function receives an argument of an array or other collection of elements


### Other Ways to Create Arrays

- `np.arange(start, stop, step)`: similar to range for lists
- `np.zeros(shape)`: where shape is a sequence of dimension sizes, to create an array of 0s
- `np.ones(shape)`: where shape is a sequence of dimension sizes, to create an array of 1s
- `np.full(shape, value)`: where shape is a sequence of dimension sizes, to create an array of specific values
- `np.random.rand(d0, d1…,dn)`: where d0..dn are dimension sizes, to create an array of random values between 0 and 1, exclusive of 1
- `np.random.randn(d0, d1…,dn)`: where d0..dn are dimension sizes, to create random values within a standard, normal distribution [mean = 0, variance = 1]
- `np.random.randint(low, high, numvals)`: where low is the low bound, high is the high bound (exclusive), and numvals is the number of values to generate

### Common Array Attributes

- `ndim`: number of dimensions in the array
- `shape`: number of dimensions in the array
- `dtype`: data type of values in the array
- `size`: total number of values in the array

### Casting Arrays to Other DTypes

Use the `astype()` function.

For more information on NumPy objects, review the documentation at: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html.

# Session 2:  Operations With Arrays

## Array Operations

Arrays are designed to support fast computation and comparisons. The most common types of operations are:
- Between arrays and scalars (one value at a time)
- Universal functions (np.function_name)
    - Unary (performed on a single array): abs, sqrt, ceil, floor
    - Binary (performed between two arrays): +, -, <, and, etc.
- Mathematical and statistical functions
    - Aggregation: mean, sum, std, variance, min/max, etc.
    - Non-aggregation: cumsum, cumprod

### Broadcasting

You can add a scalar value to an array, which will add the scalar as if it were an array of the same size.

An array can also be a part of the broadcast if the sizes are compatible.

### Comparison

A comparison can take place with a scalar.

Or with an array with compatible size.

## Practice Problems

1. Create and print a one-dimensional array from a list comprehension that produces all odd numbers from 5 through 10.
2. Re-do \#1 without a list comprehension. Then, use broadcasting to square each value.
3. **Challenge:** Using your answer from \#2, print an array of all elements that have a squared value that is at most 50.

In [3]:
# Answer



## Universal Functions

A **universal function** is a standalone function that performs element-wise operations, using one or two arrays or array-like arguments. To use a universal function, the syntax is:
`np.function_name(array_name)`

More detail about universal functions can be found in the NumPy documentation: https://numpy.org/doc/stable/reference/ufuncs.html.

## Aggregate Functions

An **aggregate function** is a function that aggregates all of the values in an array.

In [4]:
# Examples of aggregate functions


## Non-Aggregate Functions

A **non-aggregate** function does not aggregate all of the values in an array, but performs an action against the values.

In [5]:
# Examples of non-aggregate functions


## Practice Problems

1. Create and print a one-dimensional array for all values between -5 and 10 inclusive.
2. Determine and print the average value within the array.
3. Determine and print a cumulative sum of the values within the array.
4. **Challenge:** Determine and print the sum of values within the array, using the distance of each value from 0. For example, both 4 and -4 have a distance of 4 from 0.

In [6]:
# Answer


## Arrays vs. Lists

Arrays may seem similar to lists (e.g., they are both mutable and iterable sequences), but they are distinct data structures. Use an array whenever you are performing any large scale computations or comparisons.

However, lists do not have restrictions on the size of nested sequences, whereas arrays have restrictions for constructing a useful form of the object.

In [7]:
# Example - Cannot create a "holey" array


### Execution Time

#### Using %timeit to check the time for executing a full cell and %timeit for executing a single line of code, we can see the length of time for creation of a list vs. an array

Example: %timeit -r5 -n10000<br>
r = number of times to repeat the timer<br>
n = number of times to execute the statement

#### List Creation

#### Array Creation

<font color='red'>Using the array yields a far faster time to run!!!</font>

## Indexing and Slicing

Array indexing and slicing are similar to lists.

Lists rely on a *deep copy*, while arrays rely on a *shallow copy*.
- Any changes to a list slice are not reflected in the original list.
- Any changes to an array slice are reflected in the original array.

To force a deep copy to be created for an array, use the `copy()` function.

## Filtering Data

### Index Arrays

One way to filter data is to create an array based on the indices of another array. Each element in the index array is replaced with the corresponding values in the original array. An array copy is made.

### Boolean Arrays

Another way to filter data is to create an array based another array of Booleans. Each element in the original array is returned if the corresponding boolean scalar is True. An array copy is made.

### Conditional Filtering

Another way to filter data is to create an array based on a conditional value. Only elements that meet the criteria are included. An array copy is made

* `&` - and
* `|` - or

## Practice Problems

1. Create an array containing all numbers between 10 and 20, inclusive.
2. Using the original array created, create another array that contains the values: 11, 12, 19, 18, 13.
3. Repeat the previous problem, using the original array, but now the array should contain the values: 11, 12, 13, 18, 19. Create the array using a different technique than before.
4. Using the original array, create another array containing all values between 14 and 18, exclusive of these values.

In [8]:
# Answer


## Boolean Data for Data Analysis

Boolean arrays can be used to learn about the data.

In [9]:
# Example: Count the number of values that meet a conditional


In [10]:
# Example: Do any of the values satisfy a given condition?


In [11]:
# Example: Do all of the values satisfy a given condition


## Working with Multidimensional Arrays

Indexing and slicing multi-dimensional arrays is fairly intuitive. 
- A one dimensional array contains 0-dimensional values (scalars)
- A 2-dimensional array is an array of 1-d arrays
- A 3-dimension array has 3 dimensions corresponding to the position of each 2-d array, 1-d array, and scalar value, respectively.
- And so on, for higher dimensions.

Dimensions are accessed in order, by successive indexing and by slicing operations.

### Creating a Two-Dimensional Array

Specify two dimensions (rows and columns)

### Slicing Rows and Columns

Slicing rows and columns of a multidimensional array is performed in a similar manner to a one dimensional array

In [12]:
# Slice the first row of the previous array


In [13]:
# Slice the second column of the previous array


In [14]:
# Show all rows, starting at the 2nd row, for the 1st and 2nd column


### Creating a Three-Dimensional Array

Specify three dimensions (numbe of matrices, rows, and columns). Each element is a two-dimensional array

In [15]:
# Access the 2nd matrix


In [16]:
# For all matrices, access all columns, in the 2nd row


## Generating Values Based on Condition

Previous way: Using a `for` loop, map with a lambda function or list comprehension could then be cast to an array – too much work and processing!

For arrays, use the `where()` function.

In [17]:
# Flip a coin a number of times


Use of the where function can be nested.

In [18]:
# Choose between 3 options


## Practice Problems

Create an array that contains 5 columns and 3 rows of random values. Then, using the array you created:
1. Print the second row
2. Print the first and third rows
3. Print the middle three columns
4. **Challenge:** Starting with the second column, print every other column

In [20]:
# Answer 


## Sorting

The `sort(array)` function returns a copy of a sorted array.

In [19]:
# Example sorting


An array can also be sorted without creating a new array.

It is also possible to gather the indexes of a sorted array.

## Manipulating and Combining Arrays

Sometimes you will need to manipulate or combine multiple arrays of data prior to performing any analysis. Built-in functions can help.

### Manipulating Arrays

Using the `reshape()` function, an array can change in its dimensions. The original array *is not* changed.

Using the `resize()` function is similar to the `reshape()` function, but the original array *is* changed.

The `T attribute` returns a transposed view (shallow copy) with rows and columns switched. The original array *is not* changed.

The `flatten()` function takes a multidimensional array and converts it to a single dimension array. The original array *is not* changed because a deep copy is created.

The `flatten()` function allows various arguments, such as:
- `'C'` for flatten in C style: by rows, then by columns in each row (default)
- `'F'` for flatten in Fortran style: by columns, then by rows in each column

The `ravel()` function is similar to the `flatten()` function, but the original array is changed because a shallow copy is created.

### Combining Arrays

Vertical stacking with the `vstack()` function adds more rows.

Horizontal stacking with the `hstack()` function adds more columns.

## File Input/Output

There are two primary ways to save/load NumPy arrays to/from a file.
- Binary Format (.npy) – `np.save` and `np.load`
- Text Delimited (.txt) = `np.savetxt` and `np.loadtxt`

Using these pre-built functions is easier than using the Python built-in method for reading from a file

In [21]:
# Example 


### File Output

### File Input

To load a text file, the `np.loadtxt` function can be used. Details: https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html. Each row in the text file **must** have the same number of values

Required Parameter:
* `fname` - Filename where the data is located

Some Optional Parameters:
* `dtype` - Data type of the resulting array. The default is float.
* `delimiter` - String used to separate values. The default value is whitespace. For CSV files, a ',' should be used.
* `skiprows` - Integer representing the number of rows to skip importing. The default is 0.
* `usecols` - Intenger or sequence representing which columns to read. The default is all columns are read. When pulling multiple columns, use a tuple.
* `max_rows` - Integer representing the number of lines of content to read after any `skiprows` lines. The default is to read all lines.

## Case Study: Basic Analysis of Stock Quotes

Using the data from https://www.nasdaq.com/market-activity/stocks/amzn/historical, stock prices for Amazon (AMZN) were pulled from the year 2021.

In reviewing the data file, there are six columns for which data can be pulled. Noting the position of the column relative to the data file is important to determine which columns to use:
1. Date - The date the stock was traded
2. Close/Last - The last price at which the stock traded during the regular trading day
3. Volume - The number of shares traded
4. Open - The opening price at which the stock traded
5. High - The high price at which the stock traded
6. Low - The low price at which the stock traded


## Practice Problems

Using the stock data set, analyze the data to programatically print answers to the following business questions, making sure your answer is completed in well-formatted sentences.

1. What was the lowest and highest opening price for the AMZN stock?
2. How many days saw a high trading price that exceeded \$2,500?
3. Did the AMZN stock ever have an opening price less than \$1,500? Your answer must include the phrase "did have" or "did not have".
4. What were the top 3 opening prices for the AMZN stock?


In [22]:
# Answer


## Preview: Data Visualization

Using the matplotlib Python library, a simple line graph can be created to plot data gathered from the NumPy arrays, such as the change in closing price