<a href="https://colab.research.google.com/github/SimeonHristov99/ML_22-23/blob/main/Week_01 - Hello%2C Numpy/hello_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hello, [NumPy](https://numpy.org/doc/stable/reference/)!

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.vPezx00A1u0WAfS8e8wBXQHaHa%26pid%3DApi&f=1" alt="drawing" width="200"/>



What is NumPy?

- the core library for **scientific computing** in Python;
- almost all of the libraries in the [PyData](https://pydata.org/) ecosystem (`pandas`, `scipy`, `scikit-learn`, etc.) and many deep learning frameworks such as [Tensorflow](https://www.tensorflow.org/) and [PyTorch](https://pytorch.org/) **rely on NumPy as one of their main building blocks**;
- the main advantage of NumPy is that **it's incredibly fast**, as it has bindings to C libraries.

NumPy has many built-in functions and capabilities. We won't cover them all but instead we will focus on some of the most important aspects such as:
- arrays;
- vectors;
- matrices;
- number generation.

# NumPy Arrays

NumPy arrays are the main way we will use NumPy throughout the course. They come in two flavors: vectors and matrices.
- vectors are 1-dimensional (1D) arrays;
- matrices are 2-dimensional (2D) arrays (but you should note a matrix can still have only one row or one column);
- we'll call any structure that has 3 or more dimensions just an array or tensor;
- NumPy arrays are **homogeneours**. They can't hold multiple types!

## Why not just a list?

Some reasons are:
- Memory Efficiency of Numpy Array vs list;
- Easily expands to N-dimensional arrays;
- Speed of calculations of numpy array;
- **Broadcasting** operations and functions with numpy;
- All machine learning libraries we use are built with Numpy;
- For a discussion on why you would want to use NumPy arrays instead of Python lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

## Some terminology

- **Axis**: The number of the dimension (an non-negative integer).
- **Shape**: The number of elements in a dimension (a tuple).

1D array:

![1d_array](https://raw.githubusercontent.com/SimeonHristov99/ML_22-23/main/assets/1d.png)

2D array:

![2d_array](https://raw.githubusercontent.com/SimeonHristov99/ML_22-23/main/assets/2d.png)

3D array:

![3D array](https://raw.githubusercontent.com/SimeonHristov99/ML_22-23/main/assets/3d.png)

In [None]:
# Import numpy

In [None]:
# Check the version of numpy

# Creating NumPy Arrays

## From Python Lists

We can create an array by directly converting a list or list of lists.

In [None]:
# Of course we don't need to have a predefined variable.

## Using built-in methods

There are lots of built-in ways to generate arrays.

## arange

The analog to the Python `range` function. [[reference](https://numpy.org/doc/stable/reference/generated/numpy.arange.html?highlight=arange#numpy.arange)]

## zeros and ones

Generate arrays of zeros or ones. [[reference](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html?highlight=zeros#numpy.zeros)]

Note that the resulting arrays hold **floating point** numbers.

## linspace 

Return evenly spaced numbers over a specified interval. [[reference](https://www.numpy.org/devdocs/reference/generated/numpy.linspace.html)]

> **Note**: `.linspace()` *includes* the stop value.

To obtain an array of common fractions, increase the number of items:

## eye

Creates an identity matrix [[reference](https://numpy.org/doc/stable/reference/generated/numpy.eye.html?highlight=eye#numpy.eye)]. The resulting array holds floating point numbers.

## Random 

Numpy also has lots of ways to create random number arrays.

### rand
Creates an array of the given shape and populates it with random samples from a uniform distribution over ``[0, 1)`` [[reference](https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html?highlight=rand#numpy.random.rand)].

Read more about the uniform distribution [here](https://www.wallstreetmojo.com/uniform-distribution/).

In [None]:
print(f'Expected mean if true uniform distribution: {1 / 2}. Got: {samples.mean()}')
print(f'Expected standard deviation if true uniform distribution: {np.sqrt(1 / 12)}. Got: {samples.std()}')

### randn

Returns a sample (or samples) from the "standard normal" distribution [σ = 1]. Unlike **rand** which is uniform, values closer to zero are more likely to appear [[reference](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randn.html?highlight=randn#numpy.random.randn)].

Read more about the standard normal distribution [here](https://online.stat.psu.edu/stat500/lesson/3/3.3/3.3.2).

In [None]:
# print the mean and standard deviation

### randint

Returns random integers from `low` (inclusive) to `high` (exclusive) drawn from the discrete uniform distribution.  [[reference](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html?highlight=randint#numpy.random.randint)]

In [None]:
# Print the mean.

### seed

Can be used to set the random state, so that the same "random" results can be reproduced.

# Useful Attributes and Methods

In [None]:
arr

In [None]:
rand_arr

## Reshape

Returns an array containing the same data with a new shape.

> **Note**: The product of the elements in the new shape **MUST** equal the total number of elements! 

In [None]:
# Get the total number of elements

## max, min, argmax, argmin

These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [None]:
rand_arr

In [None]:
# get the maximum number

In [None]:
# get the index of the maximum number

In [None]:
# get the minimum number

In [None]:
# get the index of the minimum number

## Shape


In [None]:
# Vector

In [None]:
# Matrix

In [None]:
# Notice the two sets of brackets in the output

### dtype

Grab the data type of the object in the array.

# Indexing and Selection

How to select elements or groups of elements from an array?

In [None]:
#Creating a sample array

### Bracket Indexing and Selection

The simplest way to pick one or some elements of an array looks very similar to Python lists.

In [None]:
#Get a value at an index

In [None]:
#Get values in a range

In [None]:
#Get values in a range

### **Broadcasting**

NumPy arrays differ from normal Python lists because of their ability to **broadcast**.

- in general, allows for working with arrays of different shapes
- with lists, you can only reassign parts of a list with new parts of the same size and shape. That is, if you wanted to replace the first 5 elements in a list with a new value, you would have to pass in a new 5 element list. With NumPy arrays, you can broadcast a single value across a larger set of values.

In [None]:
# Technically, we should have the same shape on both arrays, but numpy can automatically create it for us.

In [None]:
# broadcasting makes it easier to write
# code like this

In [None]:
#Setting a value with index range (Broadcasting)

> **Note**: Changes in sliced arrays get propagated to the original array.

In [None]:
# Reset
arr = np.arange(0,11)
arr

In [None]:
#Important notes on slices

`slice_of_arr` is **NOT** a copy of `arr`. It is a direct reference to its elements.

In [None]:
#Change Slice

# Slice has changed

In [None]:
# But so has the original array

In [None]:
#To get a copy, need to be explicit

## Indexing a 2D array (matrices)

The general format is `arr_2d[row][col]` or `arr_2d[row, col]`. The more common approach is to use the comma notation.

In [None]:
# Indexing row

In [None]:
# Getting individual element value

In [None]:
# Using the comma notation

In [None]:
# 2D array slicing

# Get all the elements from the bottom row

In [None]:
# Grabbing the top right corner

## Indexing is Often Confusing

Indexing a 2D matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching *NumPy indexing* to find useful images, like this one:

![numpy_indexing](https://scipy-lectures.org/_images/numpy_indexing.png)

## Conditional Selection

This is **an important concept** that will directly translate to `pandas` later on!

Let's briefly go over how to use brackets for selection based off of comparison operators.

In [None]:
# Most commonly, the boolean array (also called mask)
# is not stored in a variable.

# NumPy Operations

## Arithmetic

You can easily perform *array with array* arithmetic, or *scalar with array* arithmetic thanks to broadcasting. Let's see some examples:

In [None]:
# This will raise a Warning on division by zero, but not an error!
# It just fills the spot with nan (Not A Number)

In [None]:
# Also a warning (but not an error) relating to infinity

In [None]:
# See the power of broadcasting

## Universal Array Functions

NumPy comes with many [universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html), or <em>ufuncs</em>, which are essentially just mathematical operations that can be applied across the array.<br>Let's show some common ones:

In [None]:
# Taking Square Roots

In [None]:
# Calculating exponential (e^)

In [None]:
# Trigonometric Functions like sine

In [None]:
# Taking the Natural Logarithm

## Summary Statistics on Arrays

NumPy also offers common summary statistics like <em>sum</em>, <em>mean</em> and <em>max</em>. You would call these as methods on an array.

In [None]:
# Take the sum

In [None]:
# Take the mean

In [None]:
# Take the maximum element

<strong>Other summary statistics include:</strong>
<pre>
arr.min() returns 0                   minimum
arr.var() returns 8.25                variance
arr.std() returns 2.8722813232690143  standard deviation
</pre>

## Axis Logic

![axis_logic](https://raw.githubusercontent.com/SimeonHristov99/ML_22-23/main/assets/axis_logic.png)

When working with 2-dimensional arrays (matrices) we have to consider rows and columns. This becomes very important when we get to the section on `pandas`. In array terms, axis 0 (zero) is the vertical axis (rows), and axis 1 is the horizonal axis (columns). These values (0,1) correspond to the order in which <tt>arr.shape</tt> values are returned.

Let's see how this affects our summary statistic calculations from above.

By passing in <tt>axis=0</tt>, we're returning an array of sums along the vertical axis, essentially <tt>[(1+5+9), (2+6+10), (3+7+11), (4+8+12)]</tt>

This tells us that <tt>arr_2d</tt> has 3 rows and 4 columns.

So what should <tt>arr_2d.sum(axis=1)</tt> return?

# Let's try it out

## Task 1

Create a numpy array of `101` evenly linearly spaced points between `0` and `10`. Assign this array to a variable called `arr`. Print `arr`.

In [None]:
# Your code here

In [None]:
# Expected output

## Task 2

Check how many rolls were greater than `2`. For example if `dice_rolls=[1,2,3]` then the answer is `1`.

In [None]:
dice_rolls = np.array([3, 1, 5, 2, 5, 1, 1, 5, 1, 4, 2, 1, 4, 5, 3, 4, 5, 2, 4, 2, 6, 6, 3, 6, 2, 3, 5, 6, 5])

# Your code here

In [None]:
# Expected output

## Task 3

A bank account has had withdrawals and deposits tracked in a numpy array called `account_transactions`. Based on this list of account transactions, what is the total remaining in the account?

In [None]:
# Solution
account_transactions = np.array([100,-200,300,-400,100,100,-230,450,500,2000])

# Your code here

In [None]:
# Expected output