# NumPy

 - NumPy, which stands for Numerical Python is a Python library used for scientific computing.
 - It can also be used as an efficient multi-dimensional container for data and and a collection of routines for processing those arrays.
 - It is the most basic and a powerful package for working with data in python.

## Importing and Documentation

To be able to use NumPy we need to import the library first as it is not part of core python. Importing a library means loading it into memory and then it's there for us to work with. To import NumPy all you have to do is run the following lines:

In [2]:
import numpy as np # usually we add the second part 'as np' so we can access it with 'np.command' instead of 'numpy.command'

IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions.

For example, to display all the contents of the numpy namespace, you can type np. and the press the TAB key:

In [43]:
# Try it here: (Might take a few seconds for the result to show up)
np.

And to display NumPy's built-in documentation, you can use this:

In [44]:
np?

More detailed documentation, along with tutorials and other resources, can be found at http://www.numpy.org.

## Creating NumPy Arrays

### From List

First, we can use np.array to create arrays from Python lists:

In [12]:
# integer array:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [13]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [14]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### From Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy. Here are several examples:

In [None]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [42]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [None]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

#### Quick Exercise:

<font color=blue>Create a numpy array in a range from 1 to 101 (including 101) with a step of 10:</font>

In [39]:
# Your code goes here:


## Array Attributes

Let's start exploring some of the key aspects of NumPy arrays by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. We'll use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [16]:
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

In [21]:
print("x1 ndim: ", x1.ndim)
print("x1 shape:", x1.shape)
print("x1 size: ", x1.size)

x1 ndim:  1
x1 shape: (6,)
x1 size:  6


#### Quick Exercise

<font color=blue>Inspect the attributes of arrays x2 and x3:</font>

## Understanding NumPy Axes

NumPy axes are one of the hardest things to understand in the NumPy system. If you’re just getting started with NumPy, this is particularly true. Many beginners struggle to understand how NumPy axes work.

Don’t worry, it’s not you. A lot of Python data science beginners struggle with this.

#### NUMPY AXES ARE LIKE AXES IN A COORDINATE SYSTEM

<tr>
<td> <img src="attachment:numpy-axes_cartesian-coordinate-example.png" width="100%"/> </td>
<td> <img src="attachment:numpy-axes_point-in-cartesian-coordinates-example.png" width="100%"/> </td>
</tr>

#### NUMPY AXES ARE THE DIRECTIONS ALONG THE ROWS AND COLUMNS

![numpy-arrays-have-axes.png](attachment:numpy-arrays-have-axes.png)

## Array Indexing and Slicing

## Array Reshaping

## Array Concatenation and Splitting

## Aggregations: Min, Max, and Everything In Between

## Numpy Arrays vs. Python Lists

In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.  

Numpy uses **less memory**, is **faster**, and **more convenient** than Lists. Also, we cannot perform calculations (add, subtract, multiply, divide and exponentiation) on Python Lists but we can on Numpy Arrays. 

For example, imagine we have an array of values and we'd like to compute the reciprocal of each. A straightforward approach might look like this:

In [8]:
import numpy as np
np.random.seed(0) # This ensures that the output of random number generator is always consistent.

def compute_reciprocals(values):
    output = np.empty(len(values)) # Reuires a seperate list where we will isert the result (More memory)
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5) # Create an array of 5 random ints between 1 and 10
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

This implementation probably feels fairly natural to someone from, say, a C or Java background. But if we measure the execution time of this code for a large input, we see that this operation is very slow, perhaps surprisingly so! We'll benchmark this with IPython's **%timeit** magic

In [9]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

1.49 s ± 9.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a vectorized operation. This can be accomplished by simply performing an operation on the array, which will then be applied to each element.

In [10]:
print(compute_reciprocals(values)) # Standard approach adopted above
print(1.0 / values) # A vectorized operation

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


In [11]:
%timeit (1.0 / big_array)

3.9 ms ± 50.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The following table lists the arithmetic operators implemented in NumPy:

| Operator | Equivalent ufunc | Description |
| ---|       ---        |                 ---                   |
| +  | np.add           |	Addition (e.g., 1 + 1 = 2)          |
| -	 | np.subtract      |	Subtraction (e.g., 3 - 2 = 1)       |
| -	 | np.negative      |	Unary negation (e.g., -2)           | 
| *	 | np.multiply      |	Multiplication (e.g., 2 * 3 = 6)    |
| /	 | np.divide	    |   Division (e.g., 3 / 2 = 1.5)        |
| // | np.floor_divide  |	Floor division (e.g., 3 // 2 = 1)   |
| ** | np.power         |	Exponentiation (e.g., 2 ** 3 = 8)   |
| %  | np.mod           |	Modulus/remainder (e.g., 9 % 4 = 1) |

### Finding the max, min and sum of an array

In [None]:
numpy_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])


In [None]:
numpy_array.max()

In [None]:
numpy_array.min()

In [None]:
numpy_array.sum()

### Finding the mean, median, variance and standard deviation of an array:

In [None]:
numpy_array.mean()

In [None]:
np.median(numpy_array)

In [None]:
numpy_array.var()

In [None]:
numpy_array.std()

### Subsetting

### How to extract specific items from an array?


In [None]:
a = np.array([1, 2, 3])

Extract the item located in the 2nd index

In [None]:
a[2]

In [None]:
b = np.array([[1, 2, 3],[4,5,6]])

Extract the item located in the 1st index row position (2nd row) and in the 2nd index column position (3rd column)

In [None]:
b[1,2]

You can extract specific portions on an array using indexing starting with 0, something similar to how you would do with python lists.

But unlike lists, numpy arrays can optionally accept as many parameters in the square brackets as there is number of dimensions.



In [None]:
arr2 = np.array([[ 1,  2,  3,  4], [ 3,  4,  5,  6], [ 5,  6,  7,  8]])

Extract the first 2 rows and columns

In [None]:
arr2[:2, :2]

### Question 

How would you extract the 2nd column of the following array:

In [None]:
arr2d = np.array([[1,2,3],[4,5,6]])

#### Answer

Write your answer to the question in this box.

In [None]:
# Question Code Answer

### Slicing

Extract the items located from the 0th index up to (not including) 2nd index

In [None]:
a = np.array([1, 2, 3])
a[0:2]

Extract the items in the rows 0 up to 2 and column 1

In [None]:
b = np.array([[1, 2, 3],[4,5,6]])
b[0:2,1]

Extract the rows up to the 1st index

In [None]:
b[:1]