# Introduction to Numpy

### Outline
* [Getting started](#getting-started)
* [ndarray](#ndarray)
* [Creating single dimentional arrays with a python list](#single-dimentional-arrays-with-python-list)
* [Array attributes](#array-attributes)
    * dtype
    * ndim
    * size
    * shape
    * itemsize
    * data
* [Creating multi-dimentional arrays with python sequences](#multi-dimentional-arrays-with-python-sequences)
* [More creational routines](#more-creational-routines)
   * zeros
   * ones
   * arange
* [Array manipulation routines](#array-manipulation-routines) 
    * reshape
    * ravel
* More routines
    * linspace
    * random
* [Structured arrays](#structured-arrays)
* [Basic operations](#basic-operations)
    * Scalar addition
    * Scalar multiplication
    * Element-wise addition
    * Element-wise subtraction
    * Element-wise multiplication
    * Matrix dot product
* [Universal functions](#universal-functions)
* Aggregate functions
    * sum
    * min 
    * max 
    * mean 
    * std
* Indexing
* Slicing
* Iteration
* Conditions and boolean arrays
* Array manipulation
    * joining arrays
        * vstack
        * hstack
        * column_stack
        * row_stack
    * spliting arrays
        * hsplit
        * vsplit
        * split
* Vectorization
* Broadcasting
* Reading and writing data on files
    * saving data as binary
    * loading binary data
    * reading tabular data
    * 
 

<a id="getting-started"></a>
### Getting Started

By convention, when imported numpy is typically aliased as np.

In [1]:
import numpy as np

<a id="ndarray"></a>
### ndarray

An [ndarray](https://numpy.org/devdocs/reference/arrays.ndarray.html#) is a multidimensional homogeneous array with a predetermined number of items.

* homogeneous meaning that all items are of the same [dtype](https://numpy.org/doc/1.17/reference/arrays.dtypes.html?highlight=dtype) and size.
* the data type is specified by another NumPy object called dtype (data-type).
* each ndarray is associated with only one data-type.

<a id="single-dimentional-arrays-with-python-list"></a>
### Creating single dimentional arrays with a python list

Arrays can be constructed using [creational routines](https://numpy.org/devdocs/reference/generated/numpy.array.html#numpy.array)

[data source](https://www.metacritic.com/movie/serenity/critic-reviews) for scores.

In [2]:
serenity_movie_critic_scores = [
    100, 91, 90, 90, 88, 88, 80, 80, 80, 80, 
     80, 80, 78, 75, 75, 75, 75, 75, 75, 75, 
     75, 75, 75, 75, 70, 70, 70, 70, 70, 70,
     63, 50, 50, 50
]

[numpy.array](https://numpy.org/devdocs/reference/generated/numpy.array.html#numpy.array)

In [3]:
x = np.array(serenity_movie_critic_scores, np.int32)

In [4]:
x

array([100,  91,  90,  90,  88,  88,  80,  80,  80,  80,  80,  80,  78,
        75,  75,  75,  75,  75,  75,  75,  75,  75,  75,  75,  70,  70,
        70,  70,  70,  70,  63,  50,  50,  50], dtype=int32)

In [5]:
type(x)

numpy.ndarray

<a id="array-attributes"></a>
### Array Attributes

[Array attributes](https://numpy.org/doc/1.17/reference/arrays.ndarray.html#array-attributes) reflect information that is intrinsic to the array itself. 

In [6]:
# Data-type of the array’s elements.
x.dtype

dtype('int32')

In [7]:
# Number of array dimensions.
x.ndim

1

In [8]:
# Number of elements in the array.
x.size

34

In [9]:
# Tuple of array dimensions.
x.shape

(34,)

In [10]:
# Length of one array element in bytes.
x.itemsize

4

In [11]:
# Total bytes consumed by the elements of the array.
# Same as x.size * x.itemsize
x.nbytes

136

**Knowledge check**  
When we created the array, we specified that each item should have a datatype of np.int32.  
How would the value of the itemsize attribute change if we changed the datatype to np.int64?  
Try it out!

<a id="multi-dimentional-arrays-with-python-sequences"></a>
### Creating Multi Dimentional Arrays With Python Sequences

Matrix resources:
* [Khan Academy - Introduction to matrices](https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices/x9e81a4f98389efdf:mat-intro/v/introduction-to-the-matrix)
* [Wikipedia - Matrix](https://en.wikipedia.org/wiki/Matrix_(mathematics))

Let's create a 2D array containing the quaterly US sales for Harley Davidson Motorcycles.  

|  Year  | Q1     | Q2     | Q3     |
| ------ | ------ | ------ | ------ |
| 2018   | 29309  | 46490  | 36220  |
| 2019   | 28091  | 42762  | 34903  |

data source:
* [2019 Q1](https://investor.harley-davidson.com/static-files/34d087e4-95d5-45ff-9098-fe8c09bee292)
* [2019 Q2](https://investor.harley-davidson.com/static-files/2a5df0f5-6ea5-4860-803e-bc31828e0526)
* [2019 Q3](https://investor.harley-davidson.com/static-files/51bb4f70-77c6-4526-9c87-046c3a7c0f5e)



In [12]:
# We could have also elected for a list of lists or a tuple of tuples.
hd_us_sales = [(29309, 46490, 36220), (28091, 42762, 34903)]

In [13]:
hd_us_sales

[(29309, 46490, 36220), (28091, 42762, 34903)]

In [14]:
# Notice here that we are not providing dtype to the function. 
# Numpy will automatically determine as the minimum type required to hold the objects in the sequence.
X =  np.array(hd_us_sales)

In [15]:
X

array([[29309, 46490, 36220],
       [28091, 42762, 34903]])

In [16]:
type(X)

numpy.ndarray

Once again, let's have a look at the attributes associated with this ndarray.

In [17]:
X.dtype

dtype('int64')

In [18]:
X.ndim

2

In [19]:
X.size

6

In [20]:
X.shape

(2, 3)

X has rank 2 since it has two axes. Each axe has 3 elements.

In [21]:
np.linalg.matrix_rank(X)

2

<a id="more-creational-routines"></a>
### More Creational Routines

There are times when we want to create arrays initilized with default values.

#### Zeros
Let's create an array of a given shape and size filled with zeros.  
The [zeros creation routine](https://numpy.org/devdocs/reference/generated/numpy.zeros.html#numpy.zeros) expects a shape. It also allows for two additional parameters.

numpy.zeros(shape, dtype=float, order='C')

In [22]:
x = np.zeros((2, 3), dtype=int)

In [23]:
x

array([[0, 0, 0],
       [0, 0, 0]])

#### Ones

Similarly, the [ones creational routine](https://numpy.org/devdocs/reference/generated/numpy.ones.html#numpy.ones) will generate an array where every element's value is one.

In [24]:
x = np.ones((4, 4))

In [25]:
x

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

#### Range

The [arange creational routine](https://numpy.org/devdocs/reference/generated/numpy.arange.html#numpy.arange) is useful when we need to create an array with evenly space values within a given interval.

numpy.arange([start, ]stop, [step, ]dtype=None)

In [26]:
x = np.arange(10)
y = np.arange(20, 32)
z = np.arange(40, 50, 2)

In [27]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
y

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

In [29]:
z

array([40, 42, 44, 46, 48])

<a id="array-manipulation-routines"></a>
### Array manipulation routines

NumPy provides many [array manipulation routines](https://numpy.org/devdocs/reference/routines.array-manipulation.html#array-manipulation-routines). Let's explore some that allow us to change the shape of arrays. 
Keep in mind that often these functions return new objects rather than mutating the existing object.

#### reshape

The [reshape function](https://numpy.org/devdocs/reference/generated/numpy.reshape.html#numpy.reshape) gives a new shape to an array without changing its data.

In [30]:
np.reshape(x, (2, 5))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [31]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**Knowledge check**  
x remains unchanged. Why? 
How can we fix it?  
Was this behavior expected? Where is this documented?

In [32]:
y.reshape((3, 4))

array([[20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

**Note**  
We could also have chained the invocations to these functions in a single line.

In [33]:
y = np.arange(20, 32).reshape((3, 4))
y

array([[20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

#### ravel
The [ravel function](https://numpy.org/devdocs/reference/generated/numpy.ravel.html#numpy.ravel) returns a contiguous flattened array.

In [34]:
y = np.ravel(y)

In [35]:
y

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

<a id="structured-arrays"></a>
### Structured Arrays

Jen Barber, the Relationship Manager of the IT Department, would like to utilize Numpy for the organization's infrastructure data. But she has the need to be able to manipulate this data by field rather than by row. 

She has provided you with a snippet of the data structure for disaster recovery assets. 

| Site | UPS Count | Generator Count |
| ---- | --------- | --------------- |
| US   | 4         | 2               |
| UK   | 2         | 1               |
| IN   | 3         | 2               |


Using this information, let's create a [structured array](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.rec.html) to prove to her that her requirements can be met.

In [36]:
infrastructure = np.array( [(4, 2), (2, 1), (3, 2)], dtype=[('ups_count', 'i4'), ('gen_count', 'i4')])

In [37]:
infrastructure

array([(4, 2), (2, 1), (3, 2)],
      dtype=[('ups_count', '<i4'), ('gen_count', '<i4')])

In [38]:
infrastructure['ups_count']

array([4, 2, 3], dtype=int32)

In [39]:
infrastructure['gen_count']

array([2, 1, 2], dtype=int32)

<a id="basic-operations"></a>
### Basic operations

#### Scalar Addition

Jen is satisfied with the structured array. She has revealed to you that Reynholm Industries needs to improve on its site reliability uptime. In order to do so, they intend to add two additional uninterruptible power supply units as well as a generator to each site.  

Perform the necessary operations to update the infrastructure object according to these requirements. 

In [40]:
infrastructure['ups_count'] += 2

In [41]:
infrastructure

array([(6, 2), (4, 1), (5, 2)],
      dtype=[('ups_count', '<i4'), ('gen_count', '<i4')])

In [42]:
infrastructure['gen_count'] += 1

In [43]:
infrastructure

array([(6, 3), (4, 2), (5, 3)],
      dtype=[('ups_count', '<i4'), ('gen_count', '<i4')])

#### Scalar Multiplication

Recent site reliabilty improvements at Reynholm Industries have not gone unnoticed. The company is now growing faster than ever. Denholm Reynholm has informed the IT department that they are expecting a 20% growth in the usage of cloud hosted services. Jen Barber has once again solicited your help in estimating the total increased operational cost. For simplicity sake, you decide to estimate a 20% increase in costs. Given the current costs below, how can you use numpy to arrive at an estimate?


| Provider | Cloud Compute | Cloud Storage | 
| -------- | ------------- | ------------- |
| AWS      | 2,500         | 3,200         |
| GCP      | 4,000         | 250           |
| Azure    | 1,000         | 750           |


Let's convert this table into an ndarray.

In [44]:
cloud_cost = np.array( [(2500, 3200), (4000, 250), (1000, 750)] )

In [45]:
cloud_cost

array([[2500, 3200],
       [4000,  250],
       [1000,  750]])

We can now use scalar multiplication to acquire the estimated increased cost for each element.

In [46]:
cloud_cost * .20 # element-wise operation

array([[500., 640.],
       [800.,  50.],
       [200., 150.]])

We can chain this function call with the sum aggregate function to arrivate at our final result. We will cover more about aggregate functions in a later section.

In [47]:
(cloud_cost * .20).sum()

2340.0

#### Element-wise addition

Denholm, the director of Reynholm Industries, is an avid believer in working as a team. As such, he's requested a single ndarray containing a sum total of each measurement for all of his teams.


**Blue Team**

| Month   | Commit Count | Issues Resolved | New Features |
| ------- | ------------ | --------------- | ------------ |
| Jan     | 100          | 3               | 1            |
| Feb     | 250          | 2               | 2            |    
| Mar     | 235          | 3               | 2            |


**Red Team**

| Month   | Commit Count | Issues Resolved | New Features |
| ------- | ------------ | --------------- | ------------ |
| Jan     | 75           | 1               | 0            |
| Feb     | 137          | 1               | 1            |
| Mar     | 200          | 2               | 2            | 


Let's start by putting the data from these tables into ndarray objects.

In [48]:
blue_team = np.array([ (100, 3, 1), (250, 2, 2), (235, 3, 2) ])

In [49]:
blue_team

array([[100,   3,   1],
       [250,   2,   2],
       [235,   3,   2]])

In [50]:
red_team = np.array([ (75, 1, 0), (137, 1, 1), (200, 2, 2) ])

In [51]:
red_team

array([[ 75,   1,   0],
       [137,   1,   1],
       [200,   2,   2]])

In [52]:
blue_team + red_team

array([[175,   4,   1],
       [387,   3,   3],
       [435,   5,   4]])

#### Element-wise subtraction

After some consideration, Denholm has decided that he would like to get a better picture of how each team performed against their initial target.

**Q1 Target For Each Team**

| Month   | Commit Count | Issues Resolved | New Features |
| ------- | ------------ | --------------- | ------------ |
| Jan     | 100          | 1               | 2            |
| Feb     | 150          | 2               | 3            |
| Mar     | 200          | 3               | 4            | 

In [53]:
q1_target = np.array([ (100, 1, 2), (150, 2, 3), (200, 3, 4) ])

In [54]:
q1_target

array([[100,   1,   2],
       [150,   2,   3],
       [200,   3,   4]])

In [55]:
blue_team - q1_target

array([[  0,   2,  -1],
       [100,   0,  -1],
       [ 35,   0,  -2]])

In [56]:
red_team - q1_target

array([[-25,   0,  -2],
       [-13,  -1,  -2],
       [  0,  -1,  -2]])

**Thought expirement**  
Let's pretend for a moment that the measurements taken here are:  

a) actually meaningful towards the company's end goals of happier customers, higher profits, and an engaged staff.  
b) directly associated to a healhty balance of quality and produtivity rather than the result of metric fixation.  

Now that you have suspended disbelief for a moment, which team had the best performance?

#### Element-wise Multiplication

Matrix Multiplication Resources:  
[Wikipedia - Matrix Multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication)  
[Khan Academy - Matrix vector products](https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/null-column-space/v/matrix-vector-products?modal=1)




During a manager's meeting, word has spread that NumpPy is an incredibly useful tool. The marketing department is preparing for a major recruiting event and they would like to make sure they have enough swag to pass around to potential future employees.

Using Matrix multiplication, we can help them evaluate what they currently have in stock.

**Swag Inventory**

| Product     | Pallets In Storage |
| ----------- | ------------------ |
| Coffee Mugs | 2                  | 
| Mouse Mats  | 3                  |
| Posters     | 1                  |



| Product     | Units Per Pallet |
| ----------- | ---------------- |
| Coffee Mugs | 8,000            |
| Mouse Mats  | 20,000           |
| Stickers    | 150,000          |



In [57]:
A = np.array([ (2, 3, 1)])

In [58]:
A

array([[2, 3, 1]])

In [59]:
B = np.array([ (8000, 20000, 150000) ])

In [60]:
B

array([[  8000,  20000, 150000]])

In [61]:
C = A * B

In [62]:
C

array([[ 16000,  60000, 150000]])

<a id="universal-functions"></a>
### Universal Functions

A [universal functions](https://numpy.org/doc/1.17/reference/ufuncs.html) (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion.


**Required Units**

| Product     | Units Per Pallet |
| ----------- | ---------------- |
| Coffee Mugs | 30,000           |
| Mouse Mats  | 50,000           |
| Stickers    | 150,000          |


In [63]:
R = np.array([30000, 10000, 150000])

In [64]:
R

array([ 30000,  10000, 150000])

In [65]:
np.greater_equal(C, R)

array([[False,  True,  True]])