## Content


- **Introduction to use case**

    
- **2-D arrays (Matrices)**
    - `reshape()`
    - 2 Questions
    - Transpose
    - Converting Matrix back to Vector - `flatten()`

- **Indexing and Slicing on 2D** 
    - Indexing

    - Slicing

    - Masking (Fancy Indexing)


     


- **Universal Functions (ufunc) on 2D**
    - Aggregate Function/ Reduction functions - `sum()`, `mean()`, `min()`, `max()`
    
    - Axis argument

    - Logical Operations
    




### Use Case: Fitbit

### Imagine you are a Data Scientist at Fitbit

You've been given a user data to analyse and find some insights which can be shown on the smart watch.

#### But why would we want to analyse the user data for desiging the watch?

These insights from the user data can help business make customer oriented decision for the product design.



#### Lets first look at the data we have gathered

Link: https://drive.google.com/file/d/1Uxwd4H-tfM64giRS1VExMpQXKtBBtuP0/view?usp=sharing

<img src='https://drive.google.com/uc?id=1Uxwd4H-tfM64giRS1VExMpQXKtBBtuP0'>


#### Notice that there are some user features in the data

There are provided as various columns in the data.

#### Every row is called a record or data point


#### What are all the features provided to us? 

- Date
- Step Count
- Mood (Categorical)
- Calories Burned
- Hours of sleep
- Feeling Active (Categorical)


**Using NumPy, we will explore this data to look for some interesting insights - Exploratory Data Analysis.**

#### EDA is all about asking the right questions

#### What kind of questions can we answer using this data?

- How many records and features are there in the dataset?
- What is the **average step count**?
- On which day the **step count was highest/lowest?** 


#### Can we find some deeper insights?

We can probably see how daily activity affects sleep and moood.

We will try finding 
- How daily activity affects mood? 

In [None]:
import numpy as np

## Working with 2-D arrays (Matrices)


#### Question : How do we create a matrix using numpy?

In [None]:
m1 = np.array([[1,2,3],[4,5,6]])
m1
# Nicely printing out in a Matrix form

array([[1, 2, 3],
       [4, 5, 6]])

How can we check shape of a numpy array?

In [None]:
m1.shape # arr1 has 3 elements

(2, 3)

#### Question: What is the type of this result of `arr1.shape`? Which data structure is this?
Tuple

#### Now, What is the dimension of this array?

In [None]:
m1.ndim

2

#### Question
```

a = np.array([[1,2,3],
              [4,5,6],
              [7,8,9]])

b = len(a)
```
What'll be the value of b? 

```
Ans: 3
```
**Explanation: len(nD array) will give you magnitude of first dimension**


In [None]:
a = np.array([[1,2,3],
              [4,5,6],
              [7,8,9]])

In [None]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
len(a)

3

#### What will be the shape of array `a`?

In [None]:
a.shape

(3, 3)

- So, it is a **2-D array** with **3 rows and 3 columns**

Clearly, if **we have to create high-dimensional arrays, we cannot do this using `np.arange()`** directly

### How can we create high dimensional arrays?

- Using `reshape()`

For a 2D array
- **First argument** is **no. of rows**
- **Second argument** is **no. of columns**

In [None]:
m2 = np.arange(1, 13)
m2

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

- We can pass the **desired dimensions** of array in `reshape()`

#### In what ways can we convert this array with 12 values into high-dimensional array?

#### Can we make `m2` a $4\times4$ array?

- Obviously NO
- **$4\times4$ requires 16 values**, but **we only have 12 in `m2`**

In [None]:
m2 = np.arange(1, 13)
m2.reshape(4, 4)

ValueError: ignored

#### So, What are the ways in which we can reshape it?

- $4\times3$
- $3\times4$
- $6\times2$
- $2\times6$
- $1\times12$
- $12\times1$

In [None]:
m2 = np.arange(1, 13)
m2.reshape(4, 3)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [None]:
m2 = np.arange(1, 13)
m2

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [None]:
m2.shape

(12,)

#### Lets do some reshaping here 

In [None]:
m2.reshape(12, 1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

#### Now, What's the difference b/w `(12,)` and `(12, 1)`?

- **`(12,)`** means its a **1D array**
- **`(12, 1)`** means its a **2D array**

#### Question

What will be output for the following code?
```
a = np.array([[1,2,3],[0,1,4]])
print(a.ndim)
```

**Ans: 2**


In [None]:
a = np.array([[1,2,3],[0,1,4]])
print(a.ndim)

2


Since it is a 2 dimensional array, the number of dimension will be 2.

### Transpose

- **Change rows into columns and columns into rows**

- Just use `<Matrix>.T`

In [None]:
a = np.arange(3)
a

array([0, 1, 2])

In [None]:
a.T

array([0, 1, 2])

#### Why did Transpose did not work?

- Because **numpy sees `a` as a vector (3,), NOT a matrix**

- We'll have to **reshape the vector `a` to make it a matrix**

In [None]:
a = np.arange(3).reshape(1, 3)
a
# Now a has dimensions (1, 3) instead of just (3,)
# It has 1 row and 3 columns

array([[0, 1, 2]])

In [None]:
a.T
# It has 3 rows and 1 column

array([[0],
       [1],
       [2]])



#### Conclusion

- **Transpose works only on matrices**


### Flattening of an array

#### What if we want to convert this 2D or nD array back to 1D array?

There is a function named `flatten()` to help you do so.



In [None]:
A = np.arange(12).reshape(3, 4)
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
A.flatten() 

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

## Indexing and Slicing on 2D Numpy arrays





### Indexing in np arrays

- Works same as lists

In [None]:
m1 = np.arange(1,10).reshape((3,3))

In [None]:
m1

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
m1[1][2]

6

OR

- We just use [0, 0] (**indexes separated by commas**)

#### What will be the output of this?

In [None]:
m1[1, 1] #m1[row, column]

5

#### We saw how we can use list of indexes in numpy array

In [None]:
m1 = np.array([100,200,300,400,500,600])

In [None]:
m1[[2,3,4,1,2,2]]

array([300, 400, 500, 200, 300, 300])



How'll list of indexes work in 2D array ?

In [None]:
m1 = np.arange(9).reshape((3,3))

In [None]:
m1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [None]:
m1[[0,1,2],[0,1,2]] # picking up element (0,0), (1,1) and (2,2)

array([0, 4, 8])

### Slicing

- Need to **provide two slice ranges** - one for **row** and one for **column**
- Can also **mix Indexing and Slicing**

In [None]:
m1 = np.arange(12).reshape(3,4)
m1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
m1[:2] # gives first two rows

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

#### How can we get columns from 2D array?

In [None]:
m1[:, :2] # gives first two columns

array([[0, 1],
       [4, 5],
       [8, 9]])

**Question: Given an 2-D array**

```
m1 = [[0,1,2,3],
     [4,5,6,7],
     [8,9,10,11]]

```

In [None]:
m1 = m1.reshape((3,4))

#### Question for you: Can you just get this much of our array `m1`?
```
[[5, 6],
 [9, 10]]
```

#### Remember our `m1` is:
```
m1 = [[0, 1, 2, 3],
      [4, 5, 6, 7],
      [8, 9, 10, 11]]
```

In [None]:
# First get rows 1 to all
# Then get columns 1 to 3 (not included)
m1[1:, 1:3]

array([[ 5,  6],
       [ 9, 10]])

#### Question: What if I need 1st and 3rd column?
```
[[1, 3],
 [5, 7],
 [9,11]]
```

In [None]:
# Get all rows
# Then get columns from 1 to all with step of 2

m1[:, 1::2]

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])

- **We can also pass indices of required columns as a Tuple** to get the same result

In [None]:
# Get all rows
# Then get columns 1 and 3

m1[:, (1,3)]

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])

### Fancy indexing (Masking)
\


#### What would happen if we do this?


In [None]:
m1 = np.arange(12).reshape(3, 4)
m1 < 6

array([[ True,  True,  True,  True],
       [ True,  True, False, False],
       [False, False, False, False]])


- A **matrix having boolean values** `True` and `False` is returned 



- **We can use this boolean matrix to filter our array**


#### Now, Let's use this to filter or mask values from our array

- **Condition will be passed instead of indices and slice ranges**

In [None]:
m1[m1 < 6]
# Value corresponding to True is retained
# Value corresponding to False is filtered out

array([0, 1, 2, 3, 4, 5])




#### How can we filter/mask even values from our array?

In [None]:
m1[m1%2 == 0]

array([ 0,  2,  4,  6,  8, 10])

#### But did you notice that matrix gets converted into a 1D array after masking?

In [None]:
m1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
m1[m1%2 == 0]

array([ 0,  2,  4,  6,  8, 10])

#### It happens because

- To retain matrix shape, it **has to retain all the elements**
- It **cannot retain its $3\times4$ with lesser number of elements**
- So, this filtering operation **implicitly converts high-dimensional array into 1D array**


#### If we want, we can reshape the resulting 1D array into 2D

- But, we need to know **beforehand** what is the **dimension or number of elements** in resulting 1D array

In [None]:
m1[m1%2==0].shape

(6,)

In [None]:
m1[m1%2==0].reshape(2, 3)

array([[ 0,  2,  4],
       [ 6,  8, 10]])

## Universal Functions (`ufunc`) on 2D & Axis 

### Aggregate Functions/ Reduction functions

We saw how aggregate functions work on 1D array in last class



In [None]:
arr = np.arange(3)
arr

array([0, 1, 2])

In [None]:
arr.sum()

3

#### Let's apply Aggregate functions on 2D array

### `np.sum()`

In [None]:
a = np.arange(12).reshape(3, 4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
np.sum(a)  # sums all the values present in array

66

#### What if we want to do the elements row-wise or column-wise?

- By **setting `axis` parameter**

#### What will `np.sum(a, axis=0)` do?

- **`np.sum(a, axis=0)` adds together values in DIFFERENT rows**
- **`axis = 0` ---> Changes will happen along the vertical axis**
- Summing of values happen **in the vertical direction**
- Rows collapse/merge when we do `axis=0`


In [None]:
np.sum(a, axis=0)

array([12, 15, 18, 21])

#### Now, What if we specify `axis=1`?

- **`np.sum(a, axis=1)` adds together values in DIFFERENT columns** 
- **`axis = 1` ---> Changes will happen along the horizontal axis**
- Summing of values happen **in the horizontal direction**
- Columns collapse/merge when we do `axis=1`

In [None]:
np.sum(a, axis=1)

array([ 6, 22, 38])

***

#### Now, What if we want to find the average value or median value of all the elements in an array?

In [None]:
np.mean(a) # no need to give any axis

5.5

#### What if we want to find the mean of elements in each row or in each column?

- We can do **same thing with `axis` parameter** like we did for `np.sum()` function


#### Question: Now you tell What will `np.mean(a, axis=0)` give?

- It will give **mean of values in DIFFERENT rows**
- **`axis = 0` ---> Changes will happen along the vertical axis**
- Mean of values will be calculated **in the vertical direction**
- Rows collapse/merge when we do `axis=0`

In [None]:
np.mean(a, axis=0)

array([4., 5., 6., 7.])

#### How can we get mean of elements in each column?

- **`np.mean(a, axis=1)` will give mean of values in DIFFERENT columns** 
- **`axis = 1` ---> Changes will happen along the horizontal axis**
- Mean of values will be calculated **in the horizontal direction**
- Columns collapse/merge when we do `axis=1`

In [None]:
np.mean(a, axis=1)

array([1.5, 5.5, 9.5])

#### Now, we want to find the minimum value in the array

**`np.min()` function can help us with this**

In [None]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
np.min(a)

0

#### What if we want to find row wise minimum value?

### Use `axis` argument!!

In [None]:
np.min(a, axis = 1 )

array([0, 4, 8])

#### We can also find max elements in an array.

**`np.max()`** function will give us *maximum value in the array*

We can also use `axis` argument to find row wise/ column wise max.

In [None]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
np.max(a) # maximum value

11

In [None]:
np.max(a, axis = 0) # column wise max

array([ 8,  9, 10, 11])

### Logical Operations

#### Now, What if we want to check whether "any" element of array follows a specific condition?

#### Let's say we have 2 arrays:

In [None]:
a = np.array([1,2,3,4])
b = np.array([4,3,2,1])
a, b

(array([1, 2, 3, 4]), array([4, 3, 2, 1]))

#### Let's say we want to find out if any of the elements in array `a` is smaller than any of the corresponding elements in array `b`


#### `np.any()` can become handy here as well


- `any()` returns `True` if **any of the corresponding elements** in the argument arrays follow the **provided condition**.


In [None]:
a = np.array([1,2,3,4])
b = np.array([4,3,2,1])
np.any(a<b) # Atleast 1 element in a < corresponding element in b

True

#### Let's try the same condition with different arrays:

In [None]:
a = np.array([4,5,6,7])
b = np.array([4,3,2,1])
np.any(a<b) # All elements in a >= corresponding elements in b

False

- In this case, **NONE of the elements in `a` were smaller than their corresponding elements in `b`**

- So, `np.any(a<b)` returned `False`

***

#### What if we want to check whether "all" the elements in our array are non-zero or follow the specified condition?

`np.all()`


#### Now, What if we want to check whether "all" the elements in our array follow a specific condition?




#### Let's say we want to find out if all the elements in array `a` are smaller than all the corresponding elements in array `b`

Again, Let's say we have 2 arrays:


In [None]:
a = np.array([1,2,3,4])
b = np.array([4,3,2,1])
a, b

(array([1, 2, 3, 4]), array([4, 3, 2, 1]))

In [None]:
np.all(a<b) # Not all elements in a < corresponding elements in b

False

#### Let's try it with different arrays

In [None]:
a = np.array([1,0,0,0])
b = np.array([4,3,2,1])
np.all(a<b) # All elements in a < corresponding elements in b

True

- In this case, **ALL the elements in `a` were smaller than their corresponding elements in `b`**

- So, `np.all(a<b)` returned `True`

#### Multiple conditions for `.all()` function

In [None]:
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
((a <= b) & (b <= c)).all()

True

#### What if we want to update an array based on condition ? 

Suppose you are given an array of integers and you want to update it based on following condition:
- if element is > 0, change it to +1
- if element < 0, change it to -1.

#### How will you do it ? 


In [None]:
arr = np.array([-3,4,27,34,-2, 0, -45,-11,4, 0 ])
arr

array([ -3,   4,  27,  34,  -2,   0, -45, -11,   4,   0])

You can use masking to update the array (as discussed in last class)

In [None]:
arr[arr > 0]  = 1
arr [arr < 0] = -1

In [None]:
arr

array([-1,  1,  1,  1, -1,  0, -1, -1,  1,  0])

There is a numpy function which can help us with it.

#### np.where()

Function signature: 
`np.where(condition, [x, y])`

This functions returns an ndarray whose elements are chosen from x or y depending on condition.



In [None]:
arr = np.array([-3,4,27,34,-2, 0, -45,-11,4, 0 ])


In [None]:
np.where(arr > 0, +1, -1)

array([-1,  1,  1,  1, -1, -1, -1, -1,  1, -1])

In [None]:
arr

array([ -3,   4,  27,  34,  -2,   0, -45, -11,   4,   0])

Notice that it didn't change the original array.