## <code style="background:yellow;color:black">Summary:</code> 

1. reshape() method
2. Method chaining
3. Transpose of a 2D numpy matrix
4. Nested or chained indexing and Concise indexing of 2D numpy matrices
5. Slicing on 2D numpy matrices
6. Agreggate functions
7. diag() method
8. Logical operations
9. where() function and its use
10. Fitbit Case Study

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">2D Arrays:</code>

**We will today primarily work with 2D arrays, because the data in real world that we will get will be a table of rows and cols.
A table is nothing but a 2D array or a 2 dimensional representation of data.**

In [1]:
import numpy as np

In [2]:
a = np.array(range(16))

a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [3]:
a.shape

(16,)

In [4]:
a.ndim

1

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">reshape() method:</code>

* We have already learnt in the last session about array() method, by which we can create nD numpy array from scratch i.e. from python nested lists.
* Lets suppose we want to convert a 1D array to a 2D array. We can use a method called np.reshape().
* **<code style="background:yellow;color:black">The reshape() method</code> in Numpy is used to change the shape or dimensions of a NumPy array while keeping the same elements.** 
* **It allows us to reorganize the elements of an array into a new shape, provided that the total number of elements remains the same.**
* The reshape() method takes one argument, which is the new shape that we want for our array. This new shape should be specified as a tuple of dimensions. 
* The product of the dimensions in the new shape should equal the total number of elements in the original array. If it doesn't, we will get a ValueError because it's not possible to reshape the array into the specified shape without changing the total number of elements.
* **Format of function is: <code style="background:yellow;color:black">np_array.reshape(num_rows, num_columns)</code>**

Here in the example below we use: a.reshape(8, 2). This converts 1D array into 2D array with 8 rows and 2 cols. We can always play with the structure of the array once we have a numpy array.

In [5]:
a.reshape(8, 2)

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

In [6]:
a.reshape(4, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

**Here in the code below, our array has 16 elements which cannot be stored in 20 spaces. So always make sure that product of rows and columns should always be equal to the total elements in original array.**

In [7]:
a.reshape(4, 5)

ValueError: cannot reshape array of size 16 into shape (4,5)

* **<code style="background:yellow;color:black">This reshape() method is actually very powerful.</code> In our array "a", we have 16 elements, so if we say: a.reshape(8), rows are set to 8, and number of cols in this case will automatically be 2 only, no other option, when we have 16 values in our data.** 
* Lets suppose maybe data is in millions and we dont want to calculate the number of cols, we can simply give 2nd argument as "-1". -1 denotes that we want python to automatically calculate the second argument.
* We can also use -1 as one of the dimensions in the new shape, and NumPy will automatically calculate it based on the size of the array and the other specified dimensions. This is useful when we want to reshape an array without specifying all dimensions explicitly.
* So, if we give only one argument that is rows argument, python will automatically calculate 2nd column argument.
* We can do this for rows as well. Python will automatically compute that since there are 16 total elements, and we want to store that data in 4 cols, so there will be 4 rows only, to make the product of rows and cols 16.
* The example code below: a.reshape(-1, -1) will create ambiguity, because multiple values will satisfy this equation or condition; 4 x 4, 2 x 8, 8 x 2 all will satisfy this condition. Now python cannot randomly choose any one of them and return. 
* **<code style="background:yellow;color:black">So, in order for python to compute any one of the arguments, we MUST provide other argument.</code> We can only specify one unknown dimension, the other dimension or argument needs to be specified.**

In [8]:
a.reshape(8, -1)

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

In [9]:
a.reshape(-1, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [10]:
a.reshape(-1, -1)

ValueError: can only specify one unknown dimension

* **Since numpy arrays are similar to python 2D lists, we can get number of rows by using len() function.**
* **We can get number of columns as below: len(a[0]) = len(a[1]) = len(a[2]), because in matrix each row contains same number of columns.** 

In [11]:
a = a.reshape(8, 2)

a

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

In [12]:
len(a)  

8

In [13]:
len(a[0])  

2

In [14]:
b = np.arange(12)

b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [15]:
b = b.reshape(3, 4)

b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [16]:
len(b)

3

In [17]:
len(b[0])  

4

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Method Chaining:</code>

* **<code style="background:yellow;color:black">Method chaining</code> in NumPy refers to the practice of applying multiple methods or operations in a sequence, where each method is called on the result of the previous one. This allows for concise and readable code when performing a series of operations on arrays.**

* In above code cells, we followed following steps: 
    * Step 1- We created a numpy array consisting of 12 elements as above, by calling the arange() function.
    * Step 2- We reshaped that array by calling the reshape() function.

* **In method chaining, we can do it all together. We can chain both the methods one after the other in a single step.**

* **Advantage: Makes syntax easy, sometimes these two tasks are done one after the other, so it saves time, looks nice. Method chaining isnt faster, it just looks good.**

* Optimization is more in reference to the performance of code, and not with respect to the shape and size of code. 

* No doubt that making the code look clean, readable and faster to write is vital because most of the time we will be working in a team, and team can easily understand our code; maybe we have to collaborate on our single python file.

* **Following is are two implementations of method chaining.**

In [18]:
# Creating an example array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Example of method chaining
result = (
    arr             # Starting with the original array
    .sum(axis = 0)  # Sum along the columns
    .mean()         # Calculate the mean of the sums
)

print(result)

15.0


In [19]:
b = np.arange(12).reshape(3, 4)

b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [20]:
b.shape

(3, 4)

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Transpose of a matrix:</code>

* **The transpose of a matrix is obtained by swapping its rows and columns.**

* **In NumPy, we can obtain the transpose of a matrix using the <code style="background:yellow;color:black">.T attribute</code> or the <code style="background:yellow;color:black">numpy.transpose() function.</code>** 

* **T represents the transpose of any matrix in numpy. Transpose of any matrix is matrix rotated by 90 degress, rows will become columns and columns will become rows, sort of like a pivot. Transpose of a matrix is readily available in numpy.**

* Transpose is a fundamental linear algebra operation that is used for various purposes in mathematics, science, and engineering.

* **<code style="background:yellow;color:black">Why do we need Transpose?</code> We will do a fitbit case study to see the application of transpose.**

* In image processing, transposing a matrix can be used to perform various operations, such as rotating or flipping images.

* In linear regression and related statistical techniques, the transpose is used when calculating coefficients or solving the normal equations.

* If we have data with rows representing variables and columns representing observations, we can transpose it to switch the orientation for better analysis or visualization.

**Following are few examples illustrating transpose of matrices.**

In [21]:
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [22]:
b.T

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [23]:
b.T.shape

(4, 3)

In [24]:
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [25]:
# Creating an example matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Transposing the matrix using the .T attribute
transposed_matrix = matrix.T

print("Original matrix:")
print(matrix)

print("\nTransposed matrix:")
print(transposed_matrix)

Original matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Transposed matrix:
[[1 4 7]
 [2 5 8]
 [3 6 9]]


In [26]:
# Creating an example matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Transposing the matrix using numpy.transpose()
transposed_matrix = np.transpose(matrix)

print("Original matrix:")
print(matrix)

print("\nTransposed matrix:")
print(transposed_matrix)

Original matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Transposed matrix:
[[1 4 7]
 [2 5 8]
 [3 6 9]]


<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Indexing of a matrix:</code>

* **Indexing in 2D NumPy arrays involves specifying the row and column indices to access or manipulate individual elements or subsets of the array. The basic syntax is array[row_index, column_index].**

* **In the following example, the expression b[0][1] gives us the value located in the first row (index 0) and the second column (index 1) of the NumPy array "b".** 

* b[0][1] corresponds to the element at the intersection of the first row and the second column, which is 1. This is how we access a specific element in a multi-dimensional array in NumPy. **This type of indexing is called <code style="background:yellow;color:black">nested indexing or double indexing or chained indexing.</code>**

* **In the next code cell, the expression b[0, 1] refers to a particular indexing called <code style="background:yellow;color:black">concise indexing</code> and it only works with numpy arrays -> NOT WITH PYTHON LISTS.**

* **This is not much advantageous, actually its confusing. Although there is one small advantage of finding diagonal elements easily that we will discuss further below, there are other advantages of indexing shown below:** 

    * **<u>Performance:</u>** The concise indexing is often more efficient in terms of performance. When you use a[0, 1], NumPy can access the element directly, whereas with nested indexing, it may involve additional intermediate steps.
    * **<u>Multi-dimensional Arrays:</u>** When working with multi-dimensional arrays (more than two dimensions), concise indexing becomes essential for clarity and ease of use.
    * **<u>Broadcasting:</u>** NumPy supports broadcasting, which allows you to perform operations on arrays with different shapes. Concise indexing helps you understand and work with broadcasting rules more effectively.

In [27]:
b[0][1] 

1

In [28]:
b[0, 1] 

1

* **Lets suppose we have two numpy arrays called "c" and "d" like this code below.**

* Now, this is an old concept of previous lecture. We learnt that if we have a numpy array like "d", then what we can do is **inside of indexes we can provide another list of indexes and all of the corresponding elements in the array will be returned.**

* This is possible with a 1D np array. But what for a 2D np array- is this kind of thing possible? **<code style="background:yellow;color:black">YES. We will provide indexes in the form of a list of indexes of rows and cols. In other words we are asking numpy to pick ith and jth elements from the array.</code>**

* Inside a index, we are providing 2 sets of indexes- one for the rows and one for the columns. So, it will return an array containing diagonal elements of the "c" array. 

* So basically, it will pick 0,0th 1,1th 2,2th elements from the array and return the array of diagonal elements.

* **We can essentially pick ANY NUMBER of elements in a 2D array by providing concise indexes, unlike in this code b[0,1] where we get just one element.**

* Since this works with 1D lists, in which we just provide column numbers, it also works with 2D lists, in which we provide row and column both.

In [29]:
c = np.arange(9).reshape(3, 3)

c

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [30]:
d = np.arange(10)

d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
d[[6,7,8,1]]

array([6, 7, 8, 1])

In [32]:
c

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [33]:
c[[0,1,2], [0,1,2]] 

array([0, 4, 8])

**We can pick any elements, it doesnt have to be diagonal or any pattern always, it can be anything random.**

In [34]:
c[[0,2], [1,1]]

array([1, 7])

**This index is out of range, index should be available. So, out of bounds error.**

In [35]:
c[[0,2], [1,5]] 

IndexError: index 5 is out of bounds for axis 1 with size 3

**Shape mismatch error.**

In [36]:
c[[1,2,0], [1,2]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 

**Lets suppose we have a numpy array called test like this:**

In [37]:
test = np.arange(64).reshape(8,8)

test

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

* **What below code will do is this-** 
    * First step is len(test), which is 8. So we have range(8); and this will generate a list of numbers from 0 to 7.
    * So, it will essentially look like this- test[[0,1,2,3,4,5,6,7], [0,1,2,3,4,5,6,7]]. 
    * And this will give 0,0th  1,1th  2,2th  3,3th,  4,4th and so on to return the diagonal elements.

In [38]:
test[list(range(len(test))),list(range(len(test)))]

array([ 0,  9, 18, 27, 36, 45, 54, 63])

* **<code style="background:yellow;color:black">Fancy indexing or masking</code> also works on 2D numpy arrays; below code returns a 1D array.**

* Actually the elements, that is the result, that are picked do not fit in the 4 x 3 dimension array, which means the result should have a different dimension, but python cannot calculate both the dimensions, which means it simply outputs the result in a 1D array.

* CRUX of it is that working of fancy indexing on 2D arrays is similar to its working on 1D arrays. Its just that the result is always a 1D array.

In [39]:
test[test < 21]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

In [40]:
test

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

In [41]:
test < 21

array([[ True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False]])

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Slicing of a matrix:</code>

* **<code style="background:yellow;color:black">Slicing in 2D NumPy arrays</code> involves extracting a portion of the array by specifying ranges for both rows and columns. <code style="background:yellow;color:black">The basic syntax is array[start_row:end_row, start_column:end_column].</code>**

* Since we talk about indexing in 2D arrays, we will also talk about slicing in 2D arrays because its also important.

* Similar to how we can provide indexes to both rows and columns, we can also provide slices for both rows and columns.

* **<code style="background:yellow;color:black">By default slicing is done on rows.</code>**

In [43]:
a = np.arange(12).reshape(3, 4)

a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

* **In below code, slicing got us 0th row and 1st row, last "2" isnt included.**

* This slice got applied to rows and we got all columns. 

* **Means- when we have a 2D list, a slice means we want a subset of all the rows.**

* Or a slice can also mean that we want a subset of few columns and we want all the rows. This is well explained in notes.

* A slice can also mean that we dont want all the rows nor all the cols, we want something like 1st and 2nd row but only few columns, basically, some rows and some cols.

* **<code style="background:yellow;color:black">There can be all of the 3 possibilities when it comes to slicing as discussed above.</code>**

### <code style="background:yellow;color:black">First possibility:</code> 

**This is very easy, we just provide a single set of slice like the code below:**

In [44]:
a[:2] 

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

### <code style="background:yellow;color:black">Second possibility:</code> 

* If we write - a[:] - it will print all the data, we know this that starting element and ending element is taken by default and  prints everything in between.

* **What we can do is we add a comma next to colon, then we can add a slice on the number of columns. So, essentially we are saying that we want all the columns from 1 to 3, as shown below.**

* It returs all rows, and 1st column and 2nd column, and NOT 0th and 3rd column.

* **We can have any subset of rows and column elements that we want which is illustrated in next code cell.**

* Whenever we are working we 2D numpy arrays, which we will be working in real world 99% of the time, we have all of this added capabilities like indexing, slicing and all the we will study further that python lists do not actually provide. They do provide, but numpy has made that a little bit better, elevated the entire thing a little bit.

In [45]:
a[:, 1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [46]:
a[:2, 1:3]

array([[1, 2],
       [5, 6]])

### <code style="background:yellow;color:black">Third possibility:</code> 

**Third possibilty is adding step size or jump values, which also works when slicing 2D arrays.**

In [10]:
test

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

In [11]:
test[3:7:2, 1:6:3] 

array([[25, 28],
       [41, 44]])

In [12]:
a = np.array([1, 2, 3, 4, 5])

b = np.array([8, 7, 6])

**Pick everything after the 3rd element.**

In [13]:
a[3:]

array([4, 5])

**It will start from end point because jump is of 2, so "6" and "8" will be picked.**

In [14]:
b[::-2]

array([6, 8])

**Take the result of b[::-2] and add that in the position of a[3:].**

In [15]:
a[3:] = b[::-2]

**So we are replacing result of a[3:] that is 4,5 with result of b[::-2] that is 6,8 to finally get the array "a".**

In [16]:
a

array([1, 2, 3, 6, 8])

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Aggregate functions:</code>

* **<code style="background:yellow;color:black">Aggregate functions are basically mathematical operations.</code>** 
* If we have numpy imported, we wont have to import math module in 99% of the cases.**
* **Aggregate functions in Numpy are functions that operate on an array or a portion of an array to compute a single, summary statistic or value.** 
* These functions allow us to perform common mathematical and statistical operations on arrays, often summarizing the data in some way. Aggregate functions are particularly useful for data analysis and processing. 

**<code style="background:yellow;color:black">Some common aggregate functions in NumPy include:</code>**

- numpy.sum(): Computes the sum of all elements in an array or along a specified axis.

- numpy.mean(): Calculates the mean (average) of the elements in an array or along a specified axis.

- numpy.median(): Computes the median (middle value) of the elements in an array.

- numpy.min(): Returns the minimum value in an array.

- numpy.max(): Returns the maximum value in an array.

- numpy.var(): Computes the variance of the elements in an array, measuring the spread or dispersion of the data.

- numpy.std(): Calculates the standard deviation of the elements in an array, which is a measure of the data's dispersion around the mean.

- numpy.prod(): Computes the product of all elements in an array or along a specified axis.

- numpy.cumsum(): Calculates the cumulative sum of elements along a specified axis.

- numpy.cumprod(): Computes the cumulative product of elements along a specified axis.

- numpy.percentile(): Calculates the specified percentile value (e.g., 25th, 50th, or 75th percentile) of the data.

- numpy.histogram(): Generates a histogram of the data, returning the counts and bin edges.

- numpy.unique(): Returns the unique elements in an array, along with their counts.

- numpy.bincount(): Counts occurrences of non-negative integers in an array, returning a count for each integer.

- numpy.nanmean(), numpy.nanmedian(), and other "nan" functions: These are similar to their non-"nan" counterparts, but they ignore NaN (Not-a-Number) values in the computation.

**But these are there in python itself, so why are we studying and using these in numpy? Beacuse of what it converts to or what it leads to.**

In [47]:
a = np.arange(1, 11)

a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [48]:
np.sum(a)

55

In [49]:
np.mean(a)

5.5

In [50]:
np.min(a)

1

In [51]:
np.max(a)

10

In [52]:
a = np.arange(12).reshape(3, 4)

a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

* To implement row and col wise sum of elements in a python list it is a bit long code and cumbersome and we have to write  custom code in python 2D lists.

* **But numpy can do that with just one line of code which is so simple. Its very useful because in excel we create a lot of  grand totals of columns or rows frequently and numpy can automatically do that.**

* The code below will add all the elements.

In [53]:
np.sum(a)

66

**This code will sum all the columns. Basically in our array "a", axis = 0 means vertical axis. So it will go and add all elements of column 0, then add all elements of column 1 and so on and return a list of that arrangement.** 

In [54]:
np.sum(a, axis = 0)

array([12, 15, 18, 21])

**Similarly we can do row wise sum by specifying axis = 1.**

In [55]:
np.sum(a, axis = 1)

array([ 6, 22, 38])

In [56]:
np.mean(a, axis = 0)

array([4., 5., 6., 7.])

In [57]:
m = np.arange(16).reshape(4, 4)

m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

**How to get sum of all diagonal elements.**

In [65]:
n = m[[0, 1, 2, 3], [0, 1, 2, 3]]

np.sum(n)

30

<hr style="border: 1px solid gray;">

## <code style="background:yellow;color:black">diag() method:</code> 

**In NumPy, <code style="background:yellow;color:black">the diag() method</code> is used to extract the diagonal elements of a 2D array or construct a diagonal matrix from a 1D array. The behavior of the diag() method depends on the input provided. Here's an overview of its usage:**

### <code style="background:yellow;color:black">Extracting Diagonal Elements:</code>

* **If the input is a 2D array, diag() returns a 1D array containing the diagonal elements of the input array.** 
* We can also specify an offset to extract diagonals parallel to the main diagonal.
* **The k parameter, when specified, determines the diagonal offset. A positive k value corresponds to diagonals above the main diagonal, while a negative k value corresponds to diagonals below the main diagonal.**

In [63]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [61]:
np.diag(m)

array([ 0,  5, 10, 15])

In [62]:
np.diag(m, k = 1)

array([ 1,  6, 11])

In [64]:
np.diag(m, k = -1)

array([ 4,  9, 14])

### <code style="background:yellow;color:black">Constructing Diagonal Matrix:</code>

* **If the input is a 1D array, diag() constructs a 2D diagonal matrix with the elements of the input array on its main diagonal.**

In [66]:
# Creating a 1D array
arr_1d = np.array([1, 2, 3])

# Constructing a diagonal matrix
np.diag(arr_1d)

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">Logical operations:</code>

* **Logical operations in Numpy are operations that allow us to perform element-wise comparisons between elements in Numpy arrays. These operations return Boolean arrays with True and False values, where each value represents the result of the element-wise comparison.** 

* **There are 2 basic logical operations that we will study - .any() and .all()**

* Lets say, we have a budget, means we can only purchase an item upto 30 rupees. Overall task is to check if there is any value in the array with price less than or equal to 30. Returns True if the condition is True for even a single value. So, usually what we would do is we would write a loop, we will compare every value with 30, if we get a value which is less than or equal to 30, we will set a flag to true and then we will return that.

* "can_afford" is our flag. What np.any() function will do is it will check the condition inside it and compare it with array elements, and if its true for ANY OF THE VALUES, then our "can_afford" flag will become true.

* Lets suppose we are doing a comparison in which we are checking for multiple values and even if a single value satisfies the condition and we are ok in moving ahead or that is what we want, then we can use np.all() function. Its sort of like an OR operation.

* **In NumPy, the <code style="background:yellow;color:black">np.any() and np.all() functions</code> are used to check whether any or all elements of a given array satisfy a particular condition, respectively.**

    * **<code style="background:yellow;color:black">np.any():</code>** np.any() returns True if at least one element in the input array satisfies the specified condition, and False if all elements fail to meet the condition. It takes an array as its argument and an optional axis parameter to specify along which axis to check for the condition. If the axis is not specified, it checks the entire array.

    * **<code style="background:yellow;color:black">np.all():</code>** np.all() returns True only if all elements in the input array satisfy the specified condition, and False if any element fails to meet the condition. Like np.any(), it also takes an optional axis parameter to specify along which axis to check for the condition.

In [67]:
prices = np.array([50, 45, 25, 20, 35])

In [68]:
can_afford = np.any(prices <= 30)

can_afford

True

In [69]:
task_completion = np.array([1, 1, 1, 1, 1, 1, 0])

* **What all() method will do is it will check the condition inside it and compare it with array elements, and if its true for ALL OF THE VALUES, then our "can_go_out_play" flag will become true.**

* Returns True if all the values satisfy the condition. Its sort of like an AND operation.

In [70]:
can_go_out_play = np.all(task_completion == 1)

can_go_out_play

False

In [71]:
a = np.array([1, 4, 3, 2])

b = np.array([2, 2, 3, 2])

c = np.array([6, 4, 4, 5])

* **Below is a way where we can write multiple conditions or multiple np arrays in all() method. Here all the conditions element wise must satisfy for all() method to return true.**

In [72]:
((a <= b) & (b <= c)).all()

False

In [73]:
a = np.array([-3, 4, 27, 34, -2, 0, -45, -11, 4, 0])

a

array([ -3,   4,  27,  34,  -2,   0, -45, -11,   4,   0])

* Setting values for elements selected via fancy indexing.

* We can pick these elements and change the elements at the same time with the given conditions.

* Here the condition is if element is greater than 0 pick and replace it with 1, and element is less than 0 pick and replace it with -1, and let 0 be 0.

In [74]:
a[a > 0] = 1

a[a < 0] = -1

a

array([-1,  1,  1,  1, -1,  0, -1, -1,  1,  0])

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">where() function:</code>

* **In NumPy, <code style="background:yellow;color:black">the where() function</code> is used to return the indices of elements in an array where a specified condition is true. It can be particularly useful for extracting elements or performing operations based on certain criteria.**

* **<code style="background:yellow;color:black">The basic syntax of np.where() is: numpy.where(condition[, x, y])</code>**
    * **condition:** A boolean array indicating where the specified condition is true.
    * **x, y (optional):** Values to be used for elements where the condition is true or false, respectively. If not provided, where() returns indices.<br><br>

* **The <code style="background:yellow;color:black">where() function</code> returns a new array with elements from x where the condition is True and elements from y where the condition is False. The resulting array has the same shape as the input arrays, x and y.**

* **<code style="background:yellow;color:black">Format of np.where() -- np.where(condition, value_if_true, value_if_false)</code>**<br><br>

* Below, we want to check that if price of product is greater than 50, we want to give a discount of 10% on top of that value. So, we want to update the prices after adding the discount.

* So, with this **where() function**, we can modify all the elements in our np array based on a given condition.

In [75]:
prices = np.array([45, 55, 60, 30, 75, 20, 100, 90])

In [76]:
discounted_prices = np.where(prices > 50, prices * 0.9, prices)

discounted_prices

array([45. , 49.5, 54. , 30. , 67.5, 20. , 90. , 81. ])

<hr style="border: 1px solid gray;">

# <code style="background:yellow;color:black">FITBIT CASE STUDY:</code>

In [2]:
!gdown https://drive.google.com/uc?id=1vk1Pu0djiYcrdc85yUXZ_Rqq2oZNcohd

Downloading...
From: https://drive.google.com/uc?id=1vk1Pu0djiYcrdc85yUXZ_Rqq2oZNcohd
To: C:\Users\admin\Desktop\Scaler\Module 5- Python Libraries\My Practice\fit.txt

  0%|          | 0.00/3.43k [00:00<?, ?B/s]
100%|##########| 3.43k/3.43k [00:00<?, ?B/s]


In [3]:
data = np.loadtxt("fit.txt", dtype = "str")

In [4]:
data[:5]

array([['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
       ['07-10-2017', '6041', 'Sad', '197', '8', 'Inactive'],
       ['08-10-2017', '25', 'Sad', '0', '5', 'Inactive'],
       ['09-10-2017', '5461', 'Sad', '174', '4', 'Inactive'],
       ['10-10-2017', '6915', 'Neutral', '223', '5', 'Active']],
      dtype='<U10')

In [5]:
data.shape

(96, 6)

In [6]:
data[0]

array(['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
      dtype='<U10')

In [7]:
# Approach 1

data[:, 0] 

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017', '11-10-2017', '12-10-2017', '13-10-2017',
       '14-10-2017', '15-10-2017', '16-10-2017', '17-10-2017',
       '18-10-2017', '19-10-2017', '20-10-2017', '21-10-2017',
       '22-10-2017', '23-10-2017', '24-10-2017', '25-10-2017',
       '26-10-2017', '27-10-2017', '28-10-2017', '29-10-2017',
       '30-10-2017', '31-10-2017', '01-11-2017', '02-11-2017',
       '03-11-2017', '04-11-2017', '05-11-2017', '06-11-2017',
       '07-11-2017', '08-11-2017', '09-11-2017', '10-11-2017',
       '11-11-2017', '12-11-2017', '13-11-2017', '14-11-2017',
       '15-11-2017', '16-11-2017', '17-11-2017', '18-11-2017',
       '19-11-2017', '20-11-2017', '21-11-2017', '22-11-2017',
       '23-11-2017', '24-11-2017', '25-11-2017', '26-11-2017',
       '27-11-2017', '28-11-2017', '29-11-2017', '30-11-2017',
       '01-12-2017', '02-12-2017', '03-12-2017', '04-12-2017',
       '05-12-2017', '06-12-2017', '07-12-2017', '08-12

In [8]:
data.T[0]

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017', '11-10-2017', '12-10-2017', '13-10-2017',
       '14-10-2017', '15-10-2017', '16-10-2017', '17-10-2017',
       '18-10-2017', '19-10-2017', '20-10-2017', '21-10-2017',
       '22-10-2017', '23-10-2017', '24-10-2017', '25-10-2017',
       '26-10-2017', '27-10-2017', '28-10-2017', '29-10-2017',
       '30-10-2017', '31-10-2017', '01-11-2017', '02-11-2017',
       '03-11-2017', '04-11-2017', '05-11-2017', '06-11-2017',
       '07-11-2017', '08-11-2017', '09-11-2017', '10-11-2017',
       '11-11-2017', '12-11-2017', '13-11-2017', '14-11-2017',
       '15-11-2017', '16-11-2017', '17-11-2017', '18-11-2017',
       '19-11-2017', '20-11-2017', '21-11-2017', '22-11-2017',
       '23-11-2017', '24-11-2017', '25-11-2017', '26-11-2017',
       '27-11-2017', '28-11-2017', '29-11-2017', '30-11-2017',
       '01-12-2017', '02-12-2017', '03-12-2017', '04-12-2017',
       '05-12-2017', '06-12-2017', '07-12-2017', '08-12

In [9]:
data_t = data.T

In [10]:
data_t.shape

(6, 96)

In [11]:
date, step_count, mood, calories_burned, hours_of_sleep, activity_status = data_t

In [12]:
step_count

array(['5464', '6041', '25', '5461', '6915', '4545', '4340', '1230', '61',
       '1258', '3148', '4687', '4732', '3519', '1580', '2822', '181',
       '3158', '4383', '3881', '4037', '202', '292', '330', '2209',
       '4550', '4435', '4779', '1831', '2255', '539', '5464', '6041',
       '4068', '4683', '4033', '6314', '614', '3149', '4005', '4880',
       '4136', '705', '570', '269', '4275', '5999', '4421', '6930',
       '5195', '546', '493', '995', '1163', '6676', '3608', '774', '1421',
       '4064', '2725', '5934', '1867', '3721', '2374', '2909', '1648',
       '799', '7102', '3941', '7422', '437', '1231', '1696', '4921',
       '221', '6500', '3575', '4061', '651', '753', '518', '5537', '4108',
       '5376', '3066', '177', '36', '299', '1447', '2599', '702', '133',
       '153', '500', '2127', '2203'], dtype='<U10')

In [13]:
# Can you figure out if there is a correlation between step_count and mood

Can we say that in all of these individual arrays date, step_count, mood, calories_burned, hours_of_sleep and activity_status, the 0th element is data for 1st person, the 1st element is the data for 2nd person, 2nd element is the data for 3rd person and so on? So, looking at same indexes in all of these arrays, I will get data that belongs to one row because i have taken a transpose of that data.

Lets suppose I create a mask on mood array. Now in mood data, mood is neutral, sad and happy. I create a mask on mood by this - mood == "Happy". This will return a bunch of true and false values. I can create a mask and then I can apply the mask to an array to get filtered data. This is what we so in masking - we create a mask, apply a condition on it and then apply that mask on array to get filtered data. 

NOW, I CAN CREATE A MASK USING MOOD, BUT APPLY IT ON STEP_COUNT. So, 1st element of mood is neutral and 1st element of step_count is 5464. So, in mask of mood == "Happy" the first element is False, which means the 1st person is not happy because of sthe step count he took i.e. 5464. So, we have created a mask of some array and applied to some other array.

In [14]:
mood

array(['Neutral', 'Sad', 'Sad', 'Sad', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Sad', 'Sad', 'Sad', 'Sad', 'Happy', 'Sad', 'Sad', 'Sad', 'Sad',
       'Neutral', 'Neutral', 'Neutral', 'Neutral', 'Neutral', 'Neutral',
       'Happy', 'Neutral', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Neutral', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Neutral',
       'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Neutral', 'Sad', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Happy', 'Happy', 'Sad', 'Neutral', 'Neutral',
       'Sad', 'Sad', 'Neutral', 'Neutral', 'Happy', 'Neutral', 'Neutral',
       'Sad', 'Neutral', 'Sad', 'Neutral', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Sad', 'Happy', 'Neutral', 'Happy', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Neutral', 'Neutral', 'Sad', 'Sad', 'Happy', 'Neutral', 'Neutral',
       'Happy'], dtype='<U10')

In [16]:
# Creating a mask on mood array

mood == "Happy"

array([False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False,  True, False,  True,  True,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False, False, False, False, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False,  True, False, False, False, False, False, False,
       False, False,  True, False, False,  True])

In [17]:
# Converting all str values to ints in step_count array.

step_count = np.array(step_count, dtype = "int")

In [18]:
# Here I am applying a mask generated on mood array to step_count array.

step_count_happy = step_count[mood == "Happy"]

In [19]:
# So these were the step counts of all those people who were happy.

step_count_happy

array([4732,  330, 4550, 4435, 4779, 1831, 2255,  539, 5464, 4068, 4683,
       4033, 6314,  614, 3149, 4005, 4880, 4136,  705,  269, 4275, 5999,
       4421, 6930, 5195,  546,  493,  995, 3608,  774, 1421, 4064, 2725,
       5934, 1867, 7422, 5537, 5376,  153, 2203])

In [20]:
step_count_sad = step_count[mood == "Sad"]

In [21]:
# So these were the step counts of all those people who were sad.

step_count_sad

array([6041,   25, 5461, 4545, 4340, 1230,   61, 1258, 3148, 4687, 3519,
       1580, 2822,  181, 6676, 3721, 1648,  799, 1696,  221, 4061,  651,
        753,  518,  177,   36,  299,  702,  133])

In [22]:
step_count_neutral = step_count[mood == "Neutral"]

In [23]:
# So these were the step counts of all those people who were neutral.

step_count_neutral

array([5464, 6915, 3158, 4383, 3881, 4037,  202,  292, 2209, 6041,  570,
       1163, 2374, 2909, 7102, 3941,  437, 1231, 4921, 6500, 3575, 4108,
       3066, 1447, 2599,  500, 2127])

In [24]:
# Here we find the mean of all 3. So, from analysis, we fins that if the step count is higher the mood is happy or vice versa.

step_count_happy.mean(), step_count_sad.mean(), step_count_neutral.mean()

(3392.725, 2103.0689655172414, 3153.777777777778)