
> ### **Assignment 2 - Numpy Array Operations** 
>
> This assignment is part of the course ["Data Analysis with Python: Zero to Pandas"](http://zerotopandas.com). The objective of this assignment is to develop a solid understanding of Numpy array operations. In this assignment you will:
> 
> 1. Pick 5 interesting Numpy array functions by going through the documentation: https://numpy.org/doc/stable/reference/routines.html 
> 2. Run and modify this Jupyter notebook to illustrate their usage (some explanation and 3 examples for each function). Use your imagination to come up with interesting and unique examples.
> 3. Upload this notebook to your Jovian profile using `jovian.commit` and make a submission here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-2-numpy-array-operations
> 4. (Optional) Share your notebook online (on Twitter, LinkedIn, Facebook) and on the community forum thread: https://jovian.ml/forum/t/assignment-2-numpy-array-operations-share-your-work/10575 . 
> 5. (Optional) Check out the notebooks [shared by other participants](https://jovian.ml/forum/t/assignment-2-numpy-array-operations-share-your-work/10575) and give feedback & appreciation.
>
> The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.
>
> Try to give your notebook a catchy title & subtitle e.g. "All about Numpy array operations", "5 Numpy functions you didn't know you needed", "A beginner's guide to broadcasting in Numpy", "Interesting ways to create Numpy arrays", "Trigonometic functions in Numpy", "How to use Python for Linear Algebra" etc.
>
> **NOTE**: Remove this block of explanation text before submitting or sharing your notebook online - to make it more presentable.


# 5 Must-know Numpy functions for Data Scientists


### Subtitle Here

**Numpy** is a library for Python programming language which is used for **multi dimentaional array manipulations**. Traditional list operations in Python are slow but **numpy operations** are **50 times faster** and hence it is a very popular library for Data Science and large datsets manipulations. Internally **numpy uses C++** for operations which improves the overall performance.

- **numpy.mean**
- **numpy.reshape**
- **numpy.transpose**
- **numpy.count_nonzero**
- **numpy.std**

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

In [52]:
!pip install jovian --upgrade -q

In [53]:
import jovian

In [54]:
jovian.commit(project='numpy-array-operations')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ar-puuk/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/ar-puuk/numpy-array-operations[0m


'https://jovian.ai/ar-puuk/numpy-array-operations'

Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [55]:
import numpy as np

In [56]:
# List of functions explained 
function1 = np.average  # arithmetic mean
function2 = np.reshape # create a new shape to an array
function3 = np.transpose # transpose input array
function4 = np.count_nonzero # counts the number of non-zero values
function5 = np.std # compute standard deviation

## Function 1 - np.mean

**np.mean** is a function to calculate a **arithmetic mean** of an **ndarray** along the specified axis.

**np.mean** takes the following parameters and returns an **ndarray** output:

numpy.mean(a, axis=None, dtype=None, out=None, keepdims=\<no value>, *, where=\<no value>)

- input **aparray**
- **axis** *(optional)*: Axis or axes along which the means are computed, **int** or **tuple**
- **dtype** *(optional)*: Type to use in computing the mean, **data type**
- **out** *(optional)*: Alternate output array in which to place the result, **aparray**
- **keepdims** *(optional)*: Keep the resulting array with one dimension, **bool**
- **where** *(optional)*: Elements to include in the mean, **bool**

In [57]:
# Example 1 - working

# Create an array of random values
func1_arr1 = np.random.random(10)

# Calculate the mean of the array
np.mean(func1_arr1)

0.42933527385333886

The **mean** variable will contain the mean of the elements in the **func1_arr1** array.

In [58]:
# Example 2 - working

# Create a 2x3 array of random values
func1_arr2 = np.random.random((2, 3))

# Calculate the mean of the elements along the rows (axis=0)
np.mean(func1_arr2, axis=0)

array([0.52788756, 0.3180774 , 0.77851811])

The **np.mean** also takes the axis parameter to calculate the **mean** along a particular axis of a multi-dimensional array.
The above code calculates the mean of each column, resulting in a 1D array with 3 elements.

In [59]:
# Example 3 - breaking (to illustrate when it breaks)

# Create an array with a string element
func1_arr3 = np.array([1, 2, 'hello'])

# Calculate the mean of the array
np.mean(func1_arr3)

TypeError: cannot perform reduce with flexible type

The **np.mean** function will fail if you try to pass it a Python object that is not an array or if the array contains elements that cannot be averaged (e.g., strings or dictionaries). The above code raises a **TypeError** because **func1_arr3** contains a string.

To fix this error, we can either remove the string element from the array or use a different function to calculate the mean (e.g., np.mean has a **dtype** parameter that allows us to specify the type of the output).

```python
# Remove the string element from the array
func1_arr3 = func1_arr3[func1_arr3 != 'hello']

# or,

# Calculate the mean of the array, casting the elements to floats
np.mean(func1_arr3, dtype=float)
```


Some closing comments about when to use this function.

In [60]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ar-puuk/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/ar-puuk/numpy-array-operations[0m


'https://jovian.ai/ar-puuk/numpy-array-operations'

## Function 2 - np.reshape

**np.reshape** is a function in the NumPy library that can be used to reshape an array to a new shape.

**np.reshape** takes following parameters to create an **nparray** output:
- input **nparray** to be reshaped
- the **shape** of desired array

In [61]:
# Example 1 - working

# Create a 1D array with 8 elements
func2_arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# Reshape the array to a 4x2 array
np.reshape(func2_arr1, (4, 2))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

**np.reshape** function creates a new array with shape (4, 2) and the same elements as **func2_arr1**.

In [62]:
# Example 2 - working

# Create a 1D array with 8 elements
func2_arr2 = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# Reshape the array to a 2x-1 array (the second dimension will be inferred as 4)
np.reshape(func2_arr2, (2, -1))

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In the above example, We can also use the **-1** placeholder to specify that a dimension should be inferred from the length of the array and the other dimensions.

In [63]:
# Example 3 - breaking (to illustrate when it breaks)

# Create a 1D array with 8 elements
func2_arr3 = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# Try to reshape the array to a 3x3 array (incompatible shape)
np.reshape(func2_arr3, (3, 3))

ValueError: cannot reshape array of size 8 into shape (3,3)

The **np.reshape** function will fail if the new shape is not compatible with the number of elements in the array. In the above example, this code raised a **ValueError** because the output array has 3 elements and the input array has 8 elements.

To fix this error, we can either change the shape to a compatible size (e.g., (2, 4) or (4, 2)) or use the -1 placeholder to specify that a dimension should be inferred from the length of the array and the other dimensions.

```python
# Reshape the array to a 2x4 array (compatible shape)
np.reshape(func2_arr3, (2, 4))

# OR,

# Reshape the array to a 2x-1 array (the second dimension will be inferred as 4)
np.reshape(func2_arr3, (2, -1))
```

**np.mean** is a very quick way to manipulate the mean of a large nparray.

In [None]:
jovian.commit()

## Function 3 - np.transpose

The **np.transpose** function is used to return the **transposed input array**.
<br>**np.transpose** function takes following parameter to return an **ndarray** output:
- input **nparray**
- **a tuple of axes** *(optional)*

In [64]:
# Example 1 - working

func3_arr1 = np.array([[1, 2], 
        [3, 4.]])

np.transpose(func3_arr1)

array([[1., 3.],
       [2., 4.]])

The **np.transpose** function takes **nparray func1_arr1 as input** and returns the **transposed nparray**

In [65]:
# Example 2 - working

func3_arr2 = np.ones((2, 3, 1))

np.transpose(func3_arr2, (1, 2, 0))

array([[[1., 1.]],

       [[1., 1.]],

       [[1., 1.]]])

The **np.transpose** function takes **nparray func1_arr2 and the axes in a tuple as inputs** and returns the **transposed nparray**. The axes are re-arranged as per the tuple parameter. 

In [66]:
# Example 3 - breaking (to illustrate when it breaks)

func3_arr3 = [[1, 2], 
        [3, 4.]]

np.transpose(func3_arr3, (1, 2, 0))

ValueError: axes don't match array

The **np.transpose** function takes **nparray func1_arr3 and the axes in a tuple as inputs** and returns the **transposed nparray**. The axes are re-arranged as per the tuple parameter.

The function **breaks** because the **func1_arr3 has only 2 axes** but the **np.transpose has input of 3 axes** in the tuple. Changing the axes **from 3 to 2** will fix this issue. 

**np.transpose** can be used to quickly swap the axes or dimenions of a nparray.

In [67]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ar-puuk/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/ar-puuk/numpy-array-operations[0m


'https://jovian.ai/ar-puuk/numpy-array-operations'

## Function 4 - np.count_nonzero

**np.count_nonzero** counts the number of non-zero values in the array.
<br>**np.count_nonzero** function takes following parameters to return count as an **int** output:
- input **nparray**
- **axis** *(optional)*: Axis along which the count should be computed.

In [68]:
# Example 1 - working

func4_arr1 = np.array([[10, 7, 4],
                      [8, 12., 0.]])

np.count_nonzero(func4_arr1)

5

Since the **np.count_nonzero** function was not provided with the **axis parameter**, the **default behaviour** is to compute **count along a flattened version of the aaray**.

In [69]:
# Example 2 - working

func4_arr2 = np.array([[10, 7, 4],
                      [8, 12., 0.]])

np.count_nonzero(func4_arr2, axis = 0)

array([2, 2, 1])

Since the **np.count_nonzero** function was provided with the **axis parameter as 0**, the **count is computed along axis 1 of the aaray**.

In [70]:
# Example 3 - breaking (to illustrate when it breaks)

func4_arr3 = np.array([[10, 7, 4],
                      [8, 12., 0.]])

np.count_nonzero(func4_arr3, axis = 2)

AxisError: axis 2 is out of bounds for array of dimension 2

The **np.count_nonzero** function was provided with the **axis parameter as 2** but the **input array does not have a 3rd axis** and hence the **error**.
<br>The error **can be fixed** by **changing the axis parameter from 2 to either 0 or 1**.

**np.count_nonzero** is a very quick way to manipulate the count of non-zero elements of a large **nparray**.

In [None]:
jovian.commit()

## Function 5 - np.std

**np.std** is a function to **compute standard deviation** of the **ndarray**.
<br>The **np.std** function takes following parameters to return an **ndarray** output:
- input **nparray**
- **axis** *(optional)*: Axis along which the standard deviation should be computed
- **dtype** *(optional)*: type to compute standard deviation
- **out ndarray** *(optional)*: Shape same as input array which can be used to place the result

In [71]:
# Example 1 - working

func5_arr1 = np.array([[10, 7, 4], 
              [3, 2, 1]])

np.std(func5_arr1)

3.095695936834452

Since the **np.std** function was not provided with the **axis parameter**, the **default behaviour** is to compute **standard deviation along a flattened version of the array**.

In [72]:
# Example 2 - working

func5_arr2 = np.array([[10, 7, 4], 
              [3, 2, 1]])

np.std(func5_arr2, axis = 0)

array([3.5, 2.5, 1.5])

Since the **np.std** function was provided with the **axis parameter as 0**, the **standard variance is computed along axis 1 of the aaray**.

In [73]:
# Example 3 - breaking (to illustrate when it breaks)

func5_arr3 = np.array([[10, 7, 4], 
              [3, 2, 1]])

np.median(func5_arr3, axis = 2)

AxisError: axis 2 is out of bounds for array of dimension 2

The **np.std** function was provided with the **axis parameter as 2** but the **input array does not have a 3rd axis** and hence the **error**.
<br>The error **can be fixed** by **changing the axis parameter from 2 to either 0 or 1**.

**np.std** is a very quick way to manipulate the standard deviation of a large nparray.

In [74]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ar-puuk/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/ar-puuk/numpy-array-operations[0m


'https://jovian.ai/ar-puuk/numpy-array-operations'

## Conclusion

- In this assignment, we have covered 5 numpy functions that can be used in array manipulations and statistics.
- There are so many other important numpy functions which needs to be practiced.

## Reference Links
Provide links to your references and other interesting articles about Numpy arrays:
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
* ...

In [75]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ar-puuk/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/ar-puuk/numpy-array-operations[0m


'https://jovian.ai/ar-puuk/numpy-array-operations'