## Exercise  09:  NumPy practice

The objective of this exercise is to practice your NumPy skills.

In [6]:
import numpy as np

### Counting zeros

For a 1-d array $x$, we'll define its `number_of_zeros` as the number of elements in the array that are equal to zero.
For example, for the array 

```Python
[1, 5, 0, 6, 0, 1]
```

The `number_of_zeros` is equal to 2.

We can apply `number_of_zeros` to a matrix $X$ (i.e. a 2-d array).
The definition can be applied either to the columns or rows of the matrix, resulting in an array of `number_of_zeros` values for each column/row.  Your task is to write a function that computes `number_of_zeros` for a 2-d array.  You may not use the NumPy functions count_nonzero, nonzero, and argwhere.

For example, for the matrix
```Python
2 0 3 0
0 0 1 5
0 0 0 6
```

when applied to the columns, the result should be an array that contains the numbers

```Python
2 3 1 1
```

when applied to the rows the result should be an array that contains the numbers

```Python
2 2 3
```

Fill in the following function for computing `number_of_zeros`.  The axis  argument should control whether the operation is performed on columns or rows.

In addition to writing the function, write code that tests its correctness, i.e. compares its output to a result you know is correct, returning True/False on whether it matches that correct output.

In [381]:
def number_of_zeros(X, axis = 0):
    zeros = np.zeros(X.shape)
    temp = X == 0
    zeros[temp == True] = 1
    
    return np.sum(zeros, axis)

def t_or_f(arr1, arr2):
    return np.array_equal(arr1, arr2)

In [383]:
# test your code here
# your testing should verify that the code works correctly, i.e.
# will return a True/False on whether it matches a result you know
# is correct
test = np.array([[0,1,2,3],[4,0,5,0],[6,7,8,0]])
axis0 = number_of_zeros(test)
axis1 = number_of_zeros(test, 1)
axis0_result = np.array([1,1,0,2], dtype=np.float64)
axis1_result = np.array([1,2,1], dtype=np.float64)
print("Test array:\n", test)
print("\nAxis 0 result: ", axis0_result, ", ", t_or_f(axis0, axis0_result), 
      "\nAxis1 result: ", axis1_result, ", ", t_or_f(axis1, axis1_result))

Test array:
 [[0 1 2 3]
 [4 0 5 0]
 [6 7 8 0]]

Axis 0 result:  [1. 1. 0. 2.] ,  True 
Axis1 result:  [1. 2. 1.] ,  True


### Removing sparse columns

Write a function that removes sparse columns from a 2-d array.
We will define a sparse column as a column that contains mostly zeros, and more specifically, the number of zeros is at least 90% of the entries in the column.  For example, if we apply this to the matrix

```Python
2 0 3 0
0 0 1 5
0 0 0 6
```

The second column would be removed.
You can use the `number_of_zeros` function you just wrote to help you in this task.

As in the previous problem, you also need to write code to test whether your function works correctly by comparing its output to a case where you know the correct solution.

In [385]:
def remove_sparse_columns(X):
    temp = number_of_zeros(X)
    temp[temp == 0] = True
    temp[temp == len(X)] = False
    temp = temp.astype(bool)
    return X[:, temp]

In [387]:
# test your code here
# your testing should verify that the code works correctly, i.e.
# will return a True/False on whether it matches a result you know
# is correct

sparse = np.array([[0,1,2,0],[4,0,5,0],[6,7,8,0]])
sparse_result = np.array([[0,1,2],[4,0,5],[6,7,8]])
removed = remove_sparse_columns(sparse)

print("Original array:\n", sparse, "\n\nNew array:\n", removed)
print("\nRemoved result: ", t_or_f(sparse_result, removed))

Original array:
 [[0 1 2 0]
 [4 0 5 0]
 [6 7 8 0]] 

New array:
 [[0 1 2]
 [4 0 5]
 [6 7 8]]

Removed result:  True


### Replacing NaN's with zeros

You are given a feature matrix that has some NaN values.  Write a function that creates a new matrix in which all the NaN values are replaced with zeros.


In [365]:
# your code here
def nan_to_zero(X):
    new = X.copy()
    new[np.isnan(new)] = 0
    return new

In [379]:
# write code that verifies that there are no NaN values in the matrix
# returned by your function
from numpy.random import default_rng
rng = default_rng()  
x = rng.integers(1,9, size=(10,10))
x = np.where(x == rng.integers(1,9), np.nan, x)
y = nan_to_zero(x)

print("Before NaN removal:\n", x, "\nAfter NaN removal:\n", y)

Before NaN removal:
 [[ 1.  7.  6.  7. nan  6.  7.  7. nan nan]
 [ 3.  4. nan  3.  5.  7.  8. nan  3.  5.]
 [ 6.  4. nan  4.  5.  8.  8.  7.  1.  5.]
 [ 6.  3.  7.  7.  6.  5.  8.  5.  1.  7.]
 [ 4.  6.  7.  1.  5.  5.  8.  4.  7.  5.]
 [nan  5.  3.  7.  6. nan  5.  8.  1.  5.]
 [ 4.  4.  4. nan  1.  3. nan  1.  8.  6.]
 [nan nan  6.  5.  6.  5.  8.  1.  8.  1.]
 [ 3.  3.  3.  8.  1.  3.  7.  5.  3.  5.]
 [ 1. nan  4. nan  8.  8. nan  4.  7.  7.]] 
After NaN removal:
 [[1. 7. 6. 7. 0. 6. 7. 7. 0. 0.]
 [3. 4. 0. 3. 5. 7. 8. 0. 3. 5.]
 [6. 4. 0. 4. 5. 8. 8. 7. 1. 5.]
 [6. 3. 7. 7. 6. 5. 8. 5. 1. 7.]
 [4. 6. 7. 1. 5. 5. 8. 4. 7. 5.]
 [0. 5. 3. 7. 6. 0. 5. 8. 1. 5.]
 [4. 4. 4. 0. 1. 3. 0. 1. 8. 6.]
 [0. 0. 6. 5. 6. 5. 8. 1. 8. 1.]
 [3. 3. 3. 8. 1. 3. 7. 5. 3. 5.]
 [1. 0. 4. 0. 8. 8. 0. 4. 7. 7.]]
