# Welcome to this Kernel

* This kernel if a compilation of 70 Numpy exercises with solutions from this webpage:

https://www.machinelearningplus.com/python/101-numpy-exercises-python/

## <span style="color:green">* If you want to learn **sklearn** check this kernel with tricks and tips:</span>

https://www.kaggle.com/python10pm/sklearn-24-best-tips-and-tricks

<a id='table_of_contents'></a>
# Table of contents

[Imports and helper functions](#imports)

[1. Import numpy as np and see the version](#q1)

[2. How to create a 1D array?](#q2)

[3. How to create a boolean array?](#q3)

[4. How to extract items that satisfy a given condition from 1D array?](#q4)

[5. How to replace items that satisfy a condition with another value in numpy array?](#q5)

[6. How to replace items that satisfy a condition without affecting the original array?](#q6)

[7. How to reshape an array?](#q7)

[8. How to stack two arrays vertically?](#q8)

[9. How to stack two arrays horizontally?](#q9)

[10. How to generate custom sequences in numpy without hardcoding?](#q10)

[11. How to get the common items between two python numpy arrays?](#q11)

[12. How to remove from one array those items that exist in another?](#q12)

[13. How to get the positions where elements of two arrays match?](#q13)

[14. How to extract all numbers between a given range from a numpy array?](#q14)

[15. How to make a python function that handles scalars to work on numpy arrays?](#q15)

[16. How to swap two columns in a 2d numpy array?](#q16)

[17. How to swap two rows in a 2d numpy array?](#q17)

[18. How to reverse the rows of a 2D array?](#q18)

[19. How to reverse the columns of a 2D array?](#q19)

[20. How to create a 2D array containing random floats between 5 and 10?](#q20)

[21. How to print only 3 decimal places in python numpy array?](#q21)

[22. How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?](#q22)

[23. How to limit the number of items printed in output of numpy array?](#q23)

[24. How to print the full numpy array without truncating](#q24)

[25. How to import a dataset with numbers and texts keeping the text intact in python numpy?](#q25)

[26. How to extract a particular column from 1D array of tuples?](#q26)

[27. How to convert a 1d array of tuples to a 2d numpy array?](#q27)

[28. How to compute the mean, median, standard deviation of a numpy array?](#q28)

[29. How to normalize an array so the values range exactly between 0 and 1?](#q29)

[30. How to compute the softmax score?](#q30)

[31. How to find the percentile scores of a numpy array?](#q31)

[32. How to insert values at random positions in an array?](#q32)

[33. How to find the position of missing values in numpy array?](#q33)

[34. How to filter a numpy array based on two or more conditions?](#q34)

[35. How to drop rows that contain a missing value from a numpy array?](#q35)

[36. How to find the correlation between two columns of a numpy array?](#q36)

[37. How to find if a given array has any null values?](#q37)

[38. How to replace all missing values with 0 in a numpy array?](#q38)

[39. How to find the count of unique values in a numpy array?](#q39)

[40. How to convert a numeric to a categorical (text) array?](#q40)

[41. How to create a new column from existing columns of a numpy array?](#q41)

[42. How to do probabilistic sampling in numpy?](#q42)

[43. How to get the second largest value of an array when grouped by another array?](#q43)

[44. How to sort a 2D array by a column](#q44)

[45. How to find the most frequent value in a numpy array?](#q45)

[46. How to find the position of the first occurrence of a value greater than a given value?](#q46)

[47. How to replace all values greater than a given value to a given cutoff?](#q47)

[48. How to get the positions of top n values from a numpy array?](#q48)

[49. How to compute the row wise counts of all possible values in an array?](#q49)

[50. How to convert an array of arrays into a flat 1d array?](#q50)

[51. How to generate one-hot encodings for an array in numpy?](#q51)

[52. How to create row numbers grouped by a categorical variable?](#q52)

[53. How to create groud ids based on a given categorical variable?](#q53)

[54. How to rank items in an array using numpy?](#q54)

[55. How to rank items in a multidimensional array using numpy?](#q55)

[56. How to find the maximum value in each row of a numpy array 2d?](#q56)

[57. How to compute the min-by-max for each row for a numpy array 2d?](#q57)

[58. How to find the duplicate records in a numpy array?](#q58)

[59. How to find the grouped mean in numpy?](#q59)

[60. How to convert a PIL image to numpy array?](#q60)

[61. How to drop all missing values from a numpy array?](#q61)

[62. How to compute the euclidean distance between two arrays?](#q62)

[63. How to find all the local maxima (or peaks) in a 1d array?](#q63)

[64. How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?](#q64)

[65. How to find the index of n'th repetition of an item in an array](#q65)

[66. How to convert numpy's datetime64 object to datetime's datetime object?](#q66)

[67. How to compute the moving average of a numpy array?](#q67)

[68. How to create a numpy array sequence given only the starting point, length and the step?](#q68)

[69. How to fill in missing dates in an irregular series of numpy dates?](#q69)

[70. How to create strides from a given 1D array?](#q70)

<a id='imports'></a>
# Imports and helper functions

[Go back to the table of contents](#table_of_contents)

In [1]:
# Allow several prints in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# importing the core library
import numpy as np

# helper functions to list the datasets available
def print_files():
    import os
    for dirname, _, filenames in os.walk('/kaggle/input'):
        for filename in filenames:
            print(os.path.join(dirname, filename))
print_files()

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite


# Numpy exercise

<a id='q1'></a>
**1. Import numpy as np and see the version**

[Go back to the table of contents](#table_of_contents)

In [2]:
# Q. Import numpy as np and print the version number.

# Solution
print("Solution")
import numpy as np
print(np.__version__)

Solution
1.17.4


<a id = 'q2'></a>
**2. How to create a 1D array?**

[Go back to the table of contents](#table_of_contents)

In [3]:
# Q. Create a 1D array of numbers from 0 to 9

# Desired Output
# > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Solution
print("Solution")
arr = np.arange(10)
arr

Solution


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<a id = 'q3'></a>
**3. How to create a boolean array?**

[Go back to the table of contents](#table_of_contents)

In [4]:
# Q. Create a 3×3 numpy array of all True’s

# Solution
print("Solution")
arr = np.repeat(True, 9).reshape(3, -1)
arr

Solution


array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

<a id = 'q4'></a>
**4. How to extract items that satisfy a given condition from 1D array?**

[Go back to the table of contents](#table_of_contents)

In [5]:
# Q. Extract all odd numbers from arr
# Input 
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Input")
arr

# Desired Output
# > array([1, 3, 5, 7, 9])

# Solution
print("Solution")
odds = arr[arr%2 != 0]
odds

Input


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution


array([1, 3, 5, 7, 9])

<a id = 'q5'></a>
**5. How to replace items that satisfy a condition with another value in numpy array?**

[Go back to the table of contents](#table_of_contents)

In [6]:
# Q. Replace all odd numbers in arr with -1
# Input 
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Input")
arr

# Desired Output
# >  array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

# Solution
print("Solution")
arr[arr%2 != 0] = -1
arr

Input


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution


array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

<a id = 'q6'></a>
**6. How to replace items that satisfy a condition without affecting the original array?**

[Go back to the table of contents](#table_of_contents)

In [7]:
# Q. Replace all odd numbers in arr with -1 without changing arr
# Input 
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Input")
arr

# Desired Output
# out #>  array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
# arr #>  array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Solution
out = arr.copy()
out[out%2 != 0] = -1
print("Solution")
print("Modified array")
out
print("Original array")
arr

Input


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution
Modified array


array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

Original array


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<a id = 'q7'></a>
**7. How to reshape an array?**

[Go back to the table of contents](#table_of_contents)

In [8]:
# Q. Convert a 1D array to a 2D array with 2 rows
# Input 
arr = np.arange(10)
print("Input")
arr

# Desired Output
# > array([[0, 1, 2, 3, 4],
# >        [5, 6, 7, 8, 9]])

# Solution
print("Solution: reshaped array")
arr.reshape(2, -1)

Input


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution: reshaped array


array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

<a id = 'q8'></a>
**8. How to stack two arrays vertically?**

[Go back to the table of contents](#table_of_contents)

In [9]:
# Q. Stack arrays a and b vertically
# Input
print("Input")
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)
a
b

# Desired Output
#> array([[0, 1, 2, 3, 4],
#>        [5, 6, 7, 8, 9],
#>        [1, 1, 1, 1, 1],
#>        [1, 1, 1, 1, 1]])

# Solution
print("Solution: verticaly stacked arrays")
np.vstack((a, b))

Input


array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

Solution: verticaly stacked arrays


array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

<a id = 'q9'></a>
**9. How to stack two arrays horizontally?**

[Go back to the table of contents](#table_of_contents)

In [10]:
# Q. Stack the arrays a and b horizontally.
# Input
print("Input")
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Desired Output
# > array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
# >        [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

# Solution
print("Solution: horizontally stacked arrays")
np.hstack((a, b))

Input
Solution: horizontally stacked arrays


array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

<a id = 'q10'></a>
**10. How to generate custom sequences in numpy without hardcoding?**

[Go back to the table of contents](#table_of_contents)

In [11]:
# Q. Create the following pattern without hardcoding. Use only numpy functions and the below input array a.

# Input
print("Input")
a = np.array([1,2,3])
a

# Desired Output
# > array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

# Solution
print("Solution")
solution_array = np.hstack((np.repeat(a, 3), a, a, a)) # using repeat to generate the 111222.. sequence and hstack 3 times the original array
np.set_printoptions(threshold=len(solution_array)) # just to help us see all the array-
solution_array

Input


array([1, 2, 3])

Solution


array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

<a id = 'q11'></a>
**11. How to get the common items between two python numpy arrays?**

[Go back to the table of contents](#table_of_contents)

In [12]:
# Q. Get the common items between a and b

# Input
print("Input")
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
a
b

# Desired Output
# array([2, 4])

# Solution
print("Solution")
np.unique(a[a == b])

Input


array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])

array([ 7,  2, 10,  2,  7,  4,  9,  4,  9,  8])

Solution


array([2, 4])

<a id = 'q12'></a>
**12. How to remove from one array those items that exist in another?**

[Go back to the table of contents](#table_of_contents)

In [13]:
# Q. From array a remove all items present in array b

# Input
print("Input")
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])
a
b

# Desired Output
# array([1,2,3,4])

# Solution
print("Solution")
a[~np.isin(a,b)] # np.isin to find the common elements (returns an array of Booleans). To filter only False, use ~ (CTRL + ALT + 4)

Input


array([1, 2, 3, 4, 5])

array([5, 6, 7, 8, 9])

Solution


array([1, 2, 3, 4])

<a id = 'q13'></a>
**13. How to get the positions where elements of two arrays match?**

[Go back to the table of contents](#table_of_contents)

In [14]:
# Q. Get the positions where elements of a and b match

# Input
print("Input")
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
a
b

# Desired Output
# > (array([1, 3, 5, 7]),)

# Solution
print("Solution")
np.where(a == b) # Notice: the solution if the INDEX but the values

Input


array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])

array([ 7,  2, 10,  2,  7,  4,  9,  4,  9,  8])

Solution


(array([1, 3, 5, 7]),)

<a id = 'q14'></a>
**14. How to extract all numbers between a given range from a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [15]:
# Q. Get all items between 5 and 10 from a.

# Input
print("Input")
a = np.array([2, 6, 1, 9, 10, 3, 27])
a

# Desired Output
# (array([6, 9, 10]),)

# Solution
print("Solution")
a[(a > 5) & (a < 10)]

Input


array([ 2,  6,  1,  9, 10,  3, 27])

Solution


array([6, 9])

<a id = 'q15'></a>
**15. How to make a python function that handles scalars to work on numpy arrays?**

[Go back to the table of contents](#table_of_contents)

In [16]:
# Q. Convert the function maxx that works on two scalars, to work on two arrays.

# Input

def maxx(x, y):
    """
    Get the maximum of two items
    """
    
    if x >= y:
        return x
    else:
        return y
print("Result of the maxx function")
maxx(1, 5)

print("Input")
a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])
a
b

# Desired Output
# pair_max(a, b)
#> array([ 6.,  7.,  9.,  8.,  9.,  7.,  5.])

# Solution
print("Solution")

def pair_max(a, b):
    return np.array([max(x, y) for x, y in zip(a, b)]) # using a list comprehension to find the max between 2 arrays elementwise (using zip) and convert to numpy array

pair_max(a, b)

Result of the maxx function


5

Input


array([5, 7, 9, 8, 6, 4, 5])

array([6, 3, 4, 8, 9, 7, 1])

Solution


array([6, 7, 9, 8, 9, 7, 5])

<a id = 'q16'></a>
**16. How to swap two columns in a 2d numpy array?**

[Go back to the table of contents](#table_of_contents)

In [17]:
# Q. Swap columns 1 and 2 in the array arr.

# Input
print("Input")
arr = np.arange(9).reshape(3,3)
arr

# Solution
print("Solution")
temp = arr[:,0].copy() # temporary variable
arr[:,0], arr[:,1] = arr[:,1], temp
arr

Input


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution


array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

<a id = 'q17'></a>
**17. How to swap two rows in a 2d numpy array?**

[Go back to the table of contents](#table_of_contents)

In [18]:
# Q. Swap rows 1 and 2 in the array arr:

# Input
print("Input")
arr = np.arange(9).reshape(3,3)
arr

# Solution
print("Solution")
temp = arr[0,:].copy() # temporary variable
arr[0,:], arr[1,:] = arr[1,:], temp
arr


Input


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution


array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

<a id = 'q18'></a>

**18. How to reverse the rows of a 2D array?**

[Go back to the table of contents](#table_of_contents)

In [19]:
# Q. Reverse the rows of a 2D array arr.

# Input
print("Input")
arr = np.arange(9).reshape(3,3)
arr

# Solution
print("Solution")
arr[::-1]
arr[::-1, :] # exactly the same

Input


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution


array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

<a id = 'q19'></a>
**19. How to reverse the columns of a 2D array?**

[Go back to the table of contents](#table_of_contents)

In [20]:
# Q. Reverse the columns of a 2D array arr.

# Input
print("Input")
arr = np.arange(9).reshape(3,3)
arr

# Solution
print("Solution")
arr[:, ::-1]

Input


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution


array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

<a id = 'q20'></a>
**20. How to create a 2D array containing random floats between 5 and 10?**

[Go back to the table of contents](#table_of_contents)

In [21]:
# Q. Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.

# Solution
print("Solution")
# randint first argument: lower bound, second argument: higher bound (if you put 10, it will max return 9, so we add 1), third argument: number of samples. Then we reshape and done.
np.random.randint(5, 11, 15).reshape(5, 3) 

Solution


array([[ 5,  5,  5],
       [ 7, 10,  7],
       [ 6,  5,  9],
       [ 5, 10,  7],
       [ 6,  5,  9]])

<a id = 'q21'></a>
**21. How to print only 3 decimal places in python numpy array?**

[Go back to the table of contents](#table_of_contents)

In [22]:
# Q. Print or show only 3 decimal places of the numpy array rand_arr.

# Setting print options to default
np.set_printoptions(edgeitems=3,infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

# Input
print("Input")
rand_arr = np.random.random((5,3))
rand_arr

# Solution
print("Solution")
np.set_printoptions(precision=3)
rand_arr

Input


array([[0.31026373, 0.85583078, 0.1492071 ],
       [0.57720891, 0.12015007, 0.27606996],
       [0.4696676 , 0.28325313, 0.42910143],
       [0.73269082, 0.84507254, 0.01846609],
       [0.48404417, 0.89820126, 0.04742885]])

Solution


array([[0.31 , 0.856, 0.149],
       [0.577, 0.12 , 0.276],
       [0.47 , 0.283, 0.429],
       [0.733, 0.845, 0.018],
       [0.484, 0.898, 0.047]])

<a id = 'q22'></a>
**22. How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?**

[Go back to the table of contents](#table_of_contents)

In [23]:
# Q. Pretty print rand_arr by suppressing the scientific notation (like 1e10)

# Setting print options to default
np.set_printoptions(edgeitems=3,infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

# Input
print("Input")
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
rand_arr

# Desired Output
# > array([[ 0.000543,  0.000278,  0.000425],
# >        [ 0.000845,  0.000005,  0.000122],
# >        [ 0.000671,  0.000826,  0.000137]])

# Solution
print("Solution")
np.set_printoptions(suppress=True)
rand_arr

Input


array([[5.43404942e-04, 2.78369385e-04, 4.24517591e-04],
       [8.44776132e-04, 4.71885619e-06, 1.21569121e-04],
       [6.70749085e-04, 8.25852755e-04, 1.36706590e-04]])

Solution


array([[0.0005434 , 0.00027837, 0.00042452],
       [0.00084478, 0.00000472, 0.00012157],
       [0.00067075, 0.00082585, 0.00013671]])

<a id = 'q23'></a>
**23. How to limit the number of items printed in output of numpy array?**

[Go back to the table of contents](#table_of_contents)

In [24]:
# Q. Limit the number of items printed in python numpy array a to a maximum of 6 elements.

# Setting print options to default
np.set_printoptions(edgeitems=3,infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

# Input
print("Input")
a = np.arange(15)
a

# Desired Output
# > array([ 0,  1,  2, ..., 12, 13, 14])

# Solution
print("Solution")
np.set_printoptions(threshold=6)
a

Input


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Solution


array([ 0,  1,  2, ..., 12, 13, 14])

<a id = 'q24'></a>
**24. How to print the full numpy array without truncating**

[Go back to the table of contents](#table_of_contents)

In [25]:
# Q. Print the full numpy array a without truncating.

# Input
print("Input")
np.set_printoptions(threshold=6)
a = np.arange(15)
a

# Desired Output
# > array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# Solution
print("Solution")
np.set_printoptions(threshold=len(a))
a

Input


array([ 0,  1,  2, ..., 12, 13, 14])

Solution


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

<a id = 'q25'></a>
**25. How to import a dataset with numbers and texts keeping the text intact in python numpy?**

[Go back to the table of contents](#table_of_contents)

In [26]:
# Q. Import the iris dataset keeping the text intact.

# Input
# Use the iris dataset provided
print_files()

# All the available options of the numpy genfromtxt function
# numpy.genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, 
#                    skip_footer=0, converters=None, missing_values=None, filling_values=None, 
#                    usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+, -./:;<=>?@[\]^{|}~", 
#                    replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', 
#                    unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')[source]¶

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [0, 1, 2, 3, 4, 5], dtype = None)
iris

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


  app.launch_new_instance()


array([(  1, 5.1, 3.5, 1.4, 0.2, b'Iris-setosa'),
       (  2, 4.9, 3. , 1.4, 0.2, b'Iris-setosa'),
       (  3, 4.7, 3.2, 1.3, 0.2, b'Iris-setosa'), ...,
       (148, 6.5, 3. , 5.2, 2. , b'Iris-virginica'),
       (149, 6.2, 3.4, 5.4, 2.3, b'Iris-virginica'),
       (150, 5.9, 3. , 5.1, 1.8, b'Iris-virginica')],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', 'S15')])

<a id = 'q26'></a>
**26. How to extract a particular column from 1D array of tuples?**

[Go back to the table of contents](#table_of_contents)

In [27]:
# Q. Extract the text column species from the 1D iris imported in previous question.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [5], dtype='str')
iris

# Solution from the website
print("Solution from website")
iris_1d = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', dtype=None)
species = np.array([row[5] for row in iris_1d])
species[:5]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', ..., 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica'], dtype='<U15')

Solution from website




array([b'Species', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa'], dtype='|S18')

<a id = 'q27'></a>
**27. How to convert a 1d array of tuples to a 2d numpy array?**

[Go back to the table of contents](#table_of_contents)


In [28]:
# Q. Convert the 1D iris to 2D array iris_2d by omitting the species text field.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [0, 1, 2, 3, 4], dtype = None)
iris[:4]

# Another solution from the website
print("Another solution from website")
iris_1d = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', dtype=None)
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:4]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([(1, 5.1, 3.5, 1.4, 0.2), (2, 4.9, 3. , 1.4, 0.2),
       (3, 4.7, 3.2, 1.3, 0.2), (4, 4.6, 3.1, 1.5, 0.2)],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])

Another solution from website




array([[b'Id', b'SepalLengthCm', b'SepalWidthCm', b'PetalLengthCm'],
       [b'1', b'5.1', b'3.5', b'1.4'],
       [b'2', b'4.9', b'3.0', b'1.4'],
       [b'3', b'4.7', b'3.2', b'1.3']], dtype='|S13')

<a id = 'q28'></a>
**28. How to compute the mean, median, standard deviation of a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [29]:
# Q. Find the mean, median, standard deviation of iris's sepallength (1st column)

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1])
iris

import pandas as pd
pd.Series(iris).describe()

from scipy import stats 
stats.describe(iris) 

# Solution from the website
print("Another solution from the website")
mu, med, sd = np.mean(iris), np.median(iris), np.std(iris)
print(mu, med, sd)

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([5.1, 4.9, 4.7, ..., 6.5, 6.2, 5.9])

count    150.000000
mean       5.843333
std        0.828066
min        4.300000
25%        5.100000
50%        5.800000
75%        6.400000
max        7.900000
dtype: float64

DescribeResult(nobs=150, minmax=(4.3, 7.9), mean=5.843333333333334, variance=0.6856935123042507, skewness=0.3117530585022963, kurtosis=-0.5735679489249765)

Another solution from the website
5.843333333333334 5.8 0.8253012917851409


<a id = 'q29'></a>
**29. How to normalize an array so the values range exactly between 0 and 1?**

[Go back to the table of contents](#table_of_contents)

In [30]:
# Q. Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1])
(iris - np.min(iris))/(np.max(iris) - np.min(iris))

# Another solution from the website
print("Another solution from the website")
iris.ptp() # peak to peak. Basically the same as (np.max(iris) - np.min(iris))
(iris - np.min(iris))/iris.ptp()

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([0.22222222, 0.16666667, 0.11111111, ..., 0.61111111, 0.52777778,
       0.44444444])

Another solution from the website


3.6000000000000005

array([0.22222222, 0.16666667, 0.11111111, ..., 0.61111111, 0.52777778,
       0.44444444])

<a id = 'q30'></a>
**30. How to compute the softmax score?**

[Go back to the table of contents](#table_of_contents)

In [31]:
# Q. Compute the softmax score of sepallength.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
# The Softmax regression is a form of logistic regression that normalizes an input value into a vector of values that follows a probability distribution whose total sums up to 1. 
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1])
softmax = np.exp(iris)/sum(np.exp(iris))
softmax.sum() # it must sum 1

# We can also apply this to more than 1 column.
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
softmax = np.exp(iris)/sum(np.exp(iris))
softmax.sum() # We have 4 since we have 4 columns, each sums 1
softmax

# Solution from the website
print("Solution from the website")

def softmax(x):
    """Compute softmax values for each sets of scores in x.
    https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python"""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

print(softmax(iris).sum())
print(softmax(iris))


/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


0.9999999999999997

4.0

array([[0.00221959, 0.00944058, 0.00021035, 0.00188094],
       [0.00181724, 0.005726  , 0.00021035, 0.00188094],
       [0.00148783, 0.00699375, 0.00019033, 0.00188094],
       ...,
       [0.00900086, 0.005726  , 0.00940269, 0.011379  ],
       [0.006668  , 0.00854219, 0.01148447, 0.01536005],
       [0.00493978, 0.005726  , 0.00850791, 0.00931634]])

Solution from the website
4.0
[[0.00221959 0.00944058 0.00021035 0.00188094]
 [0.00181724 0.005726   0.00021035 0.00188094]
 [0.00148783 0.00699375 0.00019033 0.00188094]
 ...
 [0.00900086 0.005726   0.00940269 0.011379  ]
 [0.006668   0.00854219 0.01148447 0.01536005]
 [0.00493978 0.005726   0.00850791 0.00931634]]


<a id = 'q31'></a>
**31. How to find the percentile scores of a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [32]:
# Q. Find the 5th and 95th percentile of iris's sepallength

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1])
iris
np.percentile(iris, q = [5, 95])

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([5.1, 4.9, 4.7, ..., 6.5, 6.2, 5.9])

array([4.6  , 7.255])

<a id = 'q32'></a>
**32. How to insert values at random positions in an array?**

[Go back to the table of contents](#table_of_contents)

In [33]:
# Q. Insert np.nan values at 20 random positions in iris_2d dataset

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
index = np.random.randint(0, 150, 20)
iris[index] = np.nan
iris

# Solution from the website
print("Another solution from the website")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
i, j = np.where(iris_2d) # get the index of all 600 elements of the array
nan_index = [np.random.choice((i), 20), np.random.choice((j), 20)] # get some random values for each row and column
iris[nan_index] = np.nan
iris

# Solution 3 from the website
print("Solution 3 from the website")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
iris[np.random.randint(149, size=20), np.random.randint(4, size=20)] = np.nan
iris

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

Another solution from the website




array([[nan, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

Solution 3 from the website


array([[5.1, 3.5, nan, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       ...,
       [6.5, 3. , 5.2, nan],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

<a id = 'q33'></a>
**33. How to find the position of missing values in numpy array?**

[Go back to the table of contents](#table_of_contents)

In [34]:
#### Q. Find the number and position of missing values in iris_2d's sepallength (1st column)

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
# iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
iris[:,0][np.random.randint(0 , len(iris), 50)] = np.nan # set some random values in the first column
iris
nan_index_1 = np.where(np.isnan(iris)) # to check for nan, the official documentation always recommends using np.isnan
nan_index_1
iris[nan_index_1]
print("Number of missing values: \n", np.isnan(iris[:, 0]).sum())

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([[nan, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [nan, 3.2, 1.3, 0.2],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [nan, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

(array([  0,   2,   8, ..., 145, 146, 148]), array([0, 0, 0, ..., 0, 0, 0]))

array([nan, nan, nan, ..., nan, nan, nan])

Number of missing values: 
 47


<a id = 'q34'></a>
**34. How to filter a numpy array based on two or more conditions?**

[Go back to the table of contents](#table_of_contents)

In [35]:
# Q. Filter the rows of iris_2d that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
reduce_array = iris[(iris[:,3] > 1.5) & (iris[:,1] < 5)]
iris.shape
reduce_array.shape
reduce_array

# Another solution
print("Using criteria saved as objects")
cond1 = iris[:,3] > 1.5
cond2 = iris[:,1] < 5
reduce_array2 = iris[cond1 & cond2]
reduce_array2.shape
reduce_array2

# Another solution using reduce
print("Using reduce")
from functools import reduce
criteria = reduce(lambda x, y: x & y, (cond1, cond2))
iris[criteria].shape
iris[criteria]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


(150, 4)

(52, 4)

array([[6.3, 3.3, 4.7, 1.6],
       [5.9, 3.2, 4.8, 1.8],
       [6.7, 3. , 5. , 1.7],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

Using criteria saved as objects


(52, 4)

array([[6.3, 3.3, 4.7, 1.6],
       [5.9, 3.2, 4.8, 1.8],
       [6.7, 3. , 5. , 1.7],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

Using reduce


(52, 4)

array([[6.3, 3.3, 4.7, 1.6],
       [5.9, 3.2, 4.8, 1.8],
       [6.7, 3. , 5. , 1.7],
       ...,
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

<a id = 'q35'></a>

**35. How to drop rows that contain a missing value from a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [36]:
# Q. Select the rows of iris_2d that does not have any nan value.

# Input
# Use the titanic dataset provided
print_files()

# set original print statements
np.set_printoptions(edgeitems=3,infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

# Solution
print("Solution")

# Importing the titanic df
def import_titanic():
    with open("/kaggle/input/titanic/train.csv", "r") as f:
        data = f.read()
        l = []
        for row in data.split("\n")[1:-1]:
            r_ = row.split(",")
            l_ = []
            for c in r_:
                if c == "": 
                    l_.append(np.nan)
                else:
                    try:
                        l_.append(float(c))
                    except:
                        l_.append(c)
            l.append(l_)
    return l

l = import_titanic()
# only numeric columns
a = np.array(l, dtype = object)[:,[1, 2, 6, 7, 8, 10]]
# convert to float
arr = np.array(a, dtype = float)
# select rows with nan values
nan_r = np.array([~np.any(np.isnan(row)) for row in arr])
# filter the array
arr_no_nan = arr[nan_r]
arr_no_nan
# check: the sum of nans must be zero
np.isnan(arr_no_nan).sum()

# Solution 2
print("Solution from the website")
l = import_titanic()
a = np.array(l, dtype = object)[:,[1, 2, 6, 7, 8, 10]]
arr = np.array(a, dtype = float)
arr[np.sum(np.isnan(arr), axis = 1) == 0] # much more elegant solution
np.isnan(arr[np.sum(np.isnan(arr), axis = 1) == 0]).sum()

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([[ 0.    ,  3.    , 22.    ,  1.    ,  0.    ,  7.25  ],
       [ 1.    ,  1.    , 38.    ,  1.    ,  0.    , 71.2833],
       [ 1.    ,  3.    , 26.    ,  0.    ,  0.    ,  7.925 ],
       ...,
       [ 1.    ,  1.    , 19.    ,  0.    ,  0.    , 30.    ],
       [ 1.    ,  1.    , 26.    ,  0.    ,  0.    , 30.    ],
       [ 0.    ,  3.    , 32.    ,  0.    ,  0.    ,  7.75  ]])

0

Solution from the website


array([[ 0.    ,  3.    , 22.    ,  1.    ,  0.    ,  7.25  ],
       [ 1.    ,  1.    , 38.    ,  1.    ,  0.    , 71.2833],
       [ 1.    ,  3.    , 26.    ,  0.    ,  0.    ,  7.925 ],
       ...,
       [ 1.    ,  1.    , 19.    ,  0.    ,  0.    , 30.    ],
       [ 1.    ,  1.    , 26.    ,  0.    ,  0.    , 30.    ],
       [ 0.    ,  3.    , 32.    ,  0.    ,  0.    ,  7.75  ]])

0

<a id = 'q36'></a>
**36. How to find the correlation between two columns of a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [37]:
# Q. Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in iris_2d

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution
print("Solution")
iris = np.genfromtxt('/kaggle/input/iris/Iris.csv', delimiter=',', skip_header=1, usecols = [1, 2, 3, 4])
np.correlate(iris[:,1], iris[:,3]) # This function computes the correlation as generally defined in signal processing texts: c_{av}[k] = sum_n a[n+k] * conj(v[n])
np.corrcoef(iris[:,1], iris[:,3]) # Pearson correlation

# Solution from the website
print("Solution from the website using scipy")
from scipy.stats.stats import pearsonr  
corr, p_value = pearsonr(iris[:, 1], iris[:, 3])
print(corr)
print(p_value)

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([531.53])

array([[ 1.        , -0.35654409],
       [-0.35654409,  1.        ]])

Solution from the website using scipy
-0.3565440896138058
7.523890956067452e-06


<a id = 'q37'></a>
**37. How to find if a given array has any null values?**

[Go back to the table of contents](#table_of_contents)

In [38]:
# Q. Find out if iris_2d has any missing values.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution
print("Solution")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=",",  dtype='float', usecols=[1,2,3,4], skip_header=1)
iris[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # inser some null values

print("It's {} that we have nan values. The total amout of nan values is {}".format(np.any(np.isnan(iris)), np.isnan(iris).sum())) # first returns True second the total of nan values


/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution
It's True that we have nan values. The total amout of nan values is 20


<a id = 'q38'></a>

**38. How to replace all missing values with 0 in a numpy array?**

[Go back to the table of contents](#table_of_contents)


In [39]:
# Q. Replace all ccurrences of nan with 0 in numpy array

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Solution
print("Solution")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=",",  dtype='float', usecols=[1,2,3,4], skip_header=1)
iris[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # inser some null values

print("Before applying nan_to_num.")
np.isnan(iris).sum()
a = np.nan_to_num(iris, 0)
print("After applying nan_to_num.")
np.isnan(a).sum()

# Solution from the website
print("Solution from the website")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=",",  dtype='float', usecols=[1,2,3,4], skip_header=1)
iris[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # inser some null values
np.isnan(iris).sum()
iris[np.isnan(iris)] = 0
np.isnan(a).sum()

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution
Before applying nan_to_num.


20

After applying nan_to_num.


0

Solution from the website


19

0

<a id = 'q39'></a>
**39. How to find the count of unique values in a numpy array?**

[Go back to the table of contents](#table_of_contents)


In [40]:
# Q. Find the unique values and the count of unique values in iris's species

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
print("Solution using list comprehension")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[5], dtype=object, skip_header=1)
l = [(v, np.count_nonzero(iris[iris == v])) for v in np.unique(iris)]
l

# Solution from the website
print("Solution from the website")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[5], dtype=object, skip_header=1)
np.unique([v for v in iris], return_counts=True) # much more elegant

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution using list comprehension


[(b'Iris-setosa', 50), (b'Iris-versicolor', 50), (b'Iris-virginica', 50)]

Solution from the website


(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
       dtype='|S15'), array([50, 50, 50]))

<a id = 'q40'></a>

**40. How to convert a numeric to a categorical (text) array?**

[Go back to the table of contents](#table_of_contents)


In [41]:
# Q. Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:
# Less than 3 --> 'small'
# 3-5 --> 'medium'
# '>=5 --> 'large'

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
print("Solution")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[3], dtype=float, skip_header=1)
bin_ = np.digitize(iris.astype('float'), [0, 3, 5, 10])
bin_
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
cat_ = [label_map[x] for x in bin_]
cat_[:5]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 3, 2, 2, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

['small', 'small', 'small', 'small', 'small']

<a id = 'q41'></a>
**41. How to create a new column from existing columns of a numpy array?**

[Go back to the table of contents](#table_of_contents)


In [42]:
# Q. Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
print("Solution: use numpy.c_[] for columns")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[2, 3], dtype=float, skip_header=1)
iris.shape
iris = np.c_[iris, (np.array(iris[:,0] * 3.14 * (iris[:,1])**2))/3]
iris.shape
iris[:5]

# Solution from the website
print("Solution from website")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[2, 3], dtype=float, skip_header=1)
s = iris[:, 1]
p = iris[:, 0]
volume = (np.pi * p * (s**2))/3
# Introduce new dimension to match iris_2d's
volume = volume[:, np.newaxis]
# Add the new column
out = np.hstack([iris, volume])
# View
out[:4]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution: use numpy.c_[] for columns


(150, 2)

(150, 3)

array([[3.5       , 1.4       , 7.18013333],
       [3.        , 1.4       , 6.1544    ],
       [3.2       , 1.3       , 5.66037333],
       [3.1       , 1.5       , 7.3005    ],
       [3.6       , 1.4       , 7.38528   ]])

Solution from website


array([[3.5       , 1.4       , 7.1837752 ],
       [3.        , 1.4       , 6.1575216 ],
       [3.2       , 1.3       , 5.66324436],
       [3.1       , 1.5       , 7.30420292]])

<a id = 'q42'></a>
**42. How to do probabilistic sampling in numpy?**

[Go back to the table of contents](#table_of_contents)


In [43]:
# Q. Randomly sample iris's species such that setose is twice the number of versicolor and virginica

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution from the website
print("Solution from the website")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=1)
# Get the species column
species = iris[:, 5]

# Approach 1: Generate Probablistically
print("Solution 1: generate probablistically")
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25])
species_out

# Approach 2: Probablistic Sampling (preferred)
print("Solution 2: probablistic sampling")
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
print(np.unique(species_out, return_counts=True))

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution from the website
Solution 1: generate probablistically


array(['Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-virginica',
       'Iris-setosa', 'Iris-versicolor', 'Iris-virginica', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
       'Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',
       'Iris-setosa', 'Iris-virginica', 'Iris-virginica', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa',
       'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-setosa', 'Iris-versic

Solution 2: probablistic sampling
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
      dtype=object), array([77, 37, 36]))


<a id = 'q43'></a>
**43. How to get the second largest value of an array when grouped by another array?**

[Go back to the table of contents](#table_of_contents)


In [44]:
# Q. What is the value of second longest petallength of species setosa

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
print("Solution")
headers = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=0)
headers[0]
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=1)
# iris
sorted_iris = iris[iris[:,5] == b'Iris-setosa'][:,3]
sorted_iris = sorted_iris.astype(float)
sorted_iris = np.unique(sorted_iris)
sorted_iris.sort()
sorted_iris[::-1][1]

# Solution from the website
print("Solution from the website")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=1)
petal_len_setosa = iris[iris[:, 5] == b'Iris-setosa', [3]].astype('float')
np.unique(np.sort(petal_len_setosa))[-2]

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([b'Id', b'SepalLengthCm', b'SepalWidthCm', b'PetalLengthCm',
       b'PetalWidthCm', b'Species'], dtype=object)

1.7

Solution from the website


1.7

<a id = 'q44'></a>
**44. How to sort a 2D array by a column**

[Go back to the table of contents](#table_of_contents)


In [45]:
# Q. Sort the iris dataset based on sepallength column.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris = np.genfromtxt(url, delimiter=',', dtype='object')
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
print("Solution")
headers = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=0)
print(list(headers[0]))
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[0,1,2,3,4], skip_header=1)
iris[iris[:,1].argsort()][:5]

# Solution from the website
print("Solution from the website")
print(iris[iris[:,1].argsort()][:5]) # same solution

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution
[b'Id', b'SepalLengthCm', b'SepalWidthCm', b'PetalLengthCm', b'PetalWidthCm', b'Species']


array([[14. ,  4.3,  3. ,  1.1,  0.1],
       [43. ,  4.4,  3.2,  1.3,  0.2],
       [39. ,  4.4,  3. ,  1.3,  0.2],
       [ 9. ,  4.4,  2.9,  1.4,  0.2],
       [42. ,  4.5,  2.3,  1.3,  0.3]])

Solution from the website
[[14.   4.3  3.   1.1  0.1]
 [43.   4.4  3.2  1.3  0.2]
 [39.   4.4  3.   1.3  0.2]
 [ 9.   4.4  2.9  1.4  0.2]
 [42.   4.5  2.3  1.3  0.3]]


<a id = 'q45'></a>
**45. How to find the most frequent value in a numpy array?**

[Go back to the table of contents](#table_of_contents)


In [46]:
# Q. Find the most frequent value of petal length (3rd column) in iris dataset.

# Input
# Use the iris dataset provided
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris = np.genfromtxt(url, delimiter=',', dtype='object')
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
np.set_printoptions(edgeitems=3,infstr='inf', linewidth=75, nanstr='nan', precision=3, suppress=False, threshold=1000, formatter=None)
print("Solution")
headers = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", dtype=object, skip_header=0)
print(list(headers[0]))
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[4], skip_header=1)
counts = np.unique([v for v in iris], return_counts=True) # we have a tuple of arrays
sort_list = sorted(list(zip(counts[0], counts[1])), key = lambda x: x[1]) # extract the values and counts, zip them and sort by the counts
sort_list[::-1][0] #reverse the list and get the first (most frequent) the most frequent value of petal lenght is 0.2, it has ocurred 28 times

# Solution from the website
print("Solution from the website")
vals, counts = np.unique(iris, return_counts=True)
print(vals[np.argmax(counts)]) # much more elegant solution

/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution
[b'Id', b'SepalLengthCm', b'SepalWidthCm', b'PetalLengthCm', b'PetalWidthCm', b'Species']


(0.2, 28)

Solution from the website
0.2


<a id = 'q46'></a>
**46. How to find the position of the first occurrence of a value greater than a given value?**

[Go back to the table of contents](#table_of_contents)


In [47]:
# Q. Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.

# Input
# Use the iris dataset provided
print("Input")
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution
print("Solution")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter = ",", usecols=[4], skip_header=1)
i = np.where(iris > 1) # returns the index where this is tru
i
iris[i[0][0]] # first ocurrence is on index 50
iris[:i[0][0] + 1]

# Solution from the website
print("Solution from website")
np.argwhere(iris.astype(float) > 1.0)[0] # same result but much faster and elegant

Input
/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


(array([ 50,  51,  52,  53,  54,  55,  56,  58,  59,  61,  63,  64,  65,
         66,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  80,
         82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  94,  95,
         96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
        109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
        122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
        135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
        148, 149]),)

1.4

array([0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1,
       0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2,
       0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.1, 0.2, 0.2, 0.1, 0.2,
       0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4])

Solution from website


array([50])

<a id = 'q47'></a>
**47. How to replace all values greater than a given value to a given cutoff?**

[Go back to the table of contents](#table_of_contents)


In [48]:
# Q. From the array a, replace all values greater than 30 to 30 and less than 10 to 10.
# Input
print("Input")
np.random.seed(100)
a = np.random.uniform(1,50, 20)
a

# Solution
print("Solution")
c1 = np.where(a > 30)
c2 = np.where(a < 10)
a[c1] = 30
a[c2] = 10
a

# Solution from the website
print("Solution from the website")

# Solution 1: Using np.clip
np.random.seed(100)
a = np.random.uniform(1,50, 20)
np.clip(a, a_min=10, a_max=30) # probabily the most elegant solution

# Solution 3: Using np.where
np.random.seed(100)
a = np.random.uniform(1,50, 20)
print(np.where(a < 10, 10, np.where(a > 30, 30, a)))

Input


array([27.627, 14.64 , 21.801, 42.394,  1.231,  6.957, 33.867, 41.467,
        7.699, 29.18 , 44.675, 11.251, 10.081,  6.31 , 11.765, 48.953,
       40.772,  9.425, 40.995, 14.43 ])

Solution


array([27.627, 14.64 , 21.801, 30.   , 10.   , 10.   , 30.   , 30.   ,
       10.   , 29.18 , 30.   , 11.251, 10.081, 10.   , 11.765, 30.   ,
       30.   , 10.   , 30.   , 14.43 ])

Solution from the website


array([27.627, 14.64 , 21.801, 30.   , 10.   , 10.   , 30.   , 30.   ,
       10.   , 29.18 , 30.   , 11.251, 10.081, 10.   , 11.765, 30.   ,
       30.   , 10.   , 30.   , 14.43 ])

[27.627 14.64  21.801 30.    10.    10.    30.    30.    10.    29.18
 30.    11.251 10.081 10.    11.765 30.    30.    10.    30.    14.43 ]


<a id = 'q48'></a>
**48. How to get the positions of top n values from a numpy array?**

[Go back to the table of contents](#table_of_contents)


In [49]:
# Q. Get the positions of top 5 maximum values in a given array a.

# Input
print("Input")
np.random.seed(100)
a = np.random.uniform(1,50, 20)
a

# Solution
print("Solution")
a.argsort() # sort the numpy array with argsort(), returns the index starting from min to max value. Index 15 has the max value
a.argsort()[-5:] # select top 5 index
a.argsort()[-5:][::-1] # a.argsort()[::-1][:5] are equivalent, reverse the top 5 indee
a[a.argsort()[-5:][::-1]] # get the values

# Solution from the website
print("Solution from the webpage")
# Solution:
print(a.argsort())
#> [18 7 3 10 15]

# Solution 2:
np.argpartition(-a, 5)[:5]
#> [15 10  3  7 18]

# Below methods will get you the values.
# Method 1:
a[a.argsort()][-5:]

# Method 2:
np.sort(a)[-5:]

# Method 3:
np.partition(a, kth=-5)[-5:]

# Method 4:
a[np.argpartition(-a, 5)][:5]

Input


array([27.627, 14.64 , 21.801, 42.394,  1.231,  6.957, 33.867, 41.467,
        7.699, 29.18 , 44.675, 11.251, 10.081,  6.31 , 11.765, 48.953,
       40.772,  9.425, 40.995, 14.43 ])

Solution


array([ 4, 13,  5,  8, 17, 12, 11, 14, 19,  1,  2,  0,  9,  6, 16, 18,  7,
        3, 10, 15])

array([18,  7,  3, 10, 15])

array([15, 10,  3,  7, 18])

array([48.953, 44.675, 42.394, 41.467, 40.995])

Solution from the webpage
[ 4 13  5  8 17 12 11 14 19  1  2  0  9  6 16 18  7  3 10 15]


array([15, 10,  3,  7, 18])

array([40.995, 41.467, 42.394, 44.675, 48.953])

array([40.995, 41.467, 42.394, 44.675, 48.953])

array([40.995, 41.467, 42.394, 44.675, 48.953])

array([48.953, 44.675, 42.394, 41.467, 40.995])

<a id = 'q49'></a>

**49. How to compute the row wise counts of all possible values in an array?**

[Go back to the table of contents](#table_of_contents)


In [50]:
# Q. Compute the counts of unique values row-wise.

# Input
print("Input")
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr

# Desired Output
# > [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
# >  [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
# >  [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
# >  [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
# >  [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
# >  [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

# Output contains 10 columns representing numbers from 1 to 10. The values are the counts of the numbers in the respective rows.
# For example, Cell(0,2) has the value 2, which means, the number 3 occurs exactly 2 times in the 1st row.

# Solution
print("Solution")
from collections import Counter
rows = arr.shape[0]
lc = []
for row in range(rows): # iterate over all rows in the numpy array
    counter = Counter() # on every row, create a new Counter
    counter.update(arr[row]) # feed the Counter with the row
    lc.append(dict(counter)) # append a dict in a list: each row will have it's unique dict/counter

np.array([np.vectorize(lc[row].get)(arr[row]) for row in range(rows)]) # trasnfrom the arr into a vector and map to the values in the dictionary (lc[row].get gets every dictionary) for each row in rows

# more interesting aproaches here:
# https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key

# Solution from the website
print("Solution from the website (incorrect)")

def counts_of_all_values_rowwise(arr2d):
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]
    # Counts of all values row wise
    return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])

# it has a bug
print("The solution from the website has a bug.")
counts_of_all_values_rowwise(arr)

def counts_of_all_values_rowwise_corrected(arr2d):
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d] # same as in the previous solution
    # we have a numpy array of tuples that contain for each row the elements and their counts
    ll = [] # create an empty list of lists, we will later convert in into numpy array
    for i in range(arr2d.shape[0]): # for each row in arr rows
        l = [] # create a new list where we will be adding the mappings
        rmapper = num_counts_array[i] # rmapper = row mapper. In num_counts_array for have the same amount of tuples (elemnt - count) as rows in the arr2d
        for v in arr2d[i]: # for each value in a row
            l.append(rmapper[1][np.where(rmapper[0] == v)][0]) # append to the list the count (rmapper[1]), we are using np.where to find the index of the element
        ll.append(l) # append the list to the list of lists
    return np.array(ll) # convert the lst of list into numpy 2d array

print("The solution from the website corrected.")
counts_of_all_values_rowwise_corrected(arr)

Input


array([[ 9,  9,  4,  8,  8,  1,  5,  3,  6,  3],
       [ 3,  3,  2,  1,  9,  5,  1, 10,  7,  3],
       [ 5,  2,  6,  4,  5,  5,  4,  8,  2,  2],
       [ 8,  8,  1,  3, 10, 10,  4,  3,  6,  9],
       [ 2,  1,  8,  7,  3,  1,  9,  3,  6,  2],
       [ 9,  2,  6,  5,  3,  9,  4,  6,  1, 10]])

Solution


array([[2, 2, 1, 2, 2, 1, 1, 2, 1, 2],
       [3, 3, 1, 2, 1, 1, 2, 1, 1, 3],
       [3, 3, 1, 2, 3, 3, 2, 1, 3, 3],
       [2, 2, 1, 2, 2, 2, 1, 2, 1, 1],
       [2, 2, 1, 1, 2, 2, 1, 2, 1, 2],
       [2, 1, 2, 1, 1, 2, 1, 2, 1, 1]])

Solution from the website (incorrect)
The solution from the website has a bug.


[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
 [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
 [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
 [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
 [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
 [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

The solution from the website corrected.


array([[2, 2, 1, 2, 2, 1, 1, 2, 1, 2],
       [3, 3, 1, 2, 1, 1, 2, 1, 1, 3],
       [3, 3, 1, 2, 3, 3, 2, 1, 3, 3],
       [2, 2, 1, 2, 2, 2, 1, 2, 1, 1],
       [2, 2, 1, 1, 2, 2, 1, 2, 1, 2],
       [2, 1, 2, 1, 1, 2, 1, 2, 1, 1]])

<a id = 'q50'></a>
**50. How to convert an array of arrays into a flat 1d array?**

[Go back to the table of contents](#table_of_contents)


In [51]:
# Q. Convert array_of_arrays into a flat linear 1d array.

# Input
print("Input")
arr = np.arange(9).reshape(3,3)
arr

# Solution
print("Solution")
arr.flatten()

# Solution from the website
arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)

array_of_arrays = np.array([arr1, arr2, arr3])
print('array_of_arrays: ', array_of_arrays)

# Solution 1
arr_2d = np.array([a for arr in array_of_arrays for a in arr]) # interesting loop comprehension
arr_2d

print("------------------")
arr_2d = []
for arr in array_of_arrays:
    for a in arr:
        arr_2d.append(a)
np.array(arr_2d)


# Solution 2:
arr_2d = np.concatenate(array_of_arrays)
print(arr_2d)

Input


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution


array([0, 1, 2, 3, 4, 5, 6, 7, 8])

array_of_arrays:  [array([0, 1, 2]) array([3, 4, 5, 6]) array([7, 8, 9])]


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

------------------


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

[0 1 2 3 4 5 6 7 8 9]


<a id = 'q51'></a>
**51. How to generate one-hot encodings for an array in numpy?**

[Go back to the table of contents](#table_of_contents)


In [52]:
# Q. Compute the one-hot encodings (dummy binary variables for each unique value in the array)

# Input
print("Input")
np.random.seed(101) 
arr = np.random.randint(1,4, size=6)
arr

# Desired Output
# > array([[ 0.,  1.,  0.],
# >        [ 0.,  0.,  1.],
# >        [ 0.,  1.,  0.],
# >        [ 0.,  1.,  0.],
# >        [ 0.,  1.,  0.],
# >        [ 1.,  0.,  0.]])

# Solution
print("Solution using pandas")
import pandas as pd
df = pd.DataFrame(arr)
dummies = pd.get_dummies(df[0])
np.array(dummies)

# Solution using pure python
print("Solution using pure python")
ll = []
for i in list(set(arr)):
    l = []
    for j in arr:
        l.append(1) if i == j else l.append(0)
    ll.append(l)
np.array(ll).T

# Solution using pure python with list comprehension
print("Solution using pure python with list comprehension (list of lists)")
np.array([[1 if i == j else 0 for i in list(set(arr))] for j in arr])

# Solution using numpy
# Solution from the website
print("Solution from the website")
print("Solution 1 using numpy")

uniques = np.unique(arr)
out = np.zeros((arr.shape[0], uniques.shape[0]))
for i, k in enumerate(arr):
    print(i, k)
    out[i, k-1] = 1 # very cool solution
out

print("Solution 2 using numpy")
print("arr[:, None] evaluates all the numpy array to the unique elements and returns True or False")
(arr[:,None] == np.unique(arr))
print("we add .view(np.int8) to convert Boolean to 1 or zero")
(arr[:,None] == np.unique(arr)).view(np.int8)


Input


array([2, 3, 2, 2, 2, 1])

Solution using pandas


array([[0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0]], dtype=uint8)

Solution using pure python


array([[0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0]])

Solution using pure python with list comprehension (list of lists)


array([[0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0]])

Solution from the website
Solution 1 using numpy
0 2
1 3
2 2
3 2
4 2
5 1


array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

Solution 2 using numpy
arr[:, None] evaluates all the numpy array to the unique elements and returns True or False


array([[False,  True, False],
       [False, False,  True],
       [False,  True, False],
       [False,  True, False],
       [False,  True, False],
       [ True, False, False]])

we add .view(np.int8) to convert Boolean to 1 or zero


array([[0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0]], dtype=int8)

<a id = 'q52'></a>
**52. How to create row numbers grouped by a categorical variable?**

[Go back to the table of contents](#table_of_contents)


In [53]:
# Q. Create row numbers grouped by a categorical variable. Use the following sample from iris species as input.

# Input
# Use the iris dataset provided
print("Input")
print_files()

# Use this if you are working on your local machine
# species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
# species_small = np.sort(np.random.choice(species, size=20))
# species_small
# > array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
# >        'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
# >        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
# >        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
# >        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
# >        'Iris-virginica', 'Iris-virginica', 'Iris-virginica'],
# >       dtype='<U15')


# Desired Output
# > [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7]

# Solution
print("Solution")
species = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=',', dtype='str', usecols=5, skip_header=1)
species_small = np.sort(np.random.choice(species, size=20))
species_small
ll = [[i for i in range(len(species_small[species_small == j]))] for j in np.unique(species_small)] # create a list of lits
[i for l in ll for i in l] # flatten the list: read from the first for: for list in lists for i in list append i. THE FIRST i
# https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists

# Solution from the website
print("Solution from the website")
[i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])]

Input
/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica'], dtype='<U15')

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7, 8]

Solution from the website


[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7, 8]


<a id = 'q53'></a>
**53. How to create groud ids based on a given categorical variable?**

[Go back to the table of contents](#table_of_contents)

In [54]:
# Q. Create group ids based on a given categorical variable. Use the following sample from iris species as input

# Input
# Use the iris dataset provided
print("Input")
print_files()

# Use this if you are working on your local machine
# species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
# species_small = np.sort(np.random.choice(species, size=20))
# species_small
# > array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
# >        'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
# >        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
# >        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
# >        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
# >        'Iris-virginica', 'Iris-virginica', 'Iris-virginica'],
# >       dtype='<U15')

# Desired Output
# > [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

# Solution
print("Solution")
species = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=',', dtype='str', usecols=5, skip_header=1)
species_small = np.sort(np.random.choice(species, size=20))
species_small

d = dict((k, i) for i, k in enumerate(np.unique(species_small))) # create a mapping for every specie and store it in a dictionary
np.array([np.vectorize(d.get)(sp) for sp in species_small]) # use vectorize and dictionary .get method to map all the species

# Solution from the website
print("Solution from the website")
print("Solution usig numpy")
output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]]
output

# Solution: For Loop version
print("Solution using for loops")
output = []
uniqs = np.unique(species_small)

for val in uniqs:  # uniq values in group
    for s in species_small[species_small==val]:  # each element in group
        groupid = np.argwhere(uniqs == s).tolist()[0][0]  # groupid
        output.append(groupid)
output

Input
/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica'], dtype='<U15')

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2])

Solution from the website
Solution usig numpy


[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

Solution using for loops


[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

<a id = 'q54'></a>
**54. How to rank items in an array using numpy?**

[Go back to the table of contents](#table_of_contents)

In [55]:
# Q. Create the ranks for the given numeric array a.

# Input
print("Input")
np.random.seed(10)
a = np.random.randint(20, size=10)
a

# Desired Output
# [4 2 6 0 8 7 9 3 5 1]

# Solution
print("Solution")
a.argsort().argsort() # use argosrt twice: first to find the order of the array and then the rank
print("Order of the array: in the index 3 of the array (a) we have the value 0, which is the smallest value - ")
print("- at the index 9, we have the value 0 which is the second largest value etc etc etc")
a.argsort()
print("Second argsort: the value 9 in the original array is the 4th smallest value, the value 4 in the original array is the second smallest value etc etc etc")
a.argsort().argsort

# reference: https://stackoverflow.com/questions/5284646/rank-items-in-an-array-using-python-numpy-without-sorting-array-twice/

Input


array([ 9,  4, 15,  0, 17, 16, 17,  8,  9,  0])

Solution


array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1])

Order of the array: in the index 3 of the array (a) we have the value 0, which is the smallest value - 
- at the index 9, we have the value 0 which is the second largest value etc etc etc


array([3, 9, 1, 7, 0, 8, 2, 5, 4, 6])

Second argsort: the value 9 in the original array is the 4th smallest value, the value 4 in the original array is the second smallest value etc etc etc


<function ndarray.argsort>

<a id = 'q55'></a>
**55. How to rank items in a multidimensional array using numpy?**

[Go back to the table of contents](#table_of_contents)

In [56]:
# Q. Create a rank array of the same shape as a given numeric array a.

# Input
print("Input")
np.random.seed(10)
a = np.random.randint(20, size=[2,5])
a

# Desired Output
# > [[4 2 6 0 8]
# >  [7 9 3 5 1]]

# Solution
print("Solution")
a.flatten().argsort().argsort().reshape(2, -1) # flatten first the array, then use the same tecnique as before and then reshape the array

# Solution from the website
print("Solution from the website")
print(a.ravel().argsort().argsort().reshape(a.shape))

'''
Difference between flatten and ravel:

- flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

- ravel is a library-level function and hence can be called on any object that can successfully be parsed.
'''

Input


array([[ 9,  4, 15,  0, 17],
       [16, 17,  8,  9,  0]])

Solution


array([[4, 2, 6, 0, 8],
       [7, 9, 3, 5, 1]])

Solution from the website
[[4 2 6 0 8]
 [7 9 3 5 1]]


'\nDifference between flatten and ravel:\n\n- flatten is a method of an ndarray object and hence can only be called for true numpy arrays.\n\n- ravel is a library-level function and hence can be called on any object that can successfully be parsed.\n'

<a id = 'q56'></a>
**56. How to find the maximum value in each row of a numpy array 2d?**

[Go back to the table of contents](#table_of_contents)

In [57]:
# Q. Compute the maximum for each row in the given array.

# Input
print("Input")
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
a

# Solution
print("Solution")
np.array([max(row) for row in a])

# Solution from the website
print("Solution from the website")

# Solution 1
np.amax(a, axis=1)

# Solution 2
np.apply_along_axis(np.max, arr=a, axis=1)

Input


array([[9, 9, 4],
       [8, 8, 1],
       [5, 3, 6],
       [3, 3, 3],
       [2, 1, 9]])

Solution


array([9, 8, 6, 3, 9])

Solution from the website


array([9, 8, 6, 3, 9])

array([9, 8, 6, 3, 9])

<a id = 'q57'></a>
**57. How to compute the min-by-max for each row for a numpy array 2d?**

[Go back to the table of contents](#table_of_contents)

In [58]:
# Q. Compute the min-by-max for each row for given 2d numpy array.

# Input
print("Input")
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
a

# Solution
print("Solution")
np.array([min(row)/max(row) for row in a])

# Solution from the website
print("Solution from the website")
np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1) # maybe a little more elegat solution

Input


array([[9, 9, 4],
       [8, 8, 1],
       [5, 3, 6],
       [3, 3, 3],
       [2, 1, 9]])

Solution


array([0.444, 0.125, 0.5  , 1.   , 0.111])

Solution from the website


array([0.444, 0.125, 0.5  , 1.   , 0.111])

<a id = 'q58'></a>
**58. How to find the duplicate records in a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [59]:
# Q. Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True. First time occurrences should be False.

# Input
print("Input")
np.random.seed(100)
a = np.random.randint(0, 5, 10)
a

# Desired Output
# > [False  True False  True False False  True  True  True  True]

# Solution
print("Solution")
print("Ooops, I have not understood the problem. This solution marks False the elements that are duplicated and True the unique ones.")
counts = np.unique(a, return_counts=True)
np.array([True if counts[1][np.where(counts[0] == x)] > 1 else False for x in a])

# Solution from the website
print("Solution from the website")
out = np.full(a.shape[0], True) # Create an all True array
unique_positions = np.unique(a, return_index=True)[1] # Find the index positions of unique elements
out[unique_positions] = False # Mark those positions as False
out

Input


array([0, 0, 3, 0, 2, 4, 2, 2, 2, 2])

Solution
Ooops, I have not understood the problem. This solution marks False the elements that are duplicated and True the unique ones.


array([ True,  True, False,  True,  True, False,  True,  True,  True,
        True])

Solution from the website


array([False,  True, False,  True, False, False,  True,  True,  True,
        True])

<a id = 'q59'></a>
**59. How to find the grouped mean in numpy?**

[Go back to the table of contents](#table_of_contents)

In [60]:
# Q. Find the mean of a numeric column grouped by a categorical column in a 2D numpy array

# Input
# Use the iris dataset provided
print("Input")
print_files()

# Use this if you are working on your local machine
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# iris = np.genfromtxt(url, delimiter=',', dtype='object')
# names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Desired Output
# > [[b'Iris-setosa', 3.418],
# >  [b'Iris-versicolor', 2.770],
# >  [b'Iris-virginica', 2.974]]

# Solution
print("Solution")
iris = np.genfromtxt("/kaggle/input/iris/Iris.csv", delimiter=',', dtype='object', skip_header=1)
sepallength = iris[:,2].astype(float)
names = iris[:,5]
[[name, np.mean(sepallength[np.where(names == name)])] for name in np.unique(names)]

# Solution from the website
print("Solution from the website")
numeric_column = iris[:, 2].astype('float')  # sepalwidth
grouping_column = iris[:, 5]  # species

# List comprehension version
[[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)]

# For Loop version
output = []
for group_val in np.unique(grouping_column):
    output.append([group_val, numeric_column[grouping_column==group_val].mean()])

output


Input
/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


[[b'Iris-setosa', 3.418],
 [b'Iris-versicolor', 2.7700000000000005],
 [b'Iris-virginica', 2.974]]

Solution from the website


[[b'Iris-setosa', 3.418],
 [b'Iris-versicolor', 2.7700000000000005],
 [b'Iris-virginica', 2.974]]

[[b'Iris-setosa', 3.418],
 [b'Iris-versicolor', 2.7700000000000005],
 [b'Iris-virginica', 2.974]]

<a id = 'q60'></a>
**60. How to convert a PIL image to numpy array?**

[Go back to the table of contents](#table_of_contents)

In [61]:
# Q. Import the image from the following URL and convert it to a numpy array.

# Input
print("Input")
print_files()

# Use this if you are working on your local machine
# URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'

# Solution
print("Solution")
from PIL import Image
# pic = Image.open("/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg").convert("L") # works also without convert("L")
pic = Image.open("/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg")
imgarr = np.array(pic) 
imgarr

# Solution from the website
print("Solution from the website")
from io import BytesIO
from PIL import Image
import PIL

# Read it as Image
I = Image.open("/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg")

# Optionally resize
I = I.resize([150,150])

# Convert to numpy array
arr = np.asarray(I)
arr

# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)

Input
/kaggle/input/exercise-60-denali-mt-mckinleyjpg/Denali Mt McKinley.jpg
/kaggle/input/titanic/gender_submission.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/train.csv
/kaggle/input/iris/Iris.csv
/kaggle/input/iris/database.sqlite
Solution


array([[[  9,  72, 125],
        [  9,  72, 125],
        [  9,  72, 125],
        ...,
        [ 42, 103, 147],
        [ 42, 103, 147],
        [ 43, 104, 148]],

       [[  9,  72, 125],
        [  9,  72, 125],
        [ 10,  73, 126],
        ...,
        [ 42, 103, 147],
        [ 42, 103, 147],
        [ 43, 104, 148]],

       [[  9,  72, 125],
        [ 10,  73, 126],
        [ 10,  73, 126],
        ...,
        [ 44, 105, 150],
        [ 45, 106, 151],
        [ 45, 106, 151]],

       ...,

       [[ 21,  41,  50],
        [ 29,  51,  64],
        [ 28,  54,  69],
        ...,
        [ 27,  58,  79],
        [ 22,  53,  74],
        [ 19,  47,  69]],

       [[ 27,  51,  63],
        [ 29,  55,  68],
        [ 27,  56,  72],
        ...,
        [ 33,  70,  97],
        [ 27,  64,  91],
        [ 22,  57,  85]],

       [[ 18,  45,  56],
        [ 20,  48,  62],
        [ 21,  52,  70],
        ...,
        [ 15,  51,  77],
        [ 12,  48,  74],
        [ 11,  45,  72]]

Solution from the website


array([[[  9,  72, 125],
        [ 10,  73, 126],
        [ 11,  74, 127],
        ...,
        [ 44, 104, 154],
        [ 44, 104, 154],
        [ 42, 103, 147]],

       [[ 10,  73, 126],
        [ 11,  74, 127],
        [ 12,  75, 128],
        ...,
        [ 46, 106, 156],
        [ 46, 106, 156],
        [ 48, 109, 154]],

       [[ 11,  74, 127],
        [ 12,  75, 128],
        [ 13,  76, 129],
        ...,
        [ 47, 107, 157],
        [ 47, 107, 157],
        [ 45, 105, 155]],

       ...,

       [[ 15,  28,  36],
        [ 24,  46,  60],
        [ 16,  33,  43],
        ...,
        [ 23,  60,  79],
        [ 39,  73,  98],
        [ 42,  79, 106]],

       [[ 32,  50,  60],
        [ 21,  48,  65],
        [ 27,  49,  62],
        ...,
        [ 23,  46,  54],
        [ 22,  41,  56],
        [ 20,  43,  59]],

       [[ 29,  55,  68],
        [ 26,  59,  78],
        [ 19,  48,  64],
        ...,
        [ 25,  71,  95],
        [ 34,  75, 107],
        [ 27,  64,  91]]

<a id = 'q61'></a>
**61. How to drop all missing values from a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [62]:
# Q. Drop all nan values from a 1D numpy array

# Input
print("Input")
a = np.array([1,2,3,np.nan,5,6,7,np.nan])
a

# Desired Output
# array([ 1.,  2.,  3.,  5.,  6.,  7.])

# Solution
print("Solution")
a[[not np.isnan(x) for x in a]] # create a list with boolean if its np.isnan (not to reverse to True when it's not a nan) and index the original array

# Solution from the website
print("Solution from the website")
a[~np.isnan(a)]

Input


array([ 1.,  2.,  3., nan,  5.,  6.,  7., nan])

Solution


array([1., 2., 3., 5., 6., 7.])

Solution from the website


array([1., 2., 3., 5., 6., 7.])

<a id = 'q62'></a>
**62. How to compute the euclidean distance between two arrays?**

[Go back to the table of contents](#table_of_contents)

In [63]:
# Q. Compute the euclidean distance between two arrays a and b.

# Input
print("Input")
a = np.array([1,2,3,4,5])
b = np.array([4,5,6,7,8])
a
b

# Solution
print("Solution")
dist = np.linalg.norm(a-b)
dist

# The website uses the same solution

Input


array([1, 2, 3, 4, 5])

array([4, 5, 6, 7, 8])

Solution


6.708203932499369

<a id = 'q63'></a>
**63. How to find all the local maxima (or peaks) in a 1d array?**

[Go back to the table of contents](#table_of_contents)

In [64]:
# Q. Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.

# Input
print("Input")
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
a

# Desired Output
# > array([2, 5])
# where, 2 and 5 are the positions of peak values 7 and 6.

# Solution
print("Solution using scipy")
from scipy.signal import find_peaks

ipeaks, _ = find_peaks(a)
ipeaks
a[ipeaks]

# Solution from the website
print("Solution from the website using pure numpy")
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
peak_locations

Input


array([1, 3, 7, 1, 2, 6, 0, 1])

Solution using scipy


array([2, 5])

array([7, 6])

Solution from the website using pure numpy


array([2, 5])

<a id = 'q64'></a>
**64. How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?**

[Go back to the table of contents](#table_of_contents)

In [65]:
# Q. Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.

# Input
print("Input")
a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3])
a_2d
b_1d

# Desired Output
# > [[2 2 2]
# >  [2 2 2]
# >  [2 2 2]]

# Solution
print("Solution")
a_2d - b_1d[:, None]

Input


array([[3, 3, 3],
       [4, 4, 4],
       [5, 5, 5]])

array([1, 2, 3])

Solution


array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

<a id = 'q65'></a>
**65. How to find the index of n'th repetition of an item in an array**

[Go back to the table of contents](#table_of_contents)


In [66]:
# Q. Find the index of 5th repetition of number 1 in x.

# Input
print("Input")
x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
x

# Solution
print("Solution")
np.where(x == 1)[0][4] # we do [0] since np.where returns a tupple, and then find the index of the 5 repetition

# Solution from the website
print("Solution from the website")
n = 5
[i for i, v in enumerate(x) if v == 1][n-1]

print("Solution using numpy")
np.where(x == 1)[0][n-1] # notice that n = 5, and 5 - 1 = 4, the index we did

Input


array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

Solution


8

Solution from the website


8

Solution using numpy


8

<a id = 'q66'></a>
**66. How to convert numpy's datetime64 object to datetime's datetime object?**

[Go back to the table of contents](#table_of_contents)

In [67]:
# Q. Convert numpy's datetime64 object to datetime's datetime object

# Input
print("Input")
dt64 = np.datetime64('2018-02-25 22:10:10')
dt64

# Solution from the website
print("Solution from the website")
from datetime import datetime
dt64.tolist()

# or

dt64.astype(datetime)

Input


numpy.datetime64('2018-02-25T22:10:10')

Solution from the website


datetime.datetime(2018, 2, 25, 22, 10, 10)

datetime.datetime(2018, 2, 25, 22, 10, 10)

<a id = 'q67'></a>
**67. How to compute the moving average of a numpy array?**

[Go back to the table of contents](#table_of_contents)

In [68]:
# Q. Compute the moving average of window size 3, for the given 1D array.

# Input
print("Input")
np.random.seed(100)
a = np.random.randint(10, size=10)
a

# Solution
print("Solution")
# using the solution from https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy/54628145
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

moving_average(a)

# Solution from the website
print("Solution from the website")
np.convolve(a, np.ones(3)/3, mode='valid')

Input


array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])

Solution


array([6.333, 6.   , 5.667, 4.667, 3.667, 2.   , 3.667, 3.   ])

Solution from the website


array([6.333, 6.   , 5.667, 4.667, 3.667, 2.   , 3.667, 3.   ])

<a id = 'q68'></a>
**68. How to create a numpy array sequence given only the starting point, length and the step?**

[Go back to the table of contents](#table_of_contents)

In [69]:
# Q. Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers

# Solution
print("Solution")

np.arange(5, (5 + (10*3)), 3) # first argument is the starting point, second is the end, and the third the step.

Solution


array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

<a id = 'q69'></a>
**69. How to fill in missing dates in an irregular series of numpy dates?**

[Go back to the table of contents](#table_of_contents)

In [70]:
# Q. Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.

# Input
print("Input")
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
dates

# Solution from the website
print("Solution from the website")
# Solution ---------------
filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)

# add the last day
output = np.hstack([filled_in, dates[-1]])
output

# For loop version -------
out = []
for date, d in zip(dates, np.diff(dates)):
    out.append(np.arange(date, (date+d)))

filled_in = np.array(out).reshape(-1)

# add the last day
output = np.hstack([filled_in, dates[-1]])
output

Input


array(['2018-02-01', '2018-02-03', '2018-02-05', '2018-02-07',
       '2018-02-09', '2018-02-11', '2018-02-13', '2018-02-15',
       '2018-02-17', '2018-02-19', '2018-02-21', '2018-02-23'],
      dtype='datetime64[D]')

Solution from the website


array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

<a id = 'q70'></a>
**70. How to create strides from a given 1D array?**

[Go back to the table of contents](#table_of_contents)

In [71]:
# Q. From the given 1d array arr, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]

# Input
print("Input")
arr = np.arange(15) 
arr

# Desired Output
# > [[ 0  1  2  3]
# >  [ 2  3  4  5]
# >  [ 4  5  6  7]
# >  [ 6  7  8  9]
# >  [ 8  9 10 11]
# >  [10 11 12 13]]

# Solution
print("Solution")
index_ = np.arange(0, 15, 2)
arr_ = [[arr[index_[i]:index_[i+2]]] for i in range(6)]
arr_

# Solution from the website
print("Solution from the website")
def gen_strides(a, stride_len=5, window_len=5):
    
    n_strides = ((a.size-window_len)//stride_len) + 1
    
    return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(np.arange(15), stride_len=2, window_len=4))

Input


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Solution


[[array([0, 1, 2, 3])],
 [array([2, 3, 4, 5])],
 [array([4, 5, 6, 7])],
 [array([6, 7, 8, 9])],
 [array([ 8,  9, 10, 11])],
 [array([10, 11, 12, 13])]]

Solution from the website
[[ 0  1  2  3]
 [ 2  3  4  5]
 [ 4  5  6  7]
 [ 6  7  8  9]
 [ 8  9 10 11]
 [10 11 12 13]]


# This is the end. Thank you for following the exercises. I hope you learned a lot of Numpy.