<h1><center>NUMPY</center><h1>
## Questions 
31. How to find the percentile scores of a numpy array?
32. How to insert values at random positions in an array?
33. How to find the position of missing values in numpy array?
34. How to filter a numpy array based on two or more conditions?
35. How to drop rows that contain a missing value from a numpy array?
36. How to find the correlation between two columns of a numpy array?
37. How to find if a given array has any null values?
38. How to replace all missing values with 0 in a numpy array?
39. How to find the count of unique values in a numpy array?
40. How to convert a numeric to a categorical (text) array?
41. How to create a new column from existing columns of a numpy array?
42. How to do probabilistic sampling in numpy?
43. How to get the second largest value of an array when grouped by another array?
44. How to sort a 2D array by a column
45. How to find the most frequent value in a numpy array?
46. How to find the position of the first occurrence of a value greater than a given value?
47. How to replace all values greater than a given value to a given cutoff?
48. How to get the positions of top n values from a numpy array?
49. How to compute the row wise counts of all possible values in an array?
50. How to convert an array of arrays into a flat 1d array

In [1]:
#1 percentile score
import numpy as np
a=np.array([1,2,3,4,5,67,8])

In [2]:
a

array([ 1,  2,  3,  4,  5, 67,  8])

In [3]:
np.percentile(a,35)

3.0999999999999996

In [4]:
#2 insertion at values
a=np.insert(a,5,43)

In [5]:
a

array([ 1,  2,  3,  4,  5, 43, 67,  8])

In [6]:
#3 position of missing values
idx=np.where(np.isnan(a))

In [7]:
idx

(array([], dtype=int64),)

In [8]:
#4 filtering a numpy array
a=a[(a>2) & (a<50)]

In [9]:
a

array([ 3,  4,  5, 43,  8])

In [10]:
#5 drop rows
a[np.all(np.isfinite(a), axis=0)]

array([[ 3,  4,  5, 43,  8]])

In [11]:
a

array([ 3,  4,  5, 43,  8])

In [12]:
b=np.array([1,33,4,5,6,np.nan])

In [13]:
#6 correlation between two columns
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution 1
np.corrcoef(iris[:, 0], iris[:, 2])[0, 1]


0.8717541573048718

In [14]:
#7 find if array has null values

np.isnan(b)

array([False, False, False, False, False,  True])

In [15]:
#8 replace null with 0
b[np.where(np.isnan(b))]=0

In [16]:
b

array([ 1., 33.,  4.,  5.,  6.,  0.])

In [17]:
#9 count unique values
len(np.unique(a))

5

In [18]:
#10 numerical to categorical array
np.array2string(a)

'[ 3  4  5 43  8]'

In [19]:
#11 create new columns from existing columns
np.append(a,np.zeros(1))

array([ 3.,  4.,  5., 43.,  8.,  0.])

In [20]:
# 12 probabilistic sampling in numpy array
np.random.choice(a,5)

array([3, 4, 5, 8, 5])

In [21]:
#13
import heapq
heapq.nlargest(2,a)[1]

8

In [22]:
#14
d=np.array([[1,2],[3,1]])
d[d[:,1].argsort()]

array([[3, 1],
       [1, 2]])

In [23]:
#15
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])

1.5


In [24]:
#16
np.where(a>5)[0][0]

3

In [25]:
#17
cut=10
a[np.where(a>cut)]=20

In [26]:
a

array([ 3,  4,  5, 20,  8])

In [27]:
#18
heapq.nlargest(5,a)

[20, 8, 5, 4, 3]

In [28]:
#19
# Solution
def counts_of_all_values_rowwise(arr2d):
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]

    # Counts of all values row wise
    return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])

# Print
print(np.arange(1,11))
counts_of_all_values_rowwise(a)

[ 1  2  3  4  5  6  7  8  9 10]


[[1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [0, 0, 1, 0, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 1, 0]]

In [29]:
#20
a.flatten()

array([ 3,  4,  5, 20,  8])