### Numpy
NumPy stands for "Numerical Python". It provides fast and efficient array operations of homogeneous data. It can perform typical numerical calculations on multidimensional arrays along with some other sophisticated functions. It provides ease working with linear algebra and carrying out complex calculations like Matrix inversion, Fourier Transfer, etc.

Lets see, How NumPy different from lists:

NumPy supports vectorized operations such as element-wise addition, substraction, etc which is not the case with basic lists.

In [1]:
l1 = [1,2,3,4]
l2 = [5,6,7,8]
print(l1 + l2)

[1, 2, 3, 4, 5, 6, 7, 8]


In [2]:
l1 * 3

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

In [3]:
import numpy as np

In [4]:
arr1 = np.array(l1)
arr2 = np.array(l2)
print(arr1 + arr2)

[ 6  8 10 12]


In [5]:
arr1 - arr2

array([-4, -4, -4, -4])

In [6]:
arr1 * arr2

array([ 5, 12, 21, 32])

In [7]:
arr = np.array([1,2,3,4])
arr

array([1, 2, 3, 4])

In [8]:
print(type(arr))

<class 'numpy.ndarray'>


In [9]:
arr + 10

array([11, 12, 13, 14])

In [10]:
arr * 3

array([ 3,  6,  9, 12])

# Numpy Object Creation:
Calculations using NumPy are peeformed using nd-array (n-dimensional array) object which can take any number of dimensions.

In [20]:
a = np.array([1,2,3,4])
b = np.array([[1,2,3,4],[5,6,7,8]])
print(a)
print('-------')
print(b)

[1 2 3 4]
-------
[[1 2 3 4]
 [5 6 7 8]]


#### Shape:

In [21]:
print(a.shape)

(4,)


In [22]:
print(b.shape)

(2, 4)


#### Dimension:

In [23]:
print(a.ndim)

1


In [24]:
print(b.ndim)

2


In [25]:
a.dtype

dtype('int32')

In [26]:
b.dtype

dtype('int32')

## Matrix Creation:

Moving ahead let's learn creation of a matrix using NumPy. There are three methods:

**Method 1**: Using NumPy array to form a matrix.

**Method 2**: Using NumPy's inbuilt matrix function.

**Method 3**: Using miscellaneois functions such as zeros(), ones(), etc.

* Method 1: Using array and reshape to convert array into matrix

In [27]:
np.array([5,12,32,54,21,24])

array([ 5, 12, 32, 54, 21, 24])

In [28]:
np.array([5,12,32,54,21,24]).reshape(2,3)

array([[ 5, 12, 32],
       [54, 21, 24]])

In [29]:
np.array([5,12,32,54,21,24]).reshape(3,2)

array([[ 5, 12],
       [32, 54],
       [21, 24]])

* Method II: Using matrix function

In [30]:
n = np.matrix([[1,2],[3,4]])
n

matrix([[1, 2],
        [3, 4]])

In [31]:
p = np.matrix([[1,2,3],[3,4,5]])
print(p)
p.shape

[[1 2 3]
 [3 4 5]]


(2, 3)

In [32]:
p.reshape(3,2)

matrix([[1, 2],
        [3, 3],
        [4, 5]])

* Method III: Using misc. functions

In [33]:
np.eye(3) # Identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [34]:
np.zeros((4,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [41]:
q = np.ones((3,3),dtype=np.int8)
q

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int8)

In [36]:
q.dtype

dtype('int32')

arange()

In [37]:
np.arange(1,20)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [38]:
np.arange(1,20,2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

##### Practice Exercise
1. Store the marks and IDs of three students as NumPy arrays. The marks are [20, 30, 40] and IDs are [0,2,4]. Store array for marks in a variable marks and the one for IDs in another variable ids. Display both marks and ids.

In [42]:
marks = np.array([20,30,40])
ids = np.array([0,2,4])
print('Marks:',marks)
print('IDs:',ids)

Marks: [20 30 40]
IDs: [0 2 4]


### Indexing and Slicing:

In [43]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [44]:
a[0]

array([1, 2, 3])

In [45]:
# Pull out second element of third row
a[2][1]

8

In [46]:
# Pull out first two rows and columns
a[:2,:2]

array([[1, 2],
       [4, 5]])

In [47]:
a[:1,:2]

array([[1, 2]])

In [48]:
a[:2,1:]

array([[2, 3],
       [5, 6]])

In [49]:
a[2,:]

array([7, 8, 9])

### Integer Array Indexing:

In [50]:
a = np.array([[1,2],[3,4],[5,6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [51]:
a[[0,1,2],[0,1,0]]

array([1, 4, 5])

In [52]:
a[[1,2,0],[1,0,1]]

array([4, 5, 2])

In [53]:
np.array([a[0,0],a[1,1],a[2,0]])

array([1, 4, 5])

In [54]:
np.array([a[0,1],a[0,1]])

array([2, 2])

### Boolean Indexing:

In [55]:
a = np.array([[4,7,1],[2,5,7],[7,1,1]])
print(a)
mask = a > 3
print(mask)

[[4 7 1]
 [2 5 7]
 [7 1 1]]
[[ True  True False]
 [False  True  True]
 [ True False False]]


In [56]:
print(a[mask])

[4 7 5 7 7]


### Vectorization:

In [57]:
a

array([[4, 7, 1],
       [2, 5, 7],
       [7, 1, 1]])

In [58]:
a[a > 3]

array([4, 7, 5, 7, 7])

In [59]:
# Creating two arrays for operations

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[10,11,12],[13,14,15],[16,17,18]])
print(a)
print(b)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[10 11 12]
 [13 14 15]
 [16 17 18]]


In [60]:
a + b

array([[11, 13, 15],
       [17, 19, 21],
       [23, 25, 27]])

In [61]:
a - b

array([[-9, -9, -9],
       [-9, -9, -9],
       [-9, -9, -9]])

In [62]:
a * b

array([[ 10,  22,  36],
       [ 52,  70,  90],
       [112, 136, 162]])

In [63]:
a / b

array([[0.1       , 0.18181818, 0.25      ],
       [0.30769231, 0.35714286, 0.4       ],
       [0.4375    , 0.47058824, 0.5       ]])

In [64]:
a = np.array([[1,4,9],[16,25,36]])
np.sqrt(a)

array([[1., 2., 3.],
       [4., 5., 6.]])

**These functions can also be as**:

1. np.add(a,b)
2. np.substract(a,b)
3. np.multiply(a,b)
4. np.divide(a,b)
5. np.sqrt(a)

### Practice Exercise:

We have a buy sell problem:

Initialize an array [[40, 35, 20], [21, 48, 70]] which constitutes the prices on 2 consecutive day at 3 different sessions of the day. The objective is to buy at minimum price on day 1 and sell at maximum on day 2.

1. Find the minimum price on day 1.
2. Find the maximum price on day 2.
3. Calculate the profit and print it.

In [None]:
import numpy as np

In [None]:
arr = np.array([[40, 35, 20], [21, 48, 70]])
arr

In [None]:
buy_price = arr[0].min()
buy_price

In [None]:
sell_price = arr[1].max()
sell_price

In [None]:
profit = sell_price - buy_price
print('Profit:',profit)

### Axes Notation

In [65]:
a = np.array([[2,5,7],[4,25,30]])
a

array([[ 2,  5,  7],
       [ 4, 25, 30]])

In [66]:
# computes sum over columns
a.sum(axis = 0)

array([ 6, 30, 37])

In [67]:
# computes sum over rows
a.sum(axis = 1)

array([14, 59])

In [68]:
# total sum
a.sum()

73

### Broadcasting:

In [69]:
np.arange(1,20)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [70]:
np.arange(1,20,2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

In [73]:
np.arange(3)

array([0, 1, 2])

In [72]:
np.arange(3) + 4

array([4, 5, 6])

In [74]:
np.ones((3,3)) + np.arange(3)

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [75]:
np.arange(3) 

array([0, 1, 2])

In [76]:
np.arange(3).reshape(3,1)

array([[0],
       [1],
       [2]])

In [77]:
np.arange(3).reshape(3,1) + np.arange(3)

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

### Mini Project:
Let us build a project to learn more about NumPy:

Features of the data set:

Age: Age of the person 

education-num:    No. of years of education they had

race: Person's race 0 - Amer-Indian-Eskimo <br>
                    1 - Asian-Pac-Islander <br>
                    2 - Black <br>
                    3 - Other <br>
                    4 - White <br>

sex:    Person's gender
        0 - Female <br>
        1 - Male <br>
        
capital-gain: Income from investment sources, apart from wages/salary

capital loss: Losses from investment sources, apart from wages/salary

hours-per-week:    No. of hours per week the person works

income:    Annual Income of the person <br>
        0 : Less than or equal to 50K <br>
        1 : More than 50K <br>

In [None]:
data_file = 'makeSenseOfCensus.csv'

In [None]:
# Loading the Data
data = np.genfromtxt(data_file,delimiter=',',skip_header=1)
print(data)

In [None]:
type(data)

In [None]:
data.shape

#### Append the Data
Append 'new_record' (given) to 'data' using "np.concatenate()"

In [None]:
new_record = [[50,9,4,1,0,0,40,0]]
new_data = np.concatenate((data,new_record),axis = 0)
print(new_data)

In [None]:
new_data.shape

In [None]:
# Create a new array called 'age' by taking only age column
# (age is the column with index 0)of 'census' array.
age = new_data[:,0]
age

In [None]:
# Find the max age and store it in a variable called 'max_age'.
max_age = age.max()
print('Maximum age:',max_age)

In [None]:
# Find the min age and store it in a variable called 'min_age'.
min_age = age.min()
print('Minimum age:',min_age)

In [None]:
# Find the mean age and store it in a variable called 'mean_age'.
mean_age = age.mean()
print('Average age:',mean_age)

As per govt. records citizens above 60 should not work more than 25 hours a week. Let us check if the policy is in place

In [None]:
senior_citizens = age[age>60]
senior_citizens

In [None]:
num_of_senior_citizen =  len(senior_citizens)
num_of_senior_citizen

In [None]:
working_hour = new_data[:,6]
working_hour_sum = np.sum(working_hour[age > 60])
print('Total working hours of Senior Citizens:',working_hour_sum)

In [None]:
avg_working_hours = working_hour_sum / num_of_senior_citizen
print(avg_working_hours)

In [None]:
if avg_working_hours <= 25:
    print('Govt policy is followed')
else:
    print('Govt policy is not followed')

# check that higher educated people have better pay in general.

In [None]:
edu = new_data[:,1]
high = edu[edu > 10]
low = edu[edu <= 10]
print(len(high))
print(len(low))

In [None]:
income = new_data[:,7]
avg_pay_high = np.mean(income[edu>10])
print(avg_pay_high)
avg_pay_low = np.mean(income[edu<=10])
print(avg_pay_low)

In [None]:
if avg_pay_high > avg_pay_low:
    print('Higher educated people have better pay in general')
else:
    print('Higher educated people do not have better pay in general')

In [None]:
import numpy as np
data_file = 'KAG_Conversion_Data.csv'

In [None]:
data_str = np.genfromtxt(data_file,delimiter=',',skip_header = 1,dtype='str')
data_float = np.genfromtxt(data_file,delimiter=',',skip_header = 1,dtype='float')

In [None]:
print(data_str)

In [None]:
print(data_float)

- Load the data. Data is already given to you in variable path

- How many unique ad campaigns (xyzcampaignid) does this data contain ? And for how many times was each campaign run ?

- What are the age groups that were targeted through these ad campaigns?

- What was the average, minimum and maximum amount spent on the ads?

- What is the id of the ad having the maximum number of clicks ?

- How many people bought the product after seeing the ad with most clicks? Is that the maximum number of purchases in this dataset?

- So the ad with the most clicks didn't fetch the maximum number of purchases. Find the details of the product having maximum number of purchases

Features:

1. ad_id:    unique ID for each ad
2. xyzcampaignid:    an ID associated with each ad campaign of XYZ company
3. fbcampaignid:    an ID associated with how Facebook tracks each campaign
4. age:    age of the person to whom the ad is shown
5. gender:    gender of the person to whom the add is shown
6. interest:    a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile)
7. Impressions:    the number of times the ad was shown
8. Clicks:    number of clicks on for that ad
9. Spent:    Amount paid by company xyz to Facebook, to show that ad
10. Total conversion:    Total number of people who enquired about the product after seeing the ad
11. Approved conversion:    Total number of people who bought the product after seeing the ad

In [None]:
# How many unique ad campaigns (xyzcampaignid) does this data contain ? And for how many times was each campaign run ?
xyz = data_str[:,1]
unique_id = np.unique(xyz)
print(unique_id)

In [None]:
unique_id,count = np.unique(xyz, return_counts = True)
print(unique_id)
print(count)

In [None]:
for i in range(len(unique_id)):
    print('{} ran {} times'.format(unique_id[i],count[i]))

In [None]:
# What are the age groups that were targeted through these ad campaigns?
age_group = data_str[:,3]
print(age_group)

In [None]:
np.unique(age_group)

In [None]:
# What was the average, minimum and maximum amount spent on the ads?
spent = data_float[:,8]
avg_spent = spent.mean()
min_spent = spent.min()
max_spent = spent.max()
print('Average amount spent: ', avg_spent)
print('Minimum amount spent: ', min_spent)
print('Maximum amount spent: ', max_spent)

In [None]:
# What is the id of the ad having the maximum number of clicks ?
id_col = data_float[:,0]
clicks = data_float[:,7]
print('Maximun no of clicks',clicks.max())
print('Id of the ad having maximum no of clicks',id_col[clicks == clicks.max()])