# Numpy For Data Science

## Introduction

##### What is Numpy?

- Numpy is a Python library used for working with arrays.
- It also has functions for working in domain of linear algebra, fourier transform, and matrices.
- It is an open source project and you can use it freely.
- It is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.
- NumPy stands for Numerical Python.

##### Features of Numpy

- NumPy is faster than lists.
- NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.
- This behavior is called locality of reference in computer science.
- This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.
- NumPy uses much less memory to store data and it provides a mechanism of specifying the data types.
- This allows the code to be optimized even further.

![image.png](attachment:image.png)

##### NumPy Array vs Python List

![image.png](attachment:image.png)

![image.png](attachment:image.png)

##### When To Use Python Lists?

- When you need to work with data of different data types (such as strings, integers, and booleans).
- When you need to add or remove elements from the collection.
- When you need to work with a small amount of data.

##### When To Use NumPy Arrays?

- When you need to perform mathematical or scientific operations on large arrays of data.
- When you need to work with homogeneous data types (such as floats or integers).
- When you need to manipulate or reshape arrays quickly and efficiently.

In [None]:
#create numpy array

import numpy as np

#1D

a = np.array([1,2])
a


#2D

b = np.array([[1,2],[2,3]])
type(b)

#3d

c = np.array([[[1,2],[2,3],[4,5]]])
c


In [None]:
#list to numpy array

list = [1,2,3]

print(type(list))

a = np.array(list)

type(a)


In [None]:
#zero array

np.zeros((2,5))

#ones array

np.ones((2,5))

#full array

#eye

np.eye(2)

In [None]:
#arange

# np.arange(5)

# np.arange(5,10)

# np.arange(2,11,2)

![image.png](attachment:image.png)

In [None]:
# np.linspace(0, 1, 5)

# np.linspace(-10, 10, 5)

# np.linspace(1, 2, 7)

![image.png](attachment:image.png)

In [None]:
#random number

# np.random.rand(2,3,5)

# np.random.randint(1,100,10)

# np.random.randn(2)

np.random.randn(5,5)

### Reshape

In [None]:
a = np.array([[1,2,3,4,5,6],[1,2,4,5,6,7]])

# a.shape

# print(a.shape)

#a.shape

a.size

In [None]:
arr1d = np.array([1,2,3,4,5,6])

arr2d = arr1d.reshape(3,2)

print(arr1d)

print(arr2d)

In [None]:
arr1d = np.arange(12)

arr3d = arr1d.reshape(3,2,2)

print(arr1d)

print(arr3d)

In [None]:
arr3d = np.array([[[1, 2], 
                   [3, 4]], 
                   [[5, 6], 
                    [7, 8]]])

print(arr3d.shape)

arr2d = arr3d.reshape((4, 2))

print(arr2d)

![image.png](attachment:image.png)
![image.png](attachment:image-2.png)

### Numpy Array Indexing

In [None]:
a = np.array([1,2,3,4])

#Access Single Element
# print(a[2])

# #Access a slice 
# print(a[1:])

# #Aceess elemets based on condition
print(a[a>2])

### 2D Array Indexing

In [None]:
a = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9]]
            )

#Access Single Element
# print(a[1,0])
# a[1][0]

#Access a slice
# print(a[:2,1:])

# [1,2,3]
# [4,5,6]
# 

# [2,3]
# [5,6]
# [8,9]
#  #



# a[:2][1:]

# [1,2,3]
# [4,5,6]
# 

# [2,3]
# [5,6]
# [8,9]

# #Access Row

# a[0]

# #Access Column

# a[:,0]

# #Aceess elemets based on condition

a[a%2==0]

### NumPy Broadcasting

##### What?

- Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes.
- The smaller array is broadcast across the larger array so that they have compatible shapes.
- Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.
- It does this without making needless copies of data and usually leads to efficient algorithm implementations.

##### Why?

- Broadcasting is used to perform operations on arrays of different shapes.
- It is used to perform operations on arrays of different shapes.

##### Rules

- If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
- If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

![image.png](attachment:image.png)

In [None]:
import numpy as np

# Creating a 3x3 matrix
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Creating a 1D array
b = np.array([10, 20, 30])

# Adding the 1D array to each row of the matrix using broadcasting
c = a + b

#   [10,20,30]
#   [10,20,30]
#   [10,20,30]

print("Matrix A:\n", a)
print("Array B:\n", b)
print("Matrix C:\n", c)

### Fancy Indexing

In [None]:
# boolean mask

arr = np.array([1, 2, 3, 4, 5, 6])
mask = np.array([True, False, True, False, True, False])
print(arr[mask])

In [None]:
# integer array mask

arr = np.array([1, 2, 3, 4, 5, 6])
idx = np.array([0, 3, 5])
print(arr[idx])

In [None]:
# Combination of Boolean Mask & Integer Array

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mask = np.array([True, False, True])
idx = np.array([0, 1])
# print(arr[mask, idx])

print(arr[mask][idx])


# 0 [1,2,3]
# 1 [7,8,9]



In [None]:
arr2d = np.zeros((10,10))
arr2d

In [None]:
arr_length = arr2d.shape[1]
arr_length

In [None]:
for i in range(arr_length):
    arr2d[i] = i
    
arr2d

In [None]:
arr2d[[2,4,6,8]]

In [None]:
arr2d

In [None]:
arr2d[[6,4,2,7]]


### Numpy Array Operations

In [None]:
a = [1,2,3]
b = [4,5,6]

# type(a)

# a + b
np.add(a,b)


In [None]:
a = np.array([1,2,3])
b = np.array([4,5,6])

# a + b


# np.sqrt(a)
# np.exp(a)
np.sin(a)


for more information visit: https://numpy.org/doc/stable/reference/routines.math.html

In [None]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([4, 8, 6, 7, 8])
# Aggregate functions

np.corrcoef(arr1,arr2)

# np.sum(arr)
# np.max(arr)
# np.mean(arr)
# np.var(arr)
# np.argmax(arr)
# np.argmin(arr)

In [None]:
arr = np.array([[1, 2], 
                [3, 4], 
                [5, 6]])

print(np.sum(arr, axis=0))

print(np.sum(arr, axis=1))

print(np.sum(arr))

In [None]:
# linear algebra functions

a = np.array([[1,2],
              [3,4]])
b = np.array([[4,5],[6,7]])

np.matmul(a,b)
np.transpose(a)

np.linalg.inv(a)

np.linalg.eig(a)



In [None]:
b = np.array([70, 75, 80, 85, 90, 95, 100, 85, 80, 75, 
                   70, 75, 80, 85, 90, 95, 100, 85, 80, 75,
                   70, 75, 80, 85, 90, 95, 100, 85, 80, 75,
                   70, 75, 80, 85, 90, 95, 100, 85, 80, 75,
                   70, 75, 80, 85, 90, 95, 100, 85, 80, 75])


print(np.histogram(b,bins=[0, 60, 70, 80, 90, 100]))

import matplotlib.pyplot as plt

plt.hist(b, bins=[0, 60, 70, 80, 90, 100])
plt.show()

### Project

We want to calculate the average grade for each assignment/exam and determine which students received a grade above the class average on each assignment/exam.

In [None]:
import numpy as np

data = np.loadtxt("C:/Users/91926/OneDrive/Desktop/StudentsPerformance.csv", delimiter=',')


In [None]:
avg_grades = np.mean(data,axis=0)
avg_grades

In [None]:
# create boolean array for grades above class average
above_avg = data > avg_grades

# count number of students above class average for each assignment/exam
num_above_avg = np.sum(above_avg, axis=0)

# print results
for i in range(len(avg_grades)):
    print("For assignment/exam {}, the average grade was {:.2f} and {} students scored above average".format(i+1, avg_grades[i], num_above_avg[i]))