# NumPy Essentials

## 1. What is NumPy and why it's important in machine learning


NumPy (Numerical Python) is a powerful library for numerical computing in Python. It provides support for arrays, matrices, and a wide range of mathematical functions to operate on these data structures efficiently. NumPy is essential in machine learning for several reasons:
1. **Efficient Data Handling**: NumPy arrays are more memory-efficient and faster than Python lists, making them ideal for handling large datasets commonly used in machine learning.
2. **Mathematical Operations**: NumPy provides a wide range of mathematical functions that can be applied to arrays, enabling efficient computations required in machine learning algorithms.
3. **Integration with Other Libraries**: Many machine learning libraries, such as TensorFlow and scikit-learn, are built on top of NumPy, making it a fundamental component of the machine learning ecosystem.
4. **Broadcasting**: NumPy's broadcasting feature allows for operations on arrays of different shapes, simplifying code and improving performance.
5. **Linear Algebra Support**: Many machine learning algorithms rely on linear algebra operations, which NumPy handles efficiently.

## 2. NumPy Arrays
NumPy arrays are the core data structure in NumPy. They are similar to Python lists but offer several advantages, including fixed data types, efficient memory usage, and support for vectorized operations.

## 1. Creating Arrays
```python   
import numpy as np
# From a Python list
arr1 = np.array([1, 2, 3, 4, 5])
# From a nested list (2D array)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Using built-in functions
arr3 = np.zeros((3, 4))  # 3x4 array of zeros
arr4 = np.ones((2, 3))   # 2x3 array of ones
arr5 = np.arange(0, 10, 2)  # Array with values from 0 to 10 with step 2
arr6 = np.linspace(0, 1, 5)  # 5 values evenly spaced between 0 and 1
```
## 2. Array Attributes
```python
# Shape of the array
print(arr1.shape)  # (5,)
print(arr2.shape)  # (2, 3)
# Data type of the array
print(arr1.dtype)  # int64
print(arr3.dtype)  # float64
# Number of dimensions
print(arr1.ndim)   # 1
print(arr2.ndim)   # 2
# Size (total number of elements)
print(arr1.size)   # 5
print(arr2.size)   # 6
```
## 3. Array Operations
NumPy supports a wide range of operations on arrays, including arithmetic operations, statistical functions, and linear algebra operations.
```python
# Arithmetic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)  # [5 7 9]
print(a * b)  # [ 4 10 18]
print(a - b)  # [-3 -3 -3]
print(a / b)  # [0.25 0.4  0.5]
# Statistical functions
print(np.mean(a))  # 2.0
print(np.median(b))  # 5.0
print(np.std(a))  # 0.816496580927726
print(np.correlate(a, b))  # 1.0400000000000002
# Linear algebra operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(np.dot(A, B))  # Matrix multiplication
print(np.linalg.inv(A))  # Inverse of matrix A
```
## 4. Universal Functions (ufuncs) & Adding, removing, sorting elements
Universal functions are functions that operate element-wise on arrays. They are optimized for performance and can handle broadcasting.
```python
# Example of ufuncs
arr = np.array([1, 2, 3, 4, 5])
print(np.sqrt(arr))  # Square root
print(np.exp(arr))   # Exponential
print(np.log(arr))   # Natural logarithm
print(np.sin(arr))   # Sine function   
print(np.cos(arr))   # Cosine function
# Adding elements
arr = np.array([1, 2, 3])
arr = np.append(arr, [4, 5])  # Add elements to the end
print(arr)  # [1 2 3 4 5]
# Removing elements
arr = np.array([1, 2, 3, 4, 5])
arr = np.delete(arr, [0, 2])  # Remove elements at index 0 and 2
print(arr)  # [2 4 5]
# Sorting elements
arr = np.array([3, 1, 4, 2, 5])
sorted_arr = np.sort(arr)  # Sort the array
print(sorted_arr)  # [1 2 3 4 5]
```

## 5. Indexing and Slicing
NumPy arrays can be indexed and sliced similarly to Python lists, but with additional capabilities for multi-dimensional arrays.
```python
# 1D array indexing and slicing
arr = np.array([10, 20, 30, 40, 50])
print(arr[0])    # 10
print(arr[1:4])  # [20 30 40]
print(arr[-1])   # 50
# 2D array indexing and slicing
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d[0, 1])    # 2
print(arr2d[1:, :2])  # [[4 5]
                      #  [7 8]]
print(arr2d[:, 2])    # [3 6 9]
```
## 6. Reshaping and Transposing
NumPy provides functions to reshape and transpose arrays, which can be useful for preparing data for machine learning models.
```python
# Reshaping an array
arr = np.arange(12)  # 1D array with 12 elements
reshaped_arr = arr.reshape((3, 4))  # Reshape to 3x4
print(reshaped_arr)
# Transposing an array
transposed_arr = reshaped_arr.T  # Transpose the array
print(transposed_arr)
```
## 7. Broadcasting
Broadcasting is a powerful feature in NumPy that allows for operations on arrays of different shapes.
```python
a = np.array([1, 2, 3])
b = np.array([[10], [20], [30]])
# Broadcasting allows us to add a (3,) array to a (3,1) array
result = a + b
print(result)
# Output:
# [[11 12 13]
#  [21 22 23]
#  [31 32 33]]
```
## 8. Random Number Generation
NumPy includes a module for generating random numbers, which is useful for initializing weights in machine learning models and for creating synthetic datasets.
```python
# Generating random numbers
rand_arr = np.random.rand(3, 4)  # 3x4 array of random floats in [0.0, 1.0)
rand_int_arr = np.random.randint(0, 10, size=(2, 3))  # 2x3 array of random integers in [0, 10)
print(rand_arr) 
print(rand_int_arr)
```

# 3. NumPy with objects, CSV files
NumPy can also handle arrays of objects, such as strings or mixed data types. Additionally, NumPy can read and write data to and from CSV files, which is useful for data preprocessing in machine learning workflows.

```python
# Creating an array of strings
arr = np.array(['apple', 'banana', 'cherry'])
print(arr)

# Reading from a CSV file
data = np.genfromtxt('data.csv', delimiter=',', dtype=None, encoding='utf-8')
print(data)

# Writing to a CSV file
np.savetxt('output.csv', data, delimiter=',', fmt='%s')

## 4. Examples of Numpy in Machine Learning
### 1-Linear Regression using NumPy
we can implement a simple linear regression model using NumPy to understand how it works under the hood, but why we used it? to perform efficient numerical computations, handle arrays and matrices, and utilize mathematical functions that are essential for implementing the linear regression algorithm.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# Generate some sample data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Add bias term (intercept)
X_b = np.c_[np.ones((100, 1)), X]  # add x0 = 1 to each instance
# Calculate weights using the Normal Equation
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
# Make predictions
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]  # add x0 = 1 to each instance
y_predict = X_new_b.dot(theta_best)
# Plot the results
plt.plot(X_new, y_predict, "r-")
plt.plot(X, y, "b.")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression using NumPy")
plt.show()

### 2-Data preprocessing using NumPy
Data preprocessing is a crucial step in machine learning, and NumPy provides efficient tools for handling and transforming data. Here are some common data preprocessing tasks that can be performed using NumPy.

In [None]:
# Feature scaling abd normalization
features = np.array([[1, 2], [3, 4], [5, 6]])
normalized = (features - np.mean(features, axis=0)) / np.std(features, axis=0)
print("Normalized Features:\n", normalized)
# Handling missing values
data = np.array([[1, 2, np.nan], [4, np.nan, 6], [7, 8, 9]])
# Replace NaN with column mean
col_mean = np.nanmean(data, axis=0)
inds = np.where(np.isnan(data))
data[inds] = np.take(col_mean, inds[1])
print("Data after handling missing values:\n", data)
# One hot encoding
categories = np.array(['cat', 'dog', 'cat', 'bird'])
unique_categories = np.unique(categories)
one_hot_encoded = np.zeros((categories.shape[0], unique_categories.shape[0]))
for i, category in enumerate(categories):
    one_hot_encoded[i, np.where(unique_categories == category)[0][0]] = 1
print("One Hot Encoded:\n", one_hot_encoded)

### 3-K means clustering using NumPy
K-means clustering is an unsupervised machine learning algorithm used to group similar data points into clusters. NumPy can be used to implement the K-means algorithm efficiently. Here's a simple implementation of K-means clustering using NumPy:

In [None]:
# Distance calculations using broadcasting
data_points = np.array([[1, 2], [3, 4], [5, 6]])
centroids = np.array([[2, 3], [4, 5]])
distances = np.sqrt(((data_points[:, np.newaxis, :] - centroids[np.newaxis, :, :]) ** 2).sum(axis=2))
print("Distances:\n", distances)
# Assign clusters based on minimum distance
clusters = np.argmin(distances, axis=1)
print("Cluster Assignments:\n", clusters)

### 4-Neural Networks Activation Functions
Neural networks often use activation functions to introduce non-linearity into the model. NumPy can be used to implement common activation functions such as sigmoid, ReLU, and softmax. Here's how you can implement these activation functions using NumPy:

In [None]:
def relu(x):
    return np.maximum(0, x)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / exp_x.sum(axis=1, keepdims=True)
# Example usage
z = np.array([[1, 2, 3], [0.1, 0.2, 0.3]])
print("ReLU:\n", relu(z))
print("Sigmoid:\n", sigmoid(z))
print("Softmax:\n", softmax(z))

For more information, please refer to the [NumPy Documentation](https://numpy.org/doc/2.3/).