<a href="https://colab.research.google.com/github/cedamusk/AI-N-ML/blob/Tools/NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is NumPy
**Numpy** (Numerical Python) is a popular Python Library used for mumerical computations. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently. NumPy is the foundation for many other Python libraries, such as Pandas, scikit-learn, TensorFlow, and PyTorch, making it essential for data science and machine learning.

##Core Features of NumPy
1. **N-dimensionl array (`ndarray`)**: Enables efficient storage and manipulation of homogenous data types in multiple dimensions.

2. **Mathematical Functions**: Offers a wide reange of functions for statistical, linear algebra and trigonometric computations.

3. **Broadcasting**: Simplifies element-wise operations on arrays of different shapes.

4. **High Performance**: Uses low-level C and Fortran code for speed, outperforming native Python lists for numerical tasks.

##NumPy in Machine Learning
In Machine Learning, data is often represented as numerical arrays or matrices, and operations such as matrix multiplication, statistical analysis and data scaling need to be performed efficiently. NumPy provides a foundation for manipulating and preparing datasets, functions for linear algebra, which is critical for algorithms such as regression, clustering and neural networks and provides a high performance that enables faster development and training for ML models.

##Use cases in ML
1. **Data Preparation**
*  **Loading Datasets**: NumPy can handle data in CSV, text, or binary format.

* **Data Cleaning**: Perform operations like replacing missing values, filtering rows, or scaling features.

* **Data Transformation**: Convert raw data into numerical formats, normalize, or apply feature engineering.

2. **Mathematical Computations**
* **Vectorized Operations**: Perform arithmetic across entire datasets without writing explicit loops.

* **Matrix Operations**: Useful for linear regression, SVMs and deep learning where matrix multiplication and inversion are critical.

* **Statistical Analysis**: Compute mean, median, standard deviation, or variance to analyze datasets.

3. **Feature Scaling**: Rescaling features to standardize their range (e.g., normalization or z-score normalization).

4. **Implementation of algorithms**: Instead of using ML frameworks, one can use NumPy to build algorithms from scratch.

5. **Performance Optimization**: Use NumPy arrays to speed up custome implementations of ML models over using Python Lists.



## NumPy Guide
**Install NumPy**


In [5]:
!pip install numpy



**Import NumPy**

In [None]:
import numpy as np

### Basic NumPy operations

**Creating Arrays**

In [7]:
#Create a 1D array
array_1d=np.array([1,2,3,4,5])
print("1D array:", array_1d)

#Create a 2D array (matrix)
array_2d=np.array([[1,2,3],
                   [4,5,6]])
print("\n2D array:\n", array_2d)

#Create arrays with specific values
zeros=np.zeros((3,3)) #3 by 3 array of zeros
ones=np.ones((2,2)) #2 by 2 array of ones
random=np.random.rand(3,3) #3 by 3 array of random values

print("\nZeros:\n", zeros)
print("\nOnes\n", ones)
print("\nRandom:\n", random)

1D array: [1 2 3 4 5]

2D array:
 [[1 2 3]
 [4 5 6]]

Zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Ones
 [[1. 1.]
 [1. 1.]]

Random:
 [[0.00803289 0.62444621 0.02780389]
 [0.4356554  0.40285531 0.94543718]
 [0.4971037  0.08185212 0.25949347]]


**Basic Operations**

In [8]:
#Array arithmetic
a=np.array([1,2,3])
b=np.array([4,5,6])

print("Addition:", a+b)
print("Multiplication:", a*b)
print("Square root:", np.sqrt(a))
print("Exponential:", np.exp(a))

#Matrix Operations
matrix_a=np.array([[1,2], [3,4]])
matrix_b=np.array([[5,6], [7,8]])

print("\nMatrix multiplication:\n", np.dot(matrix_a, matrix_b))
print("\nTranspose:\n", matrix_a.T)

Addition: [5 7 9]
Multiplication: [ 4 10 18]
Square root: [1.         1.41421356 1.73205081]
Exponential: [ 2.71828183  7.3890561  20.08553692]

Matrix multiplication:
 [[19 22]
 [43 50]]

Transpose:
 [[1 3]
 [2 4]]


### Working with Data

**Data Loading and manipulation**

In [9]:
#Create sample dataset
data=np.array([[1,2,3],
               [4,5,6],
               [7,8,9],
               [10,11,12]])

#Accessing elements
print("First row:", data[0])
print("First Column:", data[:, 0])
print("Specific element:", data[1,1])

#Slicing
print("\nFirst two rows:\n", data[:2])
print("\nLast two columns:\n", data[:, 1:])

#Reshaping
reshaped=data.reshape(2,6)
print("\nReshaped array:\n", reshaped)

First row: [1 2 3]
First Column: [ 1  4  7 10]
Specific element: 5

First two rows:
 [[1 2 3]
 [4 5 6]]

Last two columns:
 [[ 2  3]
 [ 5  6]
 [ 8  9]
 [11 12]]

Reshaped array:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]


**Data Preprocessing**

In [10]:
#Normalize data (scale to 0-1 range)
def normalize(data):
  min_vals=np.min(data, axis=0)
  max_vals=np.max(data, axis=0)
  normalized=(data-min_vals)/(max_vals-min_vals)
  return normalized

#Standardized data (zero mean, unit variance)
def standardize(data):
  mean=np.mean(data, axis=0)
  std=np.std(data, axis=0)
  standardized=(data-mean)/std
  return standardized

#Example usage
raw_data=np.array([[1, 10, 100],
                   [2, 20, 200],
                   [3, 30, 300]])

print("Normalized data:\n", normalize(raw_data))
print("\nStandardized data:\n", standardize(raw_data))

Normalized data:
 [[0.  0.  0. ]
 [0.5 0.5 0.5]
 [1.  1.  1. ]]

Standardized data:
 [[-1.22474487 -1.22474487 -1.22474487]
 [ 0.          0.          0.        ]
 [ 1.22474487  1.22474487  1.22474487]]


**Statistical Operations**

In [11]:
data=np.random.normal(0, 1, 1000)

#Basic stats
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Standard deviation:", np.std(data))
print("Variance:", np.var(data))

#Percentiles
percentiles=np.percentile(data, [25, 50, 75])
print("\nQuartiles:", percentiles)

#Correlation
data_2d=np.random.rand(100,3)
correlation=np.corrcoef(data_2d.T)
print("\nCorrelation matrix:\n", correlation)

Mean: -0.03321690110323572
Median: -0.029637261193196415
Standard deviation: 1.0082805546063294
Variance: 1.0166296767972474

Quartiles: [-0.70105876 -0.02963726  0.62569705]

Correlation matrix:
 [[ 1.         -0.19262576  0.02536167]
 [-0.19262576  1.          0.11429591]
 [ 0.02536167  0.11429591  1.        ]]


**Simple Linear Regression from scratch**

In [14]:
#generate sample data
np.random.seed(0)
X=np.random.rand(100, 1)
y=2*X+1+np.random.normal(0, 0.1, (100,1))

#Implement linear regression
def linear_regression(X, y):
  #add bias term
  X_b=np.c_[np.ones((X.shape[0], 1)), X]

  #Calculate weights using normal equation
  weights=np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

  return weights

#Train model
weights=linear_regression(X, y)
print("Linear weights (bias, slope):", weights.flatten())

#Make predictions
X_new=np.array([[0], [1]])
X_new_b=np.c_[np.ones((2,1)), X_new]
y_pred=X_new_b.dot(weights)

print("\nPredictions for X=0 and X=1:", y_pred.flatten())


Linear weights (bias, slope): [1.02221511 1.9936935 ]

Predictions for X=0 and X=1: [1.02221511 3.01590861]
