# NumPy Fundamentals and Machine Learning Basics

This notebook contains comprehensive examples of NumPy operations and machine learning fundamentals using scikit-learn. The examples progress from basic array operations to advanced machine learning algorithms.

## Prerequisites
```bash
pip install numpy scikit-learn matplotlib pandas
```

## Table of Contents
1. [NumPy Fundamentals](#numpy)
2. [Machine Learning Basics](#ml-basics)
3. [Advanced Machine Learning](#ml-advanced)

## 1. NumPy Fundamentals <a id="numpy"></a>

### Array Creation Methods
Learn different ways to create NumPy arrays for various use cases.

In [None]:
import numpy as np

# Creating a 1D array
arr1 = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr1)

# Creating a 2D array (matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr2)

# Creating an array of zeros
zeros = np.zeros((3, 3))
print("3x3 Zero Matrix:\n", zeros)

# Creating an array of ones
ones = np.ones((2, 4))
print("2x4 Ones Matrix:\n", ones)

# Creating an array with a range of numbers
range_arr = np.arange(0, 10, 2)  # from 0 to 10 with step 2
print("Range Array:", range_arr)

# Creating an array with equally spaced values
lin_arr = np.linspace(0, 1, 5)  # 5 values between 0 and 1
print("Linearly spaced values:", lin_arr)

### Array Operations and Vectorization
Perform element-wise mathematical operations efficiently.

In [None]:
import numpy as np

a = np.array([10, 20, 30, 40])
b = np.array([1, 2, 3, 4])

# Element-wise addition
print("Addition:", a + b)

# Element-wise subtraction
print("Subtraction:", a - b)

# Element-wise multiplication
print("Multiplication:", a * b)

# Element-wise division
print("Division:", a / b)

# Square root
print("Square root of a:", np.sqrt(a))

# Exponentiation
print("a squared:", np.power(a, 2))

### Array Indexing and Slicing
Access and modify array elements using various indexing techniques.

In [None]:
import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])

# Accessing elements
print("First element:", arr[0])
print("Last element:", arr[-1])

# Slicing
print("First 3 elements:", arr[0:3])
print("Every second element:", arr[::2])
 
# Modifying elements
arr[2] = 99
print("Modified array:", arr)

# 2D array slicing
mat = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

print("Element at (1,2):", mat[1, 2])   # row 1, col 2
print("First row:", mat[0, :])
print("Second column:", mat[:, 1])

### Statistical Functions
Compute descriptive statistics and aggregate functions.

In [None]:
import numpy as np

arr = np.array([3, 7, 2, 9, 5])

print("Max:", np.max(arr))
print("Min:", np.min(arr))
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Standard Deviation:", np.std(arr))
print("Index of Max Value:", np.argmax(arr))
print("Index of Min Value:", np.argmin(arr))

### Random Number Generation
Generate random data for testing, simulation, and data augmentation.

In [None]:
import numpy as np

# Random integers between 1 and 10 (size=5)
rand_ints = np.random.randint(1, 10, size=5)
print("Random Integers:", rand_ints)

# Random floats between 0 and 1
rand_floats = np.random.rand(5)
print("Random Floats:", rand_floats)

# Random 3x3 matrix 
rand_matrix = np.random.randn(3, 3)
print("Random Normal Distribution Matrix:\n", rand_matrix)

# Shuffling
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print("Shuffled Array:", arr)

## 2. Machine Learning Basics <a id="ml-basics"></a>

### Train-Test Split
Properly divide datasets for model training and evaluation.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

# Example dataset
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])  # Features
y = np.array([2, 4, 6, 8, 10, 12, 14, 16])              # Labels

# Splitting the dataset (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training Data:\n", X_train, y_train)
print("Testing Data:\n", X_test, y_test)

### Linear Regression
Predict continuous values using linear relationships.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Training data
X = np.array([[1], [2], [3], [4], [5]])  # Feature
y = np.array([2, 4, 6, 8, 10])           # Label (y = 2x)

# Model
model = LinearRegression()
model.fit(X, y)

# Prediction
pred = model.predict([[6]])
print("Prediction for 6:", pred)

# Model parameters
print("Slope (Coefficient):", model.coef_)
print("Intercept:", model.intercept_)

### Logistic Regression
Binary classification using logistic function.

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression

# Simple dataset: study hours → pass/fail
X = np.array([[1], [2], [3], [4], [5], [6], [7]])
y = np.array([0, 0, 0, 1, 1, 1, 1])  # 0 = Fail, 1 = Pass

# Model
model = LogisticRegression()
model.fit(X, y)

# Predictions
print("Prediction for 2.5 hours:", model.predict([[2.5]]))
print("Prediction for 6 hours:", model.predict([[6]]))

# Probabilities
print("Probabilities for 2.5 hours:", model.predict_proba([[2.5]]))

### Decision Tree Classification
Tree-based classification with interpretable decision rules.

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import matplotlib.pyplot as plt

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train Decision Tree
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(X, y)

# Prediction
print("Prediction for first sample:", clf.predict([X[0]]))

# Visualize tree
plt.figure(figsize=(10,6))
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

### Feature Scaling and Standardization
Normalize features to improve algorithm performance.

In [None]:
import numpy as np
from sklearn.preprocessing import StandardScaler

# Example dataset
X = np.array([[10, 100],
              [20, 200],
              [30, 300],
              [40, 400]])

# Standardization (mean=0, std=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Original Data:\n", X)
print("Standardized Data:\n", X_scaled)

## 3. Advanced Machine Learning <a id="ml-advanced"></a>

### K-Means Clustering
Unsupervised learning to group similar data points.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Sample dataset (2D points)
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Create KMeans model (2 clusters)
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)

# Cluster centers and labels
print("Cluster Centers:\n", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)

# Visualization
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
            color='red', marker='X', s=200, label="Centroids")
plt.legend()
plt.title("KMeans Clustering")
plt.show()

### Principal Component Analysis (PCA)
Dimensionality reduction while preserving most of the data variance.

In [None]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

# Load dataset
iris = load_iris()
X = iris.data

# PCA with 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print("Original Shape:", X.shape)
print("Reduced Shape:", X_pca.shape)

# Visualization
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=iris.target, cmap="viridis")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA on Iris Dataset")
plt.show()

### Hierarchical Clustering
Alternative clustering approach that builds a tree of clusters.

In [None]:
from sklearn.cluster import AgglomerativeClustering
import numpy as np
import matplotlib.pyplot as plt

# Sample dataset
X = np.array([[1, 2], [2, 3], [3, 4],
              [8, 7], [9, 6], [10, 8]])

# Hierarchical Clustering (2 clusters)
clustering = AgglomerativeClustering(n_clusters=2)
labels = clustering.fit_predict(X)

print("Cluster Labels:", labels)

# Visualization
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow')
plt.title("Agglomerative (Hierarchical) Clustering")
plt.show()

## Conclusion

This notebook covered fundamental NumPy operations and machine learning concepts:

### NumPy Fundamentals:
1. **Array Creation**: Various methods to create arrays (zeros, ones, ranges, linear spaces)
2. **Operations**: Element-wise arithmetic, vectorization benefits
3. **Indexing**: Accessing and modifying array elements
4. **Statistics**: Descriptive statistics and aggregate functions
5. **Random**: Random number generation for testing and simulation

### Machine Learning Basics:
1. **Data Preparation**: Train-test splitting, feature scaling
2. **Supervised Learning**: Linear regression, logistic regression, decision trees
3. **Unsupervised Learning**: K-means clustering, PCA, hierarchical clustering
4. **Model Evaluation**: Predictions, probabilities, visualization

### Key Libraries:
- **NumPy**: Foundation for numerical computing in Python
- **scikit-learn**: Comprehensive machine learning library
- **Matplotlib**: Visualization and plotting

### Best Practices:
1. Always use train-test splits for proper evaluation
2. Scale features when using distance-based algorithms
3. Visualize data and results when possible
4. Use random seeds for reproducible results
5. Start with simple models before trying complex ones

### Algorithm Selection Guide:
- **Linear Regression**: Continuous target, linear relationship
- **Logistic Regression**: Binary classification, interpretable
- **Decision Trees**: Non-linear relationships, interpretable rules
- **K-Means**: Spherical clusters, known number of clusters
- **PCA**: Dimensionality reduction, visualization
- **Hierarchical Clustering**: Unknown number of clusters, dendrograms

### Next Steps:
- Explore ensemble methods (Random Forest, Gradient Boosting)
- Learn neural networks and deep learning (TensorFlow, PyTorch)
- Study cross-validation and model selection techniques
- Practice with real-world datasets (Kaggle competitions)
- Understand bias-variance tradeoff and overfitting

### Applications:
- **Regression**: Price prediction, sales forecasting
- **Classification**: Image recognition, spam detection
- **Clustering**: Customer segmentation, market research
- **Dimensionality Reduction**: Data visualization, feature selection
- **Preprocessing**: Data cleaning, feature engineering