# Introduction to Machine Learning
* Supervised vs. Unsupervised Learning.
* Linear Regression & Logistic Regression.
* Decision Trees & K-Means Clustering.
* Using Scikit-learn for model training.
* Hands-on: Building and evaluating a simple ML model.


## **1. Supervised vs. Unsupervised Learning**

### **Supervised Learning**
- In supervised learning, the model is trained on a labeled dataset, meaning that each training example is paired with an output label.
- The goal is to learn a mapping from inputs to outputs so that it can make predictions on new, unseen data.
- Examples:
  - **Regression:** Predicting continuous values (e.g., house prices, temperature).
  - **Classification:** Predicting discrete values (e.g., spam detection, disease diagnosis).


### **Unsupervised Learning**
- In unsupervised learning, the model is given data without explicit labels and must learn patterns and structure from the data itself.
- Common applications include clustering and dimensionality reduction.
- Examples:
  - **Clustering:** Grouping similar items together (e.g., customer segmentation, anomaly detection).
  - **Dimensionality Reduction:** Reducing the number of features while preserving information.


---

## **2. Linear Regression & Logistic Regression**

### **Linear Regression**
- Used for predicting **continuous values** by fitting a linear relationship between input features and the target variable.
- Equation: $ y = mx + b $
- Example: Predicting house prices based on size, number of rooms, etc.


In [7]:
# Codes here

---

### **Logistic Regression**
- Used for **classification tasks** (e.g., predicting whether an email is spam or not).
- Uses the **sigmoid function** to output probabilities between 0 and 1.
- Equation: $ P(y=1) = \frac{1}{1 + e^{-z}}$

In [8]:
# Code here

## **3. Decision Trees & K-Means Clustering**

### **Decision Trees**
- A decision tree splits data into branches based on feature values to make predictions.
- Works well for both classification and regression tasks.
- Example: Predicting whether a loan applicant will default based on income and credit score.


In [18]:
# Codes here



### **K-Means Clustering**
- A clustering algorithm that groups data points into **K clusters** based on similarity.
- Example: Grouping customers based on purchasing behavior.

#### **Implementation in Python**
```python
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
print(kmeans.labels_)  # Cluster assignments
```

---

## **4. Using Scikit-learn for Model Training**
- **Scikit-learn** is a popular Python library for machine learning.
- It provides easy-to-use functions for training models, making predictions, and evaluating performance.
- Common steps:
  1. **Load Data**
  2. **Preprocess Data**
  3. **Split Data (Training & Testing)**
  4. **Train Model**
  5. **Make Predictions**
  6. **Evaluate Model Performance**

#### **Example: Training a Simple Model**
```python
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Sample data
X = [[1], [2], [3], [4], [5]]
y = [0, 1, 0, 1, 0]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
print(predictions)
```

---

## **5. Hands-on: Building and Evaluating a Simple ML Model**

### **Task: Predict if a person will buy a product based on age and salary**
#### **Step 1: Load and Prepare Data**
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Sample data
data = pd.DataFrame({
    'Age': [22, 25, 47, 52, 46, 56, 29, 31],
    'Salary': [20000, 25000, 47000, 52000, 46000, 56000, 29000, 31000],
    'Purchased': [0, 0, 1, 1, 1, 1, 0, 0]
})

# Define features and target
X = data[['Age', 'Salary']]
y = data['Purchased']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f'Model Accuracy: {accuracy * 100:.2f}%')
```

---

### **Conclusion**
- We covered the basics of machine learning, including **supervised vs. unsupervised learning**, **linear & logistic regression**, **decision trees & clustering**, and using **Scikit-learn**.
- Finally, we built a simple ML model to **predict user behavior** and evaluated its performance.

🚀 **Next Steps:** Try modifying the dataset or using different ML models!

