<a href="https://colab.research.google.com/github/SyedAmeenG/AI-Placement-Practice-Notebook/blob/main/PLT_AIML_WEEK1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Decision Tree Implementation – Description

A **Decision Tree** is a supervised machine learning algorithm used for both **classification** and **regression**.  
It works by recursively splitting the dataset into smaller subsets based on feature values, forming a **tree-like structure** of decisions.

### 🔹 How It Works
1. At each step, the algorithm chooses the **best feature** to split on, based on a metric like:
   - **Gini Impurity**
   - **Entropy (Information Gain)**
   - **Mean Squared Error** (for regression)
2. The process continues until:
   - A stopping condition is reached (e.g., maximum depth).
   - The subset is “pure” (all samples belong to one class).
3. The final result is a tree where:
   - **Internal nodes** represent decisions on features.
   - **Branches** represent outcomes of those decisions.
   - **Leaves** represent final predictions.

### 🔹 Advantages
- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Requires little data preprocessing.

### 🔹 Disadvantages
- Can easily **overfit** on training data.
- Small changes in data may produce a very different tree.
- Less accurate compared to ensemble methods (e.g., Random Forests).

### 🔹 Applications
- Classification tasks (spam detection, medical diagnosis, customer churn).
- Regression tasks (predicting prices, demand forecasting).
- Rule-based decision systems.

---


In [2]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [3]:
# Load sample dataset (Iris flower dataset)
iris = load_iris()
X, y = iris.data, iris.target

In [4]:
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create Decision Tree model
clf = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
clf.fit(X_train, y_train)


In [5]:
# Predict on test set
y_pred = clf.predict(X_test)

In [6]:
# Print accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


In [7]:

# Show the rules of the decision tree
tree_rules = export_text(clf, feature_names=iris.feature_names)
print(tree_rules)

|--- petal length (cm) <= 2.45
|   |--- class: 0
|--- petal length (cm) >  2.45
|   |--- petal length (cm) <= 4.75
|   |   |--- petal width (cm) <= 1.60
|   |   |   |--- class: 1
|   |   |--- petal width (cm) >  1.60
|   |   |   |--- class: 2
|   |--- petal length (cm) >  4.75
|   |   |--- petal width (cm) <= 1.75
|   |   |   |--- class: 1
|   |   |--- petal width (cm) >  1.75
|   |   |   |--- class: 2



**Example 2: Manual Tiny Decision Tree**

In [8]:
def classify(sample):
    if sample["weather"] == "sunny":
        if sample["humidity"] == "high":
            return "No"   # don't play
        else:
            return "Yes"  # play
    else:
        return "Yes"      # play in cloudy/rainy

# Test cases
print(classify({"weather": "sunny", "humidity": "high"}))
print(classify({"weather": "rainy", "humidity": "normal"}))


No
Yes


# 🌲 Random Forest Implementation – Description

A **Random Forest** is an **ensemble learning algorithm** that builds multiple **Decision Trees** and combines their results to improve accuracy and reduce overfitting.  
It is commonly used for both **classification** and **regression** tasks.

---

### How It Works
1. **Bootstrap Sampling**:  
   - From the training data, random samples (with replacement) are taken to build each decision tree.  
   - This process is called **bagging** (Bootstrap Aggregating).

2. **Random Feature Selection**:  
   - At each split in a tree, only a random subset of features is considered.  
   - This adds diversity among trees.

3. **Aggregation**:  
   - For classification → uses **majority voting** from all trees.  
   - For regression → uses the **average prediction** of all trees.

---

### Advantages
- More accurate and robust than a single decision tree.
- Reduces the risk of **overfitting**.
- Works well with both classification and regression problems.
- Handles high-dimensional data effectively.


### Disadvantages
- Less interpretable compared to a single decision tree.
- Can be computationally expensive with very large datasets.
- Predictions are slower because multiple trees are evaluated.



### Applications
- Credit risk analysis and fraud detection.
- Stock market and financial forecasting.
- Customer churn prediction.
- Medical diagnosis and image classification.




In [10]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [11]:
# Load dataset (Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [12]:
# Create Random Forest model
rf = RandomForestClassifier(n_estimators=100, max_depth=3, random_state=42)
rf.fit(X_train, y_train)

In [13]:
# Make predictions
y_pred = rf.predict(X_test)

In [14]:
# Evaluate performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [15]:
# Check feature importance
print("\nFeature Importances:")
for name, importance in zip(iris.feature_names, rf.feature_importances_):
    print(f"{name}: {importance:.4f}")



Feature Importances:
sepal length (cm): 0.1023
sepal width (cm): 0.0251
petal length (cm): 0.4331
petal width (cm): 0.4396


# Support Vector Machine (SVM) – Description

A **Support Vector Machine (SVM)** is a supervised machine learning algorithm used for **classification** and **regression**.  
It works by finding the **best decision boundary (hyperplane)** that separates different classes in the feature space.


### How It Works
1. SVM tries to find the **maximum-margin hyperplane** that best separates classes.
2. The data points closest to the boundary are called **support vectors** – they define the position of the hyperplane.
3. For non-linear data, SVM uses the **kernel trick** to map data into a higher-dimensional space where it can be separated.



### Kernel Types
- **Linear** → For linearly separable data.
- **Polynomial** → For more complex curved boundaries.
- **RBF (Radial Basis Function)** → Default, works well for most datasets.
- **Sigmoid** → Similar to neural networks.


###  Advantages
- Effective in **high-dimensional spaces**.
- Works well for both linear and non-linear problems.
- Robust against overfitting in many cases.

### Disadvantages
- Can be slow with very large datasets.
- Choosing the right kernel and parameters can be tricky.
- Less interpretable compared to decision trees.


### Key Notes (Implementation)
- `kernel`: Defines the type of decision boundary (`linear`, `poly`, `rbf`, `sigmoid`).
- `C`: Regularization parameter (higher → fits training data better, but risk of overfitting).
- `gamma`: Defines influence of single training samples (low → smooth boundary, high → complex boundary).
- `support_vectors_`: Attribute in scikit-learn to see which points are support vectors.

### Applications
- Text classification (e.g., spam detection).
- Image recognition (e.g., handwritten digit classification).
- Bioinformatics (e.g., cancer detection).
- Finance (e.g., stock market prediction).




In [17]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [18]:
# Load dataset (Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [19]:
# Create SVM model (RBF kernel by default)
svm_model = SVC(kernel="rbf", C=1.0, gamma="scale", random_state=42)
svm_model.fit(X_train, y_train)

In [20]:
# Make predictions
y_pred = svm_model.predict(X_test)

In [22]:

# Evaluate performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\n Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 1.0

 Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



# Naive Bayes – Description

**Naive Bayes** is a family of probabilistic algorithms based on **Bayes’ Theorem**.  
It assumes that features are **independent** (the “naive” assumption) and uses probabilities to predict class membership.  
It is widely used for **classification tasks**.


### How It Works
1. Uses **Bayes’ Theorem**:  
   \[
   P(Class | Features) = \frac{P(Features | Class) \cdot P(Class)}{P(Features)}
   \]
2. Each feature contributes independently to the probability of a class.
3. The algorithm chooses the class with the **highest posterior probability**.



### Types of Naive Bayes
- **Gaussian Naive Bayes** → For continuous features (assumes normal distribution).  
- **Multinomial Naive Bayes** → For discrete counts (e.g., word frequencies in text).  
- **Bernoulli Naive Bayes** → For binary features (yes/no, true/false).  



### Advantages
- Very **fast and efficient**, even on large datasets.
- Works well with **text classification** (spam detection, sentiment analysis).
- Simple and interpretable.



### Disadvantages
- Strong independence assumption rarely holds in real-world data.
- Struggles when features are highly correlated.
- Continuous data needs assumptions (e.g., Gaussian distribution).


### Key Notes (Implementation)
- In scikit-learn:
  - `GaussianNB` → continuous features.
  - `MultinomialNB` → word counts (e.g., NLP).
  - `BernoulliNB` → binary features.
- Works best with **high-dimensional data** (e.g., text, documents).
- Often used as a **baseline classifier**.


### Applications
- Email spam filtering.  
- Sentiment analysis (positive/negative reviews).  
- Document classification (topic labeling).  
- Medical diagnosis (probabilistic predictions).  



In [23]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

In [25]:
# Load dataset (Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [26]:
# Create Naive Bayes model (GaussianNB for continuous data)
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

In [27]:
# Make predictions
y_pred = nb_model.predict(X_test)

In [28]:
# Evaluate performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 0.9777777777777777

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.92      0.96        13
   virginica       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



In [29]:
# Show class probabilities for first test sample
print("\nPredicted probabilities for first sample:", nb_model.predict_proba([X_test[0]]))


Predicted probabilities for first sample: [[4.15880005e-88 9.95527834e-01 4.47216606e-03]]


# Fuzzy Logic – Description

**Fuzzy Logic** is a form of logic used for reasoning under uncertainty.  
Unlike classical (binary) logic where variables are either **True (1)** or **False (0)**,  
fuzzy logic allows variables to take **continuous values between 0 and 1** (degrees of truth).  

It is widely used in **control systems, decision-making, and expert systems**.



### How It Works
1. **Fuzzification**  
   - Converts crisp input values (e.g., temperature = 30°C) into fuzzy sets with degrees of membership.  
   - Example: Temperature = 30°C → 0.7 “Warm”, 0.3 “Hot”.

2. **Rule Evaluation (Inference Engine)**  
   - Applies a set of IF-THEN rules to the fuzzy inputs.  
   - Example:  
     - IF temperature is **Hot** AND humidity is **High** THEN fan_speed = **High**.

3. **Aggregation**  
   - Combines results from multiple fuzzy rules.

4. **Defuzzification**  
   - Converts fuzzy output back into a crisp value.  
   - Example: Fan speed = 70%.

### Advantages
- Handles **uncertainty and vagueness** well.  
- Mimics **human reasoning** (“warm”, “tall”, “fast”).  
- Useful in **control systems** (washing machines, air conditioners, robotics).  


### Disadvantages
- Rule-based → requires **expert knowledge** to design rules.  
- Performance depends on **quality of membership functions**.  
- Not as scalable as modern ML for large datasets.  


### Key Notes
- Based on **degrees of truth** rather than strict binary logic.  
- Core components: **Fuzzifier, Inference Engine, Rule Base, Defuzzifier**.  
- Common in **engineering, decision support, and consumer electronics**.  


### Applications
- Washing machines (adjusting wash cycle).  
- Air conditioning (temperature control).  
- Robotics (navigation and obstacle avoidance).  
- Medical diagnosis (handling uncertain symptoms).  




# Fuzzy Logic – Step-by-Step Explanation

Fuzzy Logic is a way to make decisions with **uncertain or vague information**, unlike classical logic which is strictly True or False.  
Here’s how it works in a typical fuzzy logic system:

### Step 1: Fuzzification
- Convert **crisp inputs** (exact numbers) into **fuzzy values**.  
- Example: Temperature = 30°C might be partially “Warm” (0.7) and partially “Hot” (0.3).  
- Each input is represented as a **membership function** showing degrees of truth (0 to 1).


### Step 2: Define Fuzzy Rules
- Create **IF-THEN rules** to describe human reasoning.  
- Example:  
  1- IF temperature is **Hot** AND humidity is **High**, THEN fan speed is **Fast**.  
  2- IF temperature is **Cold**, THEN fan speed is **Slow**.  
- Rules form the **core decision logic** of the system.


### Step 3: Inference / Rule Evaluation
- Apply all rules to the fuzzy inputs.  
- Determine how strongly each rule **fires** based on the degree of membership of the inputs.  
- Combines the effects of multiple rules.


### Step 4: Aggregation
- Combine the outputs from all rules into a **single fuzzy set**.  
- Represents all possible outcomes and their degrees of truth.


### Step 5: Defuzzification
- Convert the aggregated fuzzy output into a **crisp number** that can be used in real life.  
- Example: Fan speed = 6.8 (on a scale of 0–10).  



### Key Points
- Handles **uncertainty** and **gradual reasoning**.  
- Useful in **control systems**, **decision-making**, and **expert systems**.  
- Mimics **human thinking** instead of strict yes/no rules.  
- Main components: **Fuzzifier, Rule Base, Inference Engine, Defuzzifier**.  


### Applications
- Air conditioning and temperature control.  
- Washing machines (auto wash cycles).  
- Robotics (navigation and obstacle avoidance).  
- Medical diagnosis (handling uncertain symptoms).  


In [31]:
# Install scikit-fuzzy
!pip install -q scikit-fuzzy

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/920.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m204.8/920.8 kB[0m [31m6.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m920.8/920.8 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [32]:
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl

In [33]:
# 1 Define fuzzy variables
# Inputs
temperature = ctrl.Antecedent(np.arange(0, 41, 1), 'temperature')  # 0°C to 40°C
humidity = ctrl.Antecedent(np.arange(0, 101, 1), 'humidity')       # 0% to 100%

In [34]:
# Output
fan_speed = ctrl.Consequent(np.arange(0, 11, 1), 'fan_speed')      # 0 to 10

In [35]:
# 2️ Define membership functions
temperature['cold'] = fuzz.trimf(temperature.universe, [0, 0, 20])
temperature['warm'] = fuzz.trimf(temperature.universe, [15, 25, 35])
temperature['hot'] = fuzz.trimf(temperature.universe, [30, 40, 40])

In [36]:
humidity['low'] = fuzz.trimf(humidity.universe, [0, 0, 50])
humidity['medium'] = fuzz.trimf(humidity.universe, [30, 50, 70])
humidity['high'] = fuzz.trimf(humidity.universe, [60, 100, 100])

In [37]:
fan_speed['slow'] = fuzz.trimf(fan_speed.universe, [0, 0, 5])
fan_speed['medium'] = fuzz.trimf(fan_speed.universe, [3, 5, 7])
fan_speed['fast'] = fuzz.trimf(fan_speed.universe, [5, 10, 10])

In [38]:
# 3️ Define fuzzy rules
rule1 = ctrl.Rule(temperature['hot'] & humidity['high'], fan_speed['fast'])
rule2 = ctrl.Rule(temperature['hot'] & humidity['medium'], fan_speed['fast'])
rule3 = ctrl.Rule(temperature['warm'] & humidity['high'], fan_speed['medium'])
rule4 = ctrl.Rule(temperature['warm'] & humidity['medium'], fan_speed['medium'])
rule5 = ctrl.Rule(temperature['cold'], fan_speed['slow'])

In [39]:
# 4️ Create control system
fan_ctrl = ctrl.ControlSystem([rule1, rule2, rule3, rule4, rule5])
fan_sim = ctrl.ControlSystemSimulation(fan_ctrl)

In [40]:
# 5️ Test the system
fan_sim.input['temperature'] = 30  # Example temperature
fan_sim.input['humidity'] = 70     # Example humidity
fan_sim.compute()

In [41]:
print(f"✅ For temperature = 30°C and humidity = 70%, predicted fan speed = {fan_sim.output['fan_speed']:.2f}")

✅ For temperature = 30°C and humidity = 70%, predicted fan speed = 5.00
