In traditional programming, we write explicit rules using constructs like if/else to make decisions. For example, you might write:

In [7]:
def classify_temperature(temp):
    if temp > 30:
        return "Hot"
    elif temp < 10:
        return "Cold"
    else:
        return "Moderate"

print(classify_temperature(35))  # "Hot"

Hot


Here, we manually define thresholds and conditions that determine the outcome. This works well for problems where rules are simple and well understood.

#### From If/Else to Machine Learning
Machine Learning (ML) represents a shift from explicitly programmed rules to systems that learn decision boundaries from data. Instead of hard-coding rules, you provide examples and let the algorithm find patterns.

#### Key Differences:
**Deterministic Rules vs. Learned Patterns:**

**If/Else:** The decision is fixed by the programmer.

**ML:** The model infers a decision boundary from data (which might be non-linear and complex).

**Flexibility:**

**If/Else:** Adding more conditions often means manually updating rules.

**ML:** More data can be used to update or retrain the model, and it can adapt to complex patterns.

**Handling Complexity:**

**If/Else:** Becomes cumbersome when the number of conditions or features increases.

**ML:** Algorithms (e.g., decision trees, neural networks) automatically learn how to weight and combine multiple features.

**A Simple Example: Predicting Categories**

**Using If/Else:**
Imagine we want to classify fruits as "Apple" or "Orange" based on features like weight and texture. With if/else, you might have:

In [8]:
def classify_fruit(weight, texture):
    if weight < 150 and texture == "smooth":
        return "Apple"
    else:
        return "Orange"

print(classify_fruit(120, "smooth"))  # "Apple"

Apple


Here, you assume that all apples are lighter and smooth, and anything else is an orange. However, real-world data might not be that simple.

**Using Machine Learning (Decision Tree):**

With machine learning, you would:

**1.Collect Data:** Gather examples of fruits with labels (Apple, Orange) along with their features.

**2.Train a Model:** Let a decision tree or another classifier learn the decision boundaries from the data.

**3.Predict:** Use the trained model to classify new examples.

In [9]:
from sklearn.tree import DecisionTreeClassifier
import numpy as np


# Example dataset: [weight (grams), texture (0 for smooth, 1 for bumpy)]
X = np.array([
    [130, 0],  # Orange
    [150, 0],  # Orange
    [170, 1],  # Apple
    [160, 1]   # Apple
])
y = np.array(["Orange", "Orange", "Apple", "Apple"]) 

# Create and train the decision tree model
clf = DecisionTreeClassifier()
clf.fit(X, y)

# Predict on a new fruit
new_fruit = np.array([[140, 0]])
print("Predicted fruit:", clf.predict(new_fruit)[0])

Predicted fruit: Orange


Here, the decision tree automatically figures out the best thresholds on weight and texture. It might learn a rule such as "if weight > 155 and texture is smooth then Apple, otherwise Orange." Notice that the rules aren’t written by a programmer but are derived from the data.

<u>**Evolution in a Nutshell:**</u>

<u>**If/Else:**</u>

**1. Explicit:** You define all the conditions.

**2. Limited to simple scenarios:** Can get very complex when rules increase.

**3. Static:** Rules don't change unless manually updated.

<u>**Machine Learning:**</u>

**1. Implicit:** The algorithm learns rules from the data.

**2. Handles complexity:** Can model non-linear and high-dimensional decision boundaries.

**3. Dynamic:** Models can be retrained as new data becomes available.

# **_Fruit Classification with Five Features_**

Suppose we have the following features for each fruit:

1. **Weight (grams)**
2. **Color Score** (a numeric value representing redness/brightness)
3. **Texture Score** (smoothness vs. roughness, scaled 0–1)
4. **Diameter (cm)**
5. **Sugar Level** (grams per 100g)

We build a dataset with these five features and let a decision tree learn how to classify the fruit.

In [10]:
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# Example dataset:
# Each row corresponds to a fruit sample with five features:
# [weight, color score, texture score, diameter, sugar level]
X = np.array([
    [165, 0.8, 0.2, 7.0, 14],   # Likely an Apple
    [160, 0.75, 0.25, 7.2, 15],  # Likely an Apple
    [140, 0.6, 0.7, 8.0, 20],    # Likely an Orange
    [130, 0.65, 0.6, 8.5, 22],   # Likely an Orange
    [170, 0.85, 0.1, 6.5, 13],   # Likely an Apple
    [145, 0.55, 0.8, 8.8, 23]    # Likely an Orange
])

# Labels for the samples
y = np.array(["Apple", "Apple", "Orange", "Orange", "Apple", "Orange"])

# Create and train the decision tree classifier
clf = DecisionTreeClassifier(random_state=76)
clf.fit(X, y)

# Predict on a new fruit sample with five features
# New fruit: [weight, color score, texture score, diameter, sugar level]
new_fruit = np.array([[155, 0.8, 0.2, 7.0, 14]])
prediction = clf.predict(new_fruit)

print("Predicted fruit:", prediction[0])


Predicted fruit: Apple


In [11]:
def classify_fruit(weight, color_score, texture_score, diameter, sugar_level):
    """
    Classify a fruit as 'Apple' or 'Orange' based on five features.
    
    Parameters:
    - weight: Weight in grams.
    - color_score: Numeric score indicating color intensity (higher is redder/brighter).
    - texture_score: Score representing smoothness (lower means smoother).
    - diameter: Diameter in centimeters.
    - sugar_level: Sugar content in grams per 100g.
    
    Returns:
    - A string: "Apple" or "Orange"
    """
    if (weight > 155 and 
        color_score > 0.7 and 
        texture_score < 0.4 and 
        diameter < 7.5 and 
        sugar_level < 16):
        return "Apple"
    else:
        return "Orange"

# Test cases:
sample_fruit1 = classify_fruit(170, 0.8, 0.2, 7.0, 14)  # Expected: "Apple"
sample_fruit2 = classify_fruit(150, 0.6, 0.7, 8.0, 20)  # Expected: "Orange"

print("Sample Fruit 1:", sample_fruit1)
print("Sample Fruit 2:", sample_fruit2)


Sample Fruit 1: Apple
Sample Fruit 2: Orange


# **_Explanation:_**

<u>**Manual If/Else Approach:**</u>  
To mimic this using if/else, you would have to hard-code conditions for each of the five features (e.g., if weight < threshold and color score > threshold …, etc.). This becomes complex and brittle as the number of features increases.

<u>**Machine Learning Approach:**</u>  
The decision tree automatically learns the optimal thresholds and splits in the five-dimensional feature space from the provided training data. It then uses these learned rules to classify new fruit samples without manually defined if/else conditions.

**Advantages:**

1. **Flexibility:** The decision tree can capture non-linear boundaries between classes.

2. **Scalability:** As the number of features increases, manually creating if/else rules becomes impractical, whereas ML models handle many features efficiently.

3. **Adaptability:** The model can be retrained with new data to improve accuracy without re-writing the rules.

This example demonstrates how machine learning evolves the basic if/else paradigm into a data-driven approach that can handle multiple features and complex decision boundaries.

In [12]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

# Seed for reproducibility
np.random.seed(7)

# Generate synthetic data for 100 apples
apples = np.column_stack([
    np.random.normal(170, 5, 100),       # Weight around 170 grams
    np.random.normal(0.8, 0.05, 100),    # Color Score high
    np.random.normal(0.2, 0.05, 100),    # Texture Score low
    np.random.normal(7.0, 0.2, 100),     # Diameter around 7.0 cm
    np.random.normal(14, 1, 100)         # Sugar Level lower
])

# Generate synthetic data for 100 oranges
oranges = np.column_stack([
    np.random.normal(150, 5, 100),       # Weight around 150 grams
    np.random.normal(0.6, 0.05, 100),    # Color Score lower
    np.random.normal(0.7, 0.05, 100),    # Texture Score higher
    np.random.normal(8.0, 0.2, 100),     # Diameter around 8.0 cm
    np.random.normal(20, 1, 100)         # Sugar Level higher
])

# Combine the data and create labels
X = np.vstack([apples, oranges])
y = np.array(["Apple"] * 100 + ["Orange"] * 100)

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=76, shuffle=True)

# Create and train a RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, random_state=76)
clf.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = clf.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Predict on a new fruit sample
# New fruit features: [weight, color score, texture score, diameter, sugar level]
apple_sample = np.array([[170, 0.82, 0.22, 7.1, 14]])
apple_sample_scaled = scaler.transform(apple_sample)  # Scale the new sample
print("Prediction for apple sample:", clf.predict(apple_sample_scaled)[0])

# Sample features for an orange: [weight, color score, texture score, diameter, sugar level]
orange_sample = np.array([[155, 0.65, 0.75, 8.2, 19]])  # Example values for an orange

# Scale the sample using the same scaler used for training
orange_sample_scaled = scaler.transform(orange_sample)

# Predict the class of the sample
orange_prediction = clf.predict(orange_sample_scaled)
print("Prediction for the orange sample:", orange_prediction[0])


Classification Report:
              precision    recall  f1-score   support

       Apple       1.00      1.00      1.00        23
      Orange       1.00      1.00      1.00        17

    accuracy                           1.00        40
   macro avg       1.00      1.00      1.00        40
weighted avg       1.00      1.00      1.00        40

Prediction for apple sample: Apple
Prediction for the orange sample: Orange
