## Question 1 (50 pts)

Using the perceptron learning algorithm, train a perceptron for the binary classification problem defined by the given dataset.  
Use the following requirements:

- Activation function: **Sigmoid**
- Initial weights: **w1 = 0.5**, **w2 = –0.3**
- Initial bias: **0.2**
- Learning rate: **0.1**
- Number of iterations: **100**

Train the perceptron and report the final weights and bias.

### Answer:

In [1]:
import numpy as np

# Given dataset
X = np.array([[2.5, 2.3],
              [1.3, 1.9],
              [3.1, 2.8],
              [6.5, 7.2],
              [7.1, 6.8],
              [8.2, 7.5]])

y = np.array([0, 0, 0, 1, 1, 1])  # Target values

# Initialize parameters
w = np.array([0.5, -0.3])   # Weights
bias = 0.2                   # Bias term
learning_rate = 0.1
epochs = 100

# Sigmoid activation
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Training the perceptron
for epoch in range(epochs):
    for i in range(len(X)):
        z = np.dot(w, X[i]) + bias
        y_pred = sigmoid(z)
        error = y[i] - y_pred

        # Update rule
        w += learning_rate * error * X[i]
        bias += learning_rate * error

print("Final Weights:", w)
print("Final Bias:", bias)


Final Weights: [0.06235377 1.20316401]
Final Bias: -5.05487101516288


### Final Result

The perceptron converged to the following parameters:

- **Final Weights:** `[0.06235377 1.20316401]`  
- **Final Bias:** `-5.05487101516288`

These learned parameters represent the decision boundary separating the two classes in the dataset.

## Question 2 (50 pts)

Using the dataset **sank.csv**, compare the performance of the following classifiers:

- Logistic Regression  
- Naive Bayes (Gaussian)  
- Naive Bayes (Multinomial)  
- Decision Tree Classifier  

You must evaluate them using:
- **Accuracy**
- **Precision**
- **Recall**
- **F1-score**

Finally, choose the best algorithm and explain your reasoning.


### Answer:

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load dataset
data = pd.read_csv("sank.csv")

# Fix missing values
data["Age"] = data["Age"].fillna(data["Age"].median())

# Label encoding for categorical columns
encoder = LabelEncoder()
data["Sex"] = encoder.fit_transform(data["Sex"])

# Features & target
X = data[["Class", "Sex", "Age", "Fare"]]
y = data["Alive"]

# Scaling (needed for logistic regression & Gaussian NB)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split into training/testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

# Models to test
models = {
    "Logistic Regression": LogisticRegression(),
    "Naive Bayes (Gaussian)": GaussianNB(),
    "Naive Bayes (Multinomial)": MultinomialNB(),
    "Decision Tree": DecisionTreeClassifier(random_state=42)
}

results = {}

for name, model in models.items():

    # Multinomial NB requires non-negative values
    if name == "Naive Bayes (Multinomial)":
        model.fit(X_train - X_train.min(), y_train)
        y_pred = model.predict(X_test - X_train.min())

    else:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)

    results[name] = {
        "Accuracy": accuracy_score(y_test, y_pred),
        "Precision": precision_score(y_test, y_pred),
        "Recall": recall_score(y_test, y_pred),
        "F1-score": f1_score(y_test, y_pred)
    }

# Display results
results_df = pd.DataFrame(results).T
results_df

Unnamed: 0,Accuracy,Precision,Recall,F1-score
Logistic Regression,0.793296,0.768116,0.716216,0.741259
Naive Bayes (Gaussian),0.759777,0.706667,0.716216,0.711409
Naive Bayes (Multinomial),0.709497,0.866667,0.351351,0.5
Decision Tree,0.748603,0.693333,0.702703,0.697987


## **Conclusion**

### **Best Model: Logistic Regression**

Logistic Regression performs the best overall with:
- **Highest accuracy (79.3%)**
- **Highest F1-score (74.1%)**
- Balanced precision and recall

This makes it the most reliable model for this classification task.

### Naive Bayes (Gaussian)
- Performs reasonably well (Accuracy: 75.9%).
- Slightly weaker F1-score than Logistic Regression.

Good, but not the top performer.

### Naive Bayes (Multinomial)
- High precision (86.7%) but **very low recall (35.1%)**.
- Misses many positive cases → **not suitable** for this dataset.

Multinomial NB is meant for **text frequency data**, not continuous numeric variables (Age, Fare).

### Decision Tree
- Reasonable but slightly unstable (Accuracy: 74.9%).
- Potential overfitting without hyperparameter tuning.

Could improve with pruning or tuned parameters.

## **Final Recommendation**

Use **Logistic Regression** for production or reporting,  
and **avoid Multinomial Naive Bayes** for mixed numerical/categorical data.