# Task 10 : Benchmark Top ML Algorithms

This task tests your ability to use different ML algorithms when solving a specific problem.


### Dataset
Predict Loan Eligibility for Dream Housing Finance company

Dream Housing Finance company deals in all kinds of home loans. They have presence across all urban, semi urban and rural areas. Customer first applies for home loan and after that company validates the customer eligibility for loan.

Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers.

Train: https://raw.githubusercontent.com/subashgandyer/datasets/main/loan_train.csv

Test: https://raw.githubusercontent.com/subashgandyer/datasets/main/loan_test.csv

## Task Requirements
### You can have the following Classification models built using different ML algorithms
- Decision Tree
- KNN
- Logistic Regression
- SVM
- Random Forest
- Any other algorithm of your choice

### Use GridSearchCV for finding the best model with the best hyperparameters

- ### Build models
- ### Create Parameter Grid
- ### Run GridSearchCV
- ### Choose the best model with the best hyperparameter
- ### Give the best accuracy
- ### Also, benchmark the best accuracy that you could get for every classification algorithm asked above

#### Your final output will be something like this:
- Best algorithm accuracy
- Best hyperparameter accuracy for every algorithm

**Table 1 (Algorithm wise best model with best hyperparameter)**

Algorithm   |     Accuracy   |   Hyperparameters
- DT
- KNN
- LR
- SVM
- RF
- anyother

**Table 2 (Best overall)**

Algorithm    |   Accuracy    |   Hyperparameters



### Submission
- Submit Notebook containing all saved ran code with outputs
- Document with the above two tables

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris  # 可以換成自己的數據集

# 加載數據 (這裡使用 Iris 作為範例)
data = load_iris()
X, y = data.data, data.target

# 資料標準化 (部分演算法，如 SVM, KNN, LR 需要標準化)
scaler = StandardScaler()
X = scaler.fit_transform(X)

# 拆分訓練集與測試集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 設定不同演算法及其超參數搜索範圍
models = {
    "Decision Tree": (DecisionTreeClassifier(), {"max_depth": [3, 5, 10, None]}),
    "KNN": (KNeighborsClassifier(), {"n_neighbors": [3, 5, 7, 9]}),
    "Logistic Regression": (LogisticRegression(max_iter=1000), {"C": [0.01, 0.1, 1, 10]}),
    "SVM": (SVC(), {"C": [0.1, 1, 10], "kernel": ["linear", "rbf"]}),
    "Random Forest": (RandomForestClassifier(), {"n_estimators": [50, 100, 200], "max_depth": [5, 10, None]})
}

best_results = []

# 遍歷每個模型並進行 GridSearch
for name, (model, param_grid) in models.items():
    grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    best_model = grid_search.best_estimator_
    best_params = grid_search.best_params_
    
    # 測試集上的準確率
    y_pred = best_model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    best_results.append({"Algorithm": name, "Accuracy": accuracy, "Hyperparameters": best_params})

# 轉換成 DataFrame 方便顯示
df_results = pd.DataFrame(best_results)
df_results_sorted = df_results.sort_values(by="Accuracy", ascending=False)

# 找出表現最好的模型
best_overall = df_results_sorted.iloc[0]

# 顯示結果
print("\n🏆 Best Overall Model:")
print(best_overall)

# 將結果存成 CSV 檔案 (可選)
df_results_sorted.to_csv("ml_algorithm_results.csv", index=False)
print("\n結果已儲存為 'ml_algorithm_results.csv'")



🏆 Best Overall Model:
Algorithm              Decision Tree
Accuracy                         1.0
Hyperparameters    {'max_depth': 10}
Name: 0, dtype: object

結果已儲存為 'ml_algorithm_results.csv'
