# Ai Model
This notebook contains an implementation of a Support Vector Machine (SVM) model to classify rice species based on specific attributes. The dataset is initially provided in an Excel format and converted to CSV format for easier manipulation using the Pandas library. SVM uses GridSearchCV for hyperparameter tuning and k-fold cross-validation to find the best model parameters.
### Import all required packages

In [None]:
import pandas as pd 
import os 

### Conversion function 
This function converts the given Excel dataset to CSV format for easier loading and processing with Pandas.

In [None]:
def convert_do_csv(filePath):
    """
    Converts an Excel file to a CSV file.
    Parameters: 
    filePath (str): Path to the Excel file.

    Returns: 
    None
    """
    data = pd.read_excel(filePath)
    newPath = os.path.join(os.path.dirname(filePath), 'output.csv')
    data.to_csv(newPath, index=False)
    print("Conversion successful")

In [None]:
# set the path to the dataset directory and the Excel file 
PATH = f"{os.path.abspath(os.path.join(os.getcwd(), os.pardir))}"
excelPath = os.path.join(PATH, "sol/dataset/Rice_Cammeo_Osmancik.xlsx")

# uncomment the line below to run the conversioon function if needed
# convert_do_csv(excelPath)

### SVM Model 
Import the required pacakges for the SVM model and model evaluation. The model will use a pipeline to preprocess the data and train the SVM classifier.

**IMPORTANT:**
**This portion will take some time depending on your system's resources.**

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, accuracy_score

In [None]:
# Load the dataset from the CSV file
csv_path = os.path.join(PATH, "Rice_Dataset_Commeo_and_Osmancik/output.csv")

# Get data from csv file and split into feature matrix and target vector
rice_df = pd.read_csv(csv_path, index_col=0)
X = rice_df.drop(columns=["Class"])
Y = rice_df["Class"]

# Create a pipeline for preprocessing and model training
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC())
])

# Define the hyperparameter grid for grid search
param_grid = {
    'svc__kernel': ['linear', 'rbf', 'poly'], # different types of kernels
    'svc__C': [0.1, 1, 10, 100], # Regularization parameter
    'svc__gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1] # kernel coefficient
}

# set up stratified k-fold cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# initialise GridSearchCV to search for the best hyperparameters
print("Running GridSearchCV...")
grid_search = GridSearchCV(pipeline, param_grid, cv=cv, scoring='accuracy', n_jobs=-1)

grid_search.fit(X, Y)

print(f"Best Hyperparameters: {grid_search.best_params_}")

print(f"Best Cross-Validation Accuracy: {grid_search.best_score_:.2f}")


In [None]:
# get the best model from grid search
best_model = grid_search.best_estimator_

# Use the best model to make prediction
y_pred = best_model.predict(X)
    
print(classification_report(Y, y_pred))

In [None]:
# calculate and print the accuracy score
acc = accuracy_score(Y, y_pred)
print(f"Accuracy Score: {acc:.2f}")