<a href="https://colab.research.google.com/github/Gillmasija/Final-year-project/blob/main/ML_part.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder, StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
file_path = "dataset.csv"  # Update if necessary
df = pd.read_csv(file_path)

# Drop duplicate rows if any
df = df.drop_duplicates()

# Handle missing values by filling numerical columns with the median and categorical with mode
for col in df.columns:
    if df[col].dtype == "object":
        df[col].fillna(df[col].mode()[0], inplace=True)
    else:
        df[col].fillna(df[col].median(), inplace=True)

# Encode categorical variables using Label Encoding
label_encoders = {}
for col in df.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    # Convert the column to strings before encoding
    df[col] = df[col].astype(str)  # This line is added to convert to string
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

# Split features and target variable
X = df.drop(columns=['Attack_label'])  # Assuming 'Attack_label' is the target column
y = df['Attack_label']

# Apply SMOTE only if there's class imbalance
if len(np.unique(y)) > 1:
    smote = SMOTE(sampling_strategy=1, random_state=42, k_neighbors=1)
    X, y = smote.fit_resample(X, y)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale numerical features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Random Forest model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Predictions
y_pred = clf.predict(X_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Display results
print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Test the model with a sample input
sample_input = X_test[:1]  # Taking one test sample
sample_prediction = clf.predict(sample_input)
print("Sample Prediction:", sample_prediction)


  df = pd.read_csv(file_path)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col].fillna(df[col].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or 

Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

         1.0       1.00      1.00      1.00     15015

    accuracy                           1.00     15015
   macro avg       1.00      1.00      1.00     15015
weighted avg       1.00      1.00      1.00     15015

Sample Prediction: [1.]
