<a href="https://colab.research.google.com/github/Deboraj-roy/-mental-health-data-analyst/blob/main/Clean_Model_Classifications.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SVM model test with categorical dataset using python


Support Vector Machines (SVM) can be used for classification and regression. In the case of categorical datasets, SVM is used for classification. If the target variable is categorical, then the problem is known as a classification problem. To use SVM with a categorical dataset in Python, you can use the scikit-learn library.

Here's an example of how to use an SVM classifier with a categorical dataset in Python:


Note: In this example, we're using a linear kernel for the SVM classifier. You can try different kernel functions, such as radial basis function (RBF) or polynomial, to see which one works best for your dataset.

In [1]:
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Data/CGPA 1.csv")

# Split the dataset into training and testing sets
X = df.drop(["CGPA Category"], axis=1)
y = df["CGPA Category"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the SVM model on the training data
clf = svm.SVC(kernel='linear')
clf.fit(X_train, y_train)

# Make predictions on the test data
y_pred = clf.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.2857142857142857


# New Section csv to arrf


In [None]:
import pandas as pd

# Load the CSV file into a pandas DataFrame
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Data/Current-year-of-Study.csv")

# Convert the DataFrame to an ARFF file
df.to_csv("/content/drive/MyDrive/Colab Notebooks/Data/Current-year-of-Study.arff", index=False, header=False, sep=',', quoting=None, quotechar='"')


# J48 model test with categorical dataset using python

The J48 algorithm is an implementation of the C4.5 decision tree algorithm in the WEKA machine learning library. To use J48 with a categorical dataset in Python, you can use the WEKA library or the scikit-learn library.

Here's an example of how to use J48 with a categorical dataset in Python using the scikit-learn library:

Note: In this example, we're using the entropy criterion for splitting the nodes in the decision tree. You can try using the gini index criterion as well to see which one works better for your dataset.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score  
from sklearn.preprocessing import LabelEncoder 

# Load your categorical dataset
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Data/CGPA.csv')

# Encode categorical variables into numerical values
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == "object":
        df[col] = le.fit_transform(df[col])


# Split the data into features (X) and target (y)
X = df.drop("CGPA Category", axis=1)
y = df["CGPA Category"]


# Split the data into training and testing sets
# train_data, test_data, train_labels, test_labels = train_test_split(df.drop('CGPA Category', axis=1), df['CGPA Category'], test_size=0.2)
train_data, test_data, train_labels, test_labels = train_test_split(X, y, test_size=0.2)

# Train the J48 classifier on the training data
j48 = DecisionTreeClassifier(criterion='entropy')
j48.fit(train_data, train_labels)

# Make predictions on the test data
predictions = j48.predict(test_data)

# Evaluate the accuracy of the model
accuracy = accuracy_score(test_labels, predictions)
print("Accuracy:", accuracy)


Accuracy: 0.5714285714285714


# Naive Bayes model test with categorical dataset using python

Naive Bayes is a probabilistic algorithm that is commonly used for classification problems. To use Naive Bayes with a categorical dataset in Python, you can use the scikit-learn library.

Here's an example of how to use Naive Bayes with a categorical dataset in Python:

Note: In this example, we're using the Gaussian Naive Bayes algorithm. If the features in your dataset are not continuous, you can use the Multinomial Naive Bayes or the Bernoulli Naive Bayes algorithm.

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder 
from sklearn.metrics import accuracy_score 

# Load the dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Data/CGPA.csv")

# Encode categorical variables into numerical values
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == "object":
        df[col] = le.fit_transform(df[col])

# Split the data into features (X) and target (y)
X = df.drop("CGPA Category", axis=1)
y = df["CGPA Category"]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train a Naive Bayes model
model = CategoricalNB()
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)


Accuracy:  0.42857142857142855


# Artificial neural network classifier model test with categorical dataset using python

Here's an example of how you could create and evaluate a neural network classifier for a categorical dataset in Python using the Keras library:

Note that in this example, the dataset.csv file should contain the features and categorical labels of the dataset, with the labels stored in a column named "label". Additionally, you may need to adjust the hyperparameters (such as the number of hidden units and the number of epochs) depending on the specific characteristics of your dataset.

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense

# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Data/CGPA.csv')


# Encode categorical variables into numerical values
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == "object":
        df[col] = le.fit_transform(df[col])


# Split the dataset into features (X) and labels (y)
X = df.drop('CGPA Category', axis=1)
y = df['CGPA Category']

# Encode the categorical labels as integers
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

# Convert the integer-encoded labels to one-hot encodings
y_one_hot = to_categorical(y_encoded)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_one_hot, test_size=0.2, random_state=42)

# Build the neural network model
model = Sequential()
model.add(Dense(32, input_dim=X.shape[1], activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(y_one_hot.shape[1], activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', test_accuracy)


Test accuracy: 0.2857142984867096




---


                                          #  END  # 

---

