<a href="https://www.kaggle.com/code/rafiadsadat/online-learning-effectiveness-prediction?scriptVersionId=100842498" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

#### **Importing Necessary Libraries:**

In [1]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import OrdinalEncoder
from sklearn.feature_selection import chi2

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

#### **Loading Dataset:**

In [2]:
data = pd.read_csv("../input/student-flexibility-in-online-learning/students_adaptability_level_online_education.csv")
data

Unnamed: 0,Education Level,Institution Type,Gender,Age,Device,IT Student,Location,Financial Condition,Internet Type,Network Type,Flexibility Level
0,University,Private,Male,23,Tab,No,Town,Mid,Wifi,4G,Moderate
1,University,Private,Female,23,Mobile,No,Town,Mid,Mobile Data,4G,Moderate
2,College,Public,Female,18,Mobile,No,Town,Mid,Wifi,4G,Moderate
3,School,Private,Female,11,Mobile,No,Town,Mid,Mobile Data,4G,Moderate
4,School,Private,Female,18,Mobile,No,Town,Poor,Mobile Data,3G,Low
...,...,...,...,...,...,...,...,...,...,...,...
1200,College,Private,Female,18,Mobile,No,Town,Mid,Wifi,4G,Low
1201,College,Private,Female,18,Mobile,No,Rural,Mid,Wifi,4G,Moderate
1202,School,Private,Male,11,Mobile,No,Town,Mid,Mobile Data,3G,Moderate
1203,College,Private,Female,18,Mobile,No,Rural,Mid,Wifi,4G,Low


There are 11 features including the target feature and 1205 records in the dataset.

#### **Data Analysis:**

In [3]:
data.dtypes

Education Level        object
Institution Type       object
Gender                 object
Age                     int64
Device                 object
IT Student             object
Location               object
Financial Condition    object
Internet Type          object
Network Type           object
Flexibility Level      object
dtype: object

Observing the above information we conclude that all the features except **Age** are catagorical in nature and **Flexibility Level** is our target feature.

In [4]:
print(f'Unique values of "Age" : {len(data.Age.unique())}')

Unique values of "Age" : 6


As "Age" has only 6 unique values, we can convert it into a categorical data as well.

In [5]:
data.Age = data.Age.astype(str)
data.dtypes

Education Level        object
Institution Type       object
Gender                 object
Age                    object
Device                 object
IT Student             object
Location               object
Financial Condition    object
Internet Type          object
Network Type           object
Flexibility Level      object
dtype: object

**As we have categorical input and catagorical output, we will be testing Random Forest(RF), K Nearest Neighbors(KNN) and Support Vector Machine(SVM) models on this dataset.**

#### **Feature Engineering:**

**Handling Missing Values:**

In [6]:
data.isna().sum()

Education Level        0
Institution Type       0
Gender                 0
Age                    0
Device                 0
IT Student             0
Location               0
Financial Condition    0
Internet Type          0
Network Type           0
Flexibility Level      0
dtype: int64

There are no missing values in the dataset.

**Encoding Features:**

In [7]:
data = pd.DataFrame(OrdinalEncoder(dtype = np.int64).fit_transform(data), columns=data.columns)
data.head()

Unnamed: 0,Education Level,Institution Type,Gender,Age,Device,IT Student,Location,Financial Condition,Internet Type,Network Type,Flexibility Level
0,2,0,1,3,2,0,1,0,1,2,2
1,2,0,0,3,1,0,1,0,0,2,2
2,0,1,0,2,1,0,1,0,1,2,2
3,1,0,0,1,1,0,1,0,0,2,2
4,1,0,0,2,1,0,1,1,0,1,1


**Feature Selection:**

In [8]:
fValues, pValues = chi2(data[data.columns[:len(data.columns)-1]],data[data.columns[-1]])
pValues
print(*(f'{data.columns[i]} : {round((fValues[i]),2)}' for i in range(len(data.columns)-1)), sep = '\n')

Education Level : 8.66
Institution Type : 73.15
Gender : 6.05
Age : 5.03
Device : 3.77
IT Student : 14.65
Location : 18.44
Financial Condition : 167.25
Internet Type : 12.13
Network Type : 3.95


From Chi Square Test we can conclude that "Device" is an irrelivant feature and thus can be droped from the dataset.

In [9]:
data.drop(["Device"], axis = 1, inplace = True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1205 entries, 0 to 1204
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   Education Level      1205 non-null   int64
 1   Institution Type     1205 non-null   int64
 2   Gender               1205 non-null   int64
 3   Age                  1205 non-null   int64
 4   IT Student           1205 non-null   int64
 5   Location             1205 non-null   int64
 6   Financial Condition  1205 non-null   int64
 7   Internet Type        1205 non-null   int64
 8   Network Type         1205 non-null   int64
 9   Flexibility Level    1205 non-null   int64
dtypes: int64(10)
memory usage: 94.3 KB


#### **Model Application:**

In [10]:
features = data[data.columns[:len(data.columns)-1]]
target = data[data.columns[-1]]

In [11]:
trainFeatures, testFeatures, trainTarget, testTarget = train_test_split(features, target, test_size = 0.2, random_state = 1)

In [12]:
models = [RandomForestClassifier(), KNeighborsClassifier(), SVC()]

for model in models:
    model.fit(trainFeatures, trainTarget)
    predTarget = model.predict(testFeatures)
    print(f'Model: {model}\nAccuracy Score:{round(accuracy_score(testTarget, predTarget)*100, 2)}%\n')

Model: RandomForestClassifier()
Accuracy Score:80.5%

Model: KNeighborsClassifier()
Accuracy Score:75.52%

Model: SVC()
Accuracy Score:65.56%

