# **CSMODEL Project: AI Copilot in Education Analysis**
Group 5

## **1. Introduction**

This dataset was made by Emille Villacerat and Celbert Himang as part of their paper “Data on behavioural intention to use AI copilot through TAM and AI ecological education policy lens” last May 13, 2025. According to the paper, the study introduces a dataset examining the behavioral intention of the faculty and students of Cebu Technological University to adopt AI Copilot. The analysis is grounded in the Technology Acceptance Model (TAM) and the AI Ecological Education Policy Framework. Data was gathered through a quantitative survey, administered digitally to a diverse group of participants, including professors and students from different academic departments and year levels.<br>

The researchers gathered the data through an online, five point Likert Scale survey to assess the perceptions of AI Copilot adoption amongst respondents of Cebu Technological University. The questions used in the survey were derived from existing literature on technology acceptance and educational frameworks.

Each row in the dataset corresponds to one respondent. 414 responses were gathered, however only 396 were deemed valid responses due to low variability of answers from 18 respondents. The whole survey, ranging from demographics, to likert scale questions, totals to 45 variables.

## **2. Project Requirements**

### **Importing Libraries**

In [345]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

### **Loading the Dataset**

In [346]:
# Clone github repository for the data
!git clone https://github.com/CyAdrienneRamos/csmodel-mco.git
# Load the dataset into a dataframe
file_path_raw = '/content/csmodel-mco/dataset/survey_data_raw.xls'
raw_df = pd.read_excel(file_path_raw, nrows=414)

fatal: destination path 'csmodel-mco' already exists and is not an empty directory.


## **3. Data Preprocessing**

### **Initial Look Into the Dataset**

We start the data preprocessing process by inspecting the `head()` and `tail()` of the dataframe. We do this to do a first check for the completeness of the rows and columns of the dataset. It also helps identify preparations needed for the rest of the process.

#### Inspecting the Head and Tail

We can immediately see that the column labels are very long and messy. There are also unnecessary columns that we will deal with later. Other than those, the starting and ending rows match the dataset.



In [347]:
# Check if the data is properly imported
raw_df.head(3)

Unnamed: 0,TimeStamp(OnOpenForm),Timestamp(OnClickSubmit),Duration(in Minutes),"I am aware of the purpose of this survey questionnaire on AI Copilot in Education: A Study on Usage and Acceptance through TAM and AI Ecological Policy Lens. The survey is designed to understand my experiences, perceptions, and attitudes toward using AI",Age:,Sex:,Highest Educational Attainment,Type of respondent,College:,"If student, specify year level",...,PEOU2:My interaction with AI Copilot does not require much mental effort.,PEOU3: It is easy to become skillful using AI Copilot,PEOU4: I found AI Copilot easy to use.,PEOU5: It would be easy for me to find information using AI Copilot.,ITU1: I intend to use AI Copilot to a greater extent.,ITU2: I think doing my schoolwork using AI Copilot is interesting.,ITU3: I believe that AI Copilot is a valuable tool for doing my schoolwork.,ITU4: I will recommend AI Copilot to another schoolmate or colleague.,ITU5: I believe that AI Copilot has given me a unique experience.,Please check the box(es) corresponding to the barriers and challenges that may be encountered in the continuous usage of AI Copilot
0,2024-10-17 21:24:58.000,2024-10-17 21:36:58.000,12,I consent voluntarily to participate in this s...,21,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,...,4,4,5,5,4,4,5,4,4,AI Copilot ethical dilemma
1,2024-10-17 21:17:47.246,2024-10-17 21:28:47.246,11,I consent voluntarily to participate in this s...,19,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,...,5,5,5,4,3,5,5,5,5,"AI Copilot ethical dilemma, Inflexible teachin..."
2,2024-10-17 21:20:07.997,2024-10-17 21:34:07.997,14,I consent voluntarily to participate in this s...,19,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,...,3,2,3,5,4,4,3,3,4,AI Copilot ethical dilemma


In [348]:
# Check if the data is properly imported
raw_df.tail(3)

Unnamed: 0,TimeStamp(OnOpenForm),Timestamp(OnClickSubmit),Duration(in Minutes),"I am aware of the purpose of this survey questionnaire on AI Copilot in Education: A Study on Usage and Acceptance through TAM and AI Ecological Policy Lens. The survey is designed to understand my experiences, perceptions, and attitudes toward using AI",Age:,Sex:,Highest Educational Attainment,Type of respondent,College:,"If student, specify year level",...,PEOU2:My interaction with AI Copilot does not require much mental effort.,PEOU3: It is easy to become skillful using AI Copilot,PEOU4: I found AI Copilot easy to use.,PEOU5: It would be easy for me to find information using AI Copilot.,ITU1: I intend to use AI Copilot to a greater extent.,ITU2: I think doing my schoolwork using AI Copilot is interesting.,ITU3: I believe that AI Copilot is a valuable tool for doing my schoolwork.,ITU4: I will recommend AI Copilot to another schoolmate or colleague.,ITU5: I believe that AI Copilot has given me a unique experience.,Please check the box(es) corresponding to the barriers and challenges that may be encountered in the continuous usage of AI Copilot
411,2024-10-28 12:41:05.672,2024-10-28 12:50:05.672,9,I consent voluntarily to participate in this s...,19,Female,College Level,Student,"College of Computer, Information and Communica...",Second Year,...,2,2,4,4,4,4,2,4,4,"AI Copilot ethical dilemma, AI Copilot privacy..."
412,2024-10-28 14:36:35.273,2024-10-28 14:50:35.273,14,I consent voluntarily to participate in this s...,19,Female,College Level,Student,College of Management and Entrepreneurship,Second Year,...,3,3,3,3,3,3,3,4,4,"AI Copilot ethical dilemma, AI Copilot privacy..."
413,2024-10-28 19:15:49.326,2024-10-28 19:29:49.326,14,I consent voluntarily to participate in this s...,19,Female,College Level,Student,College of Management and Entrepreneurship,Second Year,...,3,3,4,4,3,4,5,5,5,"AI Copilot privacy concern, AI Copilot gives i..."


#### Cleaning the columns

We can start by removing the unnecessary columns and renaming the rest with more concise labels and use the question codes from the survey questionnaire if applicable. We will also not be using the columns for timestamps, duration, and consent.

In [349]:
# We will not be using the timestamps, duration, and consent
raw_df = raw_df.drop(columns=raw_df.columns[[0, 1, 3]])
raw_df.columns

Index(['Duration(in Minutes)', 'Age:', 'Sex:',
       'Highest  Educational Attainment', 'Type of respondent', 'College:',
       'If student, specify year level', 'Years of AI usage',
       'RAE1: The assessment design should allow AI such as AI Copilot to enhance learning outcomes.',
       'RAE2: An assessment must focus on students' comprehension',
       'RAE3: Assessment must be automated scoring.',
       'RAE4: Teachers must use portfolio assessments to track students’ progress. ',
       'RAE5: Teachers must use rubrics to set clear performance standards, allowing for fair and organized assessment of student work.',
       'RAE6: The assessment design should not allow AI Copilot to enhance learning outcomes. ',
       'GS1: Teachers must teach the students to evaluate the credibility of AI Copilot-generated content.',
       'GS2: Teachers must create learning activities incorporating instructional strategies that enhance students’ critical thinking skills.',
       'GS3: The

In [350]:
# We first take the parts before each ":"
raw_df.columns = raw_df.columns.str.split(':').str[0]
raw_df.columns

Index(['Duration(in Minutes)', 'Age', 'Sex', 'Highest  Educational Attainment',
       'Type of respondent', 'College', 'If student, specify year level',
       'Years of AI usage', 'RAE1', 'RAE2', 'RAE3', 'RAE4', 'RAE5', 'RAE6',
       'GS1', 'GS2', 'GS3', 'GS4', 'GS5', 'PS1', 'PS2', 'PS3', 'PS4', 'PS5',
       'BA1', 'BA2', 'BA3', 'BA4', 'BA5', 'PU1', 'PU2', 'PU3', 'PU4', 'PU5',
       'PU6', 'PEOU1', 'PEOU2', 'PEOU3', 'PEOU4', 'PEOU5', 'ITU1', 'ITU2',
       'ITU3', 'ITU4', 'ITU5',
       'Please check the box(es) corresponding to the barriers and challenges that may be encountered in the continuous usage of AI Copilot'],
      dtype='object')

In [351]:
# Then we list the replacement names for some of the labels
columns_keys_map = {
    'Duration(in Minutes)':'Duration',
    'Highest  Educational Attainment':'Education',
    'Type of respondent':'Respondent',
    'If student, specify year level':'YearLevel',
    'Years of AI usage':'UsageYears',
    'Please check the box(es) corresponding to the barriers and challenges that may be encountered in the continuous usage of AI Copilot':'Barriers'
}
# Then apply the changes
raw_df.columns = [columns_keys_map.get(x, x) for x in raw_df.columns]
raw_df.columns

Index(['Duration', 'Age', 'Sex', 'Education', 'Respondent', 'College',
       'YearLevel', 'UsageYears', 'RAE1', 'RAE2', 'RAE3', 'RAE4', 'RAE5',
       'RAE6', 'GS1', 'GS2', 'GS3', 'GS4', 'GS5', 'PS1', 'PS2', 'PS3', 'PS4',
       'PS5', 'BA1', 'BA2', 'BA3', 'BA4', 'BA5', 'PU1', 'PU2', 'PU3', 'PU4',
       'PU5', 'PU6', 'PEOU1', 'PEOU2', 'PEOU3', 'PEOU4', 'PEOU5', 'ITU1',
       'ITU2', 'ITU3', 'ITU4', 'ITU5', 'Barriers'],
      dtype='object')

In [352]:
raw_df.head()

Unnamed: 0,Duration,Age,Sex,Education,Respondent,College,YearLevel,UsageYears,RAE1,RAE2,...,PEOU2,PEOU3,PEOU4,PEOU5,ITU1,ITU2,ITU3,ITU4,ITU5,Barriers
0,12,21,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,0 - 1 year,5,5,...,4,4,5,5,4,4,5,4,4,AI Copilot ethical dilemma
1,11,19,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,0 - 1 year,5,3,...,5,5,5,4,3,5,5,5,5,"AI Copilot ethical dilemma, Inflexible teachin..."
2,14,19,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,1 - 2 years,5,5,...,3,2,3,5,4,4,3,3,4,AI Copilot ethical dilemma
3,15,19,Male,College Level,Student,"College of Computer, Information and Communica...",Second Year,1 - 2 years,4,4,...,4,4,4,4,4,3,3,3,3,"AI Copilot privacy concern, Insufficient budge..."
4,13,19,Female,College Level,Student,"College of Computer, Information and Communica...",Second Year,0 - 1 year,4,5,...,4,4,4,4,4,4,4,4,4,Insufficient budget for the adoption of AI Cop...


#### Completeness and Data Types

Looking at the `df.info()`, we can see that we have some null values in the `YearLevel` and `Barriers` columns. Some column data types also defaulted to objects. We will be dealing with these in the following sections.

In [353]:
# Here we see the columns, number of rows, data types, and null counts
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 414 entries, 0 to 413
Data columns (total 46 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Duration    414 non-null    int64 
 1   Age         414 non-null    int64 
 2   Sex         414 non-null    object
 3   Education   414 non-null    object
 4   Respondent  414 non-null    object
 5   College     414 non-null    object
 6   YearLevel   387 non-null    object
 7   UsageYears  414 non-null    object
 8   RAE1        414 non-null    int64 
 9   RAE2        414 non-null    int64 
 10  RAE3        414 non-null    int64 
 11  RAE4        414 non-null    int64 
 12  RAE5        414 non-null    int64 
 13  RAE6        414 non-null    int64 
 14  GS1         414 non-null    int64 
 15  GS2         414 non-null    int64 
 16  GS3         414 non-null    int64 
 17  GS4         414 non-null    int64 
 18  GS5         414 non-null    int64 
 19  PS1         414 non-null    int64 
 20  PS2       

#### Removing the 18 excluded respondents

Since the paper did not mention the specific responses that they have excluded, then we will attempt to identify the 18 'low variability' responses. An assumption would be that these will be the 18 lowest standard deviation for the 1-5 Likert scale questions.

In [354]:
# Take Likert scale questions
likert = raw_df.loc[:, 'RAE1':'ITU5']
# Compute for std
raw_df['likert_std'] = likert.std(axis=1)
# Remove the suspicious entries
df = raw_df.sort_values(by='likert_std').iloc[18:]

We then verify the values of the `clean_df` with the values shown in the paper.


In [355]:
# The age group counts match the paper's values
age_bins = [0, 24, 34, 44, 54, 100]
age_bin_labels = ['18-24 years', '25-34 years', '35-44 years', '45-54 years', '55+ years']
df['AgeGroup'] =  pd.cut(df['Age'], bins=age_bins, labels=age_bin_labels)
df['AgeGroup'].value_counts()

Unnamed: 0_level_0,count
AgeGroup,Unnamed: 1_level_1
18-24 years,374
25-34 years,12
35-44 years,6
45-54 years,3
55+ years,1


In [356]:
# The college counts match the paper's values
df['College'].value_counts()

Unnamed: 0_level_0,count
College,Unnamed: 1_level_1
"College of Computer, Information and Communications Technology",138
College of Technology,71
College of Education,67
College of Engineering,46
College of Arts and Sciences,35
College of Management and Entrepreneurship,34
College of Nursing,3
College of Hospitality and tourism management,1
College of Customs,1


In [357]:
# The mean values for these columns match the values presented in the paper.
df.loc[:, 'RAE1':'RAE5'].mean()

Unnamed: 0,0
RAE1,3.646465
RAE2,4.090909
RAE3,3.419192
RAE4,4.025253
RAE5,4.280303


Based on the tests, we now accept this cleaned dataframe containing the data used in the paper.

### **Data Cleaning - Demographic Section**

#### Age

There are no problems with the Age column.

In [358]:
# No problems with the values
df['Age'].describe()

Unnamed: 0,Age
count,396.0
mean,20.757576
std,4.498868
min,18.0
25%,19.0
50%,20.0
75%,21.0
max,57.0


#### Sex

We convert the data type to categorical.

In [359]:
# No problems with the unique values and counts
df['Sex'].value_counts(dropna=False)

Unnamed: 0_level_0,count
Sex,Unnamed: 1_level_1
Female,200
Male,196


In [360]:
# Convert to categorical
df['Sex'] = pd.Categorical(df['Sex'], categories=df['Sex'].unique())

#### Respondent

We convert the data type to categorical.

In [361]:
# No problems with the unique values and counts
df['Respondent'].value_counts(dropna=False)

Unnamed: 0_level_0,count
Respondent,Unnamed: 1_level_1
Student,372
Teacher,24


In [362]:
# Convert to categorical
df['Respondent'] = pd.Categorical(df['Respondent'], categories=df['Respondent'].unique())

#### Usage Years

We convert the data type to categorical.

In [363]:
# No problems with the unique values and counts
df['UsageYears'].value_counts(dropna=False)

Unnamed: 0_level_0,count
UsageYears,Unnamed: 1_level_1
1 - 2 years,185
0 - 1 year,137
3 - 4 years,47
beyond 4 years,27


In [364]:
# Convert to categorical
df['UsageYears'] = pd.Categorical(df['UsageYears'], categories=df['UsageYears'].unique())

#### College


We remove respondents from colleges not under the main Cebu Technological University. We then convert the data type to categorical.

6 rows were dropped in the process.

In [365]:
# Checking the unique values, we can see that there are multiple universities
# with relatively very low counts. These are colleges outside CTU.
df['College'].value_counts(dropna=False)

Unnamed: 0_level_0,count
College,Unnamed: 1_level_1
"College of Computer, Information and Communications Technology",138
College of Technology,71
College of Education,67
College of Engineering,46
College of Arts and Sciences,35
College of Management and Entrepreneurship,34
College of Nursing,3
College of Hospitality and tourism management,1
College of Customs,1


In [366]:
# Identify colleges to remove
colleges_to_remove = [
    'College of Nursing',
    'College of Hospitality and tourism management ',
    'College of Customs',
    'College of Medical Technology ',
]
# Remove the colleges
df = df[~df['College'].isin(colleges_to_remove)]
df['College'].value_counts()

Unnamed: 0_level_0,count
College,Unnamed: 1_level_1
"College of Computer, Information and Communications Technology",138
College of Technology,71
College of Education,67
College of Engineering,46
College of Arts and Sciences,35
College of Management and Entrepreneurship,34


In [367]:
# Convert to categorical
df['College'] = pd.Categorical(df['College'], categories=df['College'].unique())

#### Year Level

Looking at the 27 respondents with missing YearLevel entries, we can see that 23 were teachers while 4 were students. Since it does not make sense for a student to not have a YearLevel entry, we will be dropping them. As for the teachers, we will assume that those with missing values are currently not studying and will be given a "Teaching" value for this column. Lastly, we convert the data type to categorical.

In [368]:
df['YearLevel'].value_counts(dropna=False)

Unnamed: 0_level_0,count
YearLevel,Unnamed: 1_level_1
Second Year,192
First Year,93
Third Year,52
Fourth Year,27
,27


In [369]:
# If the respodent is a teacher, then it would make sense for the YearLevel to
# be missing
df[df['YearLevel'].isna()]['Respondent'].value_counts()

Unnamed: 0_level_0,count
Respondent,Unnamed: 1_level_1
Teacher,23
Student,4


In [370]:
# Found 4 students with missing YearLevel, we will be dropping their
# corresponding rows
df = df[~((df['YearLevel'].isna()) & (df['Respondent'] == 'Student'))]
# We will be using the value "Teaching" for the teachers
df['YearLevel'] = df['YearLevel'].fillna('Teaching')
df['YearLevel'].value_counts(dropna=False)

Unnamed: 0_level_0,count
YearLevel,Unnamed: 1_level_1
Second Year,192
First Year,93
Third Year,52
Fourth Year,27
Teaching,23


We convert the data type to categorical.

In [371]:
# Convert to categorical
df['YearLevel'] = pd.Categorical(df['YearLevel'], categories=df['YearLevel'].unique())

#### Education

Looking at the unique values and distribution of the respondents, we will be dropping this column due to the lack of support for the options outside the vague "College Level" option.

In [372]:
df.groupby('Respondent', observed=False)['Education'].value_counts()

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Respondent,Education,Unnamed: 2_level_1
Student,College Level,352
Student,Bachelor's Degree,8
Student,Currently A 2nd Year College Student,1
Student,Senior High School Graduate,1
Student,Still a college student,1
Student,"Doctorate Degree (PhD, EdD)",0
Student,Master's Degree,0
Student,"Professional Degree (JD, MD)",0
Teacher,Bachelor's Degree,8
Teacher,Master's Degree,8


In [373]:
# Drop the Education column
df = df.drop(['Education'], axis=1)

### **Data Cleaning - Main Section**

#### Barriers

The corresponding question for the `Barrier` column is a checkbox style question, which allows the respondent to select multiple options or provide their own through the "Others" option. In the dataset, these options are all joined together using commas.

We will be cleaning the data and separating it into the options as provided in the survey. As the rows with input given in the "Others" option are very few and most of these do have other options selected, we will be ignoring those "Others" input and simply indicate that they have chosen to write their own option.

In [374]:
# Check the values of the Barriers column
df['Barriers']

Unnamed: 0,Barriers
359,AI Copilot gives inappropriate or misaligned t...
400,Lack of cognitive scaffolding for students
338,Insufficient budget for the adoption of AI Cop...
195,"AI Copilot privacy concern, Inflexible teachin..."
408,"AI Copilot ethical dilemma, AI Copilot privacy..."
...,...
102,"AI Copilot ethical dilemma, AI Copilot privacy..."
149,"AI Copilot ethical dilemma, AI Copilot privacy..."
265,"Inflexible teaching methods and curricula, Lac..."
315,Insufficient budget for the adoption of AI Cop...


We first deal with the missing value.

In [375]:
# Deal with the missing value
df['Barriers'] = df['Barriers'].fillna('NA')

We then split the entries into the individual options selected.

In [376]:
# Provided options in the survey and their codes
# NA was included to match the placeholder
barriers_provided = {
    'AI Copilot ethical dilemma':'BC1',
    'AI Copilot privacy concern':'BC2',
    'AI Copilot gives inappropriate or misaligned text suggestions':'BC3',
    'Inflexible teaching methods and curricula':'BC4',
    'Insufficient budget for the adoption of AI Copilot':'BC5',
    'Lack of AI literacy':'BC6',
    'Lack of cognitive scaffolding for students':'BC7',
    'Lack of teacher confidence and digital competence':'BC8',
    'Unequal access to AI Copilot':'BC9',
    'User resistance to AI Copilot adoption':'BC10',
    'NA':'NA'
}

# Split individual options
# - The regex splits them by the commas, ignoring commas within sentences
# - The strip(', ') deals with blank input on "Others" option
barriers_split = [set(map(str.strip, re.split(r',\s+(?=[A-Z])', x.strip(', ')))) for x in df['Barriers']]

# Get all unique chosen options
barriers = set.union(*barriers_split)
barriers

{'AI Copilot ethical dilemma',
 'AI Copilot gives inappropriate or misaligned text suggestions',
 'AI Copilot privacy concern',
 'Decline in the cognitive processing of students as they will become dependent on AI usage rather than actively creating, participating, and solving their own problems.',
 'Dependency and misuse of AI Copilot',
 "I don't have any experience yet using AI Copilot",
 'Inflexible teaching methods and curricula',
 'Insufficient budget for the adoption of AI Copilot',
 'Lack of AI literacy',
 'Lack of cognitive scaffolding for students',
 'Lack of cognitive scaffolding for students, students might rely permanently on AI Copilot',
 'Lack of teacher confidence and digital competence',
 "Like sometimes they don't give you accurate answers or the answers you've been looking for. Nothing beats experiences from teachers.",
 'NA',
 'Nahimong tapulan ang mga  people tungod naay AI',
 'No further knowledge about AI copilot',
 'Over-reliance and Skill Degradation',
 "Student

In [377]:
# Get other inputs
barriers_others = barriers.difference(set(barriers_provided.keys()))
barriers_others

{'Decline in the cognitive processing of students as they will become dependent on AI usage rather than actively creating, participating, and solving their own problems.',
 'Dependency and misuse of AI Copilot',
 "I don't have any experience yet using AI Copilot",
 'Lack of cognitive scaffolding for students, students might rely permanently on AI Copilot',
 "Like sometimes they don't give you accurate answers or the answers you've been looking for. Nothing beats experiences from teachers.",
 'Nahimong tapulan ang mga  people tungod naay AI',
 'No further knowledge about AI copilot',
 'Over-reliance and Skill Degradation',
 "Student won't be learning in the correct way, but some can still use AI for learning and self studying.",
 'Students might weaken their ability to think critically and logically'}

In [378]:
# Inspect one of the entries
target_row = [barriers_split.index(row) for row in barriers_split if 'Lack of cognitive scaffolding for students, students might rely permanently on AI Copilot' in row][0]
barriers_split[target_row]

{'Lack of cognitive scaffolding for students, students might rely permanently on AI Copilot'}

In [379]:
# Fix that entry
barriers_split[target_row] = {'Lack of cognitive scaffolding for students', 'Students might rely permanently on AI Copilot'}
barriers = set.union(*barriers_split)
barriers

{'AI Copilot ethical dilemma',
 'AI Copilot gives inappropriate or misaligned text suggestions',
 'AI Copilot privacy concern',
 'Decline in the cognitive processing of students as they will become dependent on AI usage rather than actively creating, participating, and solving their own problems.',
 'Dependency and misuse of AI Copilot',
 "I don't have any experience yet using AI Copilot",
 'Inflexible teaching methods and curricula',
 'Insufficient budget for the adoption of AI Copilot',
 'Lack of AI literacy',
 'Lack of cognitive scaffolding for students',
 'Lack of teacher confidence and digital competence',
 "Like sometimes they don't give you accurate answers or the answers you've been looking for. Nothing beats experiences from teachers.",
 'NA',
 'Nahimong tapulan ang mga  people tungod naay AI',
 'No further knowledge about AI copilot',
 'Over-reliance and Skill Degradation',
 "Student won't be learning in the correct way, but some can still use AI for learning and self studyin

Construct a dataframe from the selected barrier options.

In [380]:
# Convert to df with option codes as column names and 1 to indicate selection
# Count options not in the provided options as "BCX" which corresponds to the
# "Others" option
barriers_df = pd.DataFrame([{barriers_provided.get(item, "BCX"): 1 for item in row} for row in barriers_split]).fillna(0)
# Rearrange the columns, remove the placeholder NA
barriers_df = barriers_df[['BC1', 'BC2', 'BC3', 'BC4', 'BC5', 'BC6', 'BC7', 'BC8', 'BC9', 'BC10', 'BCX']]
barriers_df

Unnamed: 0,BC1,BC2,BC3,BC4,BC5,BC6,BC7,BC8,BC9,BC10,BCX
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
3,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
382,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
383,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
384,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0
385,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0


Change the `Barriers` column with the new columns.

In [381]:
# Replace the Barriers column with the new columns
df = df.drop(['Barriers'], axis=1)
df = pd.concat([df, barriers_df], axis=1)

In [382]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 413 entries, 359 to 373
Data columns (total 57 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Duration    387 non-null    float64 
 1   Age         387 non-null    float64 
 2   Sex         387 non-null    category
 3   Respondent  387 non-null    category
 4   College     387 non-null    category
 5   YearLevel   387 non-null    category
 6   UsageYears  387 non-null    category
 7   RAE1        387 non-null    float64 
 8   RAE2        387 non-null    float64 
 9   RAE3        387 non-null    float64 
 10  RAE4        387 non-null    float64 
 11  RAE5        387 non-null    float64 
 12  RAE6        387 non-null    float64 
 13  GS1         387 non-null    float64 
 14  GS2         387 non-null    float64 
 15  GS3         387 non-null    float64 
 16  GS4         387 non-null    float64 
 17  GS5         387 non-null    float64 
 18  PS1         387 non-null    float64 
 19  PS2        