# **Student Applications**

This notebook focuses on processing student application data. The goal is to clean and prepare the dataset for analysis — specifically to identify students who are compatible with the Accelerator Data Science Bootcamp.

## Import dependencies:

In [847]:
import pandas as pd

## Load the data:

In [848]:
applications_df = pd.read_csv("../data/skills_test_student_applications.csv")

## Data exploration functions:

In [849]:
def column_missing_values(df, column_name):
    if df[column_name].isnull().sum() > 0:
        if df[column_name].isnull().sum() == df.shape[0]:
            return f"'{column_name}' is empty."
        else:
            return column_name

In [850]:
def get_total_columns_missing_values(df):
    columns_with_missing_values = []

    for column_name in df.columns:
        missing_values_column = column_missing_values(df, column_name)
        if missing_values_column:
            columns_with_missing_values.append(missing_values_column)

    return len(columns_with_missing_values), columns_with_missing_values

In [851]:
def column_type(df, column_name):
    return df[column_name].dtype

## Assertion functions:

In [852]:
def is_column_values_unique(df, column_name):
    assert df[
        column_name
    ].is_unique, f"Expected True, meaning all values are unique, but got '{df[column_name].is_unique}', which means all values are not unique."

## Explore the data:

In [853]:
pd.set_option("display.max_columns", None)

In [854]:
applications_df.head()

Unnamed: 0,Application ID,Learner ID,Gender,Race,Nationality,Country of residence,Registration date,Application date,Total score,Literacy score,Logic score,Sequence score,Problem solving score,Numeracy score,Statistics score,Has Taken Aptitude Test,Aptitude test date attempted,Has Coding experience,Status,Programming language experince,Code Challenge plagirism,Code challenge time taken(min),Coding challenge completed,Coding challenge score,Code challenge multiple choice score,Code challenge final score,Code challenge cheat tab leaving (no.),Code challenge cheat plagiarism,Code challenge cheat pasted code,Code challenge cheat suspicious activity,Code challenge cheat ai usage,Coding experience description,Other programming language experince specify,Learner holds coding certs,Learner coding certs list,Learner program,Learner career streams,has_completed_application,application_accepted,pathways,is_south_african_citizen,learner_referral,how_far_away_from_city,internet_access_type,residential_area_type,internet_data_access_type,other_specify_residential_area_type,other_specify_internet_access_type,learner_comfortable_disclosing_income,income_currency,learner_monthly_income,has_yoma_platform,computer_access_type,other_programming_language_specify,has_criminal_record,home_language,has_disability_or_differently_abled,nearest_city,is_refugee,preferred_language_of_instruction,is_currently_employed,employment_type,other_race_specify,english_proficiency,second_language,highest_education_level,field_of_current_education,is_currently_completing_education,ever_regisitered_for_yes4youth,is_umuzi_alumnus,current_institution_study,currently_studying,institution_highest_education,qualification_highest_education,career_stream,asresidential_area_type,career_stream.1,age_formula,age_range,age_range_order,application_status
0,41581,572,Man,African/Black,South African,South Africa,2024-05-11T13:27:10.490Z,2025-03-22T12:03:12.669Z,48,8,9,7,7,7,10,,2025-03-30T19:03:46.734Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",,,[null],,,6-20km,Mobile phone (using mobile data),Urban,"I have reliable internet (wifi/data), more tha...",,,,,,,I have my own computer,,,Sepedi,,,,3.0,,Seeking employment,,"I can read, write and speak English easily",English,High School,,,,,,,Pretoria Technical High School,Diploma,"[""Full Stack Web Development""]",,,27,26-35,3,Applied
1,73815,857,Woman,African/Black,South African,South Africa,2024-05-12T18:55:56.670Z,2025-07-21T07:32:00.689Z,51,9,8,9,9,8,8,,2024-05-21T21:04:48.489Z,True,Applied,"[""C++"",""HTML/CSS"",""JavaScript"",""Python""]",,,,,,,,,,,,Self-Taught with Limited Projects: I am primar...,,False,,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Software Engine...",,,[null],,,6-20km,"Home broadband or Wifi (e.g., ADSL, fiber)",Township,"I have reliable internet (wifi/data), more tha...",,,,,,,I have my own computer but it is faulty,,,siSwati,,,,3.0,,Seeking employment,,"I can read, write and speak English easily",English,High School,,,,,,,Sybrand van Niekerk High School,National Senior Certificate,"[""Full Stack Web Development"",""Software Engine...",,,26,26-35,3,Applied
2,46124,908,Man,African/Black,South African,South Africa,2024-05-13T03:47:02.685Z,2025-04-15T22:45:48.117Z,43,9,7,6,3,9,9,,2025-04-16T11:19:07.408Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Data Science "",""Full Stack Web Development"",...",,,[null],,,6-20km,Mobile phone (using mobile data),Township,"I have reliable internet (wifi/data), more tha...",,,,,,,I have my own computer,,,siSwati,,,,3.0,,Studying,,"I can read, write and speak English easily",English,High School,,,,,,,Thembeka secondary school,beng,"[""Data Science "",""Full Stack Web Development"",...",,,30,26-35,3,Applied
3,254193,928,Man,African/Black,South African,South Africa,2024-05-13T06:12:08.009Z,2025-09-19T13:47:28.466Z,48,9,9,7,9,9,5,,2024-05-13T07:52:28.779Z,True,Applied,"[""HTML/CSS"",""JavaScript"",""Other (please specif...",,,,,,,,,,,,Bootcamp/Intensive Training: I have completed ...,,True,IITSA acredited certificate in React,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Project Managem...",,,[null],,,0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,,True,I have my own computer,,,isiZulu,,,,3.0,,Seeking employment,,"I can read, write and speak English easily",English,Other,,,,,,,mLab Codetribe academy,IITPSA acredited certificate in React javascript,"[""Full Stack Web Development"",""Project Managem...",,,32,26-35,3,Applied
4,76156,1132,Woman,African/Black,South African,South Africa,2024-05-13T14:32:21.448Z,2025-08-03T08:55:31.590Z,57,9,10,9,10,10,9,,2025-08-03T09:12:32.861Z,True,Applied,"[""HTML/CSS"",""Java"",""Python""]",,,,,,,,,,,,Some Exposure: I have taken a few introductory...,,True,FNB APP ACADEMY CERTIFICATE,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",,,[null],,,0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,,True,I have my own computer,,,isiZulu,,,,3.0,,Studying,,"I can read, write and speak English easily",English,High School,Information Technology - Computer Science,True,,,University of Kwazulu natal westville campus,"Bachelor’s Degree (e.g., BA, BSc, BCom)",Umzikazi secondary school,i have no qualification yet,"[""Full Stack Web Development""]",,,21,18-25,2,Applied


In [855]:
applications_df.shape

(599, 81)

Initial number of `applications_df` rows is `599`, and `applications_df` columns is `81`.

In [856]:
is_column_values_unique(applications_df, "Application ID")

In [857]:
initial_rows_number = applications_df.shape[0]
initial_columns_number = applications_df.shape[1]

In [858]:
applications_df.duplicated(subset="Application ID").sum()

np.int64(0)

There are no duplicate rows.

In [859]:
total_columns, columns_with_missing_values = get_total_columns_missing_values(
    applications_df
)

In [860]:
empty_columns = []

for column_name in columns_with_missing_values:
    if "is empty." in column_name:
        empty_columns.append(column_name)

f"There are {len(empty_columns)} empty columns:{empty_columns}"

'There are 13 empty columns:["\'Has Taken Aptitude Test\' is empty.", "\'has_completed_application\' is empty.", "\'application_accepted\' is empty.", "\'is_south_african_citizen\' is empty.", "\'learner_referral\' is empty.", "\'other_specify_internet_access_type\' is empty.", "\'other_programming_language_specify\' is empty.", "\'has_criminal_record\' is empty.", "\'nearest_city\' is empty.", "\'is_currently_employed\' is empty.", "\'other_race_specify\' is empty.", "\'asresidential_area_type\' is empty.", "\'career_stream.1\' is empty."]'

Out of 48 columns that have missing values, 13 are empty.

## Manipulate data: 

In [861]:
applications_df = applications_df.dropna(axis=1, how="all")

In [862]:
applications_df.shape

(599, 68)

In [863]:
total_columns, columns_with_missing_values = get_total_columns_missing_values(
    applications_df
)
empty_columns = []

for column_name in columns_with_missing_values:
    if "is empty." in column_name:
        empty_columns.append(column_name)

f"There are {len(empty_columns)} empty columns:{empty_columns}"

'There are 0 empty columns:[]'

In [864]:
applications_df.head()

Unnamed: 0,Application ID,Learner ID,Gender,Race,Nationality,Country of residence,Registration date,Application date,Total score,Literacy score,Logic score,Sequence score,Problem solving score,Numeracy score,Statistics score,Aptitude test date attempted,Has Coding experience,Status,Programming language experince,Code Challenge plagirism,Code challenge time taken(min),Coding challenge completed,Coding challenge score,Code challenge multiple choice score,Code challenge final score,Code challenge cheat tab leaving (no.),Code challenge cheat plagiarism,Code challenge cheat pasted code,Code challenge cheat suspicious activity,Code challenge cheat ai usage,Coding experience description,Other programming language experince specify,Learner holds coding certs,Learner coding certs list,Learner program,Learner career streams,pathways,how_far_away_from_city,internet_access_type,residential_area_type,internet_data_access_type,other_specify_residential_area_type,learner_comfortable_disclosing_income,income_currency,learner_monthly_income,has_yoma_platform,computer_access_type,home_language,has_disability_or_differently_abled,is_refugee,preferred_language_of_instruction,employment_type,english_proficiency,second_language,highest_education_level,field_of_current_education,is_currently_completing_education,ever_regisitered_for_yes4youth,is_umuzi_alumnus,current_institution_study,currently_studying,institution_highest_education,qualification_highest_education,career_stream,age_formula,age_range,age_range_order,application_status
0,41581,572,Man,African/Black,South African,South Africa,2024-05-11T13:27:10.490Z,2025-03-22T12:03:12.669Z,48,8,9,7,7,7,10,2025-03-30T19:03:46.734Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",[null],6-20km,Mobile phone (using mobile data),Urban,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer,Sepedi,,,3.0,Seeking employment,"I can read, write and speak English easily",English,High School,,,,,,,Pretoria Technical High School,Diploma,"[""Full Stack Web Development""]",27,26-35,3,Applied
1,73815,857,Woman,African/Black,South African,South Africa,2024-05-12T18:55:56.670Z,2025-07-21T07:32:00.689Z,51,9,8,9,9,8,8,2024-05-21T21:04:48.489Z,True,Applied,"[""C++"",""HTML/CSS"",""JavaScript"",""Python""]",,,,,,,,,,,,Self-Taught with Limited Projects: I am primar...,,False,,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Software Engine...",[null],6-20km,"Home broadband or Wifi (e.g., ADSL, fiber)",Township,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer but it is faulty,siSwati,,,3.0,Seeking employment,"I can read, write and speak English easily",English,High School,,,,,,,Sybrand van Niekerk High School,National Senior Certificate,"[""Full Stack Web Development"",""Software Engine...",26,26-35,3,Applied
2,46124,908,Man,African/Black,South African,South Africa,2024-05-13T03:47:02.685Z,2025-04-15T22:45:48.117Z,43,9,7,6,3,9,9,2025-04-16T11:19:07.408Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Data Science "",""Full Stack Web Development"",...",[null],6-20km,Mobile phone (using mobile data),Township,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer,siSwati,,,3.0,Studying,"I can read, write and speak English easily",English,High School,,,,,,,Thembeka secondary school,beng,"[""Data Science "",""Full Stack Web Development"",...",30,26-35,3,Applied
3,254193,928,Man,African/Black,South African,South Africa,2024-05-13T06:12:08.009Z,2025-09-19T13:47:28.466Z,48,9,9,7,9,9,5,2024-05-13T07:52:28.779Z,True,Applied,"[""HTML/CSS"",""JavaScript"",""Other (please specif...",,,,,,,,,,,,Bootcamp/Intensive Training: I have completed ...,,True,IITSA acredited certificate in React,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Project Managem...",[null],0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,True,I have my own computer,isiZulu,,,3.0,Seeking employment,"I can read, write and speak English easily",English,Other,,,,,,,mLab Codetribe academy,IITPSA acredited certificate in React javascript,"[""Full Stack Web Development"",""Project Managem...",32,26-35,3,Applied
4,76156,1132,Woman,African/Black,South African,South Africa,2024-05-13T14:32:21.448Z,2025-08-03T08:55:31.590Z,57,9,10,9,10,10,9,2025-08-03T09:12:32.861Z,True,Applied,"[""HTML/CSS"",""Java"",""Python""]",,,,,,,,,,,,Some Exposure: I have taken a few introductory...,,True,FNB APP ACADEMY CERTIFICATE,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",[null],0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,True,I have my own computer,isiZulu,,,3.0,Studying,"I can read, write and speak English easily",English,High School,Information Technology - Computer Science,True,,,University of Kwazulu natal westville campus,"Bachelor’s Degree (e.g., BA, BSc, BCom)",Umzikazi secondary school,i have no qualification yet,"[""Full Stack Web Development""]",21,18-25,2,Applied


In [865]:
lower_columns = []

for column_name in applications_df.columns:
    if column_name == "Total score":
        column_name = "Aptitude Total Score"
    elif column_name == "Programming language experince":
        column_name = "Programming language experience"
    elif column_name == "Other programming language experince specify":
        column_name = "other_programming_language_experience_specify"

    lower_columns.append(column_name.lower().replace(" ", "_"))

applications_df.columns = lower_columns
applications_df.head(0)

Unnamed: 0,application_id,learner_id,gender,race,nationality,country_of_residence,registration_date,application_date,aptitude_total_score,literacy_score,logic_score,sequence_score,problem_solving_score,numeracy_score,statistics_score,aptitude_test_date_attempted,has_coding_experience,status,programming_language_experience,code_challenge_plagirism,code_challenge_time_taken(min),coding_challenge_completed,coding_challenge_score,code_challenge_multiple_choice_score,code_challenge_final_score,code_challenge_cheat_tab_leaving_(no.),code_challenge_cheat_plagiarism,code_challenge_cheat_pasted_code,code_challenge_cheat_suspicious_activity,code_challenge_cheat_ai_usage,coding_experience_description,other_programming_language_experience_specify,learner_holds_coding_certs,learner_coding_certs_list,learner_program,learner_career_streams,pathways,how_far_away_from_city,internet_access_type,residential_area_type,internet_data_access_type,other_specify_residential_area_type,learner_comfortable_disclosing_income,income_currency,learner_monthly_income,has_yoma_platform,computer_access_type,home_language,has_disability_or_differently_abled,is_refugee,preferred_language_of_instruction,employment_type,english_proficiency,second_language,highest_education_level,field_of_current_education,is_currently_completing_education,ever_regisitered_for_yes4youth,is_umuzi_alumnus,current_institution_study,currently_studying,institution_highest_education,qualification_highest_education,career_stream,age_formula,age_range,age_range_order,application_status


In [866]:
applications_df.columns.duplicated().sum()

np.int64(0)

In [867]:
mandatory_columns = (
    "application_id",
    "learner_id",
    "gender",
    "race",
    "nationality",
    "country_of_residence",
    "registration_date",
    "application_date",
    "aptitude_total_score",
    "qualification_highest_education",
    "currently_studying",
    "field_of_current_education",
    "highest_education_level",
    "programming_language_experience",
    "other_programming_language_experience_specify",
    "has_coding_experience",
    "numeracy_score",
    "logic_score",
    "problem_solving_score",
    "sequence_score",
    "literacy_score",
    "statistics_score",
    "learner_holds_coding_certs",
    "learner_coding_certs_list",
    "learner_program",
    "learner_career_streams",
    "career_stream",
    "age_formula",
    "age_range",
    "age_range_order",
    "employment_type",
    "english_proficiency",
    "computer_access_type",
)

In [868]:
non_mandatory = []

for column_name in applications_df.columns.to_list():
    if str(column_name) not in mandatory_columns:
        non_mandatory.append(column_name)

non_mandatory

['aptitude_test_date_attempted',
 'status',
 'code_challenge_plagirism',
 'code_challenge_time_taken(min)',
 'coding_challenge_completed',
 'coding_challenge_score',
 'code_challenge_multiple_choice_score',
 'code_challenge_final_score',
 'code_challenge_cheat_tab_leaving_(no.)',
 'code_challenge_cheat_plagiarism',
 'code_challenge_cheat_pasted_code',
 'code_challenge_cheat_suspicious_activity',
 'code_challenge_cheat_ai_usage',
 'coding_experience_description',
 'pathways',
 'how_far_away_from_city',
 'internet_access_type',
 'residential_area_type',
 'internet_data_access_type',
 'other_specify_residential_area_type',
 'learner_comfortable_disclosing_income',
 'income_currency',
 'learner_monthly_income',
 'has_yoma_platform',
 'home_language',
 'has_disability_or_differently_abled',
 'is_refugee',
 'preferred_language_of_instruction',
 'second_language',
 'is_currently_completing_education',
 'ever_regisitered_for_yes4youth',
 'is_umuzi_alumnus',
 'current_institution_study',
 'inst

In [869]:
applications_df.head()

Unnamed: 0,application_id,learner_id,gender,race,nationality,country_of_residence,registration_date,application_date,aptitude_total_score,literacy_score,logic_score,sequence_score,problem_solving_score,numeracy_score,statistics_score,aptitude_test_date_attempted,has_coding_experience,status,programming_language_experience,code_challenge_plagirism,code_challenge_time_taken(min),coding_challenge_completed,coding_challenge_score,code_challenge_multiple_choice_score,code_challenge_final_score,code_challenge_cheat_tab_leaving_(no.),code_challenge_cheat_plagiarism,code_challenge_cheat_pasted_code,code_challenge_cheat_suspicious_activity,code_challenge_cheat_ai_usage,coding_experience_description,other_programming_language_experience_specify,learner_holds_coding_certs,learner_coding_certs_list,learner_program,learner_career_streams,pathways,how_far_away_from_city,internet_access_type,residential_area_type,internet_data_access_type,other_specify_residential_area_type,learner_comfortable_disclosing_income,income_currency,learner_monthly_income,has_yoma_platform,computer_access_type,home_language,has_disability_or_differently_abled,is_refugee,preferred_language_of_instruction,employment_type,english_proficiency,second_language,highest_education_level,field_of_current_education,is_currently_completing_education,ever_regisitered_for_yes4youth,is_umuzi_alumnus,current_institution_study,currently_studying,institution_highest_education,qualification_highest_education,career_stream,age_formula,age_range,age_range_order,application_status
0,41581,572,Man,African/Black,South African,South Africa,2024-05-11T13:27:10.490Z,2025-03-22T12:03:12.669Z,48,8,9,7,7,7,10,2025-03-30T19:03:46.734Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",[null],6-20km,Mobile phone (using mobile data),Urban,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer,Sepedi,,,3.0,Seeking employment,"I can read, write and speak English easily",English,High School,,,,,,,Pretoria Technical High School,Diploma,"[""Full Stack Web Development""]",27,26-35,3,Applied
1,73815,857,Woman,African/Black,South African,South Africa,2024-05-12T18:55:56.670Z,2025-07-21T07:32:00.689Z,51,9,8,9,9,8,8,2024-05-21T21:04:48.489Z,True,Applied,"[""C++"",""HTML/CSS"",""JavaScript"",""Python""]",,,,,,,,,,,,Self-Taught with Limited Projects: I am primar...,,False,,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Software Engine...",[null],6-20km,"Home broadband or Wifi (e.g., ADSL, fiber)",Township,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer but it is faulty,siSwati,,,3.0,Seeking employment,"I can read, write and speak English easily",English,High School,,,,,,,Sybrand van Niekerk High School,National Senior Certificate,"[""Full Stack Web Development"",""Software Engine...",26,26-35,3,Applied
2,46124,908,Man,African/Black,South African,South Africa,2024-05-13T03:47:02.685Z,2025-04-15T22:45:48.117Z,43,9,7,6,3,9,9,2025-04-16T11:19:07.408Z,False,Applied,[null],,,,,,,,,,,,,,,,Umuzi Programmes Application Form,"[""Data Science "",""Full Stack Web Development"",...",[null],6-20km,Mobile phone (using mobile data),Township,"I have reliable internet (wifi/data), more tha...",,,,,,I have my own computer,siSwati,,,3.0,Studying,"I can read, write and speak English easily",English,High School,,,,,,,Thembeka secondary school,beng,"[""Data Science "",""Full Stack Web Development"",...",30,26-35,3,Applied
3,254193,928,Man,African/Black,South African,South Africa,2024-05-13T06:12:08.009Z,2025-09-19T13:47:28.466Z,48,9,9,7,9,9,5,2024-05-13T07:52:28.779Z,True,Applied,"[""HTML/CSS"",""JavaScript"",""Other (please specif...",,,,,,,,,,,,Bootcamp/Intensive Training: I have completed ...,,True,IITSA acredited certificate in React,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Project Managem...",[null],0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,True,I have my own computer,isiZulu,,,3.0,Seeking employment,"I can read, write and speak English easily",English,Other,,,,,,,mLab Codetribe academy,IITPSA acredited certificate in React javascript,"[""Full Stack Web Development"",""Project Managem...",32,26-35,3,Applied
4,76156,1132,Woman,African/Black,South African,South Africa,2024-05-13T14:32:21.448Z,2025-08-03T08:55:31.590Z,57,9,10,9,10,10,9,2025-08-03T09:12:32.861Z,True,Applied,"[""HTML/CSS"",""Java"",""Python""]",,,,,,,,,,,,Some Exposure: I have taken a few introductory...,,True,FNB APP ACADEMY CERTIFICATE,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",[null],0-5km,"Home broadband or Wifi (e.g., ADSL, fiber)",Urban,"I have reliable internet (wifi/data), more tha...",,,,,True,I have my own computer,isiZulu,,,3.0,Studying,"I can read, write and speak English easily",English,High School,Information Technology - Computer Science,True,,,University of Kwazulu natal westville campus,"Bachelor’s Degree (e.g., BA, BSc, BCom)",Umzikazi secondary school,i have no qualification yet,"[""Full Stack Web Development""]",21,18-25,2,Applied


In [870]:
not_applied = []

for row in applications_df["status"]:
    if row != "Applied":
        not_applied.append(row)

not_applied

[]

In [871]:
not_null = []

for row_number, row_item in enumerate(applications_df["english_proficiency"]):
    if row_item != "I can read, write and speak English easily":
        not_null.append({row_number: row_item})

not_null

[{20: 'I can read, write and speak English but it isnt easy'},
 {112: nan},
 {160: 'I can read, write and speak English but it isnt easy'},
 {193: 'I can read, write and speak English but it isnt easy'},
 {196: 'I can read, write and speak English but it isnt easy'},
 {200: 'I can read, write and speak English but it isnt easy'},
 {320: 'I struggle to read, write and speak english'},
 {323: 'I can read, write and speak English but it isnt easy'},
 {330: 'I can read, write and speak English but it isnt easy'},
 {341: 'I can read, write and speak English but it isnt easy'},
 {352: 'I can read, write and speak English but it isnt easy'},
 {444: 'I can read, write and speak English but it isnt easy'},
 {451: 'I can read, write and speak English but it isnt easy'},
 {497: 'I can read, write and speak English but its difficult'},
 {564: 'I can read, write and speak English but it isnt easy'},
 {574: 'I can read, write and speak English but it isnt easy'},
 {588: 'I can read, write and speak 

In [872]:
for column_name in non_mandatory:
    applications_df = applications_df.drop(columns=[column_name])

applications_df.head(5)

Unnamed: 0,application_id,learner_id,gender,race,nationality,country_of_residence,registration_date,application_date,aptitude_total_score,literacy_score,logic_score,sequence_score,problem_solving_score,numeracy_score,statistics_score,has_coding_experience,programming_language_experience,other_programming_language_experience_specify,learner_holds_coding_certs,learner_coding_certs_list,learner_program,learner_career_streams,computer_access_type,employment_type,english_proficiency,highest_education_level,field_of_current_education,currently_studying,qualification_highest_education,career_stream,age_formula,age_range,age_range_order
0,41581,572,Man,African/Black,South African,South Africa,2024-05-11T13:27:10.490Z,2025-03-22T12:03:12.669Z,48,8,9,7,7,7,10,False,[null],,,,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",I have my own computer,Seeking employment,"I can read, write and speak English easily",High School,,,Diploma,"[""Full Stack Web Development""]",27,26-35,3
1,73815,857,Woman,African/Black,South African,South Africa,2024-05-12T18:55:56.670Z,2025-07-21T07:32:00.689Z,51,9,8,9,9,8,8,True,"[""C++"",""HTML/CSS"",""JavaScript"",""Python""]",,False,,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Software Engine...",I have my own computer but it is faulty,Seeking employment,"I can read, write and speak English easily",High School,,,National Senior Certificate,"[""Full Stack Web Development"",""Software Engine...",26,26-35,3
2,46124,908,Man,African/Black,South African,South Africa,2024-05-13T03:47:02.685Z,2025-04-15T22:45:48.117Z,43,9,7,6,3,9,9,False,[null],,,,Umuzi Programmes Application Form,"[""Data Science "",""Full Stack Web Development"",...",I have my own computer,Studying,"I can read, write and speak English easily",High School,,,beng,"[""Data Science "",""Full Stack Web Development"",...",30,26-35,3
3,254193,928,Man,African/Black,South African,South Africa,2024-05-13T06:12:08.009Z,2025-09-19T13:47:28.466Z,48,9,9,7,9,9,5,True,"[""HTML/CSS"",""JavaScript"",""Other (please specif...",,True,IITSA acredited certificate in React,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Project Managem...",I have my own computer,Seeking employment,"I can read, write and speak English easily",Other,,,IITPSA acredited certificate in React javascript,"[""Full Stack Web Development"",""Project Managem...",32,26-35,3
4,76156,1132,Woman,African/Black,South African,South Africa,2024-05-13T14:32:21.448Z,2025-08-03T08:55:31.590Z,57,9,10,9,10,10,9,True,"[""HTML/CSS"",""Java"",""Python""]",,True,FNB APP ACADEMY CERTIFICATE,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",I have my own computer,Studying,"I can read, write and speak English easily",High School,Information Technology - Computer Science,"Bachelor’s Degree (e.g., BA, BSc, BCom)",i have no qualification yet,"[""Full Stack Web Development""]",21,18-25,2


In [873]:
applications_df.shape

(599, 33)

In [874]:
valid_field_and_level_df = applications_df[
    [
        "application_id",
        "learner_id",
        "learner_holds_coding_certs",
        "learner_coding_certs_list",
        "highest_education_level",
        "qualification_highest_education",
    ]
]

In [875]:
requirements_metric = {
    "requirements": ["Valid field", "Valid level", "Valid aptitude score"],
    "points": [1, 1, 1],
}

requirements_metric_df = pd.DataFrame(requirements_metric)
requirements_metric_df.head()

Unnamed: 0,requirements,points
0,Valid field,1
1,Valid level,1
2,Valid aptitude score,1


In [876]:
valid_fields = [
    "Data",
    "math",
    "analytical",
    "disciplines",
    "data science",
    "data analytics",
    "data mining",
    "business analytics",
    "information systems",
    "information science",
    "statistics",
    "mathematics",
    "applied mathematics",
    "quantitative",
    "actuarial science",
    "econometrics",
    "economics",
    "finance",
    "management accounting",
    "computer science",
    "software engineering",
    "IT",
    "information technology",
    "ICT",
    "informatics",
    "web development",
    "programming",
    "systems development",
    "engineering",
    "computer engineering",
    "electrical engineering",
    "mechanical engineering",
    "science",
    "physical science",
    "Life science",
    "physics",
    "biology",
    "chemistry",
    "biochemistry",
    "biomedicine",
    "environmental science",
    "machine learning",
    "AI",
    "big data",
    "bioinformatics",
    "cybersecurity",
    "DevOps",
]

valid_fields = [field.lower() for field in valid_fields]
len(valid_fields)

47

In [877]:
valid_levels = (
    "bachelor",
    "bsc",
    "bcom",
    "honours",
    "postgraduate diploma",
    "advanced diploma",
    "pgd",
    "master's",
    "phd",
    "doctorate",
    "mba",
    "diploma",
    "higher certificate",
    "certificate",
    "short course",
    "online certification",
    "university undergraduate",
    "nqf level 5",
    "nqf level 6",
    "nqf level 7",
    "nqf level 8",
    "nqf 5",
    "nqf 6",
    "nqf 7",
    "nqf 8",
)
valid_levels

('bachelor',
 'bsc',
 'bcom',
 'honours',
 'postgraduate diploma',
 'advanced diploma',
 'pgd',
 "master's",
 'phd',
 'doctorate',
 'mba',
 'diploma',
 'higher certificate',
 'certificate',
 'short course',
 'online certification',
 'university undergraduate',
 'nqf level 5',
 'nqf level 6',
 'nqf level 7',
 'nqf level 8',
 'nqf 5',
 'nqf 6',
 'nqf 7',
 'nqf 8')

In [878]:
valid_field_and_level_df.head()

Unnamed: 0,application_id,learner_id,learner_holds_coding_certs,learner_coding_certs_list,highest_education_level,qualification_highest_education
0,41581,572,,,High School,Diploma
1,73815,857,False,,High School,National Senior Certificate
2,46124,908,,,High School,beng
3,254193,928,True,IITSA acredited certificate in React,Other,IITPSA acredited certificate in React javascript
4,76156,1132,True,FNB APP ACADEMY CERTIFICATE,High School,i have no qualification yet


In [879]:
eligibility_columns = {
    "has_valid_field": [],
    "has_valid_level": [],
    "achieved_minimum_aptitude_score": [],
}

for row_number, qualification in enumerate(
    valid_field_and_level_df["qualification_highest_education"]
):
    qualification = str(qualification).lower()
    for field in valid_fields:
        previous_length = len(eligibility_columns["has_valid_field"])
        if field in qualification:
            eligibility_columns["has_valid_field"].append(("True", "Eligible field"))
            break

    if len(eligibility_columns["has_valid_field"]) == previous_length:
        eligibility_columns["has_valid_field"].append(
            ("False", f"Invalid field: {qualification}")
        )

for row_number, qualification in enumerate(
    valid_field_and_level_df["learner_coding_certs_list"]
):
    qualification = str(qualification).lower()
    for field in valid_fields:
        if field in qualification:
            eligibility_columns["has_valid_field"] = ("True", "Eligible field")
            break

    if "False" in eligibility_columns["has_valid_field"][row_number]:
        status, reason = eligibility_columns["has_valid_field"][row_number]
        eligibility_columns["has_valid_field"][row_number] = (
            status,
            reason,
            f"Invalid field: {qualification}",
        )


for row_number, qualification in enumerate(
    valid_field_and_level_df["qualification_highest_education"]
):
    qualification = str(qualification).lower()
    for level in valid_levels:
        previous_length = len(eligibility_columns["has_valid_level"])
        if level in qualification:
            eligibility_columns["has_valid_level"].append(("True", "Eligible level"))
            break

    if len(eligibility_columns["has_valid_level"]) == previous_length:
        eligibility_columns["has_valid_level"].append(
            ("False", f"Invalid level: {qualification}")
        )

for row_number, applicant_level in enumerate(
    valid_field_and_level_df["learner_coding_certs_list"]
):
    applicant_level = str(applicant_level).lower()
    for level in valid_levels:
        if level in applicant_level:
            eligibility_columns["has_valid_level"][row_number] = (
                "True",
                "Eligible level",
            )
            break
    if "False" in eligibility_columns["has_valid_level"][row_number]:
        status, reason = eligibility_columns["has_valid_level"][row_number]
        eligibility_columns["has_valid_level"][row_number] = (
            status,
            reason,
            f"Invalid level: {applicant_level}",
        )

for row_number, applicant_level in enumerate(
    valid_field_and_level_df["highest_education_level"]
):
    applicant_level = str(applicant_level).lower()
    for level in valid_levels:
        if level in applicant_level:
            eligibility_columns["has_valid_level"][row_number] = (
                "True",
                "Eligible level",
            )
            break
    if "False" in eligibility_columns["has_valid_level"][row_number]:
        status, reason_1, reason_2 = eligibility_columns["has_valid_level"][row_number]
        eligibility_columns["has_valid_level"][row_number] = (
            status,
            reason_1,
            reason_2,
            f"Invalid level: {applicant_level}",
        )

# len(list(eligibility_reasons.values())[0])
# eligibility_reasons
len(list(eligibility_columns.values())[0])
# len(valid_field_and_level_df["qualification_highest_education"])
# eligibility_columns

# "bsc" in valid_levels

IndexError: tuple index out of range

In [None]:
applications_df.head()

Unnamed: 0,application_id,learner_id,gender,race,nationality,country_of_residence,registration_date,application_date,aptitude_total_score,literacy_score,logic_score,sequence_score,problem_solving_score,numeracy_score,statistics_score,has_coding_experience,programming_language_experience,other_programming_language_experience_specify,learner_holds_coding_certs,learner_coding_certs_list,learner_program,learner_career_streams,computer_access_type,employment_type,english_proficiency,highest_education_level,field_of_current_education,currently_studying,qualification_highest_education,career_stream,age_formula,age_range,age_range_order
0,41581,572,Man,African/Black,South African,South Africa,2024-05-11T13:27:10.490Z,2025-03-22T12:03:12.669Z,48,8,9,7,7,7,10,False,[null],,,,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",I have my own computer,Seeking employment,"I can read, write and speak English easily",High School,,,Diploma,"[""Full Stack Web Development""]",27,26-35,3
1,73815,857,Woman,African/Black,South African,South Africa,2024-05-12T18:55:56.670Z,2025-07-21T07:32:00.689Z,51,9,8,9,9,8,8,True,"[""C++"",""HTML/CSS"",""JavaScript"",""Python""]",,False,,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Software Engine...",I have my own computer but it is faulty,Seeking employment,"I can read, write and speak English easily",High School,,,National Senior Certificate,"[""Full Stack Web Development"",""Software Engine...",26,26-35,3
2,46124,908,Man,African/Black,South African,South Africa,2024-05-13T03:47:02.685Z,2025-04-15T22:45:48.117Z,43,9,7,6,3,9,9,False,[null],,,,Umuzi Programmes Application Form,"[""Data Science "",""Full Stack Web Development"",...",I have my own computer,Studying,"I can read, write and speak English easily",High School,,,beng,"[""Data Science "",""Full Stack Web Development"",...",30,26-35,3
3,254193,928,Man,African/Black,South African,South Africa,2024-05-13T06:12:08.009Z,2025-09-19T13:47:28.466Z,48,9,9,7,9,9,5,True,"[""HTML/CSS"",""JavaScript"",""Other (please specif...",,True,IITSA acredited certificate in React,Umuzi Programmes Application Form,"[""Full Stack Web Development"",""Project Managem...",I have my own computer,Seeking employment,"I can read, write and speak English easily",Other,,,IITPSA acredited certificate in React javascript,"[""Full Stack Web Development"",""Project Managem...",32,26-35,3
4,76156,1132,Woman,African/Black,South African,South Africa,2024-05-13T14:32:21.448Z,2025-08-03T08:55:31.590Z,57,9,10,9,10,10,9,True,"[""HTML/CSS"",""Java"",""Python""]",,True,FNB APP ACADEMY CERTIFICATE,Umuzi Programmes Application Form,"[""Full Stack Web Development""]",I have my own computer,Studying,"I can read, write and speak English easily",High School,Information Technology - Computer Science,"Bachelor’s Degree (e.g., BA, BSc, BCom)",i have no qualification yet,"[""Full Stack Web Development""]",21,18-25,2


In [None]:
aptitude_df = applications_df[
    [
        "application_id",
        "learner_id",
        "literacy_score",
        "logic_score",
        "sequence_score",
        "problem_solving_score",
        "numeracy_score",
        "statistics_score",
        "aptitude_total_score",
    ]
]
aptitude_df

Unnamed: 0,application_id,learner_id,literacy_score,logic_score,sequence_score,problem_solving_score,numeracy_score,statistics_score,aptitude_total_score
0,41581,572,8,9,7,7,7,10,48
1,73815,857,9,8,9,9,8,8,51
2,46124,908,9,7,6,3,9,9,43
3,254193,928,9,9,7,9,9,5,48
4,76156,1132,9,10,9,10,10,9,57
...,...,...,...,...,...,...,...,...,...
594,58925,73115,7,8,6,8,7,9,45
595,58555,73225,7,8,7,6,8,8,44
596,79130,73590,8,7,7,8,9,7,46
597,45413,74028,9,9,8,9,10,8,53


In [None]:
score_thresholds = {
    "literacy_score": 5,
    "statistics_score": 5,
    "numeracy_score": 6,
    "logic_score": 6,
    "problem_solving_score": 6,
    "sequence_score": 6,
}
# Initialize status list
aptitude_status = eligibility_columns["achieved_minimum_aptitude_score"]

# Iterate through each row
for row_index, row in aptitude_df.iterrows():
    passed_aptitude = "False"

    for column_name, threshold in score_thresholds.items():
        if column_name in row and row[column_name] < threshold:
            passed_aptitude = ("False", f"Low {column_name}")
            break
        else:
            passed_aptitude = ("True", "Eligible score")

    aptitude_status.append(passed_aptitude)

In [None]:
# application_results_df = applications_df[["application_id", "learner_id"]]

met_requirements_points = {"met_requirements_points": []}

for row_index in range(len(list(eligibility_columns.values())[0])):
    met_requirements_points["met_requirements_points"].append(0)

for eligibility_column, status_list in eligibility_columns.items():
    for row_number, status in enumerate(status_list):
        if "True" in status:
            met_requirements_points["met_requirements_points"][row_number] += 1

In [None]:
eligibility_conditions = {"eligibility_conditions": []}

for points in met_requirements_points.values():
    for point in points:
        if point == 0:
            eligibility_conditions["eligibility_conditions"].append("3 missing issues")
        elif point == 1:
            eligibility_conditions["eligibility_conditions"].append("2 missing issues")
        elif point == 2:
            eligibility_conditions["eligibility_conditions"].append("1 missing issue")
        elif point == 3:
            eligibility_conditions["eligibility_conditions"].append("No missing issues")
eligibility_conditions

{'eligibility_conditions': ['1 missing issue',
  '1 missing issue',
  '3 missing issues',
  'No missing issues',
  '1 missing issue',
  '1 missing issue',
  'No missing issues',
  'No missing issues',
  'No missing issues',
  '2 missing issues',
  'No missing issues',
  '2 missing issues',
  '2 missing issues',
  '2 missing issues',
  'No missing issues',
  '1 missing issue',
  'No missing issues',
  '2 missing issues',
  '2 missing issues',
  'No missing issues',
  '2 missing issues',
  '2 missing issues',
  '1 missing issue',
  '1 missing issue',
  'No missing issues',
  '2 missing issues',
  '2 missing issues',
  '2 missing issues',
  '1 missing issue',
  '1 missing issue',
  'No missing issues',
  '2 missing issues',
  'No missing issues',
  '2 missing issues',
  '2 missing issues',
  '1 missing issue',
  '2 missing issues',
  '1 missing issue',
  '2 missing issues',
  '2 missing issues',
  '1 missing issue',
  '2 missing issues',
  '2 missing issues',
  '1 missing issue',
  '2 mis

In [None]:
eligibility_outcomes = {"eligibility_outcomes": []}

for evaluations in eligibility_conditions.values():
    for evaluation in evaluations:
        if evaluation == "No missing issues":
            eligibility_outcomes["eligibility_outcomes"].append("Eligible")
        elif evaluation in ["1 missing issue", "2 missing issues"]:
            eligibility_outcomes["eligibility_outcomes"].append("Borderline")
        elif evaluation == "3 missing issues":
            eligibility_outcomes["eligibility_outcomes"].append("Not eligible")

In [None]:
eligibility_reasons = {"notes": []}

for eligibility_column, status_list in eligibility_columns.items():
    for row_number, status_and_reasons in enumerate(status_list):
        status, *reasons = status_and_reasons
        note = ""
        for reason in reasons:
            note += reason + ", "
        if note.endswith(", "):
            note = note.rstrip(", ")
        if len(eligibility_reasons["notes"]) < 599:
            eligibility_reasons["notes"].append(note)
        else:
            eligibility_reasons["notes"][row_number] += ", " + note

eligibility_reasons["notes"][2]

'Invalid field: beng, Invalid level: beng, Invalid level: nan, Invalid level: high school, Low problem_solving_score'

In [None]:
application_results_df = applications_df[["application_id", "learner_id"]]
additional_columns = [
    met_requirements_points,
    eligibility_conditions,
    eligibility_outcomes,
    eligibility_reasons,
]

for column in additional_columns:
    application_results_df = pd.concat(
        [application_results_df, pd.DataFrame(column)], axis=1
    )

application_results_df.head()

Unnamed: 0,application_id,learner_id,met_requirements_points,eligibility_conditions,eligibility_outcomes,notes
0,41581,572,2,1 missing issue,Borderline,"Invalid field: diploma, Eligible level, Eligib..."
1,73815,857,2,1 missing issue,Borderline,"Invalid field: national senior certificate, El..."
2,46124,908,0,3 missing issues,Not eligible,"Invalid field: beng, Invalid level: beng, Inva..."
3,254193,928,3,No missing issues,Eligible,"Eligible field, Eligible level, Eligible score"
4,76156,1132,2,1 missing issue,Borderline,"Invalid field: i have no qualification yet, El..."
