# Data Analysis of UPV Dataset

## 1. Introduce environmental variables

In [3]:
# prepare environmental variables for the project
WORKING_DIRECTORY <- "C:\\Users\\User\\Desktop\\Studia\\ERASMUS\\DATA_ANALYSIS\\"

DATASET_PATH <- "resources\\"
COURSES <- "students_courses.csv"
ENROLMENTS <- "students_enrolment.csv"
YEARS <- "students_years.csv"

OUTPUT <- "output\\"

## 2. Load the datasets

The provided data has been presented in the form of csv file. What is worth noticing - there has been used a ";" separator insted of "," one.


In [4]:
# Load the data 
courses_df <-  read.csv(paste(WORKING_DIRECTORY, DATASET_PATH, COURSES, sep='') ,header=TRUE, sep=';')
enrolments_df <- read.csv(paste(WORKING_DIRECTORY, DATASET_PATH, ENROLMENTS, sep='') ,header=TRUE, sep=';')
years_df <- read.csv(paste(WORKING_DIRECTORY, DATASET_PATH, YEARS, sep='') ,header=TRUE, sep=';')

In [7]:
# preview the data
head(enrolments_df, 3)
head(courses_df, 3)
head(years_df, 3)

Stud_ID,Degree_ID,Start_Year,Enrolment_Type,Admission_Grade,Prev_Studies,Entry_Type,Start_Age,Sex,Father_Edu_Lev,Mother_Edu_Lev,Numpre,NA,NA.1,NA.2
439444,156,2010,NAI,61,J,10,G,,19,V,5,5,6,8
466072,156,2010,NAP,602,S,10,G,,25,V,4,4,0,1
354734,156,2010,NAP,69,J,10,G,,22,V,5,5,0,1


NIP,TIT,ASI,CACA,GLM,CTOT,BLO,TPBLO,COND_ASI,ACTA,NOTA,CAL,TIPCRE
12428,156,11553,2014,RE1,6,3,B,,S,71,N,B
12428,156,11567,2014,RE1,45,3,G,,S,81,N,O
12428,156,11572,2014,RE1,45,4,G,,S,8,N,O


NIP,TIT,CACA,TRABAJO_A,OBTIENE_BECA,ESTADO,CRE_MAT,CRE_SUP,CRE_SUS,CRE_NP,CRE_MAT_CN,CRE_SUP_CN,CRE_SUS_CN,CRE_NP_CN
504943,158,2012,,-,,255,255,0,0,255,255,0,0
504911,158,2012,,-,,255,195,6,0,255,195,6,0
504898,158,2012,,-,,285,195,0,9,285,195,0,9


## 3. Translate the headers
As the data has been privided in spanish, some translations are required in order for the avarege user to understand it. Such translations has been applied:
NT == "No Transtlation"  
    

### 3.1 Students_enrolment.csv
For each student enrolling in one of the selected four degrees, the file includes demographic 
information, admission grade, previous studies, parents’ educational level, etc. It is important 
to point out that it refers only to the first registration in a given degree.

NIP --> Stud_ID - a unique identification number for the student

TIT --> Degree_ID - identification code for the degree on which the student was registered

- 156: Bachelor's Degree in Informatics Engineering (School of Informatics, Valencia)

- 158: Bachelor's Degree in Business Administration and Management (Faculty of Business Administration and Management, Valencia)

- 182: Bachelor's Double Degree in Business Administration and Management + Informatics Engineering (Valencia)

- 189: Bachelor's Degree in Data Science (School of Informatics, Valencia)

ANYCOM --> Start_Year - the student’s starting year in the degree

ING_INGRESO --> Enrolment_Type

ING_NOTA --> Admission_Grade - university admission grade for this degree (ranges depend on the entry 
type)

ING_EST --> Prev_Studies - previous studies

ING_CUPO --> Entry_Type - Entry Type (?)

EDAD --> Start_Age - age at the beginning of the academic year (1st September ANYCOM [Start Year])

SEXO -->  Sex - gender

ESTUDIS_P --> Father_Edu_Lev -  father's educational leve

ESTUDIS_M --> Mother_Edu_Lev - mother's educational level

NUMPRE --> NT (?) - position of the degree in the ordered options of the pre-registration


In [8]:
colnames(enrolments_df) <- c(
    "Stud_ID",
    "Degree_ID",
    "Start_Year",
    "Enrolment_Type",
    "Admission_Grade",
    "Prev_Studies",
    "Entry_Type",
    "Start_Age",
    "Sex",
    "Father_Edu_Lev",
    "Mother_Edu_Lev",
    "Numpre")
head(enrolments_df)

Stud_ID,Degree_ID,Start_Year,Enrolment_Type,Admission_Grade,Prev_Studies,Entry_Type,Start_Age,Sex,Father_Edu_Lev,Mother_Edu_Lev,Numpre,NA,NA.1,NA.2
439444,156,2010,NAI,61,J,10,G,,19,V,5,5,6.0,8
466072,156,2010,NAP,602,S,10,G,,25,V,4,4,0.0,1
354734,156,2010,NAP,69,J,10,G,,22,V,5,5,0.0,1
460803,156,2010,NAP,6833,J,11,G,2014.0,18,V,5,4,60.0,1
460956,156,2010,NSA,9016,J,10,G,,17,V,4,4,,2
460850,156,2010,NAP,9129,J,5,G,,22,V,5,4,12.0,1


### 3.1 Students_courses.csv
For each student enrolling in one of the selected four degrees, and for each academic year, the 
file includes information about each course in which the student is enrolled, its characteristics 
as well as the final grade

NIP --> Stud_ID: a unique identification number for the student

TIT --> Degree_ID identification code for the degree on which the student is registered

ASI --> Course_ID: identification code of the course

CACA --> Academic_Year: academic year

GLM --> Course_Section: section where the student is registered for that course

CTOT --> ECTS: ECTS credits

BLO --> Course_Mod: module to which the course belongs

TPBLO --> Mod_Type: module type to which the course belongs

NOTA --> Final_Grade: final grade as shown on the student transcript

TIPCRE --> Course_Type: course type


In [9]:
head(courses_df)
colnames(courses_df) <- c(
    "Stud_ID",
    "Degree_ID",
    "Course_ID",
    "Academic_Year",
    "Course_Section",
    "ECTS",
    "Course_Mod",
    "Mod_Type",
    "Final_Grade",
    "Course_Type")
head(courses_df)

NIP,TIT,ASI,CACA,GLM,CTOT,BLO,TPBLO,COND_ASI,ACTA,NOTA,CAL,TIPCRE
12428,156,11553,2014,RE1,6,3,B,,S,71,N,B
12428,156,11567,2014,RE1,45,3,G,,S,81,N,O
12428,156,11572,2014,RE1,45,4,G,,S,8,N,O
12428,156,11568,2014,RE1,45,3,G,,S,78,N,O
12428,156,11660,2015,TFG,12,4,B,,-,75,N,G
12428,156,11560,2015,RE1,45,3,B,,S,71,N,B


Stud_ID,Degree_ID,Course_ID,Academic_Year,Course_Section,ECTS,Course_Mod,Mod_Type,Final_Grade,Course_Type,NA,NA.1,NA.2
12428,156,11553,2014,RE1,6,3,B,,S,71,N,B
12428,156,11567,2014,RE1,45,3,G,,S,81,N,O
12428,156,11572,2014,RE1,45,4,G,,S,8,N,O
12428,156,11568,2014,RE1,45,3,G,,S,78,N,O
12428,156,11660,2015,TFG,12,4,B,,-,75,N,G
12428,156,11560,2015,RE1,45,3,B,,S,71,N,B


## Students_years.csv
For each student enrolling in one of the selected four degrees, and for each academic year, the 
file includes information about the total number of registered credits, as well as passed, failed 
and no-show credits. It also contains some additional socio-economical information.
Columns are:

NIP --> Stud_ID: a unique identification number for the student

TIT --> Degree_ID: identification code for the degree on which the student is registered

CACA --> Academic_Year: academic year

TRABAJO_A --> Stud_Job: has the student a job (part time, full-time)?

OBTIENE_BECA --> Stud_Scholarship has the student obtained a scholarship?

ESTADO --> Is_Reg_Open: is the registration active?

CRE_MAT --> Credits_Total: total number of registered credits on that academic year

CRE_SUP --> Credits_Passed_Total: total number of passed credits on that academic year

CRE_SUS --> Credits_Failed_Total: total number of failed credits on that academic year

CRE_NP --> Credits_No_Show_Total: total number of no-show credits on that academic year

CRE_MAT_CN --> Credits_Normal_Reg_Total total number of registered credits in “normal conditions”, excluding recognitions, adaptations…on that academic 

CRE_SUP_CN --> Credits_Passed_Normal: total number of passed credits in “normal conditions”

CRE_SUS_CN --> Credits_Failed_Normal: total number of failed credits in “normal conditions”

CRE_NP_CN --> Credits_No_Show_Normal total number of no-show credits in “normal conditions

In [11]:
head(years_df)
colnames(years_df) <- c(
    "Stud_ID",
    "Degree_ID",
    "Academic_Year",
    "Stud_Job",
    "Stud_Scholarship",
    "Is_Reg_Open",
    "Credits_Total",
    "Credits_Passed_Total",
    "Credits_Failed_Total",
    "Credits_No_Show_Total",
    "Credits_Normal_Reg_Total",
    "Credits_Passed_Normal",
    "Credits_Failed_Normal",
    "Credits_No_Show_Normal")
head(years_df)

NIP,TIT,CACA,TRABAJO_A,OBTIENE_BECA,ESTADO,CRE_MAT,CRE_SUP,CRE_SUS,CRE_NP,CRE_MAT_CN,CRE_SUP_CN,CRE_SUS_CN,CRE_NP_CN
504943,158,2012,,-,,255,255,0,0,255,255,0,0
504911,158,2012,,-,,255,195,6,0,255,195,6,0
504898,158,2012,,-,,285,195,0,9,285,195,0,9
504997,158,2012,,-,,15,0,9,6,15,0,9,6
505000,158,2012,,-,,15,15,0,0,15,15,0,0
504999,158,2012,,-,,27,225,0,45,27,225,0,45


Stud_ID,Degree_ID,Academic_Year,Stud_Job,Stud_Scholarship,Is_Reg_Open,Credits_Total,Credits_Passed_Total,Credits_Failed_Total,Credits_No_Show_Total,Credits_Normal_Reg_Total,Credits_Passed_Normal,Credits_Failed_Normal,Credits_No_Show_Normal
504943,158,2012,,-,,255,255,0,0,255,255,0,0
504911,158,2012,,-,,255,195,6,0,255,195,6,0
504898,158,2012,,-,,285,195,0,9,285,195,0,9
504997,158,2012,,-,,15,0,9,6,15,0,9,6
505000,158,2012,,-,,15,15,0,0,15,15,0,0
504999,158,2012,,-,,27,225,0,45,27,225,0,45
