# A walk-through of the preliminary data-analysis

#### What is the total size of your data?
- 1 file describing passed courses (df_courses): 12 KB, 190 rows
- 7 files describing course evaluations (df_eval_semester_week): 618 KB, 150-600 rows per file

#### What are other properties?
##### df_courses
- Passed courses for Msc. Design and Innovation during the semesters E23, F24, E24, and a few special cases for F25

These courses have correlated data regarding:
- Semester code
- Course code
- Course name (in Danish)
- Number of passed students from Msc. D&I at the given semester (except if <= 5 persons attending from D&I)
- Average grade from Msc. D&I students (except if <= 5 persons attending from D&I)

##### df_eval_semester_week
- Evaluations for all courses and students attending each course.
- Grade average of respondents at individual evaluation sections.

#### Show the fundamental distributions of the data
Scroll down!

### The original idea
The original idea was to show which courses are attended by a specific study line, including the popularity, evaluations and grades for the courses by that study line.

### Fitting to the constraints of reality
Looking at the limited dataset (due to anonymization) these columns are mostly blank (<= 5 personer) in df_courses, which is why avg. grades must be obtained for all students instead of the specific studyline. Secondly, the evaluation datasets that we have received are for all students in the courses. This means that some of the valuable data we wanted to utilize is not possible to obtain, but we can still show and filter what courses are attended by a specific study line, incl. their popularity.

### Initial ideas for alterations
- Change the format to English, including course names
- Grades and evaluations may be obtained from the received xlsx files as general numbers for all study lines.
- Popularity must be calculated as a an average throughout the semesters.
- Implement a column that categorizes the courses into mandatory, semi-mandatory, and elective courses.
- Sort the evaluation datasets so they only consist of useful columns
- Merge evaluation data files

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## DATA LOAD
df consists of passed courses from MSc. Design & Innovation students during the semesters E23, F24, E24 and some special cases from F25

In [6]:
df_courses = pd.read_excel("DI_courses_2020-24.xlsx")

In [7]:
df_courses

Unnamed: 0,Termin,Akt.Kode,Langtnavn,antal_beståede,gennemsnitskarakter
0,E19,02266,02266 User experience engineering,14,11.4285714285714
1,E19,02633,02633 Introduktion til programmering og databe...,<= 5 personer,<= 5 personer
2,E19,S01-10-1,Specialkursus ved Institut for Matematik og Co...,<= 5 personer,<= 5 personer
3,E20,02266,02266 User experience engineering,13,10.1538461538462
4,E20,02393,02393 Programmering i C++,<= 5 personer,<= 5 personer
...,...,...,...,...,...
611,F25,S41-02-1,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer
612,F25,S41-05-3,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer
613,F25,S41-05-4,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer
614,F25,S41-10-1,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer


In [8]:
df_eval_E23_W13 = pd.read_excel("Evaluations/E20-03.xlsx") 

In [9]:
df_eval_E23_W13

Unnamed: 0,kursusnummer,kursusnavn,svarprocent_antal_personer_som_har_svaret_i_forhold_til_dem_der_kunne_svare,jeg_har_laert_meget_i_dette_kursus_1_1_inkluderet,jeg_har_laert_meget_i_dette_kursus_1_1_helt_uenig,jeg_har_laert_meget_i_dette_kursus_1_1_uenig,jeg_har_laert_meget_i_dette_kursus_1_1_hverken_eller,jeg_har_laert_meget_i_dette_kursus_1_1_enig,jeg_har_laert_meget_i_dette_kursus_1_1_helt_enig,jeg_har_laert_meget_i_dette_kursus_1_1_gennemsnit,...,det_generelt_har_vaeret_klart_for_mig_hvad_der_forventes_af_mig_i_ovelser_projektarbejde_og_lignende_1_5_enig,det_generelt_har_vaeret_klart_for_mig_hvad_der_forventes_af_mig_i_ovelser_projektarbejde_og_lignende_1_5_helt_enig,det_generelt_har_vaeret_klart_for_mig_hvad_der_forventes_af_mig_i_ovelser_projektarbejde_og_lignende_1_5_gennemsnit,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_inkluderet,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_meget_mindre,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_noget_mindre,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_det_samme,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_noget_mere,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_meget_mere,x5_ects_point_er_normeret_til_9_arbejdstimer_uge_i_13_ugersperioden_45_arbejdstimer_uge_i_treugers_perioden_jeg_mener_at_den_tid_jeg_har_brugt_pa_kurset_er_2_1_gennemsnit
0,01125,01125 Topologiske grundbegreber og metriske ru...,48,Ja,,,,,,4.5,...,,,4.0,Ja,,,,,,3.0
1,01257,01257 Videregående modellering - anvendt matem...,5,Ja,,,,,,4.0,...,,,5.0,Ja,,,,,,3.5
2,02121,02121 Introduktion til softwareteknologi E20,30,Ja,,,,,,3.5,...,,,3.1,Ja,,,,,,2.3
3,02148,02148 Introduktion til koordinering af fordelt...,14,Ja,,,,,,3.8,...,,,3.4,Ja,,,,,,3.2
4,02191,02191 Computersikkerhedsundersøgelser Jan 21,21,Ja,,,,,,4.0,...,,,3.3,Ja,,,,,,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,62739,62739 Elektromagnetiske sensorer og digital si...,11,Ja,,,,,,3.6,...,,,4.3,Ja,,,,,,2.7
154,62780,62780 Projekter i Elektrisk Energi Jan 21,25,Ja,,,,,,2.7,...,,,3.1,Ja,,,,,,3.3
155,62783,62783 Projektarbejde 3. semester Jan 21,11,Ja,,,,,,1.3,...,,,2.0,Ja,,,,,,3.7
156,88383,88383 Akademisk informationssøgning - styrk di...,18,Ja,,,,,,4.6,...,,,5.0,Ja,,,,,,3.0
