# A walk-through of the preliminary data-analysis

#### What is the total size of your data?
- 1 file describing passed courses (df_courses): 12 KB, 190 rows
- 7 files describing course evaluations (df_eval_semester_week): 618 KB, 150-600 rows per file

#### What are other properties?
##### df_courses
- Passed courses for Msc. Design and Innovation during the semesters E23, F24, E24, and a few special cases for F25

These courses have correlated data regarding:
- Semester code
- Course code
- Course name (in Danish)
- Number of passed students from Msc. D&I at the given semester (except if <= 5 persons attending from D&I)
- Average grade from Msc. D&I students (except if <= 5 persons attending from D&I)

##### df_eval_semester_week
- Evaluations for all courses and students attending each course.
- Grade average of respondents at individual evaluation sections.

#### Show the fundamental distributions of the data
Scroll down!

### The original idea
The original idea was to show which courses are attended by a specific study line, including the popularity, evaluations and grades for the courses by that study line.

### Fitting to the constraints of reality
Looking at the limited dataset (due to anonymization) these columns are mostly blank (<= 5 personer) in df_courses, which is why avg. grades must be obtained for all students instead of the specific studyline. Secondly, the evaluation datasets that we have received are for all students in the courses. This means that some of the valuable data we wanted to utilize is not possible to obtain, but we can still show and filter what courses are attended by a specific study line, incl. their popularity.

### Initial ideas for alterations
- Change the format to English, including course names
- Grades and evaluations may be obtained from the received xlsx files as general numbers for all study lines.
- Popularity must be calculated as a an average throughout the semesters.
- Implement a column that categorizes the courses into mandatory, semi-mandatory, and elective courses.
- Sort the evaluation datasets so they only consist of useful columns
- Merge evaluation data files

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## DATA LOAD
df consists of passed courses from MSc. Design & Innovation students during the semesters E23, F24, E24 and some special cases from F25

In [3]:
df_courses = pd.read_excel("Kurser_DI.xlsx")

In [4]:
df_courses

Unnamed: 0,Termin,Akt.Kode,Langtnavn,antal_beståede,gennemsnitskarakter
0,E23,02162,02162 Software Engineering 2,<= 5 personer,<= 5 personer
1,E23,02266,02266 User experience engineering,8,11
2,E23,02402,02402 Statistik (Polyteknisk grundlag),<= 5 personer,<= 5 personer
3,E23,02441,02441 Anvendt statistik og statistisk programmel,<= 5 personer,<= 5 personer
4,E23,02455,02455 Eksperimenter i kognitionsvidenskab,<= 5 personer,<= 5 personer
...,...,...,...,...,...
186,F25,E41-35,Institut for Byggeri og Mekanisk Teknologi,<= 5 personer,<= 5 personer
187,F25,S41-05-3,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer
188,F25,S41-05-4,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer
189,F25,S41-10-1,Specialkursus ved Institut for Byggeri og Meka...,<= 5 personer,<= 5 personer


In [5]:
df_eval_E23_W13 = pd.read_excel("Kursuseval E23-13.xlsx") 

In [6]:
df_eval_E23_W13

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,jeg har lært meget i dette kursus.,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 35,Unnamed: 36,Unnamed: 37,"5 ECTS point er normeret til 9 arbejdstimer/uge i 13-ugersperioden (45 arbejdstimer/uge i treugers-perioden). \nJeg mener, at den tid jeg har brugt på kurset er",Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44
0,Kursusnummer,Kursusnavn,Svarprocent. Antal personer som har svaret i f...,1.1 Inkluderet,1.1 Helt uenig,1.1 Uenig,1.1 Hverken eller,1.1 Enig,1.1 Helt enig,1.1 Gennemsnit,...,1.5 Enig,1.5 Helt enig,1.5 Gennemsnit,2.1 Inkluderet,2.1 Meget mindre,2.1 Noget mindre,2.1 Det samme,2.1 Noget mere,2.1 Meget mere,2.1 Gennemsnit
1,01001,01001 Matematik 1a (Polyteknisk grundlag) E23,63,Ja,,,,,,4.2,...,,,3.7,Ja,,,,,,3.7
2,01003,01003 Matematik 1a (Polyteknisk grundlag) E23,63,Ja,,,,,,4.5,...,,,3.9,Ja,,,,,,3.7
3,01017,01017 Diskret matematik E23,45,Ja,,,,,,3.8,...,,,3.4,Ja,,,,,,3.3
4,01018,01018 Diskret matematik 2: algebra E23,46,Ja,,,,,,4.6,...,,,4.6,Ja,,,,,,3.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
574,63850,63850 Bæredygtig forretningsudvikling i medici...,61,Ja,,,,,,4.3,...,,,4.1,Ja,,,,,,3
575,63860,63860 System- og netværkssikkerhed E23,27,Ja,,,,,,3.4,...,,,3,Ja,,,,,,3
576,63880,63880 Data analytics i transport med fokus på ...,100,Ja,,,,,,4,...,,,4,Ja,,,,,,3
577,KU010,KU010 Bevægeapparatets Biomekanik E23,22,Ja,,,,,,3.8,...,,,3.7,Ja,,,,,,3
