# <center>Introduction to Curricular Simulations</center>

<center>
    <b>Gregory L. Heileman$^\dagger$, Jiacheng Zhang$^\ddagger$ and Hayden W. Free$^\bullet$</b> <br>
    $^\dagger$Department of Electrical & Computer Engineering <br>
    University of Arizona <br>
    heileman@arizona.edu <br>
    $^\ddagger$Department of Computer Science <br>
    jiachengzhang1@arizona.edu <br>
    University of Arizona<br>
    $^\bullet$Department of Computer Science <br>
    hayden.free@uky.edu <br>
    University of Kentucky
</center>

## 1. Introduction

This notebook demonstrates how to use the simulation capabilites that are included as a part of the [CurricularAnalytics toolbox](https://github.com/CurricularAnalytics/CurricularAnalytics.jl). If you would like to become more familiar with the notions behind curricular analytics, we suggest you read <cite data-cite="he:18">Heileman, et. al, (2018)</cite>, and also examine the Introduction to the Curricular Analytics Toolbox notebook that accompanies this notebook.

The simulation capabilites include the ability to simulate the flow of students through a curriculum, towards graduation, using discrete event simulation. Specifically, a population of students attempt to complete the selected curriculum, by taking courses in the order prescribed by the curriculum. At each step (semester) of the simulation a given student enrolls in a set of courses, earning either a passing of failing grade in each.  At the end of a given semester, if a student has passed all of the courses in the curriculum, they are deemed a graduate.  If a student has not yet gradauted, then they may stop out (according to a prescribed stop-out model), or enroll in the next set of courses available to them. One of the intended uses of these simulation capabilities is to estimate the impact that particular curricular changes or instructional improvements will have on student progress. 

The simulation framework (shown below) was orginally developed by <cite data-cite="hi:18">Hickman, (2014)</cite>, and  subsequent development has allowed it to be integrated into the CurricularAnalytics toolbox. 

<img src="SimulationFramework.png" width="600">

Notice that the simulation "engine" requires three inputs: a curriculum, a model for students, and a model for student peformance. The results of the simualtion are returned in an object that may be viewed using the `simulation_report()` function, as demonstrated below. 

In order to perform curricular simulations, first load the Curricular Analytics toolbox modules:

In [21]:
using CurricularAnalytics, CurricularVisualization

## 2. Setting up the Simulation Environment

The first thing we will do is read in a degree plan. The student in the simulation will attempt to complete the curriculum associated with the degree plan in the order that prescribed in the degree plan.  Specifically, in each semester each student will enroll in courses they have not yet taken in the order specified by the degree plan until they reach the maximum allowed number of credit hours.

### 2.1 Reading the Degree Plan

The following commands read in a degree plan stored in the CSV file format, and then display a visualization of the resulting plan.

In [2]:
AE_degree_plan = read_csv("Univ_of_Arizona-Aero.csv")
visualize(AE_degree_plan, notebook=true, scale=0.8)

In [3]:
basic_metrics(AE_degree_plan)
AE_degree_plan.metrics

Dict{String,Any} with 8 entries:
  "total credit hours"         => 129
  "avg. credits per term"      => 16.125
  "min. credits in a term"     => 15
  "term credit hour std. dev." => 0.927025
  "number of terms"            => 8
  "max. credits in a term"     => 18
  "min. credit term"           => 4
  "max. credit term"           => 3

In [4]:
CS_degree_plan = read_csv("Univ_of_Arizona-CS.csv")
visualize(CS_degree_plan, notebook=true, scale=0.8)

In [5]:
EE_degree_plan = read_csv("Ga_Tech-EE.csv")
visualize(EE_degree_plan, notebook=true, scale=1.0)

In [6]:
basic_metrics(EE_degree_plan)
EE_degree_plan.metrics

Dict{String,Any} with 8 entries:
  "total credit hours"         => 134
  "avg. credits per term"      => 16.75
  "min. credits in a term"     => 15
  "term credit hour std. dev." => 1.08972
  "number of terms"            => 8
  "max. credits in a term"     => 18
  "min. credit term"           => 4
  "max. credit term"           => 5

### 2.2 Creating the Student Cohort

The following command will create an inital cohort of students `n` using a simple enrollment model. Specifically, with this simple model, all students are assumed equally likely (or unlikely) to pass a given class according the course pass rate probability model.

In [7]:
enrollment_model = Enrollment  # use the Enrollment module to determine if/when student may enroll in a course
stopouts = true  # assume that student may stop out of the cohort
n = 1000   # student cohort size will be 100
students = simple_students(n);  # create a student cohort

### 2.3 The Course Performance Model

First let's see what happens if the instructional complexity is the same for both programs. Specifically, we will use course pass rates as a proxy for instructional complexity, and we'll set the pass rates for all courses at 90%. 

In [8]:
performance_model = PassRate
real_passrate = false
set_passrates(AE_degree_plan.curriculum.courses, 0.9)
set_passrates(CS_degree_plan.curriculum.courses, 0.9)
course_passrate = 0.9;

### 2.4 Setting the Simulation Parameters

In [9]:
max_credits = 18  # the maximum number of credit hours a student may enroll in during a semester
duration_lock = false # rather than simulating until no students are left in the cohort, run for a fixed number of terms
num_terms = 12  # the maximum number of terms in the simulation
course_attempt_limit = 2;  # number of times a student may attempt a course

### 2.5 Running the Simulation 
The `simulation` function is used to execute the simulation.  Depending upon how many students are in the cohort, this may take some time to run.

In [10]:
simulation = simulate(AE_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);

In order to view the results of the simulation, use the `simulation_report` function:

In [11]:
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Aerospace Engineering, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 2
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 449
Graduation Rate: 44.9%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.056, 0.301, 0.439, 0.448, 0.449]
Average time to degree: 9.229398663697104 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 551
Number of Students Reaching Max Attempts: 275
Stop-out Rate: 55.1%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.068, 0.231, 0.312, 0.39, 0.435, 0.469, 0.49, 0.511, 0.538, 0.549, 0.551, 0.551]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.068, 0.171, 0.203, 0.25, 0.27, 0.271, 0.27

│ 21  │ 62.6%  │
│ 22  │ 66.0%  │
│ 23  │ 57.2%  │
│ 24  │ 58.6%  │
│ 25  │ 57.4%  │
│ 26  │ 58.7%  │
│ 27  │ 56.1%  │
│ 28  │ 98.2%  │
│ 29  │ 54.4%  │
│ 30  │ 54.6%  │
│ 31  │ 53.1%  │
│ 32  │ 52.6%  │
│ 33  │ 97.4%  │
│ 34  │ 52.0%  │
│ 35  │ 50.3%  │
│ 36  │ 50.1%  │
│ 37  │ 50.9%  │
│ 38  │ 51.1%  │
│ 39  │ 51.6%  │
│ 40  │ 79.3%  │
│ 41  │ 48.4%  │
│ 42  │ 50.3%  │
│ 43  │ 50.4%  │
│ 44  │ 49.0%  │
│ 45  │ 48.6%  │
│ 46  │ 44.9%  │

Now let's run the same set of students through the Computer Science curriculum:

In [12]:
simulation = simulate(CS_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Computer Science, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 2
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 483
Graduation Rate: 48.3%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.259, 0.474, 0.483, 0.0, 0.0]
Average time to degree: 8.4824016563147 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 517
Number of Students Reaching Max Attempts: 214
Stop-out Rate: 51.7%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.079, 0.226, 0.278, 0.357, 0.399, 0.431, 0.465, 0.496, 0.515, 0.517, 0.0, 0.0]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.079, 0.191, 0.219, 0.273, 0.287, 0.296, 0.304, 0.304, 0

In [13]:
real_passrate = false
set_passrates(EE_degree_plan.curriculum.courses, 0.9)
simulation = simulate(EE_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Electrical Engineering, BS -- 2017-18 Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 2
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 435
Graduation Rate: 43.5%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.061, 0.329, 0.426, 0.435, 0.0]
Average time to degree: 9.124137931034483 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 565
Number of Students Reaching Max Attempts: 278
Stop-out Rate: 56.5%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.079, 0.23, 0.286, 0.356, 0.418, 0.457, 0.497, 0.528, 0.556, 0.564, 0.565, 0.0]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.079, 0.183, 0.211, 0.254, 0.276, 0.279, 0.288, 0.28

Next we will use a model to determine whether or not a student passes a course. We will use actual pass/fail rates computed using historical data for the courses in the degree plan shown above. 

In [14]:
course_passrate = 0.9  # use if a course is not contained in the CSV file
real_passrate = true  # use the actual pass rates, rather than course_passrate for all courses
set_passrates_from_csv(AE_degree_plan.curriculum.courses, "./Student_Grades_sp17_to_fall19.csv", course_passrate)

Note: A more realistic model for predicting student performance could be used here. Specifically, a more realistic model might:

- take student demographics into account, including the major they are in,
- take prior grades into account when predicting future grades,
- take into account factors that influence student stopout, e.g., academic standing, GPA, unment need, etc.

Learning the model pararmeters using actual student data would improve the fidelity of the simulation.

In [15]:
simulation = simulate(AE_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Aerospace Engineering, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 2
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 269
Graduation Rate: 26.900000000000002%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.017, 0.163, 0.261, 0.268, 0.269]
Average time to degree: 9.364312267657992 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 731
Number of Students Reaching Max Attempts: 502
Stop-out Rate: 73.1%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.088, 0.33, 0.479, 0.603, 0.656, 0.682, 0.694, 0.705, 0.715, 0.729, 0.731, 0.731]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.088, 0.194, 0.197, 0.219, 0

Now, let's see what happens if we use the actual course pass rates in the computer science program.

In [16]:
set_passrates_from_csv(CS_degree_plan.curriculum.courses, "./Student_Grades_sp17_to_fall19.csv", course_passrate)
simulation = simulate(CS_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Computer Science, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 2
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 280
Graduation Rate: 28.000000000000004%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.116, 0.258, 0.28, 0.0, 0.0]
Average time to degree: 8.664285714285715 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 720
Number of Students Reaching Max Attempts: 526
Stop-out Rate: 72.0%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.071, 0.361, 0.444, 0.536, 0.607, 0.661, 0.68, 0.703, 0.718, 0.72, 0.0, 0.0]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.071, 0.145, 0.159, 0.179, 0.188, 0.192, 0.

## 3. What-if Analyses

These simulation capabilities allow us to conduct what-if analyses around the impact that changes to curricular structure or instructional improvements will have on student success.  First, let's consider how changing the number of allowed attempts (from 2 to 3) would impact graduation rates in both of these programs.

In [17]:
course_attempt_limit = 3
simulation = simulate(AE_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Aerospace Engineering, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 3
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 578
Graduation Rate: 57.8%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.02, 0.256, 0.476, 0.566, 0.578]
Average time to degree: 9.719723183391004 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 420
Number of Students Reaching Max Attempts: 110
Stop-out Rate: 42.0%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.071, 0.211, 0.272, 0.336, 0.374, 0.389, 0.404, 0.406, 0.415, 0.418, 0.419, 0.42]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.071, 0.211, 0.248, 0.287, 0.303, 0.307, 0.

In [18]:
simulation = simulate(CS_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Computer Science, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 3
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 554
Graduation Rate: 55.400000000000006%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.117, 0.432, 0.537, 0.554, 0.0]
Average time to degree: 9.03971119133574 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 446
Number of Students Reaching Max Attempts: 156
Stop-out Rate: 44.6%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.066, 0.19, 0.287, 0.342, 0.381, 0.406, 0.429, 0.438, 0.444, 0.446, 0.446, 0.0]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.066, 0.19, 0.219, 0.265, 0.277, 0.284

Next, notice that CSC 110 -- Intro to Computer Programming I is clearly a gateway course for the computer science program at the University of Arizona, and that is has a relatively low success rate. There are a few other courses with the CSC prefix that also have low success rates. What would happen if the instruction and instructional support were changed in a way that enabled the students taking CSC 110, CSC 120 and CSC 353 to obtain a 90% pass rate? 

In [19]:
convert_ids(CS_degree_plan.curriculum)
csc110 = course(CS_degree_plan.curriculum, "CSC", "110", "Intro to Computer Programming I", "")
csc120 = course(CS_degree_plan.curriculum, "CSC", "120", "Intro to Computer Programming II", "")
csc210 = course(CS_degree_plan.curriculum, "CSC", "210", "Software Development", "")
csc345 = course(CS_degree_plan.curriculum, "CSC", "345", "Analysis of Discrete Structures", "")
csc352 = course(CS_degree_plan.curriculum, "CSC", "352", "Systems Programming & Unix", "")

csc110.passrate = csc120.passrate = csc210.passrate = csc345.passrate = csc352.passrate = 0.9;

In [20]:
simulation = simulate(CS_degree_plan, course_attempt_limit, students,
                      max_credits = max_credits,
                      performance_model = performance_model,
                      enrollment_model = enrollment_model,
                      duration = num_terms,
                      duration_lock = duration_lock,
                      stopouts = stopouts);
simulation_report(simulation, num_terms, course_passrate, max_credits, real_passrate)


[0m[1m------------ Simulation Report ------------[22m
Computer Science, BS -- 2019-20 Degree Plan

-------- Simulation Statistics --------
Number of terms: 12
Max Credits per Term: 18
Max Course Attempts: 3
Number of Students: 1000
Preset Course Pass Rates: 90.0%

-------- Graduation Statistics --------
Number of Students Graduated: 627
Graduation Rate: 62.7%
Term Graduation Rates: 
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.221, 0.558, 0.619, 0.627, 0.0]
Average time to degree: 8.770334928229666 terms

-------- Stop out Statistics --------
Number of Students Stopped Out (Stopout Model Prediction + Reached Max Attempts): 373
Number of Students Reaching Max Attempts: 56
Stop-out Rate: 37.3%
Cumulative Term Stop-out Rates (including reached max course attempts students): 
[0.086, 0.215, 0.273, 0.309, 0.342, 0.353, 0.363, 0.365, 0.367, 0.372, 0.373, 0.0]

Cumulative Term Stop-out Rates (excluding reaching max course attempts students): 
[0.086, 0.215, 0.255, 0.287, 0.306, 0.308, 0.317, 0.3

# References

Heileman, G. L., Abdallah, C.T., Slim, A., and Hickman, M. (2018). Curricular analytics: A framework for quantifying the impact of curricular reforms and pedagogical innovations. www.arXiv.org, arXiv:1811.09676 [cs.CY].

Hickman, M. (2014). Development of a Curriculum Analysis and Simulation Library with Applications in Curricular Analytics. MS thesis, University of New Mexico,
Albuquerque, NM.