# <center>University of Arizona Curricula - Computer Science</center>

This notebook contains a prelmininary analyses of the curricula and degree plan associated with the undergradatue Computer Science program in the College of Science at the University of Arizona.  In addition, we provide some comparisons to other undergraduate Computer Science programs around the country using data collected as a part of a prior study.

In [21]:
using Pkg
if split(pwd(),Base.Filesystem.path_separator)[end] != "CurricularAnalytics.jl"
    cd("../../CurricularAnalytics.jl/")
end
pkg"activate ."
using CurricularAnalytics
cd("../CA-Notebooks/Arizona Curricula")

[32m[1mActivating[22m[39m environment at `~/Library/Mobile Documents/com~apple~CloudDocs/work/research/Curricular Analytics/CurricularAnalytics.jl/Project.toml`


In [22]:
#using CurricularAnalytics
using Glob
using CSV
using DataFrames
using Statistics

## Curricular Analytics Toolbox

The analyses in this notebook makes use of the Curricular Analytics toolbox built using the Julia programming language and available as open source software [1]. As a starting point, you may find it useful to read the toolbox documenation, as well as the curricular analytics paper listed in the References section below [2].

### Create the Data Structures 
The degree plan associated with the CS program was stored as CSV files using the format for degree plans specified in the Curricular Analytics toolbox.  The files are organized in a directory structure that is assumed to be in the same directory as this notebook as follows:  `./programs/<college-name>/`.  If you place the degree plans of your other programs in this folder, we will be able to analyze them. 

Asuuming the aforementioned directory structure, we first create an dictionay called `plans` containing the degree plans for each of the programs in a given college, in this case the college of engineering.

In [18]:
college = "college_of_science"
plans = Dict{String, DegreePlan}()
program_files = glob("*", "./programs/$college")
for program in program_files
    dp = read_csv(program)
    complexity(dp.curriculum)  # compute the curricular complexity of the degree plan
    plans[dp.curriculum.name] = dp    # store the degree plan the dictionary 
end

## The Computer Science Program

First we will analyze the structural properties of a curriculum in the college.  The structural properties of a curriculum are determined by the underlying structural properties of its corresponding curriculum graph (i.e., the graph showing the prerequisite relationships between the courses in a curriculum, ignoring term information).  Here's the degree plan for the Computer Science program.  By hovering your mouse over the courses in this figure, various metrics will be displayed.

In [4]:
CS_plan = plans["Computer Science"]
visualize(CS_plan, notebook=true)

The `basic_metrics()` function can be used to output a set of basic metrics associatd with a curriculum. As an example, here are the basic curricular metrics associated with the Electrical Engineering and Computer Science programs:

In [5]:
metrics = basic_metrics(CS_plan.curriculum)
println(String(take!(metrics)))


University of Arizona 
Curriculum: Computer Science
  credit hours = 122
  number of courses = 37
  Blocking Factor --
    entire curriculum = 30
    max. value = 10, for course(s): MATH 120R - Pre-Calculus
  Centrality --
    entire curriculum = 78
    max. value = 25, for course(s): CSC 110 - Intro to Computer Programming I, CSC 120 - Intro to Computer Programming II
  Delay Factor --
    entire curriculum = 79.0
    max. value = 5.0, for course(s): CSC 110 - Intro to Computer Programming I, MATH 120R - Pre-Calculus, CSC 120 - Intro to Computer Programming II, CSC 210 - Software Development, CSC 245 - Intro to Discrete Structures, CSC 252 - Computer Organization, CSC 352 - Systems Programming & Unix, CSC 335 - Object-Oriented Programming, CSC 345 - Analysis of Discrete Structures
  Complexity --
    entire curriculum = 109.0
    max. value = 15.0, for course(s): MATH 120R - Pre-Calculus
  Longest Path(s) --
    length = 5, number of paths = 5
    path(s):
    path 1 = MATH 120R - Pr

From a student progression perspective, Math 120R appears to be the most important course in the curriculum.  That is, improving the success rates of the CS students in this program would do the most to facilitate completion of the CS degree. Introduction to Computer Programming II (CSC 120) is the most central course in the curriculum.  That is, from a knowledge flow perspective, this course requires the most requisite knowledge, and it supplies the most knowledge to follow-on courses in the curriculum.

One issue worth investigating is the need for a 122 credit hour program.  Given the number of electives in the program, it seems that it would be realitvely straightforward to create a 120 credit hour program without threatening the attainment of any of the program's student learning outcomes.

### Extraneous Prerequisites
The following function will find prerequisites in a curriculum.  These are redundant prerequisites that are unnecessary in a curriculum.  For example, if a curriculum has the prerequisite 
relationships $c_1 \rightarrow c_2 \rightarrow c_3$ and $c_1 \rightarrow c_3$, and $c_1$ and $c_2$ are 
*not* co-requisites, then $c_1 \rightarrow c_3$ is redundant and therefore extraneous.  Extraneous prerequisites do not effect the curricular complexity metric, they simply are unnecessary clutter in a curriculum or degree plan.

In [25]:
for plan in plans
    extraneous_requisites(plan[2].curriculum, print=true)
end




There are no extraneous prerequisites in the CS program.

### Dead End Courses
The following function can be used to find "dead end" courses in a curricula.  Dead end courses are those that appear at the end of a path (i.e., sink vertices), and are not a part of a course associated with the major.  E.g., in the case of the CS curriculum above, these would be courses at the end of a path that do not have the "CSC" prefix.  One might consider these courses dead ends, as their course outcomes are not (formally) used by any 
major-specific course, i.e., by any course with the prefix "CS."

In [26]:
prefixes = Dict{String, Array{String,1}}()
prefixes["Computer Science"] = ["CSC"];

In [27]:
for plan in plans
    de = dead_ends(plan[2].curriculum, prefixes[plan[2].curriculum.name])
    println("\nDead end courses in the $(plan[2].curriculum.name) curriculum:")
    for course in de[2]
        println("$(course.prefix) $(course.num): $(course.name)")
    end
end


Dead end courses in the Computer Science curriculum:
ENGL 102: English Composition II
MATH 129: Calculus II


English Composition II is a general education course, and the skills developed in this class are informally used in other classes in the curriculum, particularly those that are writing intensive.  The Calculus II class, on the other hand, warrants further investigation.  Are the learning outcomes from the class informally required by any other courses in the curriculum, specifically, in any of the typical CS elective courses?

### Comparisons to Other CS Programs
We recently conducted a study that looked at the rankings of Computer Science departments in the United States, and related them to the complexity of the undergradaute programs in those departments. We found a statistically significant *inverse* relationship between ranking and curricular complexity.  That is, higher ranked programs tend to have less complex curricula. A notebook describing the details of this study is availabe at: 

#### Methology
The study involved grouping undergraduate CS programs into three tiers (top, middle, and bottom) based on their rankings. For this purpose, we used the rankings provided by the CSRankings of Computer Science departments as a proxy for program quality.  That is, for the purpose of this study, we assumed that the highest ranked computer science departments are synonymous with the highest quality computer science undergraduate programs.  We acknowledge the concerns that are routinely expressed concerning these rankings such as these.  However, it should be noted that this study uses aggregations of schools with tiers, and the statistics associated with the aggregations.  Thus, the specific rankings of the schools within the tiers are irrelevant, all that matters is the tier in which a school is placed. Upon inspection of the schools within each tier, we believe that knowledgable and impartial observes would agree that the three tiers constructed in this study are highly correlated with the program quality.

#### Results
The study results are provided in the nothced box plot below, where the notches roughly correspond to the confidence intervals around the median values.
![CS Study Box Plot](CS_study_results.png)
ANOVA analysis determined that this is a statisitically significant difference between the means of the complexities of the programs in these tiers.

The curricular complexity of the Univeristy of Arizona undergraduate Computer Science program is most similar to that of the top tier of Computer Science programs in this country.

### Degree Plan Optimization 
The Curricular Analytics toolbox contains a number of functions that will create different degree plans for a curriculum depending upon various optimization criteria.  In order to use these functions, you must first install the Gourbi solver, called [Gurobi Optimizer](https://www.gurobi.com/downloads/gurobi-optimizer-eula). Gurobi is a commercial product, and requires a license key; however, [academic licenses](https://www.gurobi.com/downloads/end-user-license-agreement-academic) are available at no cost.

In [23]:
# Uncomment the following two lines if the Gurobi package has not yet been included in your Julia environment.
#using Pkg
#Pkg.add("Gurobi")
using Gurobi

Below is an analysis of the 2019-20 CS degree plan. 

In [24]:
metrics = basic_metrics(CS_plan)
println(String(take!(metrics)))


Curriculum: Computer Science
Degree Plan: 2019-20 Degree Plan
  total credit hours = 122
  number of terms = 8
  max. credits in a term = 17, in term 2
  min. credits in a term = 12, in term 8
  avg. credits per term = 15.25, with std. dev. = 1.3919410907075054
  total distance between all requisites = 14



In [14]:
plan_new = optimize_plan(CS_plan.curriculum, 8, 12, 18, balance_obj);
visualize(plan_new, notebook=true)

Academic license - for non-commercial use only
An optimal solution was found with objective value = 24.0


In [15]:
metrics = basic_metrics(plan_new)
println(String(take!(metrics)))


Curriculum: Computer Science
Degree Plan: 
  total credit hours = 122
  number of terms = 8
  max. credits in a term = 16, in term 2
  min. credits in a term = 15, in term 1
  avg. credits per term = 15.25, with std. dev. = 0.4330127018922193
  total distance between all requisites = 37



In [19]:
plan_new = optimize_plan(CS_plan.curriculum, 8, 12, 18, [balance_obj, req_distance_obj]);
visualize(plan_new, notebook=true)

Academic license - for non-commercial use only
An optimal solution was found with objective value = 24.0


In [20]:
metrics = basic_metrics(plan_new)
println(String(take!(metrics)))


Curriculum: Computer Science
Degree Plan: 
  total credit hours = 122
  number of terms = 8
  max. credits in a term = 16, in term 2
  min. credits in a term = 15, in term 1
  avg. credits per term = 15.25, with std. dev. = 0.4330127018922193
  total distance between all requisites = 11



## References
<a id='References'></a>

[1] Heileman, G. L., Abdallah, C.T., Slim, A., and Hickman, M. (2018). Curricular analytics: A framework for quantifying the impact of curricular reforms and pedagogical innovations. www.arXiv.org, arXiv:1811.09676 [cs.CY].

[2] Heileman, G. L., Free, H. W., Abar, O. and Thompson-Arjona, W. G, (2019). CurricularAnalytics.jl Toolbox. https://github.com/heileman/CurricularAnalytics.jl.