## K Means Clustering

This mini project had two main goals. The first was to construct a basic k means clustering model using a variable number of clusters and a Euclidean distance function. The second was to use k means clustering to explore the relationship between class grade distributions and class subjects at the University of Wisconsin at Madison between 2006 and 2018.

In [1]:
%matplotlib inline

import pandas as pd
import numpy as np
import random
from collections import defaultdict

import matplotlib.pyplot as plt

#### The datasets

Here I use three datasets (which are a subset of the datasets that I used in the UW_Madison project).
- ```/inputs/grade_distributions.csv``` contains the count of each grade assigned for each course section taught at the university during the given time period
- ```/inputs/subject_memberships.csv``` contains the subject codes associated with each course section 
- ```/inputs/subjects.csv``` associates a subject name to each subject code

In [2]:
# Loading in the data

grades = pd.read_csv('../UW_Madison/inputs/grade_distributions.csv')
subject_mems = pd.read_csv('../UW_Madison/inputs/subject_memberships.csv')
subject_names = pd.read_csv('../UW_Madison/inputs/subjects.csv')


In [3]:
# Converting the assigned grade counts to assigned grade percentages

graded_students_count = (grades['a_count'] + grades['ab_count']+ grades['b_count'] + grades['bc_count']
                      + grades['c_count'] + grades['d_count'] + grades['f_count'])

percent_A = grades['a_count']/graded_students_count
percent_AB = grades['ab_count']/graded_students_count
percent_B = grades['b_count']/graded_students_count
percent_BC = grades['bc_count']/graded_students_count
percent_C = grades['c_count']/graded_students_count
percent_D = grades['d_count']/graded_students_count
percent_F = grades['f_count']/graded_students_count


In [4]:
# Creating a dataframe of grade percentages for each course section offered

uw_section_grades = pd.DataFrame()

uw_section_grades['course_offering_uuid'] = grades['course_offering_uuid']
uw_section_grades['percent_a'] = percent_A
uw_section_grades['percent_ab'] = percent_AB
uw_section_grades['percent_b'] = percent_B
uw_section_grades['percent_bc'] = percent_BC
uw_section_grades['percent_c'] = percent_C
uw_section_grades['percent_d'] = percent_D
uw_section_grades['percent_f'] = percent_F


In [5]:
uw_section_grades.set_index('course_offering_uuid', drop = True, inplace = True)
uw_section_grades.head()

Unnamed: 0_level_0,percent_a,percent_ab,percent_b,percent_bc,percent_c,percent_d,percent_f
course_offering_uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
344b3ebe-da7e-314c-83ed-9425269695fd,1.0,0.0,0.0,0.0,0.0,0.0,0.0
f718e6cd-33f0-3c14-a9a6-834d9c3610a8,1.0,0.0,0.0,0.0,0.0,0.0,0.0
ea3b717c-d66b-30dc-8b37-964d9688295f,0.891026,0.076923,0.012821,0.0,0.019231,0.0,0.0
075da420-5f49-3dd0-93df-13e3c152e1b1,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2b4e216d-a728-3713-8c7c-19afffc6b2fd,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [6]:
# And eliminating any course sections for which no student received a letter grade
# This ends up being roughly half of the total sections

uw_section_grades.dropna(axis = 0, how = 'any', inplace = True)

#### Associating subjects to each class

The University of Wisconsin has many classes that are cross-listed, that is, they are taught across multiple departments. Thus, my next step was to construct a dictionary with an entry for each course section which contained a list of all of the departments to which it belongs.  These lists can range from a single department up to ten, or even more. 

I then converted the resulting dictionary into a dataframe and merged it with my grades data in order to assure that the order of my sets of observations was identical.

Finally, I split the data into two numpy arrays, the first ```grades``` containing all of the percentages of grades for each course section, and the second ```subjects``` containing all of the assigned subjects for each course section.

In [7]:
subjects_dict = {}

for i in range(len(subject_mems)):
    code = subject_mems.loc[i, 'course_offering_uuid']
    if code not in subjects_dict.keys():
        subjects_dict[code] = [subject_mems.loc[i, 'subject_code']]
    else:
        subjects_dict[code].append(subject_mems.loc[i, 'subject_code'])
        
max_subjects = 1
for key in subjects_dict.keys():
    if len(subjects_dict[key]) > max_subjects:
        max_subjects = len(subjects_dict[key])
        
cols = []
for j in range(max_subjects):
    cols.append('Suj' + str(j))
        
consolidated_subject_mems = pd.DataFrame.from_dict(subjects_dict, orient = 'index', columns = cols)

cols = ['course_offering_uuid']
cols.extend(consolidated_subject_mems.columns.tolist())

consolidated_subject_mems.reset_index(inplace = True, drop = False)
consolidated_subject_mems.columns = cols

consolidated_subject_mems.set_index('course_offering_uuid', drop = True, inplace = True)

In [8]:
consolidated_subject_mems.head()

Unnamed: 0_level_0,Suj0,Suj1,Suj2,Suj3,Suj4,Suj5,Suj6,Suj7,Suj8,Suj9,Suj10,Suj11
course_offering_uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
344b3ebe-da7e-314c-83ed-9425269695fd,220,320.0,346.0,612.0,636.0,207.0,490.0,240.0,,,,
f718e6cd-33f0-3c14-a9a6-834d9c3610a8,220,320.0,346.0,612.0,636.0,207.0,490.0,418.0,240.0,,,
ea3b717c-d66b-30dc-8b37-964d9688295f,220,320.0,684.0,346.0,612.0,207.0,636.0,490.0,418.0,240.0,,
075da420-5f49-3dd0-93df-13e3c152e1b1,220,320.0,346.0,612.0,207.0,636.0,490.0,240.0,,,,
2b4e216d-a728-3713-8c7c-19afffc6b2fd,220,320.0,684.0,346.0,612.0,207.0,636.0,490.0,240.0,,,


In [9]:
uw_section_grades = pd.merge(uw_section_grades, consolidated_subject_mems,
                           how='left', left_index = True,right_index =True, sort=False,
                           copy=False)



In [10]:
len(uw_section_grades)

87209

In [11]:
uw_section_grades.head(10)

Unnamed: 0_level_0,percent_a,percent_ab,percent_b,percent_bc,percent_c,percent_d,percent_f,Suj0,Suj1,Suj2,Suj3,Suj4,Suj5,Suj6,Suj7,Suj8,Suj9,Suj10,Suj11
course_offering_uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
000085b6-0eb9-386e-881e-60cc62be5b62,1.0,0.0,0.0,0.0,0.0,0.0,0.0,736,,,,,,,,,,,
00015734-b612-3152-bf5f-7f6855e1c0c0,0.294118,0.223529,0.164706,0.105882,0.188235,0.023529,0.0,224,,,,,,,,,,,
0002389b-0bda-3f47-b5e7-e9d8973cb2e9,0.636364,0.363636,0.0,0.0,0.0,0.0,0.0,544,,,,,,,,,,,
00028b06-6e42-3a3e-b484-69fd61baf978,0.444444,0.148148,0.333333,0.037037,0.0,0.037037,0.0,224,,,,,,,,,,,
00049821-be2e-3e8e-b697-848621267154,1.0,0.0,0.0,0.0,0.0,0.0,0.0,938,,,,,,,,,,,
0005d259-9986-3184-b638-e0f23ed55040,1.0,0.0,0.0,0.0,0.0,0.0,0.0,938,,,,,,,,,,,
0006b3f8-3403-35d5-bda2-9489e3c84434,0.730769,0.115385,0.076923,0.038462,0.038462,0.0,0.0,896,,,,,,,,,,,
000967ae-98da-36b7-888a-cfcf87c571e5,0.25,0.458333,0.208333,0.083333,0.0,0.0,0.0,864,900.0,271.0,,,,,,,,,
000967ae-98da-36b7-888a-cfcf87c571e5,0.391304,0.26087,0.130435,0.173913,0.043478,0.0,0.0,864,900.0,271.0,,,,,,,,,
00098cbe-d97f-3b05-b976-e67a6efdecbf,0.333333,0.333333,0.25,0.083333,0.0,0.0,0.0,418,240.0,,,,,,,,,,


In [12]:
grades = np.array(uw_section_grades[['percent_a', 'percent_ab', 'percent_b', 'percent_bc',
                            'percent_c', 'percent_d', 'percent_f']])

subjects = np.array(uw_section_grades[['Suj0', 'Suj1', 'Suj2', 'Suj3', 'Suj4', 'Suj5', 
                                       'Suj6', 'Suj7', 'Suj8', 'Suj9', 'Suj10', 'Suj11']])

In [13]:
grades[:5]

array([[1.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.29411765, 0.22352941, 0.16470588, 0.10588235, 0.18823529,
        0.02352941, 0.        ],
       [0.63636364, 0.36363636, 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.44444444, 0.14814815, 0.33333333, 0.03703704, 0.        ,
        0.03703704, 0.        ],
       [1.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ]])

## Building the k means clustering model

In order to construct a model for k means, I built a series of functions.
- ```euclid_dist``` computes the Euclidean distance between two points in $\mathbb{R}^n$,
- ```assign_center``` takes an observation and determines which of our current cluster centers it is closest to, then assigns it to that cluster,
- ```compute_centers``` takes an array of observations along with their assigned clusters and computes the center of each current cluster,
- ```k_means_centers``` randomly selects k points from an array of observations as initial cluster centers, then iterates ```assign_center``` and ```compute_centers``` until the cluster centers converge,
- ```internal_dist``` computes the sum of the total pairwise internal distance among observations in each cluster, and
- ```best_kmeans``` calls ```k_means_centers``` repeatedly and returns the list of cluster centers that was obtained by the best performing iteration.

In [14]:
def euclid_dist(p1, p2):
    
    """
    Computes the Euclidean distance between two points in  R^n

    Parameters
    ----------
    p1 : list or numpy array
        A point in R^n
    p2 : list or numpy array
        A second point in R^n

    Returns
    -------
    float
        The Euclidean distance between the two points

    """
    
    v1 = np.array(p1)
    v2 = np.array(p2)
    
    diff_squared = (v1 - v2) ** 2
    
    dist = np.sqrt(np.sum(diff_squared))
    
    return dist

In [15]:
def assign_center(p1, centers): # p1 is list or np array, centers is dictionary
    
    """
    Takes an observation and determines to which of our current cluster centers it is closest, 
    then assigns it to that cluster


    Parameters
    ----------
    p1 : numpy array 
        An observation
    centers : dictionary
        The centers of our current clusters 

    Returns
    -------
    int
        The key of the closest cluster center

    """
    
    current_closest = -1
    current_dist = np.inf

    for key in centers.keys():
        new_dist = euclid_dist(p1, centers[key])
        if new_dist < current_dist:
            current_closest = key
            current_dist = new_dist
    
    return current_closest

In [16]:
def compute_centers(da, assignments, k):
    
    """
    Takes an array of observations along with their assigned clusters and 
    computes the center of each current cluster


    Parameters
    ----------
    da : numpy array 
        The observations on which we are basing our model
    assignments : list
        The assigned cluster for each observation
    k : int
        The number of clusters in the current model
    
    Returns
    -------
    dictionary
        The new centers for the clusters

    """
    
    new_centers = {}
    
    for i in range(k):
        
        # retrieve the points assigned to the given group
        
        mask = [assignments[j] == i for j in range(len(assignments))]
        grouped_points = da[mask, :]
        
        if grouped_points.shape[0] > 0:
        # compute the center of that group
        
            mean = np.mean(grouped_points, axis = 0)
        
            new_centers[i] = mean
        else:
            random_index = random.sample(list(range(da.shape[0])), 1) 
            new_centers[i] = da[random_index]
        
    return new_centers

In [17]:
def kmeans_centers(da, k, epsilon = 0.001):
    
    """
    Randomly selects k points from an array of observations as initial cluster centers, 
    then iterates ```assign_center``` and ```compute_centers``` until the cluster centers converge


    Parameters
    ----------
    da : numpy array 
        The observations on which we are basing our model
    k : int
        The number of clusters we are creating
    epsilon : float
        Controls at what point the algorithm is said to have converged, default value = 0.001

    Returns
    -------
    dictionary
        The cluster centers after the algorithm has converged

    """
    
    # Randomly choose k initial centers
    
    center_indices = random.sample(list(range(da.shape[0])), k)
        
    centers = {}
    for i in range(len(center_indices)):
        centers[i] = da[center_indices[i], :]
        
    keep_going = True
      
    rounds = 0
    while keep_going == True:
        rounds += 1
        # Assign each point to the closest center
    
        assignments = [-1 for i in range(da.shape[0])]
        for i in range(da.shape[0]):
            assignments[i] = assign_center(da[i, :], centers)
    
        # Compute the center of each chunk
    
        new_centers = compute_centers(da, assignments, k)
    
        # Compute the total distance that the centers have moved
    
        distance_moved = 0
        for key in centers.keys():
            distance_moved += euclid_dist(centers[key], new_centers[key])
            
        if distance_moved > epsilon:
            centers = new_centers
        else:
            keep_going = False
    
    return centers

In [18]:
def internal_dist(da, centers):
    
    """
    Computes the sum of the total pairwise internal distance among observations in each cluster


    Parameters
    ----------
    da : numpy array 
        The observations on which we are basing our model
    centers : dictionary
        The centers of our current clusters

    Returns
    -------
    float
        The total internal distance for all of the current clusters

    """
    
    total_distance = 0
    assignments = [-1 for i in range(da.shape[0])]
    for i in range(da.shape[0]):
        assignments[i] = assign_center(da[i], centers)
        
    for key in centers.keys():
        mask = [assignments[k] == key for k in range(len(assignments))]
        grouped_points = da[mask]
        
        group_distance = 0
        for j in range(grouped_points.shape[0]):
            for k in range(grouped_points.shape[0]):
                group_distance += euclid_dist(grouped_points[j], grouped_points[k])
        
        total_distance += group_distance

    return total_distance

In [19]:
def best_kmeans(da, k, n = 3, epsilon = 0.001):
    
    """
    Calls ```k_means_centers``` repeatedly and returns the list of 
    cluster centers that was obtained by the best performing iteration


    Parameters
    ----------
    da : numpy array 
        The observations on which we are basing our model
    k : int
        The number of clusters we are creating
    n : int
        The number of models to build and compare
    epsilon : float
        Controls at what point the algorithm is said to have converged, default value = 0.001

    Returns
    -------
    float
        The reduction (if positive) or increase (if negative) in the impurity with the specified split

    """
    
    best_dist = np.inf
    best_centers = {}
    
    for i in range(n):
        centers = kmeans_centers(da, k, epsilon)
        dist = internal_dist(da, centers)
        if dist < best_dist:
            best_dist = dist
            best_centers = centers
            
    return best_centers

## Fitting a model to our grades data

Due to the size of our data set (nearly 90 000 observations), we fit our model on a random sample of size 5000.

In [20]:
sample = random.sample(list(range(grades.shape[0])), 5000)
sampled = grades[sample]

In [21]:
centers = best_kmeans(sampled, 10)

In [22]:
centers

{0: array([0.50341434, 0.15205297, 0.23667483, 0.04300628, 0.0454299 ,
        0.01009001, 0.00933166]),
 1: array([0.16221663, 0.33506411, 0.36633835, 0.08644442, 0.03577865,
        0.00825849, 0.00589934]),
 2: array([0.27108101, 0.54121236, 0.13527558, 0.02874053, 0.01543891,
        0.00467976, 0.00357184]),
 3: array([0.20849883, 0.13624864, 0.27956597, 0.13708489, 0.16898335,
        0.04424513, 0.02537319]),
 4: array([0.51166991, 0.3784247 , 0.07353292, 0.01974098, 0.00931554,
        0.00360621, 0.00370973]),
 5: array([0.2392817 , 0.07582715, 0.5777588 , 0.03531778, 0.057615  ,
        0.00882902, 0.00537056]),
 6: array([0.84934366, 0.09138226, 0.03879665, 0.00762495, 0.00713333,
        0.00212931, 0.00358983]),
 7: array([0.33289192, 0.29833688, 0.22901418, 0.07862603, 0.03859912,
        0.01253314, 0.00999873]),
 8: array([0.69334177, 0.2195652 , 0.05784988, 0.01266468, 0.00874554,
        0.00341936, 0.00441355]),
 9: array([9.90431470e-01, 4.32021127e-03, 3.29813258e-

## Assigning subjects to clusters

In order to assign subjects to clusters, I constructed the following functions:
- ```subjects_by_asst``` produces a dictionary of frequencies for each subject code associated to a class found in a specified cluster, 
- ```subject_clusters``` assigns each subject code to the cluster for which it has the highest frequency
- ```subject_name_clusters``` creates a dictionary of subject names associated to each cluster


In [23]:
def subjects_by_asst(assignments, asst, da):
    
    """
    Produces a dictionary of frequencies for each subject code associated to a class
    found in a specified cluster


    Parameters
    ----------
    assignments : list 
        The cluster to which each observation is assigned
    asst : int
        The number of the cluster under consideration
    da : numpy array
        The subject assignments for each course section

    Returns
    -------
    dict
        The number of occurances of each subject code for course sections assigned
        to the given cluster

    """
    
    assignment_mask = [assignments[i] == asst for i in range(len(assignments))]
    subjects = da[assignment_mask].tolist()
    
    subject_list = [item for sublist in subjects for item in sublist if str(item) != 'nan']
    
    my_subject_dict = defaultdict(int)
    for item in subject_list:
        my_subject_dict[item] += 1
    
    return my_subject_dict

In [24]:
def subject_clusters(centers, da, subjects):
    
    """
    Assigns each subject code to the cluster for which it has the highest frequency


    Parameters
    ----------
    centers : dictionary 
        The centers of our current clusters
    da : numpy array
        The observations on which we are basing our model
    subjects: numpy array
        The subject codes assigned to each course section

    Returns
    -------
    pandas dataframe
        The cluster most associated to each subject code

    """
    
    # First assign each observation in the dataset to a center
    
    assignments = [-1 for i in range(grades.shape[0])]
    for i in range(da.shape[0]):
        assignments[i] = assign_center(da[i, :], centers)
        
    # Second, create a dictionary of dictionaries of subject counts assigned to each cluster
    
    subject_cluster_dicts = {}
    
    for i in range(len(centers)):
        subject_cluster_dicts[i] = subjects_by_asst(assignments, i, subjects)
        
    # Third, create a list of keys across all dictionaries in subject_cluster_dicts
    
    subject_keys = []
    
    for i in subject_cluster_dicts.keys():
        subject_keys.extend(subject_cluster_dicts[i].keys())
        
    # And now assign each key to a cluster based on which cluster it occurs in most frequently
    
    subject_cluster_asst = pd.DataFrame()

    for key in subject_keys:
        cluster_count = [subject_cluster_dicts[i][key] for i in subject_cluster_dicts.keys()]
        subject_cluster_asst.loc[key, 'assignment'] = np.argmax(cluster_count)
        
    subject_cluster_asst.reset_index(inplace = True)
    
    return subject_cluster_asst

In [25]:
subject_assignments = subject_clusters(centers, grades, subjects)

In [26]:
subject_names.drop(
    [subject_names.loc[(subject_names['code'] == 'ZZZ')].index.tolist()[0], 
     subject_names.loc[(subject_names['code'] == 'SAB')].index.tolist()[0]],
inplace = True)

In [27]:
subject_names = subject_names.astype({'code': 'int32'})

In [28]:
subject_names.dtypes

code             int32
name            object
abbreviation    object
dtype: object

In [29]:
subject_names.set_index('code', drop = True, inplace = True)

In [30]:
def subject_name_clusters(subject_assignments, subject_names):
    
    """
    Creates a dictionary of subject names associated to each cluster


    Parameters
    ----------
    subject_assignments : pandas dataframe 
        Associates a cluster to each subject code
    subject_names : pandas dataframe
        Associates a subject name to each subject code

    Returns
    -------
    dictionary
        The subject names associated with each cluster

    """
    
    subject_clusters = {}
    
    possible_assignments = list(set(subject_assignments['assignment']))
    
    for asst in possible_assignments:
        subjects_assigned = list(subject_assignments.loc[subject_assignments['assignment'] == asst]['index'])
        cluster_subjects = list(subject_names.loc[subjects_assigned, 'name'])
        subject_clusters[asst] = cluster_subjects
        
    return subject_clusters

In [31]:
subject_clusters = subject_name_clusters(subject_assignments, subject_names)

Finally, I want to examine the subject clusters that I've produced and the cluster centers that go with them.

In [38]:
print(centers[0])
subject_clusters[0]

[0.50341434 0.15205297 0.23667483 0.04300628 0.0454299  0.01009001
 0.00933166]


['Molecular and Environmental Toxicology Center',
 'Pathology and Laboratory Medicine',
 'East Asian Area Studies',
 'Portuguese (Spanish and Portuguese)',
 'Latin (Classics)',
 'ANIMAL HEALTH AND BIOMEDICAL SCIENCES',
 'Agronomy',
 'German',
 'Comparative Biosciences',
 'Greek (Classics)',
 'TRANSPORTATION AND PUBLIC UTILITIES']

In [49]:
for key in subject_clusters.keys():
    print('For cluster {} we have:'.format(np.rint(key)))
    print('        The cluster center is ', centers[key])
    print('        The cluster subjects are', list(np.sort(subject_clusters[key])))
    print('===========')

For cluster 0.0 we have:
        The cluster center is  [0.50341434 0.15205297 0.23667483 0.04300628 0.0454299  0.01009001
 0.00933166]
        The cluster subjects are ['ANIMAL HEALTH AND BIOMEDICAL SCIENCES', 'Agronomy', 'Comparative Biosciences', 'East Asian Area Studies', 'German', 'Greek (Classics)', 'Latin (Classics)', 'Molecular and Environmental Toxicology Center', 'Pathology and Laboratory Medicine', 'Portuguese (Spanish and Portuguese)', 'TRANSPORTATION AND PUBLIC UTILITIES']
For cluster 1.0 we have:
        The cluster center is  [0.16221663 0.33506411 0.36633835 0.08644442 0.03577865 0.00825849
 0.00589934]
        The cluster subjects are ['CLINICAL LABORATORY SCIENCE', 'Communication Arts', 'Legal Studies', 'Marketing', 'Senior Medical Program']
For cluster 2.0 we have:
        The cluster center is  [0.27108101 0.54121236 0.13527558 0.02874053 0.01543891 0.00467976
 0.00357184]
        The cluster subjects are ['Engineering Professional Development', 'Jewish Studies', 'P

# Results

Using the k means model that I produced, I found that there does indeed seem to be a relationship between grading severity and subject (or academic disciple) at the University of Wisconsin. The following table shows the subjects assigned to each cluster along with the grade values for the cluster center. The table is sorted from fewest As to most As as a fraction of total grades. Note that while some cluster centers had fewer than 25% assigned A grades, several had more than 50% assigned A grades, and one had a whopping 99% assigned A grades. Note also that only one cluster assigned more than 5% D or F grades, the second cluster, associated primarily with Engineering and the Sciences. By contrast, there are four clusters where fewer than 1% of enrolled students were assigned D or F grades. 


|A|AB|B|BC|C|D|F|Subjects|
|-|--|-|--|-|-|-|:--------|
|0.162| 0.335| 0.366| 0.086| 0.0358| 0.008| 0.006|<ul><li>Clinical Laboratory Science</li><li> Communication Arts</li><li> Legal Studies</li><li> Marketing</li><li> Senior Medical Program</li></ul>|
|0.208| 0.136| 0.280| 0.137| 0.169| 0.044| 0.025|<ul><li> Accounting and Information Systems</li><li> Agricultural and Applied Economics</li><li> Anthropology</li><li> Astronomy</li><li> Atmospheric and Oceanic Sciences</li><li> Biological Systems Engineering</li><li> Biology</li><li> Botany</li><li> Chemical and Biological Engineering</li><li> Chemistry</li><li> Computer Sciences</li><li> Economics</li><li> Electrical and Computer Engineering</li><li> Engineering Mechanics and Astronautics</li><li> Food Science</li><li> Geological Engineering</li><li> Geoscience</li><li> Mathematics</li><li> Mechanical Engineering</li><li> Nutritional Sciences</li><li> Physics</li><li> Statistics</li><li> Wildlife Ecology</li><li> Zoology</li></ul>|
|0.239| 0.076| 0.578|  0.035|0.058|   0.009| 0.005 |<ul><li>Biology Core Curriculum</li><li> Law</li></ul>|
|0.271| 0.541| 0.135| 0.029| 0.015| 0.005| 0.004|<ul><li> Engineering Professional Development</li><li> Jewish Studies</li><li> Pediatrics</li><li> Physician Assistant Program</li><li> Psychiatry</li></ul>|
|0.332| 0.298| 0.229| 0.079| 0.039| 0.013| 0.010|<ul><li> Actuarial Science</li><li> Afro-American Studies</li><li> Art History</li><li> Chicana/o and Latina/o Studies</li><li> Classics</li><li> Community and Environmental Sociology</li><li> English as a Second Language</li><li> Farm and Industry Short Course</li><li> Finance</li><li> Investment and Banking</li><li> Forest and Wildlife Ecology</li><li> Geography</li><li> History</li><li> History of Science</li><li> Horticulture</li><li> International Studies</li><li> Italian (French and Italian)</li><li> Landscape Architecture</li><li> Languages and Cultures of Asia</li><li> Linguistics</li><li> Literature in Translation</li><li> Medical History and Bioethics</li><li> Medical Sciences - Medical School</li><li> Medieval Studies</li><li> Nuclear Engineering</li><li> Operations and Technology Management</li><li> Philosophy</li><li> Plant Pathology</li><li> Political Science</li><li> Real Estate and Urban Land Economics</li><li> Religious Studies</li><li> Risk Management and Insurance</li><li> Sociology</li><li> Soil Science</li><li> Spanish (Spanish and Portuguese)</li></ul>|
|0.503| 0.152| 0.237| 0.043| 0.045| 0.010| 0.009| <ul><li>Animal Health and Biomedical Sciences</li><li> Agronomy</li><li> Comparative Biosciences</li><li> East Asian Area Studies</li><li> German</li><li> Greek (Classics)</li><li> Latin (Classics)</li><li>Molecular and Environmental Toxicology Center</li><li> Pathology and Laboratory Medicine</li><li> Portuguese (Spanish and Portuguese)</li><li> Transportation and Public Utilities</li></ul>|
|0.512| 0.378| 0.074|0.020| 0.009| 0.004| 0.004 |<ul><li>Asian American Studies</li><li> Biomedical Engineering</li><li> Comparative Literature</li><li> English</li><li> Emergency Medicine</li><li> French (French and Italian)</li><li> Gender and Women’s Studies</li><li> Human Oncology</li><li> Interdisciplinary Courses (Engineering)</li><li> Journalism and Mass Communication</li><li> La Follette School of Public Affairs</li><li> Latin American, Caribbean, and Iberian Studies</li><li> Management and Human Resources</li><li> Medical Physics</li><li> Scandinavian Studies</li></ul>|
|0.693|0.220|  0.058| 0.013| 0.009| 0.003| 0.004|<ul><li> African Languages and Literature</li><li> American Indian Studies</li><li> Civil Society and Community Studies</li><li> Collaborative Nursing Program</li><li> Design Studies</li><li> East Asian Languages and Literature</li><li> Family and Consumer Communications</li><li> Folklore Program</li><li> Hebrew</li><li> Human Development and Family Studies</li><li> Industrial Relations</li><li> Information Systems</li><li> Integrated Arts</li><li> Life Sciences Communication</li><li> Physical Educ Activity Progm</li><li> Pharmacy</li><li> Radiology</li><li> Slavic (Slavic Languages)</li></ul>|
|0.849|0.091| 0.039| 0.008| 0.007| 0.002| 0.004|<ul><li> Air Force Aerospace Studies</li><li> Consumer Science</li><li> Dance</li><li> Engineering Physics</li><li> General Business</li><li> Interdisciplinary Courses (CALS)</li><li> Interdisciplinary Courses (LandS)</li><li> Interdisciplinary Courses (SOHE)</li><li> Languages and Cultures of Asia - Languages</li><li> Professional Orientation</li></ul>|
|.990| .004| .003| .001| .001| 0 |.001|<ul><li> Agroecology</li><li> Anatomy</li><li> Anesthesiology</li><li> Animal Sciences</li><li> Art Department</li><li> Art Education (Department of Art)</li><li> Biochemistry</li><li> Biomolecular Chemistry</li><li> Biostatistics and Medical Informatics</li><li> Cell and Regenerative Biology</li><li> Civil and Environmental Engineering</li><li> Communication Sciences and Disorders</li><li> Counseling Psychology</li><li> Curriculum and Instruction</li><li> Dairy Science</li><li> Educational Leadership and Policy Analysis</li><li> Educational Policy Studies</li><li> Educational Psychology</li><li> English</li><li> Entomology</li><li> Environmental Studies - Gaylord Nelson Institute</li><li> Family Medicine</li><li> Genetics</li><li> Hebrew-Modern</li><li> Industrial and Systems Engineering</li><li> Integrated Liberal Studies</li><li> Integrated Science</li><li> International Business</li><li> Kinesiology</li><li> Library and Information Studies</li><li> Materials Science and Engineering</li><li> Medical Genetics</li><li> Medical Microbiology and Immunology</li><li> Medical Sciences - Veterinary Medicine</li><li> Medicine</li><li> Microbiology</li><li> Military Science</li><li> Molecular Biology</li><li> Music</li><li> Music-Performance</li><li> Naval Science</li><li> Neurological Surgery</li><li> Neurology</li><li> Neuroscience</li><li> Neuroscience Training Program</li><li> Nursing</li><li> Obstetrics and Gynecology</li><li> Occupational Therapy (Department of Kinesiology)</li><li> Oncology</li><li> Ophthalmology and Visual Sciences</li><li> Patho-Biological Sciences</li><li> Pharmaceutical Sciences</li><li> Pharmacology</li><li> Pharmacy Practice</li><li> Physical Therapy</li><li> Physiology</li><li> Population Health Sciences</li><li> Psychology</li><li> Rehabilitation Psychology and Special Education</li><li> Science and Technology Studies</li><li> Social Work</li><li> Social and Administrative Pharmacy</li><li> Surgery</li><li> Surgical Sciences</li><li> Theatre and Drama</li><li> Therapeutic Science (Department of Kinesiology)</li><li> Urban and Regional Planning</li></ul>|