<a href="https://colab.research.google.com/github/collinheyden44/myWork/blob/master/Project2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Option 2

Collin Heyden

12-13-*2023*



# Part 1: Colley Method

## Colley Method

Colley's method, as explained in section 2 of the paper, is a sophisticated system for ranking sports teams that goes beyond the simple calculation of winning percentages. It was developed by astrophysicist Wesley Colley. The method incorporates the Laplace's rule of succession as a foundational concept, adjusting the standard winning percentage formula.

The core idea of Colley's method is to adjust the simple win-loss record of a team by considering the strength of its opponents. This is accomplished by solving a system of linear equations that interrelate the ratings of all teams. Essentially, each team's rating depends not only on its own performance but also on the performance of the teams they have played against.

Colley's method starts by transforming the standard winning percentage into a modified formula, given by:

$$r_i = \frac{1 + w_i}{2 + t_i}$$


where $r$ is the rating of team $i$, $w_i$ is the number of wins, and $t_i$ is the total number of games played by team $i$. This formula includes an initial assumption that each team has a 50% chance of winning each game at the beginning of the season.

The interdependence of team ratings is a key feature of Colley's method. The formula for calculating a team's rating takes into account the ratings of the opponents it has faced. This system of equations is represented in matrix form (referred to as the Colley matrix), where the diagonal elements reflect the total number of games played by a team plus a constant, and the off-diagonal elements represent the games played between different teams. The system is then solved to find the ratings for each team.

## Laplace's Rule of Succession

Laplace's Rule of Succession is a principle in probability theory that provides a way to estimate the probability of a future event based on past occurrences. It's particularly useful in situations where you have observed outcomes but are uncertain about the probability of these outcomes repeating in the future. This rule is central to understanding how it applies to sports ranking, like in Colley's method.

The rule is based on the assumption that the probability of an event (like a sports team winning a match) is not known but is estimated based on past occurrences. The formula for the Rule of Succession is expressed as $\frac{k+2}{n+2}$, where $k$ is the number of observed successes (wins, in the context of sports), and $n$ is the total number of trials (or games played).

In a sports ranking context, this principle is used to adjust a team's win-loss record by considering the likelihood of future wins based on past performance. It starts with a prior belief that every outcome is equally likely, and then updates this belief based on observed outcomes (wins and losses). The strength of this approach lies in its ability to handle uncertainty and provide a more nuanced ranking that reflects not just the number of wins but the probability of winning future games based on past performance.

The application of Laplace's Rule of Succession in sports ranking, like in Colley's method, takes into account the strength of the opponents and the team's past performance to predict future outcomes. This approach is more sophisticated than simply counting wins and losses, as it provides a probabilistic assessment of a team's strength, factoring in both their performance and the performance of their opponents.

This understanding of Laplace's Rule of Succession and its application in sports ranking is derived from a combination of mathematical details and theoretical explanations found on Wikipedia and Jonathan Weisberg's discussion on the subject.

Link 1: https://en.wikipedia.org/wiki/Rule_of_succession

Link 2: https://jonathanweisberg.org/post/inductive-logic-2/

In [None]:
########################## Matricies from Text

# Colley matrix
C <- matrix(c(10, -2, -2, -2, -2,
              -2, 9, -2, -2, -2,
              -2, -2, 9, -2, -1,
              -2, -2, -2, 10, -2,
              -2, -1, -1, -2, 8),
            byrow = TRUE, nrow = 5)

# Right-hand side vector
b <- c(3, 1.5, 0.5, 0, 0)

# Solve for colley ratings
r <- solve(C, b)

# Print ratings
print("Colly ratings from paper:")
print(C)
print(r)
cat("\n")


# Colley matrix
C <- matrix(c(10, -2, -2, -2, -2,
              -2, 10, -2, -2, -2,
              -2, -2, 9, -2, -1,
              -2, -2, -2, 10, -2,
              -2, -2, -1, -2, 9),
            byrow = TRUE, nrow = 5)

# Right-hand side vector
b <- c(3, 1, 0.5, 0, 0.5)

# Solve for colley ratings
r <- solve(C, b)

# Print ratings
print("Colly ratings from paper:")
print(C)
print(r)
cat("\n\n")

[1] "Colly ratings from paper:"
     [,1] [,2] [,3] [,4] [,5]
[1,]   10   -2   -2   -2   -2
[2,]   -2    9   -2   -2   -2
[3,]   -2   -2    9   -2   -1
[4,]   -2   -2   -2   10   -2
[5,]   -2   -1   -1   -2    8
[1] 0.7025 0.6300 0.5000 0.4525 0.4300

[1] "Colly ratings from paper:"
     [,1] [,2] [,3] [,4] [,5]
[1,]   10   -2   -2   -2   -2
[2,]   -2   10   -2   -2   -2
[3,]   -2   -2    9   -2   -1
[4,]   -2   -2   -2   10   -2
[5,]   -2   -2   -1   -2    9
[1] 0.6666667 0.5000000 0.4583333 0.4166667 0.4583333




In [None]:
########################## Data from Text
########################## Colley Rating

########################## Setting up data

library(readr)
library(dplyr)
library(Matrix)
library(lubridate)

teams <- read_csv("teams.csv", col_names = c("team_number", "team_name"), show_col_types = FALSE)
scores <- read_csv("scores.csv", col_names = c("days_since", "date", "team1_number", "homefield1", "team1_score", "team2_number", "homefield2", "team2_score"), show_col_types = FALSE)

teams$Wins <- 0
teams$Losses <- 0

# Calculate wins and losses
for (i in 1:nrow(scores)) {
    if (scores$team1_score[i] > scores$team2_score[i]) {
        teams$Wins[teams$team_number == scores$team1_number[i]] <- teams$Wins[teams$team_number == scores$team1_number[i]] + 1
        teams$Losses[teams$team_number == scores$team2_number[i]] <- teams$Losses[teams$team_number == scores$team2_number[i]] + 1
    } else if (scores$team1_score[i] < scores$team2_score[i]) {
        teams$Wins[teams$team_number == scores$team2_number[i]] <- teams$Wins[teams$team_number == scores$team2_number[i]] + 1
        teams$Losses[teams$team_number == scores$team1_number[i]] <- teams$Losses[teams$team_number == scores$team1_number[i]] + 1
    }
}

# Calculate Games Played and Initial Rating
teams$GamesPlayed <- teams$Wins + teams$Losses
teams$InitialRating <- (1 + teams$Wins) / (2 + teams$GamesPlayed)

# Create the data frame
data <- data.frame(Team = teams$team_name,
                   Wins = teams$Wins,
                   Losses = teams$Losses,
                   GamesPlayed = teams$GamesPlayed,
                   InitialRating = teams$InitialRating)

# Sort the data
sorted_data <- data %>%
  arrange(desc(InitialRating))

# Print the sorted data
print("Stats from teams.csv and scores.csv:")
print(sorted_data)
cat("\n\n")

########################## Colley Rating

# Initialize the Colley matrix and b vector
num_teams <- nrow(teams)
C <- Diagonal(x=rep(2, num_teams))  # Start with 2 on the diagonal
b <- rep(1, num_teams)  # Start with 1 in b-vector

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])

    C[team1, team1] <- C[team1, team1] + 1
    C[team2, team2] <- C[team2, team2] + 1
    C[team1, team2] <- C[team1, team2] - 1
    C[team2, team1] <- C[team2, team1] - 1

    if (scores$team1_score[i] > scores$team2_score[i]) {
        b[team1] <- b[team1] + 1
        b[team2] <- b[team2] - 1
    } else {
        b[team1] <- b[team1] - 1
        b[team2] <- b[team2] + 1
    }
}

# Normalize b vector
b <- 1 + 0.5 * (b - 2)

# Solve for ratings
ratings <- solve(C, b)

# Add ratings to the teams data frame
teams$ColleyRating <- ratings[teams$team_number]

# Order teams by Colley rating
sorted_teams <- teams %>%
  arrange(desc(ColleyRating))

# Print the sorted teams
print("Teams sorted by Colley Rating:")
print(sorted_teams)

[1] "Stats from teams.csv and scores.csv:"
             Team Wins Losses GamesPlayed InitialRating
1        Davidson   19      2          21     0.8695652
2     Ga_Southern   13      7          20     0.6363636
3         Wofford   12      7          19     0.6190476
4  UNC_Greensboro   11      9          20     0.5454545
5  Col_Charleston   10      9          19     0.5238095
6            Elon   10     10          20     0.5000000
7      W_Carolina   11     11          22     0.5000000
8          Furman    9     11          20     0.4545455
9         Samford    8     11          19     0.4285714
10 Appalachian_St    8     12          20     0.4090909
11    Chattanooga    5     14          19     0.2857143
12        Citadel    3     16          19     0.1904762


[1] "Teams sorted by Colley Rating:"
[90m# A tibble: 12 × 7[39m
   team_number team_name      Wins Losses GamesPlayed InitialRating ColleyRating
         [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<dbl

# Part 2: Massey method

## Massey Method

The Massey Method is a sophisticated ranking system used for sports teams, particularly in contexts like the NCAA Men's Basketball Tournament. This method is distinct from simpler approaches like winning percentage calculations, as it incorporates the margin of victory in games, offering a more nuanced understanding of a team's performance.

Fundamental Principle: The foundational concept of the Massey Method is the assumption of transitivity, albeit approximately. Transitivity in this context means if Team A beats Team B by a certain margin, and Team B beats Team C by another margin, one might expect Team A to beat Team C by a combined margin of these two games. However, sports outcomes are rarely perfectly transitive, but assuming approximate transitivity allows for a more sophisticated ranking of teams.

Calculation Method: The core of the Massey Method involves setting up a system of linear equations based on the margins of victory in games. For instance, if Team A beats Team B by 10 points, and Team B beats Team C by 5 points, these outcomes are translated into linear equations reflecting the differences in team ratings. The actual ratings for teams are computed so that they best fit all game outcomes in the least squares sense - meaning, the system attempts to minimize the total squared difference between the predicted and actual game margins.

Implementation: In practice, the number of games (equations) typically exceeds the number of teams (variables), resulting in an overdetermined system that does not have a unique solution. To resolve this, the Massey Method employs a least-squares approximation. This is done by transforming the original system into a new system, where the objective is to minimize the residual error (the difference between the observed game margins and those predicted by the ratings). This is achieved through a matrix operation where the matrix of game differences is transposed and multiplied by itself, and then solved against a vector of game margins.

Adaptability: An interesting aspect of the Massey Method is its flexibility to include game scores in the ratings, which is not taken into account in methods like the Colley Method. This inclusion of game scores allows for a more dynamic and potentially accurate representation of a team's strength as it reflects not just whether a team wins or loses, but also how decisively they do so.

Application: In practical applications like predicting the outcomes of the NCAA tournament, the Massey ratings can be quite insightful. By considering the margin of victory, the ratings can capture aspects of a team's performance that might be missed by methods focusing solely on win-loss records. This makes the Massey Method a valuable tool in sports analytics, particularly in contexts where the strength of a victory is as telling as the victory itself.

## Least Square

The method of least squares is a statistical approach used to determine the best-fitting line or curve for a set of data points. This is achieved by minimizing the sum of the squares of the offsets or residuals, which are the differences between the observed data points and the values predicted by the model. In essence, it seeks to minimize the discrepancies between the actual data and the model's predictions.

The least squares method is particularly useful in regression analysis, a statistical process used to understand the relationship between variables. This method is employed to estimate the true value of a quantity based on observations or measurements, especially when dealing with linear relationships. However, it can also be generalized for nonlinear relationships.

In the context of sports ranking, such as in the Massey Method, the least squares approach is applied to rank teams by creating a system of linear equations based on the margins of victory in games. These equations reflect the differences in team ratings. The method aims to find team ratings that best fit all game outcomes in a least squares sense, which means minimizing the total squared difference between the predicted and actual game margins.

The application of the least squares method in sports ranking involves setting up and solving a system of equations. The rankings are determined in such a way that they best represent the performance of the teams with respect to their game margins. This system is usually overdetermined (having more equations than unknowns), and the least squares method provides a way to find the best possible solution to this system by minimizing the error between the predicted and actual results.

In summary, the least squares method is crucial for deriving a sophisticated and accurate sports ranking system like the Massey Method. It allows for a nuanced analysis of team performances by considering not just the win-loss records, but also the margins of victory, providing a more comprehensive view of a team's strength.

Link: https://www.britannica.com/topic/least-squares-approximation

Link: https://byjus.com/maths/least-square-method/

Link: https://www.cuemath.com/data/least-squares/

In [None]:
######################## Linear System from Text

# Massey Matrix
A <- matrix(c(2, -1, -1, -1, 2, -1, 1, 1, 1), nrow = 3, byrow = TRUE)

# Right-hand side vector
b <- c(9, -5, 0)

# Solve for ratings
x <- solve(A, b)

# Print the Massey ratings
print(A)
print("Massey Ratings:")
print(x)


     [,1] [,2] [,3]
[1,]    2   -1   -1
[2,]   -1    2   -1
[3,]    1    1    1
[1] "Massey Ratings:"
[1]  3.000000 -1.666667 -1.333333


In [None]:
########################## Data from Text
########################## Massey Rating

# Initialize the Massey matrix M and p vector
M <- matrix(0, nrow = num_teams, ncol = num_teams)
p <- rep(0, num_teams)

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])
    point_diff <- scores$team1_score[i] - scores$team2_score[i]

    M[team1, team1] <- M[team1, team1] + 1
    M[team2, team2] <- M[team2, team2] + 1
    M[team1, team2] <- M[team1, team2] - 1
    M[team2, team1] <- M[team2, team1] - 1
    p[team1] <- p[team1] + point_diff
    p[team2] <- p[team2] - point_diff
}

# Solve for ratings (add a regularization term to handle singular matrix)
lambda <- 0.001
I <- diag(num_teams)
M_reg <- M + lambda * I
massey_ratings <- solve(M_reg, p)

# Add Massey ratings to the teams data frame
teams$MasseyRating <- massey_ratings[teams$team_number]

# Order teams by Massey rating
sorted_teams_massey <- teams %>%
  arrange(desc(MasseyRating))

# Print the sorted teams with Massey rating
print("Teams sorted by Massey Rating:")
print(sorted_teams_massey)

[1] "Teams sorted by Massey Rating:"
[90m# A tibble: 12 × 8[39m
   team_number team_name      Wins Losses GamesPlayed InitialRating ColleyRating
         [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m         [3m[90m<dbl>[39m[23m        [3m[90m<dbl>[39m[23m
[90m 1[39m           5 Davidson         19      2          21         0.870       0.592 
[90m 2[39m          12 Wofford          12      7          19         0.619       0.369 
[90m 3[39m           4 Col_Charlest…    10      9          19         0.524       0.282 
[90m 4[39m           8 Ga_Southern      13      7          20         0.636       0.371 
[90m 5[39m           7 Furman            9     11          20         0.455       0.230 
[90m 6[39m          11 W_Carolina       11     11          22         0.5         0.249 
[90m 7[39m           1 Appalachian_…     8     12          20         0.409       0.159 
[

# Part 3: Weighting methods

Linear weighting involves assigning different weights to games based on when they occurred in the season. This is done to give more importance to recent games, under the assumption that a team's recent performance is more indicative of its current strength.

For the Colley method, linear weighting alters the system of equations used to calculate team ratings. The weighting function modifies the number of wins and losses factored into the rating calculation, reflecting the time when games were played. This approach maintains the fundamental properties of the Colley method while adding temporal sensitivity to the ratings.

In the Massey method, linear weighting is integrated using a weighted least squares method. This approach adjusts the emphasis on each game in the system of linear equations that determines team ratings. Games with higher weights, typically more recent games, have a greater influence on the final team ratings. This modification allows the Massey method to reflect more recent performances more strongly in the team rankings.

Overall, linear weighting in both methods is a means to adapt the traditional ranking systems to account for the timing of games, potentially offering a more accurate reflection of a team's current form.

In [None]:
########################## Data from Text
########################## Colley and Massey Methods with Linear Weighting

########################## Linear Weighting Implementation

# Define the linear weighting function
earliest_game_date <- min(scores$date)
latest_game_date <- max(scores$date)
total_days_span <- as.numeric(latest_game_date - earliest_game_date)
alpha <- 0.5

get_weight <- function(game_date) {
    days_since_earliest_game <- as.numeric(game_date - earliest_game_date)
    return(1 - alpha * days_since_earliest_game / total_days_span)
}

########################## Colley Rating with Linear Weighting

# Initialize the Colley matrix and b vector
num_teams <- nrow(teams)
C <- Diagonal(x=rep(2, num_teams))  # Start with 2 on the diagonal
b <- rep(1, num_teams)  # Start with 1 in b-vector

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    weight <- get_weight(scores$date[i])
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])

    C[team1, team1] <- C[team1, team1] + weight
    C[team2, team2] <- C[team2, team2] + weight
    C[team1, team2] <- C[team1, team2] - weight
    C[team2, team1] <- C[team2, team1] - weight

    if (scores$team1_score[i] > scores$team2_score[i]) {
        b[team1] <- b[team1] + weight
        b[team2] <- b[team2] - weight
    } else {
        b[team1] <- b[team1] - weight
        b[team2] <- b[team2] + weight
    }
}

# Normalize b vector
b <- 1 + 0.5 * (b - 2)

# Solve for ratings
ratings <- solve(C, b)

# Add ratings to the teams data frame
teams$ColleyRating <- ratings[teams$team_number]

# Order teams by Colley rating
sorted_teams <- teams %>%
  arrange(desc(ColleyRating))

# Print the sorted teams
print("Teams sorted by Colley Rating with Linear Weighting:")
print(sorted_teams)
cat("\n")

########################## Massey Rating with Linear Weighting

# Initialize the Massey matrix M and p vector
M <- matrix(0, nrow = num_teams, ncol = num_teams)
p <- rep(0, num_teams)

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    weight <- get_weight(scores$date[i])
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])
    point_diff <- scores$team1_score[i] - scores$team2_score[i]

    M[team1, team1] <- M[team1, team1] + weight
    M[team2, team2] <- M[team2, team2] + weight
    M[team1, team2] <- M[team1, team2] - weight
    M[team2, team1] <- M[team2, team1] - weight
    p[team1] <- p[team1] + weight * point_diff
    p[team2] <- p[team2] - weight * point_diff
}

# Solve for ratings (add a regularization term to handle singular matrix)
lambda <- 0.001
I <- diag(num_teams)
M_reg <- M + lambda * I
massey_ratings <- solve(M_reg, p)

# Add Massey ratings to the teams data frame
teams$MasseyRating <- massey_ratings[teams$team_number]

# Order teams by Massey rating
sorted_teams_massey <- teams %>%
  arrange(desc(MasseyRating))

# Print the sorted teams with Massey rating
print("Teams sorted by Massey Rating with Linear Weighting:")
print(sorted_teams_massey, n = Inf, width = Inf)

[1] "Teams sorted by Colley Rating with Linear Weighting:"
[90m# A tibble: 12 × 8[39m
   team_number team_name      Wins Losses GamesPlayed InitialRating ColleyRating
         [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m         [3m[90m<dbl>[39m[23m        [3m[90m<dbl>[39m[23m
[90m 1[39m           5 Davidson         19      2          21         0.870      0.582  
[90m 2[39m           8 Ga_Southern      13      7          20         0.636      0.380  
[90m 3[39m          12 Wofford          12      7          19         0.619      0.357  
[90m 4[39m           4 Col_Charlest…    10      9          19         0.524      0.297  
[90m 5[39m           6 Elon             10     10          20         0.5        0.279  
[90m 6[39m          11 W_Carolina       11     11          22         0.5        0.274  
[90m 7[39m          10 UNC_Greensbo…    11      9          20        

# Part 4: Application to real data

I have retrieved the teams and scores for the current 2023 Mens College Football League. I have applied both Colley and Massey methods to the data below.

---



In [None]:
########################## Current Mens College Football as of 2023-12-09
########################## Setting up data

library(readr)
library(dplyr)
library(Matrix)
library(lubridate)

teams <- read_csv("collegeteams.csv", col_names = c("team_number", "team_name"), show_col_types = FALSE)
scores <- read_csv("collegescores.csv", col_names = c("days_since", "date", "team1_number", "homefield1", "team1_score", "team2_number", "homefield2", "team2_score"), show_col_types = FALSE)

teams$Wins <- 0
teams$Losses <- 0

# Calculate wins and losses
for (i in 1:nrow(scores)) {
    if (scores$team1_score[i] > scores$team2_score[i]) {
        teams$Wins[teams$team_number == scores$team1_number[i]] <- teams$Wins[teams$team_number == scores$team1_number[i]] + 1
        teams$Losses[teams$team_number == scores$team2_number[i]] <- teams$Losses[teams$team_number == scores$team2_number[i]] + 1
    } else if (scores$team1_score[i] < scores$team2_score[i]) {
        teams$Wins[teams$team_number == scores$team2_number[i]] <- teams$Wins[teams$team_number == scores$team2_number[i]] + 1
        teams$Losses[teams$team_number == scores$team1_number[i]] <- teams$Losses[teams$team_number == scores$team1_number[i]] + 1
    }
}

# Calculate Games Played and Initial Rating
teams$GamesPlayed <- teams$Wins + teams$Losses
teams$InitialRating <- (1 + teams$Wins) / (2 + teams$GamesPlayed)

# Create the data frame
data <- data.frame(Team = teams$team_name,
                   Wins = teams$Wins,
                   Losses = teams$Losses,
                   GamesPlayed = teams$GamesPlayed,
                   InitialRating = teams$InitialRating)

# Sort the data
sorted_data <- data %>%
  arrange(desc(InitialRating))

# Print the sorted data
print("Stats from collegeteams.csv and collegescores.csv:")
head(sorted_data, 20)
cat("\n\n")

########################## Colley Rating

# Initialize the Colley matrix and b vector
num_teams <- nrow(teams)
C <- Diagonal(x=rep(2, num_teams))  # Start with 2 on the diagonal
b <- rep(1, num_teams)  # Start with 1 in b-vector

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])

    C[team1, team1] <- C[team1, team1] + 1
    C[team2, team2] <- C[team2, team2] + 1
    C[team1, team2] <- C[team1, team2] - 1
    C[team2, team1] <- C[team2, team1] - 1

    if (scores$team1_score[i] > scores$team2_score[i]) {
        b[team1] <- b[team1] + 1
        b[team2] <- b[team2] - 1
    } else {
        b[team1] <- b[team1] - 1
        b[team2] <- b[team2] + 1
    }
}

# Normalize b vector
b <- 1 + 0.5 * (b - 2)

# Solve for ratings
ratings <- solve(C, b)

# Add ratings to the teams data frame
teams$ColleyRating <- ratings[teams$team_number]

# Order teams by Colley rating
sorted_teams <- teams %>%
  arrange(desc(ColleyRating))

# Print the sorted teams
print("Teams sorted by Colley Rating:")
head(sorted_teams, 20)
cat("\n\n")

########################## Massey Rating

# Initialize the Massey matrix M and p vector
M <- matrix(0, nrow = num_teams, ncol = num_teams)
p <- rep(0, num_teams)

# Update the matrix and vector for each game
for (i in 1:nrow(scores)) {
    team1 <- as.integer(scores$team1_number[i])
    team2 <- as.integer(scores$team2_number[i])
    point_diff <- scores$team1_score[i] - scores$team2_score[i]

    M[team1, team1] <- M[team1, team1] + 1
    M[team2, team2] <- M[team2, team2] + 1
    M[team1, team2] <- M[team1, team2] - 1
    M[team2, team1] <- M[team2, team1] - 1
    p[team1] <- p[team1] + point_diff
    p[team2] <- p[team2] - point_diff
}

# Solve for ratings (add a regularization term to handle singular matrix)
lambda <- 0.001
I <- diag(num_teams)
M_reg <- M + lambda * I
massey_ratings <- solve(M_reg, p)

# Add Massey ratings to the teams data frame
teams$MasseyRating <- massey_ratings[teams$team_number]

# Order teams by Massey rating
sorted_teams_massey <- teams %>%
  arrange(desc(MasseyRating))

# Print the sorted teams with Massey rating
print("Teams sorted by Massey Rating:")
head(sorted_teams_massey, 20)

[1] "Stats from collegeteams.csv and collegescores.csv:"


Unnamed: 0_level_0,Team,Wins,Losses,GamesPlayed,InitialRating
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
1,Colorado_Mines,14,0,14,0.9375
2,Harding,14,0,14,0.9375
3,N_Central_IL,14,0,14,0.9375
4,Northwestern_IA,14,0,14,0.9375
5,Florida_St,13,0,13,0.9333333
6,Liberty,13,0,13,0.9333333
7,Michigan,13,0,13,0.9333333
8,S_Dakota_St,13,0,13,0.9333333
9,Washington,13,0,13,0.9333333
10,Chaffey,11,0,11,0.9230769




[1] "Teams sorted by Colley Rating:"


team_number,team_name,Wins,Losses,GamesPlayed,InitialRating,ColleyRating
<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
846,Washington,13,0,13,0.9333333,0.8460742
192,Cortland_St,13,1,14,0.875,0.8309001
476,Michigan,13,0,13,0.9333333,0.8125073
7,Alabama,12,1,13,0.8666667,0.8110757
661,S_Dakota_St,13,0,13,0.9333333,0.8026988
539,N_Central_IL,14,0,14,0.9375,0.7968943
765,Texas,12,1,13,0.8666667,0.7894912
174,Colorado_Mines,14,0,14,0.9375,0.7889831
268,Florida_St,13,0,13,0.9333333,0.7875666
295,Georgia,12,1,13,0.8666667,0.7722832




[1] "Teams sorted by Massey Rating:"


team_number,team_name,Wins,Losses,GamesPlayed,InitialRating,ColleyRating,MasseyRating
<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
604,Oregon,11,2,13,0.8,0.7040163,84.25738
476,Michigan,13,0,13,0.9333333,0.8125073,83.05331
619,Penn_St,10,2,12,0.7857143,0.698701,80.99266
595,Ohio_St,11,1,12,0.8571429,0.7642066,79.58932
765,Texas,12,1,13,0.8666667,0.7894912,79.49059
598,Oklahoma,10,2,12,0.7857143,0.6815731,78.9592
295,Georgia,12,1,13,0.8666667,0.7722832,76.82455
379,Kansas_St,8,4,12,0.6428571,0.5587302,75.6365
7,Alabama,12,1,13,0.8666667,0.8110757,75.54158
846,Washington,13,0,13,0.9333333,0.8460742,75.04934


## Findings

My code above offers a detailed analysis of current men's college football teams using initial, Colley, and Massey rating systems. By processing team data and game scores from the CSV files, it calculates initial ratings based on wins and losses. The code then applies the Colley Rating System, factoring in team wins, losses, and games played, and the Massey Rating System, which additionally considers point differences. Both systems provide a nuanced view of team performances, with the top 20 teams displayed according to each rating, allowing for a comparative evaluation of teams under these different methodologies. This approach offers a comprehensive and sophisticated analysis of team strengths and standings in the current season.
$$$$
I have compared my results to the two sources below, one showing the overall standings based on the Colley rating and the other based on the Massey rating. My results are very similar to these sources but not 100% accurate. This could be from a number of reasons. The data that I have used might be from a different source, not as up-to-date (missing some games), or there could be variations in how their data is collected, processed, and updated.

$$$$
Colley Link: https://www.colleyrankings.com/currank.html

Massey Link: https://masseyratings.com/cf/fbs/ratings



# Part 5: Group member contributions

- Total time spent on project: ~ 20+ Hours
- This is an individual project so this is all my work.