# Estimating Transition Matrices for Grades

## 1. Markov Transition Matrix

A transition matrix, $P$, for a system with $N$ states is an $N \times N$ matrix where each element $P_{ij}$ represents the probability of moving from state $i$ to state $j$ in one step.

### 1.1. Defining the States

- States (N=6): The possible grades are A, B, C, D, E, U.
- The rows of the matrix represent the starting state (Mock Exam Grade).
- The columns of the matrix represent the ending state (Final Exam Grade).

The matrix will look like this:

$$
P = 
\begin{pmatrix}
P_{AA} & P_{AB} & P_{AC} & P_{AD} & P_{AE} & P_{AU} \\
P_{BA} & P_{BB} & P_{BC} & P_{BD} & P_{BE} & P_{BU} \\
P_{CA} & P_{CB} & P_{CC} & P_{CD} & P_{CE} & P_{CU} \\
P_{DA} & P_{DB} & P_{DC} & P_{DD} & P_{DE} & P_{DU} \\
P_{EA} & P_{EB} & P_{EC} & P_{ED} & P_{EE} & P_{EU} \\
P_{UA} & P_{UB} & P_{UC} & P_{UD} & P_{UE} & P_{UU}
\end{pmatrix}
$$

### 1.2. Calculating the Probabilities ($P_{ij}$)

The probability $P_{ij}$ is calculated by observing the number of students who transitioned from grade $i$ (in the Mock) to grade $j$ (in the Final) and dividing it by the total number of students who started with grade $i$.

$$
P_{ij} = \frac{\text{Number of students who got } i \text{ (Mock) and } j \text{ (Final)}}{\text{Total number of students who got } i \text{ (Mock)}}
$$

### 1.3. Key Property

The sum of the probabilities in each row must equal 1, as a student starting in grade $i$ must end up in some grade ($jâˆˆ{A,B,C,D,E,U}$).

$$
\sum_{j=A}^{U} P_{ij} = 1 \quad \text{for all } i
$$

## 2. Implementation in Python

### 2.1. Data Setup

In [6]:
import pandas as pd
import numpy as np

# Define the order of the grades (states)
grades = ['A', 'B', 'C', 'D', 'E', 'U']

# --- Example Data Setup (Replace with your actual data) --- the range number is number of students + 1 !!!
data = {
    'StudentID': range(1, 16), 
    'Mock_Grade': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'U', 'U', 'U'],
    'Final_Grade': ['A', 'B', 'A', 'B', 'C', 'B', 'C', 'C', 'D', 'E', 'D', 'E', 'E', 'U', 'U']
}
df = pd.DataFrame(data)
# -----------------------------------------------------------

### 2.2. Counting the Transitions (The Joint Frequencies)

In [4]:
# 1. Create a contingency table (counts)
# Index = Mock_Grade (Start State)
# Columns = Final_Grade (End State)
transition_counts = pd.crosstab(
    df['Mock_Grade'],
    df['Final_Grade'],
    dropna=False # Important to include all pairs if data has missing categories
)

# Optional: Reindex to ensure all grades are present and in the correct order
transition_counts = transition_counts.reindex(index=grades, columns=grades, fill_value=0)

print("--- Transition Counts (Frequency Table) ---")
print(transition_counts)

--- Transition Counts (Frequency Table) ---
Final_Grade  A  B  C  D  E  U
Mock_Grade                   
A            1  1  0  0  0  0
B            1  1  1  0  0  0
C            0  1  1  0  0  0
D            0  0  1  1  1  0
E            0  0  0  1  1  0
U            0  0  0  0  1  2


### 2.3. Calculating the Transition Matrix (The Probabilities)

In [5]:
# 2. Convert counts to probabilities (Transition Matrix P)
# The margins=True argument adds the row sums for verification
transition_matrix_P = pd.crosstab(
    df['Mock_Grade'],
    df['Final_Grade'],
    normalize='index', # Normalizes by row sum to get probabilities
    dropna=False
)

# Optional: Reindex and fill again to ensure a complete, ordered matrix
transition_matrix_P = transition_matrix_P.reindex(index=grades, columns=grades, fill_value=0)

# Optional: Format the probabilities for better readability
transition_matrix_P_formatted = transition_matrix_P.apply(lambda x: pd.Series([f'{v:.3f}' for v in x], index=x.index), axis=1)

print("\n--- Markov Transition Matrix P (Probabilities) ---")
print(transition_matrix_P_formatted)


--- Markov Transition Matrix P (Probabilities) ---
Final_Grade      A      B      C      D      E      U
Mock_Grade                                           
A            0.500  0.500  0.000  0.000  0.000  0.000
B            0.333  0.333  0.333  0.000  0.000  0.000
C            0.000  0.500  0.500  0.000  0.000  0.000
D            0.000  0.000  0.333  0.333  0.333  0.000
E            0.000  0.000  0.000  0.500  0.500  0.000
U            0.000  0.000  0.000  0.000  0.333  0.667


This transition matrix can then be used to predict the expected grade distribution in the final exam, given the distribution in the mock exam, by using the matrix multiplication: 

$$
\text{Final\_Distribution} = \text{Mock\_Distribution} \times P
$$




## 3. The entire code in one block

In [7]:
import pandas as pd
import numpy as np

# Define the order of the grades (states)
grades = ['A', 'B', 'C', 'D', 'E', 'U']

# --- Not Observed Example Data Setup (Replace with your actual data) --- the range number is number of students + 1 !!!
data = {
    'StudentID': range(1, 16), 
    'Mock_Grade': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'U', 'U', 'U'],
    'Final_Grade': ['A', 'B', 'A', 'B', 'C', 'B', 'C', 'C', 'D', 'E', 'D', 'E', 'E', 'U', 'U']
}
df = pd.DataFrame(data)
# -----------------------------------------------------------




# 1. Create a contingency table (counts)
# Index = Mock_Grade (Start State)
# Columns = Final_Grade (End State)
transition_counts = pd.crosstab(
    df['Mock_Grade'],
    df['Final_Grade'],
    dropna=False # Important to include all pairs if data has missing categories
)

# Optional: Reindex to ensure all grades are present and in the correct order
transition_counts = transition_counts.reindex(index=grades, columns=grades, fill_value=0)

print("--- Transition Counts (Frequency Table) ---")
print(transition_counts)




# 2. Convert counts to probabilities (Transition Matrix P)
# The margins=True argument adds the row sums for verification
transition_matrix_P = pd.crosstab(
    df['Mock_Grade'],
    df['Final_Grade'],
    normalize='index', # Normalizes by row sum to get probabilities
    dropna=False
)

# Optional: Reindex and fill again to ensure a complete, ordered matrix
transition_matrix_P = transition_matrix_P.reindex(index=grades, columns=grades, fill_value=0)

# Optional: Format the probabilities for better readability
transition_matrix_P_formatted = transition_matrix_P.apply(lambda x: pd.Series([f'{v:.3f}' for v in x], index=x.index), axis=1)

print("\n--- Markov Transition Matrix P (Probabilities) ---")
print(transition_matrix_P_formatted)




--- Transition Counts (Frequency Table) ---
Final_Grade  A  B  C  D  E  U
Mock_Grade                   
A            1  1  0  0  0  0
B            1  1  1  0  0  0
C            0  1  1  0  0  0
D            0  0  1  1  1  0
E            0  0  0  1  1  0
U            0  0  0  0  1  2

--- Markov Transition Matrix P (Probabilities) ---
Final_Grade      A      B      C      D      E      U
Mock_Grade                                           
A            0.500  0.500  0.000  0.000  0.000  0.000
B            0.333  0.333  0.333  0.000  0.000  0.000
C            0.000  0.500  0.500  0.000  0.000  0.000
D            0.000  0.000  0.333  0.333  0.333  0.000
E            0.000  0.000  0.000  0.500  0.500  0.000
U            0.000  0.000  0.000  0.000  0.333  0.667
