<a href="https://colab.research.google.com/github/baker-jr-john/automated-summary-evaluation-llm/blob/main/automated_summary_evaluation_llm_rubric_feedback.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# Load the dataset - use comma separator (default for CSV)
df = pd.read_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/dataset/ASAP2_train_sourcetexts.csv',
                 encoding='ISO-8859-1')

# Explore the structure
print("Columns:", df.columns.tolist())
print("\nShape:", df.shape)
print("\nFirst few rows:")
print(df.head())

# Check what types of assignments/prompts exist
print("\nUnique prompts:")
print(df['prompt_name'].value_counts())

# Look at score distribution
print("\nScore distribution:")
print(df['score'].value_counts().sort_index())

Mounted at /content/drive
Columns: ['essay_id', 'score', 'full_text', 'assignment', 'prompt_name', 'economically_disadvantaged', 'student_disability_status', 'ell_status', 'race_ethnicity', 'gender', 'source_text_1', 'source_text_2', 'source_text_3', 'source_text_4']

Shape: (24728, 14)

First few rows:
               essay_id  score  \
0  AAAVUP14319000159574      4   
1  AAAVUP14319000159542      2   
2  AAAVUP14319000159461      3   
3  AAAVUP14319000159420      2   
4  AAAVUP14319000159419      2   

                                           full_text  \
0  The author suggests that studying Venus is wor...   
1  NASA is fighting to be alble to to go to Venus...   
2  "The Evening Star", is one of the brightest po...   
3  The author supports this idea because from rea...   
4  How the author supports this idea is that he s...   

                                          assignment      prompt_name  \
0  In "The Challenge of Exploring Venus," the aut...  Exploring Venus   
1  In "

In [2]:
# Look at the actual assignment prompts to understand task types
print("=" * 80)
for prompt in df['prompt_name'].unique():
    subset = df[df['prompt_name'] == prompt]
    print(f"\n{prompt} ({len(subset)} responses)")
    print(f"Score range: {subset['score'].min()}-{subset['score'].max()}")
    print("\nAssignment:")
    print(subset['assignment'].iloc[0][:300] + "...")  # First 300 chars
    print("-" * 80)


Exploring Venus (4480 responses)
Score range: 1-6

Assignment:
In "The Challenge of Exploring Venus," the author suggests studying Venus is a worthy pursuit despite the dangers it presents. Using details from the article, write an essay evaluating how well the author supports this idea. Be sure to include: a claim that evaluates how well the author supports the...
--------------------------------------------------------------------------------

Facial action coding system (4883 responses)
Score range: 1-6

Assignment:
In the article "Making Mona Lisa Smile," the author describes how a new technology called the Facial Action Coding System enables computers to identify human emotions. Using details from the article, write an essay arguing whether the use of this technology to read the emotional expressions of stude...
--------------------------------------------------------------------------------

The Face on Mars (3015 responses)
Score range: 1-6

Assignment:
You have read the article

In [3]:
# Filter for Exploring Venus responses
venus_df = df[df['prompt_name'] == 'Exploring Venus'].copy()

print(f"Total Venus responses: {len(venus_df)}")
print(f"\nScore distribution:")
print(venus_df['score'].value_counts().sort_index())

# Look at the source text
print("\n" + "="*80)
print("SOURCE TEXT:")
print("="*80)
print(venus_df['source_text_1'].iloc[0])

# Examine sample responses across score levels
print("\n" + "="*80)
print("SAMPLE RESPONSES BY SCORE LEVEL:")
print("="*80)

for score in sorted(venus_df['score'].unique()):
    print(f"\n--- SCORE {score} EXAMPLE ---")
    sample = venus_df[venus_df['score'] == score].iloc[0]
    print(sample['full_text'][:400] + "...")

Total Venus responses: 4480

Score distribution:
score
1     567
2    1419
3    1469
4     808
5     175
6      42
Name: count, dtype: int64

SOURCE TEXT:
The Challenge of Exploring Venus
Venus, sometimes called the âEvening Star,â is one of the brightest points of light in the night sky, making it simple for even and amateur stargazer to spot. However, this nickname is misleading since Venus is actually a planet. While Venus is simple to see from the distant but safe vantage point of Earth, it has proved a very challenging place to examine more closely. 
Often referred to as Earth's âtwin,â Venus is the closest planet to Earth in terms of density and size, and occasionally the closest in distance too. Earth, Venus, and Mars, our other planetary neighbor, orbit the sun at different speeds. These differences in speed mean that sometimes we are closer to Mars and other times to Venus. Because Venus is sometimes right around the corner - in space terms - humans have spent numerous

In [4]:
import random
import numpy as np

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

# Calculate proportional samples (36 total based on original 6-point distribution)
# Proportions: 1=13%, 2=32%, 3=33%, 4=18%, 5=4%, 6=1%
samples_needed = {
    1: 5,   # ~13% (567/4480)
    2: 11,  # ~32% (1,419/4480)
    3: 12,  # ~33% (1,469/4480)
    4: 6,   # ~18% (808/4480)
    5: 2,   # ~4% (175/4480)
    6: 0    # ~1% (42/4480) - too few to sample reliably, we'll grab these separately
}

# For score 6, let's just include all available or sample very carefully
# Since there are only 42 total, we could include 1-2 in the validation set

sampled_rows = []
for score, n_samples in samples_needed.items():
    if n_samples > 0:
        score_subset = venus_df[venus_df['score'] == score]
        if len(score_subset) >= n_samples:
            sample = score_subset.sample(n=n_samples, random_state=42)
            sampled_rows.append(sample)

# For score 6, sample 1 if we want to include it
score_6_subset = venus_df[venus_df['score'] == 6]
if len(score_6_subset) > 0:
    score_6_sample = score_6_subset.sample(n=1, random_state=42)
    sampled_rows.append(score_6_sample)

venus_validation_sample = pd.concat(sampled_rows)

print(f"\nSampled {len(venus_validation_sample)} Venus responses for validation")
print("\n6-point score distribution in sample:")
print(venus_validation_sample['score'].value_counts().sort_index())
print("\nPercentages:")
print(venus_validation_sample['score'].value_counts(normalize=True).sort_index() * 100)

# Save validation sample
venus_validation_sample.to_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/validation_set_venus_36.csv', index=False)

print("\n✅ Saved validation sample (6-point scale)!")


Sampled 37 Venus responses for validation

6-point score distribution in sample:
score
1     5
2    11
3    12
4     6
5     2
6     1
Name: count, dtype: int64

Percentages:
score
1    13.513514
2    29.729730
3    32.432432
4    16.216216
5     5.405405
6     2.702703
Name: proportion, dtype: float64

✅ Saved validation sample (6-point scale)!
