<a href="https://colab.research.google.com/github/baker-jr-john/automated-summary-evaluation-llm/blob/main/automated_summary_evaluation_llm_rubric_feedback.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# Load the dataset - use comma separator (default for CSV)
df = pd.read_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/dataset/ASAP2_train_sourcetexts.csv',
                 encoding='ISO-8859-1')

# Explore the structure
print("Columns:", df.columns.tolist())
print("\nShape:", df.shape)
print("\nFirst few rows:")
print(df.head())

# Check what types of assignments/prompts exist
print("\nUnique prompts:")
print(df['prompt_name'].value_counts())

# Look at score distribution
print("\nScore distribution:")
print(df['score'].value_counts().sort_index())

Mounted at /content/drive
Columns: ['essay_id', 'score', 'full_text', 'assignment', 'prompt_name', 'economically_disadvantaged', 'student_disability_status', 'ell_status', 'race_ethnicity', 'gender', 'source_text_1', 'source_text_2', 'source_text_3', 'source_text_4']

Shape: (24728, 14)

First few rows:
               essay_id  score  \
0  AAAVUP14319000159574      4   
1  AAAVUP14319000159542      2   
2  AAAVUP14319000159461      3   
3  AAAVUP14319000159420      2   
4  AAAVUP14319000159419      2   

                                           full_text  \
0  The author suggests that studying Venus is wor...   
1  NASA is fighting to be alble to to go to Venus...   
2  "The Evening Star", is one of the brightest po...   
3  The author supports this idea because from rea...   
4  How the author supports this idea is that he s...   

                                          assignment      prompt_name  \
0  In "The Challenge of Exploring Venus," the aut...  Exploring Venus   
1  In "

In [2]:
# Look at the actual assignment prompts to understand task types
print("=" * 80)
for prompt in df['prompt_name'].unique():
    subset = df[df['prompt_name'] == prompt]
    print(f"\n{prompt} ({len(subset)} responses)")
    print(f"Score range: {subset['score'].min()}-{subset['score'].max()}")
    print("\nAssignment:")
    print(subset['assignment'].iloc[0][:300] + "...")  # First 300 chars
    print("-" * 80)


Exploring Venus (4480 responses)
Score range: 1-6

Assignment:
In "The Challenge of Exploring Venus," the author suggests studying Venus is a worthy pursuit despite the dangers it presents. Using details from the article, write an essay evaluating how well the author supports this idea. Be sure to include: a claim that evaluates how well the author supports the...
--------------------------------------------------------------------------------

Facial action coding system (4883 responses)
Score range: 1-6

Assignment:
In the article "Making Mona Lisa Smile," the author describes how a new technology called the Facial Action Coding System enables computers to identify human emotions. Using details from the article, write an essay arguing whether the use of this technology to read the emotional expressions of stude...
--------------------------------------------------------------------------------

The Face on Mars (3015 responses)
Score range: 1-6

Assignment:
You have read the article

In [3]:
# Filter for Exploring Venus responses
venus_df = df[df['prompt_name'] == 'Exploring Venus'].copy()

print(f"Total Venus responses: {len(venus_df)}")
print(f"\nScore distribution:")
print(venus_df['score'].value_counts().sort_index())

# Look at the source text
print("\n" + "="*80)
print("SOURCE TEXT:")
print("="*80)
print(venus_df['source_text_1'].iloc[0])

# Examine sample responses across score levels
print("\n" + "="*80)
print("SAMPLE RESPONSES BY SCORE LEVEL:")
print("="*80)

for score in sorted(venus_df['score'].unique()):
    print(f"\n--- SCORE {score} EXAMPLE ---")
    sample = venus_df[venus_df['score'] == score].iloc[0]
    print(sample['full_text'][:400] + "...")

Total Venus responses: 4480

Score distribution:
score
1     567
2    1419
3    1469
4     808
5     175
6      42
Name: count, dtype: int64

SOURCE TEXT:
The Challenge of Exploring Venus
Venus, sometimes called the √¢¬Ä¬úEvening Star,√¢¬Ä¬ù is one of the brightest points of light in the night sky, making it simple for even and amateur stargazer to spot. However, this nickname is misleading since Venus is actually a planet. While Venus is simple to see from the distant but safe vantage point of Earth, it has proved a very challenging place to examine more closely. 
Often referred to as Earth's √¢¬Ä¬útwin,√¢¬Ä¬ù Venus is the closest planet to Earth in terms of density and size, and occasionally the closest in distance too. Earth, Venus, and Mars, our other planetary neighbor, orbit the sun at different speeds. These differences in speed mean that sometimes we are closer to Mars and other times to Venus. Because Venus is sometimes right around the corner - in space terms - humans have sp

In [4]:
import random

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

# Calculate proportional samples (36 total based on original 6-point distribution)
# Proportions: 1=13%, 2=32%, 3=33%, 4=18%, 5=4%, 6=1%
samples_needed = {
    1: 5,   # ~13% (567/4480)
    2: 11,  # ~32% (1,419/4480)
    3: 12,  # ~33% (1,469/4480)
    4: 6,   # ~18% (808/4480)
    5: 2,   # ~4% (175/4480)
    6: 0    # ~1% (42/4480) - too few to sample reliably, we'll grab these separately
}

# For score 6, let's just include all available or sample very carefully
# Since there are only 42 total, we could include 1-2 in the validation set

sampled_rows = []
for score, n_samples in samples_needed.items():
    if n_samples > 0:
        score_subset = venus_df[venus_df['score'] == score]
        if len(score_subset) >= n_samples:
            sample = score_subset.sample(n=n_samples, random_state=42)
            sampled_rows.append(sample)

# For score 6, sample 1 if we want to include it
score_6_subset = venus_df[venus_df['score'] == 6]
if len(score_6_subset) > 0:
    score_6_sample = score_6_subset.sample(n=1, random_state=42)
    sampled_rows.append(score_6_sample)

venus_validation_sample = pd.concat(sampled_rows)

print(f"\nSampled {len(venus_validation_sample)} Venus responses for validation")
print("\n6-point score distribution in sample:")
print(venus_validation_sample['score'].value_counts().sort_index())
print("\nPercentages:")
print(venus_validation_sample['score'].value_counts(normalize=True).sort_index() * 100)

# Save validation sample
venus_validation_sample.to_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/validation_set_venus_36.csv', index=False)

print("\n‚úÖ Saved validation sample (6-point scale)!")


Sampled 37 Venus responses for validation

6-point score distribution in sample:
score
1     5
2    11
3    12
4     6
5     2
6     1
Name: count, dtype: int64

Percentages:
score
1    13.513514
2    29.729730
3    32.432432
4    16.216216
5     5.405405
6     2.702703
Name: proportion, dtype: float64

‚úÖ Saved validation sample (6-point scale)!


In [5]:
# ========================================
# QUICK TEST: Generate 3 Synthetic Examples
# ========================================
import pandas as pd
from datetime import datetime
import time
from google.colab import drive, userdata
from openai import OpenAI

drive.mount('/content/drive', force_remount=True)

api_key = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)

# Load Venus source text
df = pd.read_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/dataset/ASAP2_train_sourcetexts.csv',
                 encoding='ISO-8859-1')
venus_df = df[df['prompt_name'] == 'Exploring Venus']
VENUS_SOURCE = venus_df['source_text_1'].iloc[0]
VENUS_ASSIGNMENT = venus_df['assignment'].iloc[0]

# Rubric
RUBRIC = """
COMPLETENESS (1-5): Coverage of main ideas and supporting details
ACCURACY (1-5): Factual correctness and faithful representation
COHERENCE (1-5): Organization, transitions, logical flow
CONCISENESS (1-5): Appropriate length without repetition
"""

def generate_example(essay_id, score, error_type, instructions, word_target):
    prompt = f"""You are simulating a grade 7-8 middle school student writing an evaluative essay.

ASSIGNMENT: {VENUS_ASSIGNMENT}

SOURCE TEXT: {VENUS_SOURCE}

RUBRIC: {RUBRIC}

TASK: Write a student response earning score {score}/6 with these characteristics:
{instructions}

Target length: {word_target} words
Use authentic middle school vocabulary and style.
Write only the student essay (no meta-commentary):"""

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You simulate authentic middle school writing for research."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.9,
            max_tokens=800
        )

        text = response.choices[0].message.content.strip()

        return {
            'essay_id': essay_id,
            'score': score,
            'full_text': text,
            'synthetic_flag': True,
            'target_error_pattern': error_type,
            'word_count': len(text.split()),
            'generation_date': datetime.now().isoformat(),
            'assignment': VENUS_ASSIGNMENT,
            'prompt_name': 'Exploring Venus',
            'source_text_1': VENUS_SOURCE
        }
    except Exception as e:
        print(f"Error: {e}")
        return None

# TEST: Generate 3 examples
configs = [
    {'essay_id': 'SYNTH_V_01_S1', 'score': 1, 'error_type': 'Severe incompleteness + fabrication',
     'word_target': '100-150',
     'instructions': 'Cover only 1-2 superficial details. Include 2-3 fabricated facts. Show fundamental misunderstanding. Random organization.'},

    {'essay_id': 'SYNTH_V_04_S2', 'score': 2, 'error_type': 'Completeness gap',
     'word_target': '150-200',
     'instructions': 'Identify main claim correctly but provide only 1-2 vague examples. Omit specific evidence (temperatures, NASA solutions). Very superficial.'},

    {'essay_id': 'SYNTH_V_11_S3', 'score': 3, 'error_type': 'Good content, weak coherence',
     'word_target': '220-250',
     'instructions': 'Cover all main ideas with adequate detail. Use awkward transitions. Random ordering. Choppy flow. Good content, poor organization.'},
]

results = []
for i, cfg in enumerate(configs, 1):
    print(f"[{i}/3] Generating {cfg['essay_id']}...", end=" ")
    result = generate_example(**cfg)
    if result:
        results.append(result)
        print(f"‚úì ({result['word_count']} words)")
        print(f"Preview: {result['full_text'][:200]}...\n")
    time.sleep(1)

# Save test results
test_df = pd.DataFrame(results)
test_path = '/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/test_synthetic_3.csv'
test_df.to_csv(test_path, index=False)

print(f"‚úÖ Generated {len(results)} test examples")
print(f"üìÅ Saved to: {test_path}")
print("\nüëÄ Review these examples. If they look good, proceed to full generation!")

Mounted at /content/drive
[1/3] Generating SYNTH_V_01_S1... ‚úì (120 words)
Preview: In "The Challenge of Exploring Venus," the author thinks studying Venus is important, even if it's dangerous. They say Venus has volcanoes and acid clouds, which sounds scary. However, they don't expl...

[2/3] Generating SYNTH_V_04_S2... ‚úì (175 words)
Preview: In "The Challenge of Exploring Venus," the author argues that studying Venus is worth the risk because it can give us important information about our solar system. While the dangers on Venus are reall...

[3/3] Generating SYNTH_V_11_S3... ‚úì (253 words)
Preview: In "The Challenge of Exploring Venus," the author makes a strong case that studying Venus is a worthy endeavor, even though it poses many dangers. The article emphasizes how Venus, often called Earth'...

‚úÖ Generated 3 test examples
üìÅ Saved to: /content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/test_synthetic_3.csv

ü

In [6]:
# ========================================
# REGENERATE 2 EXAMPLES WITH REFINED PROMPTS
# ========================================
from openai import OpenAI
import pandas as pd
from datetime import datetime
import time
from google.colab import drive, userdata

drive.mount('/content/drive', force_remount=True)

api_key = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)

# Load Venus source text (if not already loaded)
df = pd.read_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/dataset/ASAP2_train_sourcetexts.csv',
                 encoding='ISO-8859-1')
venus_df = df[df['prompt_name'] == 'Exploring Venus']
VENUS_SOURCE = venus_df['source_text_1'].iloc[0]
VENUS_ASSIGNMENT = venus_df['assignment'].iloc[0]

# Rubric
RUBRIC = """
COMPLETENESS (1-5): Coverage of main ideas and supporting details
ACCURACY (1-5): Factual correctness and faithful representation
COHERENCE (1-5): Organization, transitions, logical flow
CONCISENESS (1-5): Appropriate length without repetition
"""

# Generation function
def generate_example(essay_id, score, error_type, instructions, word_target):
    prompt = f"""You are simulating a grade 7-8 middle school student writing an evaluative essay.

ASSIGNMENT: {VENUS_ASSIGNMENT}

SOURCE TEXT: {VENUS_SOURCE}

RUBRIC: {RUBRIC}

TASK: Write a student response earning score {score}/6 with these characteristics:
{instructions}

Target length: {word_target} words
Use authentic middle school vocabulary and style.
Write only the student essay (no meta-commentary):"""

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You simulate authentic middle school writing for research."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.9,
            max_tokens=800
        )

        text = response.choices[0].message.content.strip()

        return {
            'essay_id': essay_id,
            'score': score,
            'full_text': text,
            'synthetic_flag': True,
            'target_error_pattern': error_type,
            'word_count': len(text.split()),
            'generation_date': datetime.now().isoformat(),
            'assignment': VENUS_ASSIGNMENT,
            'prompt_name': 'Exploring Venus',
            'source_text_1': VENUS_SOURCE
        }
    except Exception as e:
        print(f"Error: {e}")
        return None

# ========================================
# REFINED CONFIGURATIONS (THE KEY CHANGES)
# ========================================

refined_configs = [
    # SCORE 1 - REVISED for more chaos
    {
        'essay_id': 'SYNTH_V_01_S1_REVISED',
        'score': 1,
        'error_type': 'Severe incompleteness + fabrication',
        'word_target': '100-150',
        'instructions': """Cover only 1-2 superficial details like "Venus is bright" or "it's hot."
Include 2-3 fabricated facts (e.g., say "hotter than the sun" or "NASA already went there").

CRITICAL - Make it SCATTERED and CHAOTIC:
- Jump between totally unrelated ideas with NO logical connections
- NO clear introduction-body-conclusion structure
- Let ideas trail off or suddenly change direction mid-thought
- Use only simple transitions: "Also," "And," or just start new sentences randomly
- Make at least 2-3 sentences that don't connect to anything around them
- Reader should feel confused trying to follow your point

Example of scattered style you should use:
"Venus is really hot I think. Also there's blimps or something? The article talks about dangers but I can't remember. And they should explore Mars instead because it's better. Venus has acid maybe. I heard NASA already landed there but it broke. Also space is cool."

Show you fundamentally misunderstood the assignment. Make it feel random and unfocused."""
    },

    # SCORE 3 - REVISED for choppier flow
    {
        'essay_id': 'SYNTH_V_11_S3_REVISED',
        'score': 3,
        'error_type': 'Good content, weak coherence',
        'word_target': '220-250',
        'instructions': """Cover ALL main ideas with GOOD specific details:
- Dangers: 800¬∞F temperature, sulfuric acid, 97% CO2 atmosphere, 90x pressure
- Solutions: NASA's blimp at 30 miles altitude, mechanical computers, silicon carbide
- Value: Venus may have had oceans, Earth's twin, scientific curiosity

CRITICAL - Good content but CHOPPY execution:
- Use ONLY weak transitions: "Also," "And," "Another thing," "So," "Plus"
- NEVER use sophisticated ones: "Furthermore," "Additionally," "Moreover," "In conclusion"
- Present good ideas but in somewhat random order - jump between topics
- Make each paragraph feel disconnected from the previous one

Example of choppy style you should use:
"The author talks about Venus being super dangerous. It's like 800 degrees with sulfuric acid everywhere and 97% carbon dioxide. Also NASA has this idea about using blimps to float above Venus at like 30 miles up. And the temperature would still be hot but humans could survive. Plus Venus might have had oceans a long time ago so that's why scientists care. Another thing is they're making mechanical computers that can handle the extreme heat and pressure."

Reader should think: "This student clearly understood the article and has good details, but the organization and flow are rough. It feels choppy."

DO NOT write a polished conclusion that ties everything together - make it feel more abrupt."""
    }
]

# ========================================
# GENERATE THE 2 REVISED EXAMPLES
# ========================================

print("="*60)
print("GENERATING 2 REVISED EXAMPLES WITH REFINED PROMPTS")
print("="*60)
print()

revised_results = []
for i, cfg in enumerate(refined_configs, 1):
    print(f"[{i}/2] Generating {cfg['essay_id']}...")
    print(f"Target: {cfg['error_type']}")
    print()

    result = generate_example(**cfg)

    if result:
        revised_results.append(result)
        print(f"‚úì Generated! ({result['word_count']} words)\n")
        print(f"FULL TEXT:")
        print("-"*60)
        print(result['full_text'])
        print("-"*60)
        print()

    time.sleep(2)  # Rate limiting

# ========================================
# SAVE REVISED EXAMPLES
# ========================================

if revised_results:
    revised_df = pd.DataFrame(revised_results)
    revised_path = '/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/test_synthetic_REVISED.csv'
    revised_df.to_csv(revised_path, index=False)

    print("\n" + "="*60)
    print("‚úÖ REGENERATION COMPLETE!")
    print("="*60)
    print(f"üìÅ Saved to: {revised_path}")
    print()
    print("üìä COMPARISON:")
    print(f"  Original Score 1: 129 words, somewhat coherent")
    print(f"  Revised Score 1:  {revised_results[0]['word_count']} words")
    print()
    print(f"  Original Score 3: 241 words, too smooth")
    print(f"  Revised Score 3:  {revised_results[1]['word_count']} words")
    print()
    print("üîç REVIEW THE TEXT ABOVE")
    print("   Check if Score 1 now feels scattered/chaotic")
    print("   Check if Score 3 now feels choppy but has good content")
    print()
    print("‚úÖ If satisfied ‚Üí Proceed to STEP 2")
    print("‚ùå If not quite right ‚Üí Adjust instructions and regenerate")

Mounted at /content/drive
GENERATING 2 REVISED EXAMPLES WITH REFINED PROMPTS

[1/2] Generating SYNTH_V_01_S1_REVISED...
Target: Severe incompleteness + fabrication

‚úì Generated! (136 words)

FULL TEXT:
------------------------------------------------------------
Venus is really hot! It's like way hotter than the sun, but I think it has cool clouds. Also, NASA already went there and their spaceship didn't make it back because it melted. The article mentions scientists want to explore Venus more, but it seems silly because it's so dangerous. I mean, there are volcanoes and acid. And maybe they should just focus on exploring Mars instead since it's closer and better. There's like a blimp they can use to study Venus too, but why? Blimps seem slow. Also, they might find aliens there or something, which would be awesome, but I don't know if they will. The idea of floating above Venus is kind of weird too. So, exploring Venus is dangerous but maybe fun? I don't really get it, but space is j

In [7]:
"""
COMPLETE SYNTHETIC EXAMPLES GENERATOR - REFINED VERSION
Generates 23 synthetic Venus summaries with improved authenticity
- Score 1: Enhanced chaotic, scattered organization
- Score 3: Enhanced choppy flow with good content
- Scores 2, 4, 5: Unchanged (already working well)
"""

from openai import OpenAI
import pandas as pd
from datetime import datetime
import time
from google.colab import drive, userdata

drive.mount('/content/drive', force_remount=True)

# ===========================================
# CONFIGURATION
# ===========================================

# Set your OpenAI API key
api_key = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)

# File paths
AUTHENTIC_PATH = '/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/validation_set_venus_36.csv'
SYNTHETIC_PATH = '/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/validation_set_synthetic_23.csv'
COMBINED_PATH = '/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/validation_set_combined_60.csv'

# ===========================================
# LOAD VENUS SOURCE TEXT
# ===========================================

def load_venus_source():
    """Load the Venus article text from the ASAP dataset"""
    df = pd.read_csv('/content/drive/MyDrive/Courses/2025/3_Fall/EDUC_6192_Large_Language_Model_Applications_in_Education/Project/data/dataset/ASAP2_train_sourcetexts.csv',
                     encoding='ISO-8859-1')
    venus_df = df[df['prompt_name'] == 'Exploring Venus']
    source_text = venus_df['source_text_1'].iloc[0]
    assignment = venus_df['assignment'].iloc[0]
    return source_text, assignment

VENUS_SOURCE, VENUS_ASSIGNMENT = load_venus_source()

# ===========================================
# RUBRIC
# ===========================================

RUBRIC = """
RUBRIC FOR VENUS SUMMARY EVALUATION (1-5 scale per dimension):

COMPLETENESS (1-5):
5: Comprehensive coverage of all main ideas with strong supporting details
4: Covers main ideas with good supporting details, minor gaps acceptable
3: Covers basic main ideas but missing some key supporting details
2: Partial coverage with significant gaps
1: Minimal coverage, major ideas missing

ACCURACY (1-5):
5: All information factually correct
4: Mostly accurate with only minor imprecisions
3: Generally accurate but with some notable errors
2: Multiple factual errors or significant misrepresentations
1: Major factual errors, invented details, fundamental misunderstandings

COHERENCE (1-5):
5: Excellent organization with smooth transitions and clear logical flow
4: Well-organized with generally good transitions
3: Basic organization but with awkward transitions or logical gaps
2: Poor organization, weak transitions, difficult to follow
1: Incoherent, random organization, no clear structure

CONCISENESS (1-5):
5: Appropriately concise (200-250 words), no unnecessary repetition
4: Reasonably concise (250-280 words), minimal redundancy
3: Somewhat verbose (280-320 words) or with noticeable repetition
2: Too long (320-400 words) with significant repetition
1: Extremely brief (<150 words) or excessively long (>400 words)
"""

# ===========================================
# GENERATION FUNCTIONS
# ===========================================

def create_generation_prompt(score, specific_instructions, word_count_guidance):
    """Create prompt for generating a synthetic example"""
    return f"""You are simulating an authentic middle school student (grades 7-8) writing an evaluative essay about whether the author of an article successfully supports their argument. This is for educational research to test an automated assessment system.

ASSIGNMENT:
{VENUS_ASSIGNMENT}

SOURCE TEXT:
{VENUS_SOURCE}

RUBRIC:
{RUBRIC}

YOUR TASK:
Write a student response that would realistically earn a score of {score} on the 6-point scale based on the rubric above.

SPECIFIC REQUIREMENTS FOR THIS SAMPLE:
{specific_instructions}

WRITING GUIDELINES:
- Use vocabulary and sentence structure typical of grades 7-8
- Target length: {word_count_guidance} words
- Include some natural middle school writing patterns (minor grammar quirks, occasional informal phrasing)
- Make it feel authentic - not overly polished or obviously AI-generated
- Focus on the CONTENT errors specified above (don't make it artificially bad with excessive spelling/grammar errors)
- Stay focused on the Venus exploration topic
- Remember: this is evaluating HOW WELL THE AUTHOR SUPPORTS THE IDEA, not just summarizing

Write only the student essay response (no meta-commentary):"""

def generate_synthetic_example(example_config, model="gpt-4o-mini"):
    """Generate a single synthetic example using GPT-4o-Mini"""

    prompt = create_generation_prompt(
        score=example_config['score'],
        specific_instructions=example_config['instructions'],
        word_count_guidance=example_config['word_count']
    )

    try:
        # ‚úÖ Use Chat Completions via the client
        response = client.chat.completions.create(
            model=model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are an expert at simulating authentic middle school "
                        "student writing for educational research purposes."
                    )
                },
                {
                    "role": "user",
                    "content": prompt
                },
            ],
            temperature=0.9,
            max_tokens=800,
        )

        generated_text = response.choices[0].message.content.strip()

        return {
            'essay_id': example_config['essay_id'],
            'score': example_config['score'],
            'full_text': generated_text,
            'assignment': VENUS_ASSIGNMENT,
            'prompt_name': 'Exploring Venus',
            'source_text_1': VENUS_SOURCE,
            'source_text_2': None,
            'source_text_3': None,
            'source_text_4': None,
            'economically_disadvantaged': 'Synthetic',
            'student_disability_status': 'Synthetic',
            'ell_status': 'Synthetic',
            'race_ethnicity': 'Synthetic',
            'gender': 'Synthetic',
            'synthetic_flag': True,
            'target_error_pattern': example_config['target_error'],
            'generation_date': datetime.now().isoformat(),
            'generation_model': model,
            'word_count': len(generated_text.split())
        }

    except Exception as e:
        print(f"Error generating {example_config['essay_id']}: {e}")
        return None

# ===========================================
# ALL 23 SYNTHETIC EXAMPLE CONFIGURATIONS
# ===========================================

SYNTHETIC_EXAMPLES = [

    # ========================================
    # SCORE 1 EXAMPLES (3 total) - REFINED
    # ========================================

    {
        'essay_id': 'SYNTH_V_01_S1',
        'score': 1,
        'target_error': 'Severe incompleteness + fabrication',
        'word_count': '100-150',
        'instructions': """Cover only 1-2 superficial details like "Venus is bright" or "it's hot."
Include 2-3 fabricated facts (e.g., say "hotter than the sun" or "NASA already went there").

CRITICAL - Make it SCATTERED and CHAOTIC:
- Jump between totally unrelated ideas with NO logical connections
- NO clear introduction-body-conclusion structure
- Let ideas trail off or suddenly change direction mid-thought
- Use only simple transitions: "Also," "And," or just start new sentences randomly
- Make at least 2-3 sentences that don't connect to anything around them
- Reader should feel confused trying to follow your point

Example scattered style: "Venus is really hot I think. Also there's blimps or something? The article talks about dangers but I can't remember. And they should explore Mars instead. Venus has acid maybe."

Show you fundamentally misunderstood the assignment. Make it feel random and unfocused."""
    },

    {
        'essay_id': 'SYNTH_V_02_S1',
        'score': 1,
        'target_error': 'Extreme brevity + major misunderstandings',
        'word_count': '80-120',
        'instructions': """Write only 3-5 sentences total (extremely brief).
Fundamentally misrepresent the article's main argument (say scientists have already successfully explored Venus when the article is about future plans and challenges).

CRITICAL - Make it SCATTERED and CHAOTIC:
- Jump between ideas with no connections
- NO structure at all
- Treat Venus exploration as if it's already accomplished rather than a future challenge
- Miss the evaluative component entirely - don't assess whether the author supported their claim well
- Use simple transitions only: "Also," "And," or none
- Let thoughts trail off incompletely

Show you didn't understand what the article was actually about. Make it feel very confused."""
    },

    {
        'essay_id': 'SYNTH_V_03_S1',
        'score': 1,
        'target_error': 'Off-topic rambling + factual confusion',
        'word_count': '150-200',
        'instructions': """Spend most of the essay on tangential topics (other planets, space exploration in general, why space is cool).
Confuse Venus with Mars or Mercury in several places.
Include information about planets that wasn't in the article at all.

CRITICAL - Make it SCATTERED and CHAOTIC:
- Jump wildly between unrelated topics: Venus ‚Üí other planets ‚Üí space careers ‚Üí back to Venus ‚Üí random facts
- NO clear focus on the assigned task
- Incoherent connections between ideas
- Simple or no transitions
- Several sentences that feel completely disconnected
- Never clearly address whether the author supported their argument

Show the student didn't focus on the assigned task and got distracted by tangents."""
    },

    # ========================================
    # SCORE 2 EXAMPLES (7 total) - UNCHANGED
    # ========================================

    {
        'essay_id': 'SYNTH_V_04_S2',
        'score': 2,
        'target_error': 'Completeness gap - missing critical supporting details',
        'word_count': '150-200',
        'instructions': """Correctly identify the main claim (studying Venus is worthy despite dangers).
Mention that the author discusses dangers and solutions.
BUT only provide 1-2 very vague examples (e.g., "there are dangers" without specifying what).
Omit key evidence like specific temperatures, NASA's blimp solution, mechanical computers, etc.
Show basic understanding but very superficial engagement with the text.
Address the evaluation aspect but without sufficient detail to be convincing."""
    },

    {
        'essay_id': 'SYNTH_V_05_S2',
        'score': 2,
        'target_error': 'Accuracy issues - multiple misrepresentations',
        'word_count': '180-220',
        'instructions': """Capture the basic structure (dangers ‚Üí solutions ‚Üí why it's worth it).
Include several factual errors: wrong temperature (say 600¬∞F instead of 800¬∞F), wrong atmospheric pressure, wrong altitude for NASA's blimp.
Misattribute information (e.g., say Mercury is Earth's twin, or confuse which planet has the hottest surface).
Mix up timeframes (say missions were recent when they were decades ago).
Show the student read the article but remembered details incorrectly."""
    },

    {
        'essay_id': 'SYNTH_V_06_S2',
        'score': 2,
        'target_error': 'Coherence problems - poor organization',
        'word_count': '180-220',
        'instructions': """Include relevant content from the article but present it in random order.
Jump from dangers to solutions back to dangers to why Venus is interesting with no logical flow.
Use very weak or missing transitions ("Also..." or "And another thing...").
Make it hard to follow the argument even though the information is present.
Repeat the same point in different places rather than grouping related ideas."""
    },

    {
        'essay_id': 'SYNTH_V_07_S2',
        'score': 2,
        'target_error': 'Conciseness problems - excessive length with repetition',
        'word_count': '400-450',
        'instructions': """Include accurate information about Venus.
Repeat the same points 3-4 times using slightly different wording each time.
Say things like "Venus is dangerous because of the heat. The heat on Venus is extreme. The temperatures on Venus are very hot."
Include unnecessary elaboration on minor details.
Make it feel like padding to meet a length requirement.
Could easily be cut to 200 words without losing content."""
    },

    {
        'essay_id': 'SYNTH_V_08_S2',
        'score': 2,
        'target_error': 'Quote-heavy with minimal synthesis',
        'word_count': '180-220',
        'instructions': """Rely heavily on direct phrases from the article (don't use actual quotation marks, but use near-quotes).
String together borrowed phrases with minimal original summarization.
Reads like a patchwork: "The article says [near quote]. It also mentions [near quote]. The author states [near quote]."
Very little original synthesis or paraphrasing.
Shows the student didn't process the information, just copied it.
Weak evaluation of whether the author's support is effective."""
    },

    {
        'essay_id': 'SYNTH_V_09_S2',
        'score': 2,
        'target_error': 'Shallow coverage - lists facts without connections',
        'word_count': '160-200',
        'instructions': """Write in a bullet-point style or list-like structure even without actual bullets.
"First, Venus is hot. Second, there is acid. Third, NASA has an idea."
List facts from the article without connecting them or showing relationships.
No clear evaluation of the author's argument - just recitation.
Miss the analytical component entirely.
Each sentence feels disconnected from the previous one."""
    },

    {
        'essay_id': 'SYNTH_V_10_S2',
        'score': 2,
        'target_error': 'Personal opinion intrusion',
        'word_count': '180-220',
        'instructions': """Start summarizing the article but then shift into personal opinions.
Use phrases like "I think we should explore Venus because..." or "I believe the author is right because I've always been interested in space..."
Include 2-3 paragraphs about the student's own views on space exploration.
Lose objectivity required for summary/evaluation.
Confuse personal response with evaluation of the author's support.
Mix "the author supports this" with "I agree because..." """
    },

    # ========================================
    # SCORE 3 EXAMPLES (8 total) - REFINED
    # ========================================

    {
        'essay_id': 'SYNTH_V_11_S3',
        'score': 3,
        'target_error': 'Good completeness, weak coherence',
        'word_count': '220-250',
        'instructions': """Cover ALL main ideas with GOOD specific details:
- Dangers: 800¬∞F temperature, sulfuric acid, 97% CO2 atmosphere, 90x pressure
- Solutions: NASA's blimp at 30 miles altitude, mechanical computers, silicon carbide
- Value: Venus may have had oceans, Earth's twin, scientific curiosity

CRITICAL - Good content but CHOPPY execution:
- Use ONLY weak transitions: "Also," "And," "Another thing," "So," "Plus"
- NEVER use: "Furthermore," "Additionally," "Moreover," "In conclusion"
- Present good ideas in somewhat random order - jump between topics
- Make each paragraph feel disconnected from the previous one

Example choppy style: "The author talks about Venus being super dangerous. It's like 800 degrees with sulfuric acid. Also NASA has this idea about blimps. And the temperature would be hot but survivable. Plus Venus might have had oceans."

Reader should think: "Good content but rough organization." DO NOT write polished conclusion."""
    },

    {
        'essay_id': 'SYNTH_V_12_S3',
        'score': 3,
        'target_error': 'Good accuracy/completeness, conciseness issues',
        'word_count': '320-350',
        'instructions': """Cover all main ideas with accurate, thorough detail.
Include all key facts with correct information.

CRITICAL - Good content but CHOPPY execution AND TOO LONG:
- Use ONLY weak transitions: "Also," "And," "Another thing," "So," "Plus"
- Be moderately too long (320-350 words)
- Include some unnecessary elaboration or minor tangential details
- Some repetition of ideas
- Could be tightened significantly without losing content
- Good substance but needs editing
- Somewhat random organization

Make it feel like the student knows the material well but wrote too much with choppy flow."""
    },

    {
        'essay_id': 'SYNTH_V_13_S3',
        'score': 3,
        'target_error': 'Good structure, minor accuracy lapses',
        'word_count': '220-250',
        'instructions': """Write with decent organization and appropriate length.
Include good coverage of main ideas.

CRITICAL - Good content but CHOPPY execution PLUS minor errors:
- Use ONLY weak transitions: "Also," "And," "So," "Plus"
- Include 2-3 minor factual errors (slightly wrong numbers)
- Example: say the blimp would be 20 miles up instead of 30, or say 80% carbon dioxide instead of 97%
- Errors are small enough that overall understanding is clear
- Somewhat choppy flow with weak transitions

Otherwise solid summary with adequate evaluation."""
    },

    {
        'essay_id': 'SYNTH_V_14_S3',
        'score': 3,
        'target_error': 'Adequate but mechanical',
        'word_count': '200-230',
        'instructions': """Hit all required elements in a formulaic way.
"The author supports this idea in three ways. First,... Second,... Third,..."

CRITICAL - Good content but CHOPPY execution AND mechanical:
- Use ONLY weak transitions: "Also," "And," "First," "Second," "Third"
- Very five-paragraph-essay structure that feels paint-by-numbers
- Overly simplistic sentence structure throughout (mostly simple sentences, few complex ones)
- Feels formulaic but technically complete
- Adequate but uninspired
- Choppy transitions between sections

Make it feel like following a template rather than natural writing."""
    },

    {
        'essay_id': 'SYNTH_V_15_S3',
        'score': 3,
        'target_error': 'Good content, weak introduction/conclusion',
        'word_count': '220-250',
        'instructions': """Write strong middle paragraphs with good detail about dangers, solutions, and value.
Include specific facts and evidence.

CRITICAL - Good content but CHOPPY execution PLUS weak framing:
- Use ONLY weak transitions in body: "Also," "And," "Plus," "So"
- Unclear or missing claim statement in introduction
- Introduction jumps straight into details without setting up the evaluation
- Abrupt ending or incomplete conclusion that doesn't tie ideas together
- The body is strong (score 4 content level) but framing is weak and choppy

Make the middle good but the beginning and end feel rough."""
    },

    {
        'essay_id': 'SYNTH_V_16_S3',
        'score': 3,
        'target_error': 'Imbalanced coverage',
        'word_count': '220-250',
        'instructions': """Write excellent, detailed coverage of the dangers (sulfuric acid, heat, pressure, etc.).
Then only 2-3 sentences total on NASA's solutions (very superficial).
Barely mention why Venus is scientifically valuable.

CRITICAL - Good content but CHOPPY execution AND imbalanced:
- Use ONLY weak transitions: "Also," "And," "Another thing"
- Show engagement with some sections but uneven attention
- Good depth in dangers, inadequate in solutions/value
- Choppy flow throughout
- Somewhat random organization

Make it obvious the student focused on one section and rushed through others."""
    },

    {
        'essay_id': 'SYNTH_V_17_S3',
        'score': 3,
        'target_error': 'Nearly good but with redundancy',
        'word_count': '260-290',
        'instructions': """Write with accurate information and decent structure.
Include good evaluation of the author's support.

CRITICAL - Good content but CHOPPY execution PLUS redundancy:
- Use ONLY weak transitions: "Also," "And," "Plus," "So"
- Repeat 2-3 points unnecessarily
- Example: mention the extreme heat in paragraph 2, then mention it again in paragraph 3 in similar words
- Some ideas stated twice without adding new information
- Choppy transitions throughout
- Could be excellent if tightened and smoothed

Make it feel like good understanding but needs editing for flow and conciseness."""
    },

    {
        'essay_id': 'SYNTH_V_18_S3',
        'score': 3,
        'target_error': 'Good summary with minor coherence gaps',
        'word_count': '220-250',
        'instructions': """Write comprehensive and accurate coverage.
Include good specific details.

CRITICAL - Good content but CHOPPY execution PLUS coherence hiccups:
- Use ONLY weak transitions: "Also," "And," "So," "Plus"
- Make one paragraph or section feel disconnected from the rest
- Include slightly confusing pronoun references (unclear antecedents)
- One transition that doesn't quite work
- Reader might need to reread one part to understand the connection
- Overall good but with noticeable choppiness

Make it feel like the content is there but organization could be smoother."""
    },

    # ========================================
    # SCORE 4 EXAMPLES (4 total) - UNCHANGED
    # ========================================

    {
        'essay_id': 'SYNTH_V_19_S4',
        'score': 4,
        'target_error': 'Excellent overall, slightly too lengthy',
        'word_count': '290-310',
        'instructions': """Write a clear, explicit evaluation of how well the author supports the argument.
Include comprehensive coverage of dangers (specific examples), solutions (blimp, mechanical computers), and scientific value.
Make it accurate throughout with good detail.
Organize well with smooth transitions.
BUT make it slightly longer than ideal (290-310 words).
Could be tightened by 40-60 words without losing substance.
Very strong work with only minor conciseness issue."""
    },

    {
        'essay_id': 'SYNTH_V_20_S4',
        'score': 4,
        'target_error': 'Very good but with minor conciseness issue',
        'word_count': '250-270',
        'instructions': """Write with excellent structure, accuracy, and completeness.
Include strong evaluation of the author's argument.
Make it clear and coherent throughout.
BUT include 1-2 sentences that could be tightened.
One slightly redundant point or phrase.
Example: might say both "very hot" and "extremely high temperatures" in close proximity.
Nearly perfect with just minor tightening needed."""
    },

    {
        'essay_id': 'SYNTH_V_21_S4',
        'score': 4,
        'target_error': 'Strong summary, minor accuracy detail',
        'word_count': '230-260',
        'instructions': """Write with excellent organization, completeness, and conciseness.
Include clear evaluation with strong supporting evidence.
Use smooth, sophisticated writing.
BUT include one small factual error that doesn't undermine the overall argument.
Example: say 85 times atmospheric pressure instead of 90, or 750¬∞F instead of 800¬∞F.
Error is minor enough that understanding remains strong.
Otherwise near-perfect."""
    },

    {
        'essay_id': 'SYNTH_V_22_S4',
        'score': 4,
        'target_error': 'Near-excellent but slightly mechanical',
        'word_count': '240-260',
        'instructions': """Hit all rubric criteria very well.
Make it accurate, complete, organized, and reasonably concise.
Include clear evaluation with good evidence.
BUT lack the sophisticated synthesis and insightful connections of score 5-6.
Be slightly formulaic in approach.
Very competent and thorough but doesn't have the "spark" of exceptional writing.
Very good but not quite excellent."""
    },

    # ========================================
    # SCORE 5 EXAMPLES (1 total) - UNCHANGED
    # ========================================

    {
        'essay_id': 'SYNTH_V_23_S5',
        'score': 5,
        'target_error': 'Excellent summary with very minor flaw',
        'word_count': '230-250',
        'instructions': """Write a clear, insightful evaluation of how the author builds their argument.
Include comprehensive coverage of all key evidence (dangers, NASA solutions, scientific value, mechanical computers, past missions).
Make all information accurate and well-synthesized.
Use excellent organization with smooth, sophisticated transitions.
Be appropriately concise (230-250 words) with no redundancy.
Show deep understanding and analytical thinking.
BUT include one VERY minor issue (e.g., two ideas that could be connected more explicitly, or one transition that's good but could be slightly smoother).
The flaw should be extremely subtle - this is nearly perfect work.
Should feel like strong high school or early college writing."""
    }
]

# ===========================================
# MAIN GENERATION FUNCTION
# ===========================================

def generate_all_synthetic_examples(save_path, delay=1.5):
    """Generate all 23 synthetic examples and save to CSV"""

    print("=" * 70)
    print(f"GENERATING {len(SYNTHETIC_EXAMPLES)} SYNTHETIC EXAMPLES")
    print("=" * 70)
    print(
        f"Estimated time: {len(SYNTHETIC_EXAMPLES) * 2:.0f} seconds "
        f"(~{len(SYNTHETIC_EXAMPLES) * 2 / 60:.0f} minutes)"
    )
    print()

    results = []

    for i, example_config in enumerate(SYNTHETIC_EXAMPLES, 1):
        print(
            f"[{i:2d}/{len(SYNTHETIC_EXAMPLES)}] "
            f"{example_config['essay_id']} (Score {example_config['score']})...",
            end=" ",
        )

        # ‚úÖ actually generate one example
        result = generate_synthetic_example(example_config)

        if result is not None:
            results.append(result)
            word_count = result["word_count"]
            print(f"‚úì ({word_count} words)")
        else:
            print("‚úó FAILED")

        # Rate limiting
        if i < len(SYNTHETIC_EXAMPLES):
            time.sleep(delay)

    # If all generations failed, bail out gracefully
    if not results:
        print(
            "\nNo synthetic examples were generated. "
            "Check the error messages above (likely an API or config issue)."
        )
        return pd.DataFrame()

    # Create DataFrame
    synthetic_df = pd.DataFrame(results)

    # Save to CSV
    synthetic_df.to_csv(save_path, index=False)

    print()
    print("=" * 70)
    print("‚úÖ GENERATION COMPLETE!")
    print("=" * 70)
    print(f"Generated: {len(results)}/{len(SYNTHETIC_EXAMPLES)} examples")
    print(f"üìÅ Saved to: {save_path}")
    print()

    print("Score distribution:")
    print(synthetic_df["score"].value_counts().sort_index())
    print()

    return synthetic_df

# ===========================================
# COMBINE WITH AUTHENTIC SAMPLES
# ===========================================

def combine_with_authentic(authentic_path, synthetic_df, output_path):
    """Combine authentic and synthetic samples into final validation set"""

    print("="*70)
    print("COMBINING WITH AUTHENTIC SAMPLES")
    print("="*70)

    # Load authentic samples
    authentic_df = pd.read_csv(authentic_path)

    # Add metadata columns to authentic samples
    authentic_df['synthetic_flag'] = False
    authentic_df['target_error_pattern'] = 'Authentic student work'
    authentic_df['generation_date'] = None
    authentic_df['generation_model'] = None
    authentic_df['word_count'] = authentic_df['full_text'].apply(lambda x: len(str(x).split()))

    # Combine
    combined_df = pd.concat([authentic_df, synthetic_df], ignore_index=True)

    # Shuffle to mix authentic and synthetic
    combined_df = combined_df.sample(frac=1, random_state=42).reset_index(drop=True)

    # Save
    combined_df.to_csv(output_path, index=False)

    print()
    print("‚úÖ COMBINED DATASET CREATED!")
    print(f"   Authentic samples: {len(authentic_df)} ({len(authentic_df)/len(combined_df)*100:.1f}%)")
    print(f"   Synthetic samples: {len(synthetic_df)} ({len(synthetic_df)/len(combined_df)*100:.1f}%)")
    print(f"   Total samples: {len(combined_df)}")
    print()
    print(f"üìÅ Saved to: {output_path}")
    print()
    print("Final score distribution:")
    print(combined_df['score'].value_counts().sort_index())
    print()

    return combined_df

Mounted at /content/drive


In [8]:
# ===========================================
# QUICK SANITY CHECK (RUNS WHEN YOU EXECUTE THIS CELL)
# ===========================================

print("Running quick sanity check with one synthetic example...")
test_config = SYNTHETIC_EXAMPLES[0]
test_sample = generate_synthetic_example(test_config)

print("Result is None?", test_sample is None)
if test_sample:
    print("Sample word_count:", test_sample["word_count"])
    print(test_sample["full_text"][:400], "...")
    # Optional: stop here while debugging
    # import sys
    # sys.exit("Stopping after sanity check.")

Running quick sanity check with one synthetic example...
Result is None? False
Sample word_count: 159
Venus is super bright in the sky, like one of the best stars you can see. Also, the article says it's really hot there, hotter than the sun or something. I don't remember exactly why scientists want to go to Venus because of the dangers. They said it has crazy acid clouds, I think. And NASA went there already and they sent robots that got crushed in like minutes. That's why it's so hard to study V ...


In [9]:
# ===========================================
# RUN COMPLETE GENERATION
# ===========================================

if __name__ == "__main__":
    print("\n" + "="*70)
    print("SYNTHETIC EXAMPLES GENERATION - REFINED VERSION")
    print("="*70)
    print()

    # Step 1: Generate synthetic examples
    synthetic_df = generate_all_synthetic_examples(SYNTHETIC_PATH, delay=1.5)

    if synthetic_df.empty:
        print("‚ùå Skipping combination because no synthetic examples were generated.")
    else:
        # Step 2: Combine with authentic samples
        final_df = combine_with_authentic(AUTHENTIC_PATH, synthetic_df, COMBINED_PATH)
        print("="*70)
        print("üéâ ALL DONE!")
        print("="*70)
        print()
        print("Next steps:")
        print("1. ‚úÖ Review synthetic examples for quality")
        print("2. ‚úÖ Regenerate any that need adjustment")
        print("3. ‚úÖ Proceed to Phase 2: Expert Rating")
        print()
        print(f"Your validation set is ready: {COMBINED_PATH}")
        print()


SYNTHETIC EXAMPLES GENERATION - REFINED VERSION

GENERATING 23 SYNTHETIC EXAMPLES
Estimated time: 46 seconds (~1 minutes)

[ 1/23] SYNTH_V_01_S1 (Score 1)... ‚úì (139 words)
[ 2/23] SYNTH_V_02_S1 (Score 1)... ‚úì (86 words)
[ 3/23] SYNTH_V_03_S1 (Score 1)... ‚úì (240 words)
[ 4/23] SYNTH_V_04_S2 (Score 2)... ‚úì (180 words)
[ 5/23] SYNTH_V_05_S2 (Score 2)... ‚úì (210 words)
[ 6/23] SYNTH_V_06_S2 (Score 2)... ‚úì (240 words)
[ 7/23] SYNTH_V_07_S2 (Score 2)... ‚úì (485 words)
[ 8/23] SYNTH_V_08_S2 (Score 2)... ‚úì (225 words)
[ 9/23] SYNTH_V_09_S2 (Score 2)... ‚úì (206 words)
[10/23] SYNTH_V_10_S2 (Score 2)... ‚úì (261 words)
[11/23] SYNTH_V_11_S3 (Score 3)... ‚úì (219 words)
[12/23] SYNTH_V_12_S3 (Score 3)... ‚úì (373 words)
[13/23] SYNTH_V_13_S3 (Score 3)... ‚úì (273 words)
[14/23] SYNTH_V_14_S3 (Score 3)... ‚úì (216 words)
[15/23] SYNTH_V_15_S3 (Score 3)... ‚úì (258 words)
[16/23] SYNTH_V_16_S3 (Score 3)... ‚úì (252 words)
[17/23] SYNTH_V_17_S3 (Score 3)... ‚úì (269 words)
[18/23] SY

In [10]:
combined_df = pd.read_csv(COMBINED_PATH)

print(combined_df.shape)  # should be (60, ...)
print(combined_df["synthetic_flag"].value_counts())
print(combined_df["score"].value_counts().sort_index())

# Look at a few synthetic rows
combined_df[combined_df["synthetic_flag"]].head()[["essay_id", "score", "word_count"]]

(60, 19)
synthetic_flag
False    37
True     23
Name: count, dtype: int64
score
1     8
2    18
3    20
4    10
5     3
6     1
Name: count, dtype: int64


Unnamed: 0,essay_id,score,word_count
3,SYNTH_V_09_S2,2,206
5,SYNTH_V_18_S3,3,260
7,SYNTH_V_12_S3,3,373
9,SYNTH_V_21_S4,4,238
10,SYNTH_V_10_S2,2,261


In [11]:
synthetic = combined_df[combined_df["synthetic_flag"]]

# Check word-count ranges by score
synthetic.groupby("score")["word_count"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
score,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,3.0,155.0,78.23682,86.0,112.5,139.0,189.5,240.0
2,7.0,258.142857,103.31413,180.0,208.0,225.0,250.5,485.0
3,8.0,265.0,48.5563,216.0,243.75,259.0,270.0,373.0
4,4.0,252.25,18.391574,238.0,241.75,246.0,256.5,279.0
5,1.0,235.0,,235.0,235.0,235.0,235.0,235.0
