# Statistical Analysis 5: Question Type Difficulty and Average Grades

This notebook investigates whether some question types are easier to answer than others by analyzing the average band scores across different prompt clusters.

## Key Findings

The analysis reveals **significant variation in difficulty across question types**, with average band scores ranging from approximately 6.04 to 6.46. While this may seem like a small range, it represents meaningful differences in essay quality and student performance across different prompt themes.

**Easier topics** tend to involve familiar subjects like government spending priorities, while **more challenging topics** often require nuanced discussion of abstract concepts like the dual purposes of cultural institutions.

## Imports and Setup

---

In [2]:
import pandas as pd
from os import path
import sys

sys.path.append(path.dirname(path.abspath("")))
project_root = path.dirname(path.abspath(""))
print(project_root)

/Users/finnferchau/dev/team-10


In [3]:
pd.options.plotting.backend = "plotly"

## Data Import

---

Load the clustered essays dataset containing essays, prompts, evaluations, band scores, and cluster assignments.

In [4]:
csv_file = "/data/clean_clustered_train.csv"
csv_file_path = project_root + csv_file
print(csv_file_path)

df = pd.read_csv(csv_file_path)
df.sample(5)

/Users/finnferchau/dev/team-10/data/clean_clustered_train.csv


Unnamed: 0,prompt,essay,evaluation,band_score_old,task_achievement_description,task_achievement_score,coherence_and_cohesion_description,coherence_and_cohesion_score,lexical_resource_description,lexical_resource_score,grammatical_range_and_accuracy_description,grammatical_range_and_accuracy_score,overall_band_score_description,band_score,cluster
2646,Some people think that the government should e...,"In the modern world, there are opinions that t...",## Task Achievement:\nThe candidate has effect...,6.5\r\r\r\r\r,Task Achievement: The candidate has effectivel...,7.0,Coherence and Cohesion: The essay demonstrates...,7.0,Lexical Resource (Vocabulary): The essay exhib...,6.5,Grammatical Range and Accuracy: The essay disp...,6.0,Overall Band Score: **6.5** The essay demonstr...,6.5,3
6034,Some people believe that eventually all jobs w...,"These days, artificially intelligent robots ar...",**Task Achievement:**\n\n- The essay adequatel...,<4\r\r\r,Task Achievement:** - The essay adequately add...,4.0,Coherence and Cohesion:** - The essay is gener...,4.0,Lexical Resource (Vocabulary):** - The essay d...,3.5,Grammatical Range and Accuracy:** - The essay ...,3.5,Overall Band Score:** Considering the holistic...,4.0,0
3605,People nowadays tend to have children at older...,"In the present world, many couples prefer to g...",**Task Achievement: 4.5**\n\nThe essay address...,4.5\r\r,Task Achievement: 4.5** The essay addresses th...,4.5,Coherence and Cohesion: 4.0** The essay is wel...,4.0,Lexical Resource (Vocabulary): 4.0** The write...,4.0,Grammatical Range and Accuracy: 4.0** The essa...,4.0,"Overall Band Score: 4.5** Overall, the essay i...",4.5,4
8527,Some people think that museums should be enjoy...,There has been much discussion revolving aroun...,## Task Achievement:\n- The candidate has adeq...,4\r,Task Achievement: - The candidate has adequate...,4.0,Coherence and Cohesion: - Transitions between ...,4.0,Lexical Resource (Vocabulary): - The vocabular...,3.5,Grammatical Range and Accuracy: - There are so...,3.5,Overall Band Score: - Considering the holistic...,4.0,19
8988,"As well as making money, businesses also have ...",I believe that the main purpose of businesses ...,**Task Achievement:**\n- The candidate has ade...,7.5,Task Achievement:** - The candidate has adequa...,7.0,Coherence and Cohesion:** - Transitions betwee...,7.0,Lexical Resource (Vocabulary):** - The candida...,7.0,Grammatical Range and Accuracy:** - Sentence s...,7.0,Overall Band Score:** - The essay demonstrates...,7.5,25


## Overall Band Score Distribution

---

### Basic Statistics

Calculate and display fundamental statistics for the overall band score distribution, including mean, median, minimum, and maximum values.

In [5]:
# Create a frequency count of band scores
band_counts = df["band_score"].value_counts().sort_index()

# Get some basic statistics
print(f"Mean band score: {df['band_score'].mean():.2f}")
print(f"Median band score: {df['band_score'].median():.2f}")
print(f"Min band score: {df['band_score'].min():.2f}")
print(f"Max band score: {df['band_score'].max():.2f}")

Mean band score: 6.25
Median band score: 6.50
Min band score: 4.00
Max band score: 8.50


### Band Score Frequency Distribution

Visualize the frequency distribution of band scores using a bar chart to understand the overall scoring patterns in the dataset.

In [6]:
band_counts.plot(kind="bar")

## Cluster-Based Difficulty Analysis

---

### Average Band Scores by Prompt Cluster

Calculate mean, count, and standard deviation of band scores for each prompt cluster, then rank clusters from highest to lowest average scores to identify easier and more difficult question types.

In [7]:
cluster_avg = (
    df.groupby("cluster")["band_score"].agg(["mean", "count", "std"]).reset_index()
)
cluster_avg = cluster_avg.sort_values("mean", ascending=False)

print(f"Number of clusters: {len(cluster_avg)}")
print("\nTop 5 highest-scoring clusters:")
print(cluster_avg.head(5))
print("\nBottom 5 lowest-scoring clusters:")
print(cluster_avg.tail(5))

Number of clusters: 30

Top 5 highest-scoring clusters:
    cluster      mean  count       std
28       28  6.460938    192  1.243743
17       17  6.413043    207  1.245253
8         8  6.392544    456  1.204602
18       18  6.386441    295  1.162694
20       20  6.377432    257  1.254506

Bottom 5 lowest-scoring clusters:
    cluster      mean  count       std
2         2  6.134568    405  1.217565
11       11  6.120763    236  1.239952
14       14  6.111399    193  1.319633
16       16  6.092014    576  1.204429
19       19  6.041475    217  1.205940


### Barplot of the Mean Band Score per Prompt Cluster

In [8]:
# Calculate mean by cluster, sorted by mean band score (highest to lowest)
cluster_stats = df.groupby("cluster")["band_score"].agg(["mean", "count"]).reset_index()
cluster_stats = cluster_stats.sort_values("mean", ascending=True)
cluster_stats["cluster"] = cluster_stats["cluster"].astype(str)
cluster_stats["cluster"] = pd.Categorical(
    cluster_stats["cluster"], categories=cluster_stats["cluster"], ordered=True
)

# Create figure
import plotly.graph_objects as go

fig = go.Figure()

# Add mean bars
fig.add_trace(
    go.Bar(
        x=cluster_stats["cluster"],
        y=cluster_stats["mean"],
        name="Mean Band Score",
        marker_color="lightblue",
        opacity=0.8,
    )
)

# Set axis titles
fig.update_xaxes(title_text="Cluster ID")
fig.update_yaxes(title_text="Mean Band Score")

# Update layout
fig.update_layout(
    title="Mean Band Scores by Prompt Cluster (Sorted by Mean Score)",
    width=1000,
    height=600,
    showlegend=False,
)

fig.show()

## Question Type Examples

---

### Easiest Question Type (Cluster 19)

Display a sample prompt and essay from the highest-scoring cluster to illustrate the characteristics of easier question types. This cluster focuses on government spending on arts vs. health/education.

In [9]:
cluster_19 = df[df["cluster"] == 19]

sample = cluster_19.sample(1).iloc[0]
print("Prompt: ")
print(sample["prompt"])
print("\nEssay: ")
print(sample["essay"])

Prompt: 
Some people think that museums should be enjoyable places to entertain people, while others believe that the purpose of museums is to educate. Discuss both views and give your own opinion.

Essay: 
People have views that museums should focus on entertainment, while there are some opponents who argue that the main function of museums is education. I believe that museums should have both two points. On the one hand, it can be argued that the main role of museums should be entertained. The reason why for this is that museums usually collect a variety of exhibitions that attract considerable tourists who want to see these objects. And then, if museums only give visitors the dull data, such as figures, texts and the others to introduce these exhibits, people will become tired and bored quickly. As a result, they may not be interested in museums any more. That is why currently some museums provide some enjoyable facilities such as toys, VR glasses and 3D movies to improve museums' e

### Most Difficult Question Type (Cluster 19)

Show a sample prompt and essay from the lowest-scoring cluster to demonstrate the features of more challenging question types. This cluster deals with the purpose of museums (entertainment vs. education).

In [10]:
cluster_28 = df[df["cluster"] == 28]

sample = cluster_28.sample(1).iloc[0]
print("Prompt: ")
print(sample["prompt"])
print("\nEssay: ")
print(sample["essay"])

Prompt: 
Some people think that illegal internet download are having a negative effect on the music industry. Others feel that they have little or no impact on artists. Discuss both views and give own opinion.

Essay: 
Since the introduction of the internet, people have always debated whether the illegal usage of such technologies will outweigh the advances it will bring forth. One of the main topics to talk about is the music industry. It is clear that no matter what is done unauthorized usage of tracks can not be controlled, yet the question that remains is "Does it have a profound effect on the whole industry?". Firstly, it should be noted that the phenomenon definitely has a negative sum result. Some people stand behind the idea that such actions will hurt the music. The line of thought they have is, that if an artist is making less money as a result of an individual not paying to buy the track, they will have less to produce the next one. This is true, especially in the case of ne

### [`Click here to go back to the Homepage`](../Homepage.md)