# The Hewlett Foundation: Automated Essay Scoring Dataset

## Description
This dataset was created for the William and Flora Hewlett Foundation (Hewlett) Automated Student Assessment Prize (ASAP). The dataset is hand scored essays for 8 different prompts.

### Data Sources

- [Link to Kaggle](https://www.kaggle.com/competitions/asap-aes/overview)
- [Link to Data Source](https://osf.io/9fdrw/)

In [28]:
!python3 -m pip install openpyxl
import zipfile
import os
import pandas as pd

In [2]:
# Extract files from dataset

with zipfile.ZipFile('asap-aes.zip', 'r') as zip_ref:
    zip_ref.extractall('asap-aes')

with zipfile.ZipFile('./asap-aes/Essay_Set_Descriptions.zip', 'r') as zip_ref:
    zip_ref.extractall('descriptions')

In [46]:
data_filepath = './asap-aes'

# Load train, test, and validation data into pandas dataframes
train = pd.read_csv(os.path.join(data_filepath, 'training_set_rel3.tsv'), delimiter='\t', encoding='ISO-8859-1')
test = pd.read_csv(os.path.join(data_filepath, 'test_set.tsv'), delimiter='\t', encoding='ISO-8859-1')
validation = pd.read_csv(os.path.join(data_filepath, 'valid_set.tsv'), delimiter='\t', encoding='ISO-8859-1')

In [47]:
# Only keep essay, essay_set, and essay_id columns
# For now, let's use the train set because it has the score (which we can use for stratified sampling)
essay_data = train[['essay_id', 'essay_set', 'essay', 'domain1_score']]
#test = test[['essay_id', 'essay_set', 'essay']]
#validation = validation[['essay_id', 'essay_set', 'essay']]

In [48]:
prompt_data = pd.read_excel('descriptions/Essay_Set_Descriptions/essay_set_descriptions.xlsx', sheet_name='Sheet1', header=0)

In [49]:
prompt_data = prompt_data[['essay_set', 'grade_level']]
prompt_data['type'] = ['persuasive', 'persuasive', 'litanalysis', 'litanalysis', 'litanalysis', 'litanalysis', 'narrative', 'narrative']
prompt_data['prompt'] = ['More and more people use computers, but not everyone agrees that this benefits society. Those who support advances in technology believe that computers have a positive effect on people. They teach hand-eye coordination, give people the ability to learn about faraway places and people, and even allow people to talk online with other people. Others have different ideas. Some experts are concerned that people are spending too much time on their computers and less time exercising, enjoying nature, and interacting with family and friends. Write a letter to your local newspaper in which you state your opinion on the effects computers have onpeople. Persuade the readers to agree with you.',
                         '"All of us can think of a book that we hope none of our children or any other children have taken off the shelf. But if I have the right to remove that book from the shelf -- that work I abhor -- then you also have exactly the same right and so does everyone else. And then we have no books left on the shelf for any ofus." --Katherine Paterson, Author \nWrite a persuasive essay to a newspaper reflecting your views on censorship in libraries. Do you believe that certain materials, such as books, music, movies, magazines, etc., should be removed from the shelves if they are found offensive? Support your position with convincing arguments from your own experience, observations, and/or reading.',
                         'Read "ROUGH ROAD AHEAD: Do Not Exceed Posted Speed Limit" by Joe Kurmaskie. Write a response that explains how the features of the setting affect the cyclist. In your response, include examples from the essay that support your conclusion.',
                         'Read "Winter Hibiscus" by Minfong Ho. Read the last paragraph of the story. \n"When they come back, Saeng vowed silently to herself, in the spring, when the snows melt and the geese return and this hibiscus is budding, then I will take that test again." \nWrite a response that explains why the author concludes the story with this paragraph. In your response, include details and examples from the story that support your ideas.',
                         'Read "Narciso Rodriguez" by Narciso Rodriguez. Describe the mood created by the author in the memoir. Support your answer with relevant and specific information from the memoir.',
                         'Read the excerpt from "The Mooring Mast" by Marcia Amidon Lüsted. Based on the excerpt, describe the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. Support your answer with relevant and specific information from the excerpt.',
                         'Write about patience. Being patient means that you are understanding and tolerant. A patient person experience difficulties without complaining. Do only one of the following: write a story about a time when you were patient OR write a story about a time when someone you know was patient OR write a story in your own way about patience.',
                         'We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part.']
prompt_data.to_csv('prompt_data.csv', index=False)
prompt_data.to_csv('prompt_data.csv', index=False)
merged_data = pd.merge(essay_data, prompt_data, on='essay_set', how='left')
merged_data.to_csv('full_dataset.csv', index=False)

TOTAL ESSAYS:
- 3583 essays in `train`
- 1194 essays in `test`
- 1189 essays in `validation`