<a href="https://colab.research.google.com/github/Gayathri-achari/AES/blob/Projects/AES.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
import pandas as pd
import numpy as np
import nltk
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Download NLTK data if not already present
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

# Load data
# Assume CSV in archive (2).zip has column 'full_text' for essays and 'score' for scores
data = pd.read_csv('/archive (2).zip')

print("Columns in the DataFrame:")
print(data.columns)

# Feature extraction
def extract_features(essay):
    words = nltk.word_tokenize(essay)
    sentences = nltk.sent_tokenize(essay)
    return {
        'word_count': len(words),
        'sentence_count': len(sentences),
        'average_word_length': np.mean([len(word) for word in words]) if words else 0
    }

# Create feature set
features = data['full_text'].apply(extract_features).tolist()
features_df = pd.DataFrame(features)

# Prepare data and labels
X = features_df
y = data['score']

# Train model with multiple epochs - each epoch with different train/test split random_state
num_epochs = 5
mse_list = []
model = None

for epoch in range(num_epochs):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42 + epoch)
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    mse_list.append(mse)
    print(f"Epoch {epoch + 1}/{num_epochs} - Mean Squared Error: {mse:.4f}")

print("\nTraining complete. You can now enter essays to get a predicted score.")
print("Type 'exit' to quit.")

def score_essay(essay_text):
    features = extract_features(essay_text)
    features_df = pd.DataFrame([features])
    return model.predict(features_df)[0]

while True:
    user_input = input("\nEnter an essay to be scored (or type 'exit' to quit): ").strip()
    if user_input.lower() == 'exit':
        print("Exiting program.")
        break
    if not user_input:
        print("Empty input. Please enter a valid essay text.")
        continue
    predicted_score = score_essay(user_input)
    print(f"Predicted score: {predicted_score:.2f}")




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Columns in the DataFrame:
Index(['essay_id', 'full_text', 'score'], dtype='object')
Epoch 1/5 - Mean Squared Error: 0.5651
Epoch 2/5 - Mean Squared Error: 0.5811
Epoch 3/5 - Mean Squared Error: 0.5792
Epoch 4/5 - Mean Squared Error: 0.5914
Epoch 5/5 - Mean Squared Error: 0.5845

Training complete. You can now enter essays to get a predicted score.
Type 'exit' to quit.

Enter an essay to be scored (or type 'exit' to quit): The Indian freedom struggle is one of the most significant progress in the history of India. In 1600, the Britishers entered India in the name of trade-specific items like tea, cotton and silk and started ruling our country. Later on, they started ruling our country and made our Indian people their slaves. So, our country has to face the most challenging times to gain independence from British rule. In 1857, the first movement against the British was initiated by Mangal Pandey, an Indian soldier.
Predicted score: 1.56

Enter an essay to be scored (or type 'exit' to qu

In [9]:
import pandas as pd
import numpy as np
import nltk
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Download NLTK data if not already present
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')
# Load data
data = pd.read_csv('/archive (2).zip')
print("Columns in the DataFrame:")
print(data.columns)
# Feature extraction
def extract_features(essay):
    words = nltk.word_tokenize(essay)
    sentences = nltk.sent_tokenize(essay)
    return {
        'word_count': len(words),
        'sentence_count': len(sentences),
        'average_word_length': np.mean([len(word) for word in words]) if words else 0
    }
# Create feature set
features = data['full_text'].apply(extract_features).tolist()
features_df = pd.DataFrame(features)
# Prepare data and labels
X = features_df
y = data['score']
# Train model with multiple epochs
num_epochs = 5
mse_list = []
mae_list = []
r2_list = []
model = None
for epoch in range(num_epochs):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42 + epoch)
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    # Calculate evaluation metrics
    mse = mean_squared_error(y_test, predictions)
    mae = mean_absolute_error(y_test, predictions)
    r2 = r2_score(y_test, predictions)

    mse_list.append(mse)
    mae_list.append(mae)
    r2_list.append(r2)
    print(f"Epoch {epoch + 1}/{num_epochs} - MSE: {mse:.4f}, MAE: {mae:.4f}, R²: {r2:.4f}")
print("\nTraining complete. You can now enter essays to get a predicted score.")
print("Type 'exit' to quit.")
def score_essay(essay_text):
    features = extract_features(essay_text)
    features_df = pd.DataFrame([features])
    return model.predict(features_df)[0]
while True:
    user_input = input("\nEnter an essay to be scored (or type 'exit' to quit): ").strip()
    if user_input.lower() == 'exit':
        print("Exiting program.")
        break
    if not user_input:
        print("Empty input. Please enter a valid essay text.")
        continue
    predicted_score = score_essay(user_input)
    print(f"Predicted score: {predicted_score:.2f}")

Columns in the DataFrame:
Index(['essay_id', 'full_text', 'score'], dtype='object')
Epoch 1/5 - MSE: 0.5651, MAE: 0.5758, R²: 0.4878
Epoch 2/5 - MSE: 0.5811, MAE: 0.5797, R²: 0.4698
Epoch 3/5 - MSE: 0.5792, MAE: 0.5794, R²: 0.4639
Epoch 4/5 - MSE: 0.5914, MAE: 0.5840, R²: 0.4490
Epoch 5/5 - MSE: 0.5845, MAE: 0.5786, R²: 0.4848

Training complete. You can now enter essays to get a predicted score.
Type 'exit' to quit.

Enter an essay to be scored (or type 'exit' to quit): Seasonal Festivals In India, most festivals are seasonal in nature. They announce the change in the season and mark the harvesting seasons. All the seasonal festivals are celebrated during two harvesting seasons, Kharif and Rabi. Besides, spring is another period of seasonal festivals. In Punjab, the Lohri festival indicates the harvesting of the winter crop. Pongal, Bihu and Onam celebrations mark the harvesting of paddy crops. Similarly, Holi and Baisakhi are celebrated to mark the harvesting of new rabi crops. Thus,