### About Dataset
In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose.

This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM.

The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

Dataset contains more than 28,000 essay written by student and AI generated.

* Features :

`text` : Which contains essay text

`generated` : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay

##### Importing data from drive

In [1]:
! gdown 15tLbPLssLAGKFlolQgY83vCsfrcVDMAF

# Unzip file
! unzip llm.zip

Downloading...
From: https://drive.google.com/uc?id=15tLbPLssLAGKFlolQgY83vCsfrcVDMAF
To: /content/llm.zip
100% 19.5M/19.5M [00:00<00:00, 41.0MB/s]
Archive:  llm.zip
  inflating: Training_Essay_Data.csv  


### Importing libraries

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns

In [4]:
essay = pd.read_csv("Training_Essay_Data.csv")
essay.head()

Unnamed: 0,text,generated
0,Car-free cities have become a subject of incre...,1
1,"Car Free Cities Car-free cities, a concept ga...",1
2,A Sustainable Urban Future Car-free cities ...,1
3,Pioneering Sustainable Urban Living In an e...,1
4,The Path to Sustainable Urban Living In an ...,1


In [5]:
# Checking length of dataframe
len(essay)

29145

In [7]:
# Checking for missing texts & labels
essay.isnull().sum()

text         0
generated    0
dtype: int64

### Data Preprocessing


In [11]:
# Creating a function to remove punctuations from text data

import string
def remove_punctuations(text):
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator)

In [12]:
essay["text"] = essay["text"].apply(remove_punctuations)

In [13]:
# Checking text for punctuations
essay["text"][0]

'Carfree cities have become a subject of increasing interest and debate in recent years as urban areas around the world grapple with the challenges of congestion pollution and limited resources The concept of a carfree city involves creating urban environments where private automobiles are either significantly restricted or completely banned with a focus on alternative transportation methods and sustainable urban planning This essay explores the benefits challenges and potential solutions associated with the idea of carfree cities  Benefits of CarFree Cities  Environmental Sustainability Carfree cities promote environmental sustainability by reducing air pollution and greenhouse gas emissions Fewer cars on the road mean cleaner air and a significant decrease in the contribution to global warming  Improved Public Health A reduction in automobile usage can lead to better public health outcomes Fewer cars on the road result in fewer accidents and a safer urban environment for pedestrians 

### Vectorizing text

In [25]:
# Splitting data into features & labels
x = essay["text"]
y = essay["generated"]

len(x), len(y)

(29145, 29145)

In [26]:
# Splitting data into training & test sets

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,
                                                    y,
                                                    test_size=0.2)

In [31]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
xv_train = vectorizer.fit_transform(x_train)
xv_test = vectorizer.transform(x_test)

In [32]:
xv_train.shape, xv_test.shape

((23316, 75353), (5829, 75353))

### Fitting a Logistic Regression model

In [33]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(xv_train,y_train)
lr.score(xv_test, y_test)

0.9849030708526334

### Evaluating the Model

In [34]:
from sklearn.metrics import classification_report

y_pred = lr.predict(xv_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.99      0.99      3557
           1       0.99      0.97      0.98      2272

    accuracy                           0.98      5829
   macro avg       0.99      0.98      0.98      5829
weighted avg       0.98      0.98      0.98      5829



#### Creating a function to manually test model predictions

In [39]:
def output_label(n):
    if n == 0:
        return "Text is Human Written"
    elif n == 1:
        return "Text is AI generated"

def manual_testing(essays):
    testing_essay = {"text":[essays]}
    new_def_test = pd.DataFrame(testing_essay)
    new_def_test["text"] = new_def_test["text"].apply(remove_punctuations)
    new_x_test = new_def_test["text"]
    new_xv_test = vectorizer.transform(new_x_test)
    pred_LR = lr.predict(new_xv_test)



    return print("\n\nLR Prediction: {}" .format(output_label(pred_LR[0]), output_label(pred_LR[0]),))

In [40]:
text = str(input())
manual_testing(text)

Carfree cities have become a subject of increasing interest and debate in recent years as urban areas around the world grapple with the challenges of congestion pollution and limited resources The concept of a carfree city involves creating urban environments where private automobiles are either significantly restricted or completely banned with a focus on alternative transportation methods and sustainable urban planning This essay explores the benefits challenges and potential solutions associated with the idea of carfree cities  Benefits of CarFree Cities  Environmental Sustainability Carfree cities promote environmental sustainability by reducing air pollution and greenhouse gas emissions Fewer cars on the road mean cleaner air and a significant decrease in the contribution to global warming  Improved Public Health A reduction in automobile usage can lead to better public health outcomes Fewer cars on the road result in fewer accidents and a safer urban environment for pedestrians a