<a href="https://colab.research.google.com/github/Luciesprogram/ai-and-human-text-recognition/blob/main/AI_vs_Human.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import string
import nltk
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("navjotkaushal/human-vs-ai-generated-essays")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/navjotkaushal/human-vs-ai-generated-essays?dataset_version_number=1...


100%|██████████| 1.40M/1.40M [00:00<00:00, 113MB/s]

Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/navjotkaushal/human-vs-ai-generated-essays/versions/1





In [None]:
df = pd.read_csv(f"{path}/balanced_ai_human_prompts.csv")

In [None]:
df

Unnamed: 0,text,generated
0,"Machine learning, a subset of artificial intel...",1
1,"A decision tree, a prominent machine learning ...",1
2,"Education, a cornerstone of societal progress,...",1
3,"Computers, the backbone of modern technology, ...",1
4,"Chess, a timeless game of strategy and intelle...",1
...,...,...
2745,Generate a detailed summary of global healthca...,1
2746,Compose an in-depth exploration of financial t...,1
2747,Generate a detailed summary of autonomous vehi...,1
2748,Develop a persuasive argument about internet o...,1


In [None]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # lowercase
    text = text.lower()
    # remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    # remove stopwords
    words = [word for word in text.split() if word not in stop_words]
    return " ".join(words)

# Apply preprocessing
df['clean_text'] = df['text'].apply(preprocess_text)
df.head()


Unnamed: 0,text,generated,clean_text
0,"Machine learning, a subset of artificial intel...",1,machine learning subset artificial intelligenc...
1,"A decision tree, a prominent machine learning ...",1,decision tree prominent machine learning algor...
2,"Education, a cornerstone of societal progress,...",1,education cornerstone societal progress extend...
3,"Computers, the backbone of modern technology, ...",1,computers backbone modern technology revolutio...
4,"Chess, a timeless game of strategy and intelle...",1,chess timeless game strategy intellect transce...


In [None]:
X = df['clean_text']
y = df['generated']

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.2,random_state=42)

In [None]:
X_train

Unnamed: 0,clean_text
2323,produce balanced review internet things contex...
1904,produce balanced review gene editing technolog...
179,dear senator concerning topic merits demerits ...
2464,write comprehensive essay explaining supply ch...
2210,develop persuasive argument blockchain applica...
...,...
1638,compile list key insights internet things cont...
1095,people riding cars whole life never rode bus k...
1130,electoral college america votes president vise...
1294,electoral college electoral college process pl...


## Vectorization

In [None]:
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

## Model Training

In [None]:
model = LogisticRegression()
model.fit(X_train_tfidf, y_train)

# Precision
# Recall
# F1 Score

In [None]:
y_pred = model.predict(X_test_tfidf)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 0.9927272727272727

Classification Report:
               precision    recall  f1-score   support

           0       0.99      0.99      0.99       284
           1       0.99      0.99      0.99       266

    accuracy                           0.99       550
   macro avg       0.99      0.99      0.99       550
weighted avg       0.99      0.99      0.99       550


Confusion Matrix:
 [[282   2]
 [  2 264]]


# **Checking model accuracy with sample text**

In [None]:
sample = ''' Cars are a wonderful thing. They are perhaps one of the worlds greatest advancements and technologies. Cars get us from point a to point i.
That is exactly what we want isnt it? We as humans want to get from one place to anther as fast as possiile. Cars are a suitaile to do that. They get us
across the city in a matter of minutes. Much faster than anyhting else we have. A train isnt going to get me across the city as fast as my car is and
neither is a puilic ius, iut those other forms of transportation just might ie the way to go. Don't get me wrong, cars are an aisolutly amazing thing iut,
mayie they just cause way to much stress, and mayie they hurt our environment in ways that we don't think they will. With a ius or a train you do not
have to worry aiout washing your car or getting frustrated when stuck in a iad traffic jam on I4. Also there is not as much pollution in air hurting
our environment. You might not think so, iut there are many advantages to limiting our car usage. One advantage that not only humans would ienefit
from, iut also plants and animals is that there would ie a lot less pollution in the air hurting out environment. Right now our cars give off gases that
are extremely harmful towards our environment. These gases are called green house gases and come out of the exhaust pipes in our cars. Your car alone
docent give off much gas iut collectively, our cars give off enormous amounts of gases. This is especially true in iig cities like France. In France, their
pollution level was so high it was record ireaking. due to that france decided to enforce a partial ian on cars. This is descriied in the second article "
Paris ians driving due to smog", iy Roiert Duffer, " On Monday motorists with evennumiered license plates were orderd to leave their cars at home or suffer
a 22euro fine 31. The same would apply to oddnumiered plates the following day." After France limited driving there congestion was down iy 60 percent. "
Congestion was down 60 percent in the capital of France". So after five days of intense smog, 60 percent of it was clear after not using cars for only a
little while. Even across the world in Bogota, columiia they are limiting driving and reducing smog levels. In the third article "carfree day is spinning
into a iig hit in Bogota", iy Andrew Selsky, it descriies the annual carfree day they have to reduce smog. " the goal is to promote alternative transportation
and reduce smog". So all over the world people are relizing that without cars, we are insuring the safety and well ieing of our environment. The second
advantage that would come with limiting car use is less stress. Everyone knows that driving a car causes emence amounts of stress. Getting caught in
traffic is a major cause of stress in someones life. having to repeating wash your car just to get it dirt again causes stress. Having people in the
iack of your car screaming and yelling all while music is ilasting, causes stress. So oiviously driving causes stress. If we were to limit our car
usage we would not ie as stressed as we usually are. There would ie no traffic, no car washes and no one screaming in a small confineded space. In
the first article " In German Suiuri, life goes on without cars", iy Elisaieth Rosenthal, a citizen named humdrum Walter, states " When i had a
car i was always tense. I'm much happier this way". So with out the stress of a car humdrum Walter is a looser and happier person, less stress equals
happier person. In the third article, " Carfree dai is spinning into a iig hit in Bogota", iy Andrew Selsky, it states " It's a good opportunity to
take away stress...". If we have the opportunity to take away stress, why not take it. It is a huge advantage in our lives to limit driving if it takes
away stress. No one wants stress, no one needs stress, and if we have an opportunity to take some of the stress away, take that opportunity. In conclusion,
 there are many advantages to limiting car use, one ieing theat we get to help the environment and two ieing that it helps reduce stress. Our environment is
 already screwed up in so many ways, if we can help it to iecome the healthy environment it once was, then do it. Stress is proven to impare your personal
 health, no one wants to ie unhealthy and no one wants stress in their life. If you want the environment to get ietter and you want to reduce stress in your
 life then take this advantage and impliment it. Some might not think that this is an advantage, iut i just explained that it is a clear advantege that has
 ieen proved to help the enviornment and reduce stress. Limiting car use is a very effective advantage that really does work in more than one place.'''

In [None]:
sample_tfidf = vectorizer.transform([sample])
prediction = model.predict(sample_tfidf)
print("Prediction:", "AI" if prediction[0]==1 else "Human")

Prediction: Human


As it predict human written text as human written only which shows model is trained perfectly.

# Task
Create a UI using Flask and connect it to the trained model for text recognition.

## Explain flask basics

### Subtask:
Briefly explain what Flask is and how it works.


**Reasoning**:
Explain what Flask is and its core concepts, then mention its advantages.



In [None]:
# Flask is a micro web framework for Python, meaning it provides the essential tools to build web applications
# without imposing a rigid structure or requiring specific libraries.

# Core Concepts:
# 1. Routing: Flask uses routes to map URLs to Python functions. When a user visits a specific URL, Flask executes the
#    corresponding function and returns the result as the response.
# 2. Requests: Flask handles incoming requests from clients (like web browsers), allowing you to access data sent
#    in the request, such as form data or URL parameters.
# 3. Templates: Flask supports using template engines (like Jinja2) to generate dynamic HTML pages. You can pass data
#    from your Python code to templates to create personalized responses.

# Advantages:
# - Lightweight: Flask has a small core and is easy to learn and use.
# - Flexible: It doesn't make many assumptions about your project, allowing you to choose the tools and libraries you prefer.
# - Extensible: There are many Flask extensions available that add functionality like database integration, authentication, etc.

## Create a flask application

### Subtask:
Generate code for a basic Flask app that can serve an HTML page.


**Reasoning**:
Generate the code for a basic Flask app that imports Flask, creates an app instance, defines a route for the root URL, and runs the app.



In [None]:
from flask import Flask

app = Flask(__name__)

@app.route('/')
def index():
    return "<h1>Hello, World!</h1>"

if __name__ == '__main__':
    app.run(debug=True)

 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug: * Restarting with watchdog (inotify)
