<h1 align="center"><font color="#050B2B"><strong>Sarcasm Detection with Machine Learning</strong></font></h1>


Sarcasm, a humorous form of communication, often relies on opposite meanings with a distinct tone. Detecting it depends on language skills and understanding others' minds.
Can machines learn sarcasm detection? Yes! In this project we'll explores Sarcasm Detection with Machine Learning using Python.
Sarcasm is prevalent in language, from conversations to news and social media. The task involves binary classification and natural language processing. A model can be trained to identify sarcastic sentences using a dataset from Kaggle.

_________________________________________________________________________________________________________________________________________________________________

<div align="center">
    <a href="https://github.com/Saadat-Khalid">
        <img src="https://cdn.jsdelivr.net/npm/simple-icons@3.0.1/icons/github.svg" alt="GitHub Profile" width="50">
    </a>
    &nbsp;&nbsp;
    <a href="https://www.linkedin.com/in/saadatawan/">
        <img src="https://cdn.jsdelivr.net/npm/simple-icons@3.0.1/icons/linkedin.svg" alt="LinkedIn Profile" width="50">
    </a>
    &nbsp;&nbsp;
    <a href="https://www.facebook.com/Saadat.Khalid.Awan/">
        <img src="https://cdn.jsdelivr.net/npm/simple-icons@3.0.1/icons/facebook.svg" alt="Facebook Profile" width="50">
    </a>
    &nbsp;&nbsp;
    <a href="https://twitter.com/saadat_96">
        <img src="https://cdn.jsdelivr.net/npm/simple-icons@v3/icons/twitter.svg" alt="Twitter Profile" width="50">
    </a>
</div>

_________________________________________________________________________

## Step - 1: Importing Libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer # CountVectorizer is a preprocessing technique used to convert text data into numerical form.
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB # Bernoulli Naive Bayes is frequently used for text classification tasks, such as spam detection, sentiment analysis, sarcasm detection, and topic categorization.

## Step - 2: As we are going to train our model to detect sarcasm we need the Data First. So, here we are loading the dataset. 

[dataset](https://raw.githubusercontent.com/amankharwal/Website-data/master/Sarcasm.json)

In [3]:
data = pd.read_json("Sarcasm.json", lines=True)
print(data.head())

                                        article_link  \
0  https://www.huffingtonpost.com/entry/versace-b...   
1  https://www.huffingtonpost.com/entry/roseanne-...   
2  https://local.theonion.com/mom-starting-to-fea...   
3  https://politics.theonion.com/boehner-just-wan...   
4  https://www.huffingtonpost.com/entry/jk-rowlin...   

                                            headline  is_sarcastic  
0  former versace store clerk sues over secret 'b...             0  
1  the 'roseanne' revival catches up to our thorn...             0  
2  mom starting to fear son's web series closest ...             1  
3  boehner just wants wife to listen, not come up...             1  
4  j.k. rowling wishes snape happy birthday in th...             0  


## Step - 3: Mapping the Values

The “is_sarcastic” column in this dataset contains the labels that we have to predict for the task of sarcasm detection.
It contains binary values as 1 and 0, where 1 means sarcastic and 0 means not sarcastic. So for simplicity, I will transform the values of this column as “sarcastic” and “not sarcastic” instead of 1 and 0.

In [4]:
data["is_sarcastic"] = data["is_sarcastic"].map({0: "Not Sarcasm", 1: "Sarcasm"})
print(data.head())

                                        article_link  \
0  https://www.huffingtonpost.com/entry/versace-b...   
1  https://www.huffingtonpost.com/entry/roseanne-...   
2  https://local.theonion.com/mom-starting-to-fea...   
3  https://politics.theonion.com/boehner-just-wan...   
4  https://www.huffingtonpost.com/entry/jk-rowlin...   

                                            headline is_sarcastic  
0  former versace store clerk sues over secret 'b...  Not Sarcasm  
1  the 'roseanne' revival catches up to our thorn...  Not Sarcasm  
2  mom starting to fear son's web series closest ...      Sarcasm  
3  boehner just wants wife to listen, not come up...      Sarcasm  
4  j.k. rowling wishes snape happy birthday in th...  Not Sarcasm  


## Step - 4: Seperating & Spliting the DATA
* Before diving into creating model First we're going to Seperate "Features" and "Labels".
* After, seperation we'll split the data for "Traning (80%) & Testing (20%)"

In [5]:
data = data[["headline", "is_sarcastic"]]
x = np.array(data["headline"]) # Feature
y = np.array(data["is_sarcastic"]) # Label

cv = CountVectorizer() # CountVectorizer is a preprocessing technique used to convert text data into numerical form.
X = cv.fit_transform(x) # Fit the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) # Spliting the data

## Step - 5: Building Maching Learning Model 
* Bernoulli Naive Bayes

Bernoulli Naive Bayes: is a variant of the Naive Bayes algorithm used in machine learning for text classification and binary decision problems. It's specifically designed for features that are binary (yes/no, true/false) or categorical (present/absent). This algorithm assumes that the presence or absence of a particular feature is independent of the presence or absence of other features, which is the core principle of Naive Bayes.

## Step - 6: Training The Model

In [6]:
model = BernoulliNB()
model.fit(X_train, y_train)
print(model.score(X_test, y_test)) # Model Score

0.8448146761512542


Model Score is: 0.8448146761512542 which is 84%

## Step - 7: Testing Model

In [7]:
# user = input("Enter a Text: ")
user = "I'm not lazy, I'm just on energy saving mode"
data = cv.transform([user]).toarray()
output = model.predict(data)
print(output)

['Sarcasm']


For input = "I'm not lazy, I'm just on energy saving mode" -> Sarcasm.

Our Model is working great...

<h2 align="center"><font color="#050B2Bs">The End</font></h2>