# DSI Module 3 - Team Air Quotes
## Natural Language Processing (NLP)


<img src='https://miro.medium.com/max/960/0*xLRsbQ02J7sQpNNy'>

Image courtesy of ITNEXT

## Goal:
In this module, the purpose is the explore and appy NLP. After each team member had come up with at least one idea, it was decided that we would explore and implement a sarcasm detector. 

## Background
The Cambridge English dictionary defines sarcasm as "the use of remarks that clearly mean the opposite of what they say, made in order to hurt someone's feelings or to criticize something in a humorous way" [1].  The Merriam-Webster dictionary defines it as "a sharp and often satirical or ironic utterance designed to cut or give pain" [2]. Not everybody would agree about these definitions, but sarcasm is usually when positive words are used to convey a negative message. Naturally, it differs from person to person and is highly dependent on the culture, gender and many other aspects. 

## Motivation:
Especially for beginner learners of any Language, identifying sarcasm can remain a challenge. Things can be lost in translation, and people can feel hurt unintentiionally. That is why the purpose of a sarcasm detector would help people understand when something is sarcastic and not take it the wrong way. This might be especially applicable in social media circumstances such as on Twitter and Facebook. In the future this could be useful would discriminating between harmful content and witty commentaries. 

## Dataset 


## Team members
Nmeso, Mekondjo, Lali, Akhil

## Conclusion

## Recommendations for future works

## References:
1. https://dictionary.cambridge.org/dictionary/english/sarcasm

2. https://www.merriam-webster.com/dictionary/sarcasm 

3. https://aclanthology.org/D13-1066.pdf

4. https://paperswithcode.com/task/sarcasm-detection

5. https://towardsdatascience.com/sarcasm-detection-with-nlp-cbff1723f69a

6. 


In [None]:
pip install gradio



In [None]:
pip install transformers



In [None]:
import gradio as gr
import re

from transformers import pipeline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!ls

drive  flagged	sample_data


In [None]:
# load code
import pickle as cPickle
with open('/content/drive/MyDrive/M3/Naive_Bayes_model.sav', 'rb') as fid1:
    loaded_Naive_Bayes = cPickle.load(fid1)

https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations


In [None]:
with open('/content/drive/MyDrive/M3/SVC_model.sav', 'rb') as fid2:
    loaded_SVC = cPickle.load(fid2)

https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations


In [None]:
# text cleaning
def clean_text(raw_text):
    text = raw_text.encode("ascii", errors="ignore").decode("ascii")  # remove non-ascii, Chinese characters
    text = raw_text.lower()  # lower case
    text = re.sub(r"\n", " ", raw_text)
    text = re.sub(r"\n\n", " ", raw_text)
    text = re.sub(r"\t", " ", raw_text)
    text = text.strip(" ")
    text = re.sub(r"[^\w\s]", "", raw_text)  # remove punctuation and special characters
    text = re.sub(" +", " ", raw_text).strip()  # get rid of multiple spaces and replace with a single

    text = re.sub(r"http\S+", "", raw_text)
    text = re.sub(r"\n", " ", raw_text)
    text = re.sub(r"\'t", " not", raw_text)  # Change 't to 'not'
    text = re.sub(r"(@.*?)[\s]", " ", raw_text)  # Remove @name
    text = re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", " ", raw_text)  # remove digits
    text = re.sub(r"[^\w\s\#]", "", raw_text)  # remove special characters except hashtags
    return text

In [None]:
sarcasm_NB = loaded_Naive_Bayes


# sarcasm detector
def sarcasm_detector_NB(text):
    data = [clean_text(text)]
    my_prediction = sarcasm_NB.predict(data)

    if my_prediction == 1:
        return "Positive"
    elif my_prediction == 0:
        return "Negative"


In [None]:
# sentiment = pipeline("sentiment-analysis") 
# # you can swop out "sentiment-analysis" for other task identifiers such as "summarization" or "zero-shot-classification".

# # I've added optional lines for text cleaning
# # note that the sentiment-analysis pipeline returns 2 values - a label and a score
# def sentiment_analysis(text):
#     text = text.encode("ascii", errors="ignore").decode(
#         "ascii"
#     )  # remove non-ascii, Chinese characters
#     text = text.lower()  # lower case
#     text = re.sub(r"\n", " ", text)
#     text = re.sub(r"\n\n", " ", text)
#     text = re.sub(r"\t", " ", text)
#     text = text.strip(" ")
#     text = re.sub(r"[^\w\s]", "", text)  # remove punctuation and special characters
#     text = re.sub(
#         " +", " ", text
#     ).strip()  # get rid of multiple spaces and replace with a single
#     results = sentiment(text)
#     return results[0]["label"], round(results[0]["score"], 5)

In [None]:
# sarcasm_NB = loaded_Naive_Bayes

# def sarcasm_analysis_NB(text):
#     text = text.encode("ascii", errors="ignore").decode(
#         "ascii"
#     )  # remove non-ascii, Chinese characters
#     text = text.lower()  # lower case
#     text = re.sub(r"\n", " ", text)
#     text = re.sub(r"\n\n", " ", text)
#     text = re.sub(r"\t", " ", text)
#     text = text.strip(" ")
#     text = re.sub(r"[^\w\s]", "", text)  # remove punctuation and special characters
#     text = re.sub(
#         " +", " ", text
#     ).strip()  # get rid of multiple spaces and replace with a single
#     prediction = sarcasm_NB.predict(text)

#     if prediction == 1:
#         return "Positive"
#     elif prediction == 0:
#         return "Negative"

#     # class_names = ['positive', 'negative']
#     # return {class_names[i]: prediction[i] for i in range(2)}
#     #results = sarcasm_NB(text)
#     #print(results)
#     #return results[0]["label"], round(results[0]["score"], 5)

In [None]:
#sarcasm_analysis_NB('hello')

In [None]:
# sarcasm_SVC = loaded_SVC

# def sarcasm_analysis_SVC(text):
#     text = text.encode("ascii", errors="ignore").decode(
#         "ascii"
#     )  # remove non-ascii, Chinese characters
#     text = text.lower()  # lower case
#     text = re.sub(r"\n", " ", text)
#     text = re.sub(r"\n\n", " ", text)
#     text = re.sub(r"\t", " ", text)
#     text = text.strip(" ")
#     text = re.sub(r"[^\w\s]", "", text)  # remove punctuation and special characters
#     text = re.sub(
#         " +", " ", text
#     ).strip()  # get rid of multiple spaces and replace with a single
#     results = sarcasm_SVC(text)
#     print(results)
#     #return results[0]["label"], round(results[0]["score"], 5)

In [None]:
#sarcasm_analysis_SVC('hello')

In [None]:
# https://examples.yourdictionary.com/examples-of-sarcasm.html
#https://github.com/chuachinhon/gradio_nlp/blob/main/notebooks/1.0_gradio_sentiment.ipynb
samples=[['That\'s just what I needed today!'],
          ['Well, what a surprise.'],
          ['Very good; well done!'],
          ['I love the DSI!'],
          ['Are we done yet?'],
          ['Is it time for your medication or mine?']]

article = '''
<!DOCTYPE html>
<html>
<body>
<br>
<p>
DSI Module 3 - Team: Air Quotes
</p>
<p><i>Composed of: Nmeso, Mekondjo, Lali, and Akhil</i></p> 
</body>
</html>
'''

gradio_ui = gr.Interface(
    fn=sarcasm_detector_NB,
    title="Sarcasm Detector",
    description="Enter some text and see if the model can evaluate sarcasm correctly. <br> \
    Some sample texts can be selected below.",
    theme = 'huggingface',
    examples = samples,
    article = article,
    inputs=gr.inputs.Textbox(lines=10, label="Enter some text here:"),
    outputs=[
        gr.outputs.Textbox(label="Sarcasm Label"),
        gr.outputs.Textbox(label="Sarcasm Score"),
    ],
)

In [None]:
# set gradio_ui.launch(share=True) if you need to share it outside of your local machine.
# The link works for 24 hours and as long as your notebook is running

gradio_ui.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://44793.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7f4049783ed0>,
 'http://127.0.0.1:7863/',
 'https://44793.gradio.app')