<a href="https://colab.research.google.com/github/dixonhow8/sms_spam_detector/blob/main/gradio_sms_text_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install gradio

Collecting gradio
  Downloading gradio-5.0.2-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.115.2-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.4.0 (from gradio)
  Downloading gradio_client-1.4.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.25.1 (from gradio)
  Downloading huggingface_hub-0.25.2-py3-none-any.whl.metadata (13 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m572.8 kB/s[0m eta [36m0:00:00[0m
Collecting pydub (

In [2]:
# Import pandas
import pandas as pd
# Import the required dependencies from sklearn
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

# Set the column width to view the text message data.
pd.set_option('max_colwidth', 200)

# Import Gradio
import gradio as gr

In [3]:
def sms_classification(sms_text_df):
    """
    Perform SMS classification using a pipeline with TF-IDF vectorization and Linear Support Vector Classification.

    Parameters:
    - sms_text_df (pd.DataFrame): DataFrame containing 'text_message' and 'label' columns for SMS classification.

    Returns:
    - text_clf (Pipeline): Fitted pipeline model for SMS classification.
    """

    # Set the features variable to the text message column.
    X = sms_text_df['text_message']

    # Set the target variable to the "label" column.
    y = sms_text_df['label']

    # Split data into training and testing and set the test_size = 33%
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

    # Build a pipeline to transform the test set to compare to the training set.
    text_clf = Pipeline([
        ('tfidf', TfidfVectorizer()),  # TF-IDF Vectorization
        ('svc', LinearSVC())            # Linear Support Vector Classification
    ])

    # Fit the model to the transformed training data and return model.
    text_clf.fit(X_train, y_train)

    return text_clf


In [5]:
# Load the dataset into a DataFrame
sms_text_df = pd.read_csv('SMSSpamCollection.csv')

# Display the first few rows of the DataFrame to verify it loaded correctly
sms_text_df.head()


Unnamed: 0,label,text_message
0,ham,"Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives around here though"


In [6]:
# Call the sms_classification function with the DataFrame and set the result to the "text_clf" variable
text_clf = sms_classification(sms_text_df)

In [7]:
# Create a function called `sms_prediction` that takes in the SMS text and predicts the whether the text is "not spam" or "spam".
# The function should return the SMS message, and say whether the text is "not spam" or "spam".

def sms_prediction(text, model):
    """
    Predict the spam/ham classification of a given text message using a pre-trained model.

    Parameters:
    - text (str): The text message to be classified.
    - model (Pipeline): The pre-trained pipeline model for classification.

    Returns:
    - str: A message indicating whether the text message is classified as spam or not.

    This function takes a text message and a pre-trained pipeline model, then predicts the
    spam/ham classification of the text. The result is a message stating whether the text is
    classified as spam or not.
    """
    # Create a variable that will hold the prediction of a new text.
    prediction = model.predict([text])[0]

    # Using a conditional statement to return the appropriate message
    if prediction == 'ham':
        return f'The text message: "{text}", is not spam.'
    else:
        return f'The text message: "{text}", is spam.'



In [8]:
# Create a sms_app that takes a textbox for the inputs and has a textbox for the output.
# Povide labels for each textbox.

from transformers import pipeline
import gradio as gr

# Initialize the pipeline to generate questions and answers using the distilbert-base-cased-distilled-squad model.
question_answerer = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

# Create a function called `question_answer()` that takes two parameters, the text to search and a question.
def question_answer(text, question):
    result = question_answerer(text, question)
    return result[0]['question'], result[0]['answer'], result[0]['score'], result[0]['start'], result[0]['end']

# CSS styles for the interface
custom_css = """
  #component-0 {  /* Input textbox */
      background-color: black;
      color: white;
  }
  #component-1 {  /* Output textbox */
      background-color: #333333; /* Dark Gray for output */
      color: white;
  }
  button {  /* All buttons */
      margin-bottom: 10px;  /* Add some spacing */
      width: auto;  /* Set buttons to auto width */
  }
  .clear-button {
      background-color: darkgray; /* Clear button color */
      color: white;
  }
  .submit-button {
      background-color: orange; /* Submit button color */
      color: white;
  }
  .flag-button {
      background-color: darkgray; /* Flag button color */
      color: white;
  }
"""

with gr.Blocks(css=custom_css) as sms_app:
    gr.Markdown("# SMS Text Message Tester")
    gr.Markdown("Enter a text message to see what our app determines.")

    with gr.Row():  # Place input and output textboxes side by side
        message_input = gr.Textbox(label="What is the text message you want to test?", placeholder="Type your message here...")
        output_text = gr.Textbox(lines=10, label="Our App has determined:", interactive=False)  # Non-interactive output

    with gr.Row():  # Create a new row for buttons
        with gr.Column():  # Left side for Clear and Submit
            clear_button = gr.Button("Clear", elem_id="clear-button")  # Assign custom class
            submit_button = gr.Button("Submit", elem_id="submit-button")  # Assign custom class
        with gr.Column():  # Right side for Flag button
            flag_button = gr.Button("Flag", elem_id="flag-button")  # Assign custom class

    # Define the button actions
    submit_button.click(fn=question_answer, inputs=message_input, outputs=output_text)
    clear_button.click(fn=lambda: "", inputs=None, outputs=message_input)

# Launch the app
sms_app.launch(share=True)






The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]



Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://0ad1df0b354ff92bcb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Test the following text messages.

---

1. You are a lucky winner of $5000!
2. You won 2 free tickets to the Super Bowl.
3. You won 2 free tickets to the Super Bowl text us to claim your prize.
4. Thanks for registering. Text 4343 to receive free updates on medicare.