<a href="https://colab.research.google.com/github/GrandWizard1102/NM_projects/blob/main/NLPTask.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Text classification**

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import gradio as gr

# Hyperparameters
VOCAB_SIZE = 10000  # Number of most frequent words to keep
MAX_LEN = 200       # Maximum sequence length
EMBEDDING_DIM = 64  # Word embedding dimension

# Load and preprocess data
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=VOCAB_SIZE)
X_train = pad_sequences(X_train, maxlen=MAX_LEN)
X_test = pad_sequences(X_test, maxlen=MAX_LEN)

# Build GRU model
model = Sequential([
    Embedding(VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_LEN),
    GRU(64, return_sequences=False),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')  # Binary classification output
])

# Compile and train the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=5, validation_split=0.2)

# Function for prediction using the trained model
def predict_sentiment(text):
    word_index = imdb.get_word_index()
    # Preprocess input text: tokenize and pad
    sequence = [word_index.get(word.lower(), 0) for word in text.split()]
    padded_sequence = pad_sequences([sequence], maxlen=MAX_LEN)
    prediction = model.predict(padded_sequence)[0][0]
    sentiment = "Positive" if prediction > 0.5 else "Negative"

    # Return sentiment and confidence score as separate values
    return sentiment, f"{prediction:.2f}"

# Create Gradio Interface
interface = gr.Interface(
    fn=predict_sentiment,
    inputs=gr.Textbox(lines=2, placeholder="Enter a movie review..."),
    outputs=[gr.Label(label="Sentiment"), gr.Textbox(label="Confidence Score")],
    title="Movie Review Sentiment Classifier",
    description="Enter a movie review to predict whether it's Positive or Negative."
)

# Launch the Gradio app
interface.launch(debug=True)


Epoch 1/5




[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.6919 - loss: 0.5442 - val_accuracy: 0.8628 - val_loss: 0.3257
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 12ms/step - accuracy: 0.8978 - loss: 0.2588 - val_accuracy: 0.8722 - val_loss: 0.3164
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9419 - loss: 0.1604 - val_accuracy: 0.8732 - val_loss: 0.3345
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9661 - loss: 0.0973 - val_accuracy: 0.8496 - val_loss: 0.4247
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9794 - loss: 0.0640 - val_accuracy: 0.8698 - val_loss: 0.4294
Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. 

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 122ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
Using existing dataset file at: .gradio/flagged/dataset1.csv
Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7861 <> https://bc96d0c58943fb0557.gradio.live




**PARTS** **OF** **SPEECH** **TAGGERS**

In [None]:
import gradio as gr
import nltk

# Ensure NLTK resources are downloaded
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

from nltk import word_tokenize, pos_tag

# Function for POS tagging
def pos_tagging(text):
    tokens = word_tokenize(text)  # Tokenize the input text
    tagged_words = pos_tag(tokens)  # Perform POS tagging
    result = "\n".join([f"{word}: {tag}" for word, tag in tagged_words])  # Format output
    return result

# Gradio Interface
interface = gr.Interface(
    fn=pos_tagging,
    inputs=gr.Textbox(lines=5, placeholder="Enter your text here..."),
    outputs=gr.Textbox(label="POS Tags"),
    title="Part-of-Speech Tagging",
    description="Enter a sentence to see its words tagged with their grammatical roles."
)

# Launch the interface
interface.launch(debug=True)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://ba08fe751d20f8c011.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7862 <> https://ba08fe751d20f8c011.gradio.live




**INFO RETRIEVAL**

In [None]:
import gradio as gr
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Sample document corpus
documents = [
    "The cat sat on the mat.",
    "Dogs are great companions.",
    "Cats are independent animals.",
    "The dog barked loudly.",
    "I love my pet cat.",
    "Pets bring joy and happiness.",
    "Kavi is a cat lover"
]

# Function for Information Retrieval
def retrieve_documents(query):
    # Create TF-IDF Vectorizer
    vectorizer = TfidfVectorizer()

    # Combine documents and query for vectorization
    all_documents = documents + [query]

    # Fit and transform the data
    tfidf_matrix = vectorizer.fit_transform(all_documents)

    # Compute cosine similarity between the query and all documents
    cosine_similarities = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])

    # Get indices of documents sorted by similarity score
    similar_indices = cosine_similarities[0].argsort()[::-1]

    # Retrieve top 3 relevant documents
    top_n = 3
    results = [documents[i] for i in similar_indices[:top_n]]

    return "\n".join(results)

# Gradio Interface
interface = gr.Interface(
    fn=retrieve_documents,
    inputs=gr.Textbox(lines=2, placeholder="Enter your query here..."),
    outputs=gr.Textbox(label="Top Relevant Documents"),
    title="Information Retrieval System",
    description="Enter a query to retrieve the most relevant documents from the corpus."
)

# Launch the Gradio app
interface.launch(debug=True)


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://cb54484bb0e98b2d98.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7863 <> https://cb54484bb0e98b2d98.gradio.live




**Topic Modelling**

In [None]:
import gradio as gr
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# Function for Topic Modeling
def extract_topics(documents, num_topics):
    # Split the input documents by newlines
    doc_list = documents.split("\n")

    # Create a CountVectorizer to convert text into a bag-of-words representation
    vectorizer = CountVectorizer(stop_words='english')
    doc_term_matrix = vectorizer.fit_transform(doc_list)

    # Apply Latent Dirichlet Allocation (LDA) for topic modeling
    lda = LatentDirichletAllocation(n_components=num_topics, random_state=42)
    lda.fit(doc_term_matrix)

    # Extract topics and their top words
    feature_names = vectorizer.get_feature_names_out()
    topics = []
    for topic_idx, topic in enumerate(lda.components_):
        top_words = [feature_names[i] for i in topic.argsort()[:-6:-1]]  # Top 5 words per topic
        topics.append(f"Topic {topic_idx + 1}: {', '.join(top_words)}")

    return "\n".join(topics)

# Gradio Interface
interface = gr.Interface(
    fn=extract_topics,
    inputs=[
        gr.Textbox(lines=10, placeholder="Enter one document per line..."),  # Input for documents
        gr.Slider(minimum=2, maximum=10, step=1, value=3, label="Number of Topics")  # Input for the number of topics
    ],
    outputs=gr.Textbox(label="Extracted Topics"),  # Output: List of topics with top words
    title="Topic Modeling with LDA",
    description="Enter a set of documents (one per line) and specify the number of topics to extract. The app will return the top words for each topic."
)

# Launch the Gradio app
interface.launch(debug=True)


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://d148c2a414496d385e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7863 <> https://d148c2a414496d385e.gradio.live




In [None]:
import gradio as gr
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained model and tokenizer
model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Function to generate text
def generate_text(prompt):
    inputs = tokenizer.encode(prompt, return_tensors='pt')
    outputs = model.generate(inputs, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2)
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Gradio Interface for Text Generation
interface = gr.Interface(
    fn=generate_text,
    inputs=gr.Textbox(lines=2, placeholder="Enter your prompt here..."),
    outputs=gr.Textbox(label="Generated Text"),
    title="Text Generation with GPT-2",
    description="Enter a prompt to generate text using the GPT-2 model."
)

# Launch the Gradio app
interface.launch(debug=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://4ebe8759e57fa312d8.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
pip install sympy

