# Notebook Summary: Text Classification and Generation with Machine Learning

This notebook showcases various techniques for classifying and generating text using machine learning models. It includes implementations of text classification, data preprocessing, model training, saving and loading models, and generating descriptive text.

## Table of Contents

1. [Text Classification](#text-classification)
    - [Logistic Regression Classifier](#logistic-regression-classifier)
    - [Keras Model for Text Classification](#keras-model-for-text-classification)
    - [Evaluation and Prediction](#evaluation-and-prediction)
    - [Model Saving](#model-saving)
2. [Text Generation](#text-generation)
3. [Conclusion](#conclusion)

## Text Classification

### Logistic Regression Classifier

The first section implements a logistic regression classifier to differentiate between meaningful and gibberish text.

- **Libraries Used**: `pandas`, `sklearn`, `pickle`
- **Data Loading**: Combined dataset is loaded, consisting of two columns: `response` and `label`.
- **Data Preprocessing**: The data is split into features (X) and labels (y), followed by a train-test split.
- **Vectorization**: TF-IDF vectorization is applied to the text data.
- **Model Training**: A logistic regression model is trained on the vectorized training data.
- **Model Evaluation**: The model is evaluated on the test set, and the accuracy is printed.
- **Model Saving**: The trained model and vectorizer are saved using pickle.

### Keras Model for Text Classification

This section uses a Keras Sequential model for classifying text.

- **Libraries Used**: `numpy`, `tensorflow.keras`, `sklearn`
- **Data Preparation**: The dataset is prepared, and sequences are tokenized and padded.
- **Model Definition**: A Keras model is defined with several dense layers and dropout for regularization.
- **Model Compilation**: The model is compiled with Adam optimizer and binary cross-entropy loss.
- **Training**: The model is trained for a specified number of epochs.
- **Evaluation**: The model's predictions are evaluated using accuracy score and classification report.

### Evaluation and Prediction

In this part, new texts are classified using the trained Keras model.

- **Input Texts**: Example texts are defined for prediction.
- **Tokenization and Padding**: The new texts are tokenized and padded to match the model's input shape.
- **Predictions**: The model predicts whether the texts are meaningful or gibberish.

### Model Saving

The Keras model and tokenizer are saved for future use.

- **Keras Model**: Saved in HDF5 format.
- **Tokenizer**: Saved using pickle for loading later.

## Text Generation

This section utilizes the T5 model for generating text based on a descriptive prompt.

- **Libraries Used**: `transformers`
- **Model Loading**: The pre-trained T5 model and tokenizer are loaded.
- **Input Prompt**: A specific input prompt is defined to guide the generation process.
- **Text Generation**: The model generates text using sampling parameters like `top_k`, `top_p`, and `temperature`.
- **Output**: The generated text is decoded and printed.

## Conclusion

This notebook demonstrates a comprehensive workflow for text classification and generation using various models and techniques. It provides practical insights into loading data, preprocessing text, training machine learning models, and generating creative outputs. The techniques covered can be further extended or modified for more advanced applications in natural language processing.



# Gibberish vs Meaningful Text Classifier

This notebook trains a **Logistic Regression model** to classify text prompts as either gibberish or meaningful. The dataset includes two CSV files:

1. **`non_sensical_combined.csv`** - Contains nonsensical or gibberish prompts.
2. **`prompts_collection.csv`** - Contains meaningful prompts.

### Steps:
1. **Data Loading**: Load the datasets and combine them into one DataFrame.
2. **Text Preprocessing**: Use **TF-IDF Vectorization** to transform the text into numerical features.
3. **Model Training**: Train a **Logistic Regression** model on the processed text data.
4. **Evaluation**: Evaluate the model using accuracy as a performance metric.
5. **Model Saving**: Save the trained model and vectorizer as `.pkl` files for later use.

The goal is to classify text prompts accurately and save the trained model for future predictions.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pickle

# Load datasets
nonsensical_file = "/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/non_sensical_combined.csv"
prompt_collection_file = "/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/prompts_collection.csv"

df_nonsensical = pd.read_csv(nonsensical_file)
df_prompt_collection = pd.read_csv(prompt_collection_file)

# Combine the datasets
df_combined = pd.concat([df_nonsensical, df_prompt_collection])

# Preprocessing: Vectorize the 'response' column
vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df_combined['response'].values.astype('U'))

# Target (labels)
y = df_combined['label']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

# Save the model as a pickle file
with open('gibberish_classifier.pkl', 'wb') as f:
    pickle.dump(model, f)

# Save the vectorizer for later use
with open('vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Model Accuracy: 99.51%


# Text Classification Using Pretrained Model

This section demonstrates how to load a pretrained **Logistic Regression model** and **TF-IDF vectorizer** to classify new text as either gibberish or meaningful.

### Steps:
1. **Load Pretrained Model**: Load the saved **Logistic Regression model** and **TF-IDF vectorizer** from `.pkl` files.
2. **Text Classification Function**:
   - The `classify_text()` function takes a new text input, vectorizes it using the loaded TF-IDF vectorizer, and predicts the label using the loaded model.
   - The prediction is returned as either "Meaningful" or "Gibberish" based on the model's output.
3. **Example**: An example input (`"a car with its doors open in sunlight"`) is classified using the `classify_text()` function.

This script enables the use of the pretrained model to classify any arbitrary text prompt.


In [8]:
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer

# Load the model and vectorizer
with open('/kaggle/working/gibberish_classifier.pkl', 'rb') as model_file:
    model = pickle.load(model_file)

with open('/kaggle/working/vectorizer.pkl', 'rb') as vec_file:
    vectorizer = pickle.load(vec_file)

# Function to classify a new text
def classify_text(text):
    # Vectorize the input text
    text_vector = vectorizer.transform([text])
    
    # Predict using the loaded model
    prediction = model.predict(text_vector)
    print(prediction)
    # Convert the prediction to a meaningful label
    label = "Meaningful" if prediction[0] == 1 else "Gibberish"
    
    return label

# Example usage
input_text = "a car with it's doors open in sunlight"
result = classify_text(input_text)
print(f"The input text is classified as: {result}")


[1]
The input text is classified as: Meaningful


# Training a Logistic Regression Model for Gibberish vs Meaningful Text Classification

This notebook trains a **Logistic Regression** model to classify text as either gibberish or meaningful using the combined dataset.

### Steps:
1. **Data Loading**: 
   - Load the **gibberish** dataset (assumed `label = 0`).
   - Load the **meaningful** dataset (assumed `label = 1`).
   - Combine the datasets into a single DataFrame for processing.
   
2. **Data Shuffling**: 
   - Shuffle the combined dataset to ensure randomness and prevent any bias during training.

3. **Text Preprocessing**: 
   - Split the combined dataset into input (`X`: the text) and target (`y`: the labels).
   - Use **TF-IDF Vectorization** to convert the text data into numerical features.

4. **Model Training**:
   - Split the dataset into training and testing sets using an 80-20 ratio.
   - Train a **Logistic Regression** model using the vectorized text data.

5. **Evaluation**: 
   - Transform the test data using the trained vectorizer and evaluate the model's performance on unseen data.
   - Display the accuracy of the model.

This workflow demonstrates how to classify text prompts as gibberish or meaningful using machine learning techniques.


In [9]:
import pandas as pd

# Load the gibberish and meaningful datasets
gibberish_df = pd.read_csv(r'/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/non_sensical_combined.csv')  # Assumed label = 0
meaningful_df = pd.read_csv(r'/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/prompts_collection.csv')  # Assumed label = 1

# Combine the datasets
combined_df = pd.concat([gibberish_df, meaningful_df])

# Shuffle the data
combined_df = combined_df.sample(frac=1).reset_index(drop=True)

# Now use this combined_df for training
X = combined_df['response']
y = combined_df['label']

# Split the data for training and testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train your model
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Vectorize the text data
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)

# Train the logistic regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_vectorized, y_train)

# Evaluate on test data
X_test_vectorized = vectorizer.transform(X_test)
accuracy = model.score(X_test_vectorized, y_test)
print(f'Accuracy: {accuracy}')


Accuracy: 0.9936159881825684


# Saving the Trained Model and Vectorizer

This section shows how to save the trained **Logistic Regression model** and the **TF-IDF vectorizer** for future use.

### Steps:
1. **Save the Model**: 
   - Use the `pickle` library to serialize and save the trained **Logistic Regression model** into a `.pkl` file (`gibberish_classifier2.pkl`).
   
2. **Save the Vectorizer**: 
   - Similarly, save the trained **TF-IDF vectorizer** into a separate `.pkl` file (`vectorizer2.pkl`).
   
3. **Confirmation**: 
   - A success message is printed to confirm that the model and vectorizer have been successfully saved.

These saved files can later be loaded to classify new text without retraining the model.


In [10]:
import pickle

# Save the trained model
with open('gibberish_classifier2.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

# Save the vectorizer
with open('vectorizer2.pkl', 'wb') as vec_file:
    pickle.dump(vectorizer, vec_file)

print("Model and vectorizer saved!")


Model and vectorizer saved!


# Loading and Using the Saved Model for Text Classification

This section demonstrates how to load a previously saved **Logistic Regression model** and **TF-IDF vectorizer** to classify new text inputs.

### Steps:
1. **Load the Model and Vectorizer**:
   - Use `pickle` to load the trained **Logistic Regression model** (`gibberish_classifier2.pkl`) and the **TF-IDF vectorizer** (`vectorizer2.pkl`) from disk.

2. **Text Classification Function**:
   - The `classify_text()` function takes a text input, vectorizes it using the loaded vectorizer, and predicts the class using the loaded model.
   - The prediction is converted to a meaningful label: "Meaningful" for class `1` and "Gibberish" for class `0`.

3. **Example Usage**:
   - An external text input (`"a man with his arms spread"`) is classified using the `classify_text()` function, and the result is printed.

This setup enables easy classification of new text prompts using the pretrained model and vectorizer.


In [11]:
import pickle

# Load the model and vectorizer
with open('/kaggle/working/gibberish_classifier2.pkl', 'rb') as model_file:
    model = pickle.load(model_file)

with open('/kaggle/working/vectorizer2.pkl', 'rb') as vec_file:
    vectorizer = pickle.load(vec_file)

# Function to classify a new text
def classify_text(text):
    # Vectorize the input text
    text_vector = vectorizer.transform([text])
    
    # Predict using the loaded model
    prediction = model.predict(text_vector)
    
    # Convert the prediction to a meaningful label
    label = "Meaningful" if prediction[0] == 1 else "Gibberish"
    
    return label

# Example usage for external text
input_text = "a man with his arms spread"
result = classify_text(input_text)
print(f"The input text is classified as: {result}")


The input text is classified as: Meaningful


# Combining and Saving the Gibberish and Meaningful Text Datasets

This section demonstrates how to combine two datasets (gibberish and meaningful prompts) and save the resulting dataset for further use.

### Steps:
1. **Load the Datasets**:
   - Load the **gibberish** dataset (assumed to have `label = 0`).
   - Load the **meaningful** dataset (assumed to have `label = 1`).
   
2. **Combine the Datasets**:
   - Concatenate both datasets into a single DataFrame.

3. **Save the Combined Dataset**:
   - Save the combined dataset as `Combined.csv` for future use.

This ensures both gibberish and meaningful prompts are merged into one unified dataset, making it easier for training machine learning models.


In [12]:
import pandas as pd

# Load the gibberish and meaningful datasets
gibberish_df = pd.read_csv(r'/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/non_sensical_combined.csv')  # Assuming label = 0
meaningful_df = pd.read_csv(r'/kaggle/input/non-sensical-and-meaningful-prompts-ai-art/prompts_collection.csv')  # Assuming label = 1

# Combine both datasets
combined_df = pd.concat([gibberish_df, meaningful_df])

# Save combined dataset as final.csv
combined_df.to_csv('Combined.csv', index=False)

print("Final dataset saved as final.csv")


Final dataset saved as final.csv


# Training and Saving a Logistic Regression Model for Text Classification

This section demonstrates the process of training a **Logistic Regression model** on a combined dataset of gibberish and meaningful text prompts, and saving the model along with the **TF-IDF vectorizer** for future use.

### Steps:
1. **Load the Combined Dataset**:
   - Load the previously saved `Combined.csv` file, which contains both gibberish and meaningful text prompts with corresponding labels.

2. **Data Splitting**:
   - Split the dataset into features (`X`: the text) and labels (`y`: the classification labels).
   - Further divide the data into training and testing sets (80% training, 20% testing).

3. **Text Vectorization**:
   - Use **TF-IDF Vectorization** to convert the text data into numerical features.

4. **Model Training**:
   - Train a **Logistic Regression** model using the vectorized training data.

5. **Model Evaluation**:
   - Transform the test data using the vectorizer and evaluate the model's performance by calculating its accuracy on the test set.

6. **Saving the Model and Vectorizer**:
   - Save the trained **Logistic Regression model** as `gibberish_classifier3.pkl`.
   - Save the **TF-IDF vectorizer** as `vectorizer3.pkl`.

This process results in a trained model and vectorizer that can be used for classifying new text prompts as either gibberish or meaningful.


In [14]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pickle

# Load the combined dataset
combined_df = pd.read_csv('/kaggle/working/Combined.csv')

# Split the data into features (X) and labels (y)
X = combined_df['response']
y = combined_df['label']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Vectorize the text data
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)

# Train the logistic regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_vectorized, y_train)

# Evaluate the model on the test set
X_test_vectorized = vectorizer.transform(X_test)
accuracy = model.score(X_test_vectorized, y_test)
print(f'Accuracy: {accuracy}')

# Save the model and vectorizer
with open('gibberish_classifier3.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

with open('vectorizer3.pkl', 'wb') as vec_file:
    pickle.dump(vectorizer, vec_file)

print("Model and vectorizer saved!")


Accuracy: 0.9935909126139228
Model and vectorizer saved!


# Text Classification: Gibberish vs Meaningful Using a Pretrained Keras Model

This code showcases how to classify an input text as **gibberish** or **meaningful** using a **Keras deep learning model**. The model, trained on sequences of text, uses a tokenizer to process input text and classify it.

### Key Components:
1. **Loading the Pretrained Keras Model**:
   - The Keras model is loaded from a saved `.h5` file, which was trained to distinguish between gibberish and meaningful text.

2. **Loading the Tokenizer**:
   - The tokenizer is loaded from a `tokenizer.pkl` file. This tokenizer is essential for converting the input text into sequences that the model can process.

3. **Text Classification Function**:
   - The `classify_text()` function accepts an input string, tokenizes and pads it to the required input length (`maxlen=100`), and then uses the model to predict whether the text is gibberish or meaningful.
   - Predictions are made on the tokenized text, and the result is either `Meaningful` (for a class of 1) or `Gibberish` (for a class of 0).

4. **User Interaction**:
   - The script accepts a text input from the user, processes it through the `classify_text()` function, and then outputs whether the text is **Meaningful** or **Gibberish**.

### Usage:
- This code can be used to classify any given text in real-time, making it useful for detecting non-sensical text in natural language processing tasks.



In [20]:
#1.0 classifier between normal text and gibberish text
import numpy as np
import pickle
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the trained Keras model
model = load_model(r'/kaggle/input/test-classifier/keras/default/1/trained_model.h5')

# Load the tokenizer
with open(r'/kaggle/input/test-classifier/keras/default/1/tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)

# Function to classify new text
def classify_text(input_text):
    # Tokenize and pad the input text
    input_text_seq = tokenizer.texts_to_sequences([input_text])
    input_text_padded = pad_sequences(input_text_seq, maxlen=100)  # Adjust maxlen as per your training setup

    # Make prediction
    prediction = model.predict(input_text_padded)

    # Convert prediction to binary (0 or 1)
    predicted_class = (prediction > 0.5).astype("int32")

    # Interpret the result
    if predicted_class[0][0] == 1:
        return "Meaningful"
    else:
        return "Gibberish"

# Take user input
input_text = input("Enter a text: ")

# Get prediction
result = classify_text(input_text)

# Display the result
print(f"Text: {input_text}\nPredicted Class: {result}")


Enter a text:  dwdwhdhgadgdw


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step
Text: dwdwhdhgadgdw
Predicted Class: Gibberish


# Text Classification Using a Neural Network: Gibberish vs Meaningful Text

This Python script classifies text into **gibberish** or **meaningful** using a **neural network** built with TensorFlow and Keras. It processes the text data, trains a model, and evaluates its performance on a test dataset.

### Key Components:
1. **Loading the Dataset**:
   - The dataset is loaded from a CSV file, and it is assumed that the file contains two columns: `response` (the text data) and `label` (the target labels, where 0 indicates gibberish and 1 indicates meaningful text).

2. **Splitting the Data**:
   - The dataset is split into training and test sets using an 80/20 split.

3. **Text Preprocessing**:
   - The text data is tokenized using Keras's `Tokenizer`, which converts the text into sequences of integers, where each integer represents a word. 
   - These sequences are then padded to ensure uniform length (100 in this case).

4. **Building the Neural Network Model**:
   - A **Sequential model** is defined, which includes:
     - A dense layer with 64 units and ReLU activation.
     - A dropout layer to prevent overfitting.
     - Another dense layer with 32 units and ReLU activation.
     - A final output layer with a sigmoid activation for binary classification.
   
5. **Compiling the Model**:
   - The model uses the **Adam optimizer** and **binary crossentropy loss function** for training, which is appropriate for binary classification problems.
   - The `accuracy` metric is used to evaluate the model’s performance.

6. **Training the Model**:
   - The model is trained for 5 epochs with a batch size of 32, and 10% of the training data is used for validation during training.

7. **Evaluating the Model**:
   - The model's predictions on the test set are compared to the true labels, and the classification accuracy and a detailed classification report (precision, recall, F1-score) are printed.

### Usage:
- This code provides an efficient way to train a binary classifier that distinguishes between gibberish and meaningful text. It is suitable for various natural language processing tasks where text quality needs to be evaluated.

### Dependencies:
- pandas
- scikit-learn
- tensorflow/keras



In [24]:
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
df = pd.read_csv('/kaggle/input/prompt-and-gibberish-for-ai-art-gen/Combined-non_sensical-meaningful.csv')

# Assuming the dataset has columns 'response' and 'label'
X = df['response']
y = df['label']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert all entries in X_train and X_test to strings
X_train = X_train.astype(str)
X_test = X_test.astype(str)

# Tokenize and pad sequences
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
X_train_padded = pad_sequences(X_train_seq, maxlen=100)
X_test_padded = pad_sequences(X_test_seq, maxlen=100)

# Define the model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(100,)))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train_padded, y_train, epochs=5, batch_size=32, validation_split=0.1)

# Evaluate the model
y_pred = (model.predict(X_test_padded) > 0.5).astype("int32")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m98702/98702[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m188s[0m 2ms/step - accuracy: 0.9606 - loss: 1.0841 - val_accuracy: 0.9756 - val_loss: 0.0717
Epoch 2/5
[1m98702/98702[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m186s[0m 2ms/step - accuracy: 0.9756 - loss: 0.0769 - val_accuracy: 0.9786 - val_loss: 0.0664
Epoch 3/5
[1m98702/98702[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m189s[0m 2ms/step - accuracy: 0.9767 - loss: 0.0744 - val_accuracy: 0.9775 - val_loss: 0.0783
Epoch 4/5
[1m98702/98702[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m195s[0m 2ms/step - accuracy: 0.9769 - loss: 0.0745 - val_accuracy: 0.9801 - val_loss: 0.0692
Epoch 5/5
[1m98702/98702[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m185s[0m 2ms/step - accuracy: 0.9768 - loss: 0.0715 - val_accuracy: 0.9793 - val_loss: 0.0675
[1m27418/27418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 1ms/step
Accuracy: 0.9794175173363363
Classification Report:
               precision    recal

# Classifying New Text Data: Gibberish vs Meaningful Text

This code snippet demonstrates how to classify new text inputs as either **gibberish** or **meaningful** using a pre-trained neural network model.

### Key Components:

1. **New Text Data**:
   - A list of new text samples is defined for classification. For example:
     ```python
     new_texts = [
         "sunflower fields in a valley in sunlight",
         "fefefe grgsgr"
     ]
     ```

2. **Text Preprocessing**:
   - The new texts are tokenized using the previously defined `tokenizer`, converting each text sample into a sequence of integers that represent words.
   - The sequences are then padded to ensure they all have the same length (100 in this case) using the `pad_sequences` function from Keras.

3. **Making Predictions**:
   - The model, which has been trained on the dataset, is used to predict the classes of the new padded sequences.
   - Predictions are made using the `model.predict()` method, which outputs a probability score for each class.

4. **Converting Predictions to Binary**:
   - The output predictions are converted to binary values (0 or 1) based on a threshold of 0.5. 
   - If the predicted score is greater than 0.5, it is classified as `1` (meaningful); otherwise, it is classified as `0` (gibberish).

5. **Displaying Predictions**:
   - The results are printed, showing each text alongside its predicted class. For example:
     ```python
     for text, pred in zip(new_texts, predicted_classes):
         print(f"Text: {text}\nPredicted Class: {'Meaningful' if pred[0] == 1 else 'Gibberish'}\n")
     ```

### Usage:
- This code can be used to evaluate any new text inputs against a previously trained model, allowing users to quickly identify whether the text is meaningful or gibberish.

### Dependencies:
- This snippet assumes that the necessary libraries (e.g., TensorFlow, Keras) and the trained model and tokenizer are already imported and set up in the environment.


In [40]:
# Example new text data
new_texts = [
    "sunflower fields in a valley in sunlight",
    "fefefe grgsgr"
]

# Tokenize and pad the new texts
new_texts_seq = tokenizer.texts_to_sequences(new_texts)
new_texts_padded = pad_sequences(new_texts_seq, maxlen=100)

# Make predictions
predictions = model.predict(new_texts_padded)

# Convert predictions to binary (0 or 1)
predicted_classes = (predictions > 0.5).astype("int32")

# Display predictions
for text, pred in zip(new_texts, predicted_classes):
    print(f"Text: {text}\nPredicted Class: {'Meaningful' if pred[0] == 1 else 'Gibberish'}\n")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
Text: sunflower fields in a valley in sunlight
Predicted Class: Gibberish

Text: fefefe grgsgr
Predicted Class: Gibberish



# Saving the Keras Model and Tokenizer

This code snippet demonstrates how to save a trained Keras model and its associated tokenizer for later use.

### Key Components:

1. **Saving the Keras Model**:
   - The trained model is saved in HDF5 format using the `save` method provided by Keras. This allows for easy loading and inference later.
   ```python
   model.save('prompt_classifier_model.h5')  # Save Keras model in HDF5 format


In [26]:
import pickle

# Save the Keras model
model.save('prompt_classifier_model.h5')  # Save Keras model in HDF5 format

# Optionally, you can also save the tokenizer
with open('prompt_classifier_tokenizer.pkl', 'wb') as f:
    pickle.dump(tokenizer, f)


# Text Classifier for Meaningful vs. Gibberish Text

This Python script implements a classifier that distinguishes between meaningful text and gibberish using a pre-trained Keras model.

## Overview

The script loads a trained Keras model and a tokenizer, processes input text, and outputs whether the text is meaningful or gibberish. This can be useful for applications like chatbots, content moderation, or text validation.

## Code Breakdown

### 1. Importing Libraries

The script begins by importing necessary libraries:

```python
import numpy as np
import pickle
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.sequence import pad_sequences
##Loading the model
model = load_model(r'/kaggle/input/test-classifier/keras/default/1/trained_model.h5')
## Loading the tokenizer 
with open(r'/kaggle/input/test-classifier/keras/default/1/tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)


In [41]:
#1.0 classifier between normal text and gibberish text
import numpy as np
import pickle
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the trained Keras model
model = load_model(r'/kaggle/input/test-classifier/keras/default/1/trained_model.h5')

# Load the tokenizer
with open(r'/kaggle/input/test-classifier/keras/default/1/tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)

# Function to classify new text
def classify_text(input_text):
    # Tokenize and pad the input text
    input_text_seq = tokenizer.texts_to_sequences([input_text])
    input_text_padded = pad_sequences(input_text_seq, maxlen=100)  # Adjust maxlen as per your training setup

    # Make prediction
    prediction = model.predict(input_text_padded)

    # Convert prediction to binary (0 or 1)
    predicted_class = (prediction > 0.5).astype("int32")

    # Interpret the result
    if predicted_class[0][0] == 1:
        return "Meaningful"
    else:
        return "Gibberish"

# Take user input
input_text = input("Enter a text: ")

# Get prediction
result = classify_text(input_text)

# Display the result
print(f"Text: {input_text}\nPredicted Class: {result}")


Enter a text:  sunflower field in the valley in sunlight


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 74ms/step
Text: sunflower field in the valley in sunlight
Predicted Class: Meaningful
