<a href="https://colab.research.google.com/github/Areebanaeemsatti/Quotes/blob/main/Quotesexplorer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install faiss.cpu
!pip install sentence-transformers
!pip install gradio

Collecting faiss.cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m60.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss.cpu
Successfully installed faiss.cpu-1.11.0
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from to

In [None]:
!pip freeze > requirements.txt


In [None]:
import pandas as pd
import faiss
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
import ast

def load_and_normalize_data(filepath):
    df = pd.read_csv(filepath)

    # Clean quote and author columns
    df['quote'] = df['quote'].astype(str).str.strip().str.lower()
    df['author'] = df['author'].astype(str).str.strip().str.lower()

   # Function to fix each value in the category column
    def clean_category(value):
        if isinstance(value, str):
            try:
                # Try turning the string into a list
                result = ast.literal_eval(value)
                # If it's already a list, clean each item
                if isinstance(result, list):
                    return [str(x).strip().lower() for x in result]
                # If it's just one word (not a list), make it a list
                return [str(result).strip().lower()]
            except:
                # If it can't be parsed, just clean the text and return in a list
                return [value.strip().lower()]
        else:
            # If it's not a string at all (like None), return empty list
            return []

    # Apply this cleaning function to every row in the 'category' column
    df['category'] = df['category'].apply(clean_category)

    return df


# -----------------SBERT----------------------------

def vectorize_quotes(df):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model.encode(df['quote'].tolist(), convert_to_numpy=True).astype("float32")
    return embeddings, model


# --------------FAISS-------------------------------

def create_faiss_index(embeddings):
    index = faiss.IndexFlatL2(embeddings.shape[1])
    index.add(embeddings)
    return index


# ----------------MAIN FUNCTION----------------

def main():
    dataset_path = "/content/quotes (1).csv"
    index_path = "quotes_index.faiss"

    print("📥 Loading dataset...")
    df = load_and_normalize_data(dataset_path)

    print("🧠 Creating embeddings with SBERT...")
    embeddings, model = vectorize_quotes(df)

    print("⚡ Creating FAISS index...")
    index = create_faiss_index(embeddings)
    faiss.write_index(index, index_path)
    print("✅ FAISS index saved.")

    return df, model, index


# ------------ENTRY POINT-----------

if __name__ == "__main__":
    main()


📥 Loading dataset...
🧠 Creating embeddings with SBERT...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

⚡ Creating FAISS index...
✅ FAISS index saved.


In [None]:
# Import necessary libraries
import pandas as pd                       # For handling CSV and data manipulation
import faiss                              #  a library developed by Facebook for fast similarity search over vector embeddings.
import gradio as gr                       # For creating web-based UI
from sentence_transformers import SentenceTransformer  # encode text into vector embeddings for semantic similarity.

# ------------------ Load Data and Models ------------------

# File paths for the data and FAISS index
DATA_PATH = "/content/quotes (1).csv"
FAISS_INDEX_PATH = "quotes_index.faiss"

# Load and normalize the quotes dataset
df = load_and_normalize_data(DATA_PATH)

# Load pre-trained SentenceTransformer model for embedding queries
# to convert input queries into semantic vectors.
model = SentenceTransformer('all-MiniLM-L6-v2')

# Load FAISS index for efficient similarity search
index = faiss.read_index(FAISS_INDEX_PATH)

# ------------------ Quote Search Function ------------------

# Function to search for quotes based on user query and optional filters
def search_quotes(query, author_filter, category_filter):
    results = []  # List to store matching quotes

    # Clean and normalize author and category inputs
    author = author_filter.strip().lower() if author_filter else None
    category = category_filter.strip().lower() if category_filter else None

    # Check if the query is empty
    if not query:
        return "❌ Please enter something to search."

    # Encode the query into a semantic vector
    query_vector = model.encode([query.strip().lower()]).astype("float32") #float32 for FAISS compatibility

    # Search the FAISS index for the 20 most similar quote embeddings
    distances, indices = index.search(query_vector, 20)

    # Loops through the returned indices and fetches the corresponding quotes from the DataFrame
    for idx in indices[0]:
        row = df.iloc[idx]

        # Skip if author doesn't match
        if author and row['author'].strip().lower() != author:
            continue

        # Skip if category doesn't match
        if category and category not in [c.strip().lower() for c in row['category']]:
            continue

        # Format the quote and append to results
        quote = f"“{row['quote']}” — {row['author'].title()} [📚 {', '.join(row['category'])}]"
        results.append(quote)

        # Stop after collecting 5 quotes
        if len(results) == 5:
            break

    # Return results or no match message
    return "\n\n".join(results) if results else "❌ No matching quotes found."

# ------------------ Random Author Generator ------------------

# Function to generate 5 random authors from the dataset
def random_authors():
    authors = df['author'].dropna().unique().tolist()  # Get unique authors
    sample = pd.Series(authors).sample(5).tolist()     # Pick 5 at random
    return ", ".join([a.title() for a in sample])      # Format and return

# ------------------ Random Category Generator ------------------

# Function to generate 5 random categories from the dataset
def random_categories():
    categories = df['category'].explode().dropna().unique().tolist()  # Flatten list of categories
    sample = pd.Series(categories).sample(5).tolist()                 # Pick 5 at random
    return "📚 " + " | ".join([c.title() for c in sample])            # Format and return

# ------------------ Gradio Interface ------------------

# Define the main Gradio Blocks interface
with gr.Blocks(title="💬 Quote Explorer") as demo:

    # 🔍 Search Quotes Tab
    with gr.Tab("🔍 Search Quotes"):
        gr.Markdown("### 🔎 Find Meaningful Quotes")  # Description text

        # Input fields
        query = gr.Textbox(label="Search by Meaning", placeholder="e.g. love, courage")
        author = gr.Textbox(label="Filter by Author", placeholder="Optional")
        category = gr.Textbox(label="Filter by Category", placeholder="Optional")

        # Search button and output textbox
        search_btn = gr.Button("🔍 Search")
        search_output = gr.Textbox(label="Results", lines=8, interactive=False)

        # Connect search button to the search_quotes function
        search_btn.click(search_quotes, inputs=[query, author, category], outputs=search_output)

    # 🎲 Random Authors Tab
    with gr.Tab("🎲 Random Authors"):
        gr.Markdown("### 🎨 Discover Random Authors")

        author_btn = gr.Button("Show Authors")                     # Button to show authors
        authors_output = gr.Textbox(label="Authors", interactive=False)  # Output box

        # Connect button to random author generator
        author_btn.click(random_authors, outputs=authors_output)

    # 📚 Random Categories Tab
    with gr.Tab("📚 Random Categories"):
        gr.Markdown("### 📚 Explore Random Quote Categories")

        category_btn = gr.Button("Show Categories")                     # Button to show categories
        category_output = gr.Textbox(label="Categories", interactive=False)  # Output box

        # Connect button to random category generator
        category_btn.click(random_categories, outputs=category_output)

    # 💬 Feedback Tab
    with gr.Tab("💬 Feedback"):
        gr.Markdown("### 💬 Share Your Feedback")

        # Input and output fields for feedback
        feedback_input = gr.Textbox(label="Your Feedback", placeholder="Write your thoughts here...", lines=4)
        feedback_output = gr.Textbox(label="Thanks!", lines=1, interactive=False)
        feedback_btn = gr.Button("Submit Feedback")

        # Function to handle feedback submission
        def handle_feedback(text):
            print("📝 Feedback received:", text)         # Print to console
            return "✅ Thanks for your feedback!"         # Acknowledge to user

        # Connect feedback button to handler
        feedback_btn.click(handle_feedback, inputs=feedback_input, outputs=feedback_output)

    # ℹ️ About Tab
    with gr.Tab("ℹ️ About"):
        gr.Markdown("""
        ## About Quote Explorer

        This application helps you discover inspirational and thought-provoking quotes using semantic search with Sentence Transformers and FAISS indexing.

        - Built using Python, Gradio, and SBERT
        - Developed by: A ~ I ~ A
        - Version: 1.0.0
        """)

# ------------------ Launch Gradio App ------------------

demo.launch()  # Start the app and open in browser


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://270dd9dc37e93b93c8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


