In [1]:
!pip install transformers torch sentence-transformers faiss-cpu flask pyngrok




In [2]:
import os
import json
import time
import logging
from typing import List, Dict, Any
from flask import Flask, request, jsonify
from threading import Thread

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, GenerationConfig, AutoModel
from sentence_transformers import SentenceTransformer
import faiss
from pyngrok import ngrok

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("shoplite_rag")

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
logger.info(f"Using device: {DEVICE}")

# ------------------------------
# Cell 3: Knowledge base (embedded)
# ------------------------------
# 15 documents taken from the earlier generated KB. Titles and content included.
KNOWLEDGE_BASE = [
    {
        "id": "doc1",
        "title": "Shoplite User Registration Process",
        "text": (
            "To create a Shoplite account, users visit the registration page and provide their email address, "
            "password, and profile information such as full name and phone number. Email verification is mandatory "
            "within 24 hours to activate the account. Users can choose between two types: a buyer account (free and immediate) "
            "or a seller account (requires business verification including tax ID and proof of ownership). Shoplite enforces strong "
            "password requirements with at least eight characters, a number, and a symbol. Account recovery options are available through "
            "both email and SMS authentication. Security notifications are sent when suspicious login activity is detected. Users may also "
            "enable two-factor authentication for added security. For sellers, additional onboarding steps include submitting business documents, "
            "waiting for manual review, and accepting Shoplite’s seller terms. Once approved, sellers gain access to inventory management and payment "
            "setup tools. Shoplite accounts are protected by industry-standard encryption to safeguard user data."
        )
    },
    {
        "id": "doc2",
        "title": "Shoplite Shopping Cart Features",
        "text": (
            "The Shoplite shopping cart allows users to add products from multiple sellers, save items for later, and apply promotional codes during checkout. "
            "Cart contents are preserved across sessions for logged-in users, while guest carts expire after 24 hours. Each item in the cart displays the seller, "
            "price, estimated delivery time, and applicable discounts. Users can update quantities, remove products, or move items to their wish list. A built-in price "
            "calculator updates totals in real time, showing shipping fees, taxes, and any active discount codes. If inventory runs out, the cart automatically notifies "
            "users before checkout. Items saved for later are not reserved, ensuring fair availability across all buyers. The cart supports multi-currency pricing depending "
            "on the user’s region. For sellers, the cart integrates with inventory systems to lock stock once a checkout is initiated, preventing overselling. The cart also "
            "includes an upsell feature, recommending similar or complementary products."
        )
    },
    {
        "id": "doc3",
        "title": "Shoplite Checkout and Payment Security",
        "text": (
            "Shoplite’s checkout process is designed for both convenience and safety. Users can choose guest checkout or log into their account for faster processing. "
            "At checkout, shipping details are validated in real time, and the system calculates estimated delivery dates based on user location. Payment options include "
            "credit/debit cards, PayPal, digital wallets, and Shoplite’s own gift cards. All transactions are processed through PCI-DSS-compliant gateways with end-to-end encryption. "
            "Users are redirected to secure payment providers for sensitive input, minimizing risk. Fraud detection algorithms flag unusual activity, such as mismatched billing addresses or repeated failed transactions. "
            "Buyers can save preferred payment methods in their account, secured with tokenization. Shoplite offers one-click checkout for logged-in users with verified addresses and payment methods. "
            "To protect sellers, funds are held temporarily until delivery confirmation or after a defined protection window. Refunds, chargebacks, and disputes are handled under strict financial compliance protocols."
        )
    },
    {
        "id": "doc4",
        "title": "Shoplite Order Tracking and Delivery",
        "text": (
            "After checkout, buyers receive an order confirmation email with an estimated delivery date. Each order is assigned a tracking number, accessible via the user’s account dashboard. "
            "Shoplite integrates with major logistics providers, updating status in real time. Buyers can view stages such as “Order Confirmed,” “Packed,” “Shipped,” and “Delivered.” Notifications are sent by email and in-app push alerts when key events occur. "
            "In case of delays, automated alerts inform buyers with revised estimates. For sellers, the system enforces timely shipment deadlines and issues penalties for late dispatches. Sellers can upload tracking numbers directly, which sync with the buyer’s account. "
            "For orders with multiple items, partial shipments are supported, with separate tracking for each package. Buyers may opt into delivery preferences, such as requiring a signature or leaving the package at a pickup location. Shoplite’s delivery system prioritizes transparency and ensures both buyers and sellers are informed at every stage."
        )
    },
    {
        "id": "doc5",
        "title": "Shoplite Return and Refund Policies",
        "text": (
            "Shoplite maintains a buyer-friendly return policy designed to balance customer trust with seller protection. Buyers can initiate returns within 14 days of delivery, provided the item is unused, in original packaging, and includes proof of purchase. "
            "Certain categories, such as digital goods, perishable items, and hygiene-sensitive products, are non-returnable. Refunds are issued back to the original payment method within 5–7 business days after seller approval. "
            "If sellers fail to respond within 3 business days, Shoplite automatically approves the return. Disputes are escalated to Shoplite’s resolution team, which mediates between buyer and seller. Return shipping costs may be covered by the seller or deducted from the refund, depending on seller settings. "
            "For defective or incorrect items, full refunds including shipping are guaranteed. Sellers receive detailed analytics on return rates to help identify quality issues. Shoplite enforces these rules to maintain fair trade while minimizing abuse."
        )
    },
    {
        "id": "doc6",
        "title": "Shoplite Product Reviews and Ratings",
        "text": (
            "Product reviews allow buyers to share feedback, build trust, and guide purchasing decisions. After order delivery, buyers are invited to rate items on a five-star scale and provide optional written feedback. "
            "Shoplite verifies that only buyers who completed a purchase can leave reviews, reducing fake or spam content. Reviews are public and visible on product pages, with the ability to sort by most recent or most helpful. Sellers can respond to reviews publicly, addressing concerns or thanking customers. Negative reviews trigger automated alerts, encouraging sellers to improve service. Shoplite also uses AI moderation tools to detect offensive language and spam. Aggregate ratings affect product ranking in search results and seller reputation scores. Verified reviews display a special badge to indicate authenticity. Buyers may also upload product photos, further enhancing credibility. Reviews contribute to community trust, and consistent poor ratings may trigger seller performance reviews or penalties."
        )
    },
    {
        "id": "doc7",
        "title": "Shoplite Seller Account Setup",
        "text": (
            "To become a seller on Shoplite, users must create a seller account through the registration portal. Required information includes business name, tax ID, bank details, and contact information. Sellers must upload proof of business ownership, such as incorporation certificates or trade licenses. "
            "Shoplite’s compliance team reviews submissions within 3–5 business days. Once approved, sellers can access the Seller Dashboard to list products, manage inventory, and configure shipping options. Shoplite enforces strict guidelines on prohibited items, including counterfeit goods and restricted categories. Sellers must also accept the Shoplite Seller Agreement, which outlines responsibilities and commission rates. Identity verification includes both digital checks and, in some cases, video calls. Sellers can set up multiple user roles within their accounts, such as administrators and staff accounts with limited permissions. Shoplite also provides onboarding resources, including tutorials and policy guides, to help new sellers succeed quickly."
        )
    },
    {
        "id": "doc8",
        "title": "Shoplite Inventory Management",
        "text": (
            "Inventory management on Shoplite ensures accurate stock levels and prevents overselling. Sellers can manually update quantities or integrate through Shoplite’s API for automated synchronization with external systems. When an item’s stock falls below a predefined threshold, Shoplite sends low-inventory alerts. Products automatically display “Out of Stock” once quantities reach zero. Bulk upload tools allow sellers to manage large catalogs efficiently using CSV or API endpoints. The system supports variants such as size, color, and bundle packs. Reserved stock is automatically deducted when a buyer initiates checkout but is released if payment fails or the session expires. For sellers with warehouses in multiple regions, Shoplite supports geo-based inventory, ensuring buyers see availability in their area. Reports provide insights into sales velocity and inventory turnover. Shoplite enforces strict penalties for sellers who repeatedly oversell, as this harms buyer trust and platform reputation."
        )
    },
    {
        "id": "doc9",
        "title": "Shoplite Commission and Fees",
        "text": (
            "Shoplite operates on a commission-based model supplemented by service fees. Standard commission rates vary between 8% and 15%, depending on the product category. High-demand categories, such as electronics, may have slightly higher rates, while books and media enjoy reduced fees. In addition, sellers pay a fixed transaction fee per completed order. Premium sellers who meet specific performance criteria may qualify for reduced commissions. Shoplite also offers subscription plans for high-volume sellers, providing access to advanced analytics, advertising tools, and priority customer support. All fees are deducted automatically before disbursement of seller earnings. Shoplite maintains full transparency by providing detailed invoices, accessible in the Seller Dashboard. Sellers must account for these costs when pricing products. Late shipment penalties and dispute resolution fees may also apply. Shoplite’s fee structure is reviewed annually to ensure competitiveness while funding platform development and operational security."
        )
    },
    {
        "id": "doc10",
        "title": "Shoplite Customer Support Procedures",
        "text": (
            "Shoplite’s customer support system is designed for both buyers and sellers. Buyers can reach support through live chat, email, or a ticketing system accessible from their account. Common issues include tracking inquiries, refunds, and payment troubleshooting. Sellers have a dedicated support line for account-related problems, policy clarifications, and technical integration help. Response times vary by issue severity, with priority given to payment and security-related cases. Shoplite also maintains a self-service knowledge base covering common FAQs and troubleshooting steps. Escalation protocols are in place: unresolved issues move from first-level agents to specialized teams. For disputes between buyers and sellers, Shoplite’s mediation team provides final resolution. Support channels are available 24/7 in multiple languages. Regular training ensures support agents remain updated on policies and new platform features. Shoplite continuously collects feedback on support performance to improve response quality."
        )
    },
    {
        "id": "doc11",
        "title": "Shoplite Mobile App Features",
        "text": (
            "The Shoplite mobile app provides buyers with seamless access to products and sellers with tools to manage their business on the go. Features include biometric login, personalized recommendations, and push notifications for order updates. Buyers can scan QR codes to apply discounts or track shipments. Sellers can use the app to update inventory, process orders, and respond to buyer messages. Offline functionality allows users to browse cached product pages and wish lists, with updates syncing when connectivity returns. The app includes a secure wallet feature for storing gift cards and promotional credits. Shoplite regularly updates the app with performance enhancements and new features, maintaining compatibility across iOS and Android platforms. App performance is optimized for low bandwidth, ensuring global accessibility. Ratings and reviews for the app itself are actively monitored, and user feedback helps prioritize new development."
        )
    },
    {
        "id": "doc12",
        "title": "Shoplite API Documentation Overview",
        "text": (
            "Shoplite provides a RESTful API for developers integrating with external systems such as ERPs, CRMs, and logistics platforms. The API supports authentication via OAuth 2.0 and issues access tokens for secure communication. Endpoints cover product listing, inventory management, order retrieval, payment confirmation, and account management. Rate limits are enforced to prevent abuse, with higher thresholds available for enterprise partners. Documentation includes example requests and responses in JSON format, with error codes clearly defined. Sandbox environments are available for testing before production deployment. Developers must register applications within their Shoplite account to receive client IDs and secrets. API updates are versioned, ensuring backward compatibility. Shoplite also provides SDKs in popular languages like Python, JavaScript, and PHP. Webhooks enable real-time updates on events such as order creation or shipment status. Developer support includes forums, tutorials, and dedicated technical assistance for enterprise clients."
        )
    },
    {
        "id": "doc13",
        "title": "Shoplite Security and Privacy Policies",
        "text": (
            "Shoplite prioritizes data protection and user privacy. All user information is encrypted both in transit (TLS 1.3) and at rest (AES-256). Access to sensitive data is restricted to authorized personnel under strict audit controls. Shoplite complies with GDPR, CCPA, and other international privacy regulations. Users may request account deletion at any time, with all personal data purged within 30 days. Cookie usage is limited to essential and analytics purposes, with explicit consent required in applicable regions. Shoplite employs regular penetration testing and vulnerability assessments, with immediate patching of identified issues. Login systems enforce adaptive security measures, including CAPTCHA challenges and IP-based risk scoring. Data breaches trigger mandatory notifications within 72 hours. Third-party integrations undergo security vetting before approval. Shoplite maintains a bug bounty program, rewarding security researchers for responsibly disclosing vulnerabilities. These policies ensure trust and regulatory compliance."
        )
    },
    {
        "id": "doc14",
        "title": "Shoplite Promotional Codes and Discounts",
        "text": (
            "Shoplite provides flexible promotional tools for both buyers and sellers. Buyers can enter promo codes at checkout, with discounts applied instantly to eligible items. Promotions may include percentage discounts, fixed-amount reductions, free shipping, or buy-one-get-one offers. Sellers can create custom campaigns via the Seller Dashboard, specifying start/end dates, usage limits, and applicable product categories. Shoplite automatically prevents overlapping discounts that would exceed policy limits. Promotional performance is tracked through analytics dashboards, showing redemption rates and revenue impact. Platform-wide seasonal sales, such as Black Friday, are coordinated by Shoplite, with sellers invited to participate. Buyers are notified of active promotions via email and push notifications. Promo codes are case-sensitive and may expire once redemption thresholds are reached. Abuse, such as unauthorized coupon sharing, may result in account suspension. Shoplite ensures discounts are applied transparently, maintaining fairness across the marketplace."
        )
    },
    {
        "id": "doc15",
        "title": "Shoplite Developer Best Practices",
        "text": (
            "Shoplite encourages developers to follow best practices when integrating with the platform. Proper error handling is essential, with retries implemented for temporary failures. Developers should minimize API calls by caching responses where possible. Authentication tokens must be stored securely and refreshed before expiration. Sensitive information, such as API keys, must never be hard-coded. Applications should respect rate limits to avoid throttling. Shoplite recommends using webhooks for event-driven integrations instead of constant polling. Developers should validate input thoroughly to prevent injection attacks or data corruption. For UI integrations, Shoplite advises following accessibility guidelines and providing multilingual support. Automated testing in sandbox environments ensures smooth deployments. Documentation updates should be monitored regularly, as deprecations may occur. Developers are also encouraged to join the Shoplite developer community to exchange knowledge and gain access to early beta features. Following these practices improves reliability and security of integrations."
        )
    }
]


In [3]:
PROMPTS = {
    "version": "1.1",
    "created": "2025-09-30",
    "author": "Your Name",

    "base_retrieval_prompt": {
        "role": "You are a helpful Shoplite customer service assistant.",
        "goal": "Provide accurate answers using only the provided Shoplite documentation.",
        "context_guidelines": [
            "Use only information from the provided document snippets.",
            "Cite specific documents when possible."
        ],
        "response_format": "Answer: [Provide a clear, concise response based on the context]\nSources: [List document titles referenced]"
    },

    "complex_question_prompt": {
        "role": "You are a Shoplite knowledge assistant specializing in combining multiple documents.",
        "goal": "Provide comprehensive answers that synthesize details across two or more documents.",
        "context_guidelines": [
            "Retrieve and integrate information from all relevant sources.",
            "If documents present overlapping details, explain them clearly.",
            "Always list each document used."
        ],
        "response_format": "Answer: [Provide a detailed response combining multiple documents]\nSources: [List document titles used]"
    },

    "no_context_prompt": {
        "role": "You are a Shoplite assistant that prioritizes factual accuracy.",
        "goal": "When no relevant document is retrieved, politely refuse to answer instead of guessing.",
        "context_guidelines": [
            "Do not invent or hallucinate answers.",
            "If context is missing, say you cannot provide an answer.",
            "Encourage the user to rephrase or provide more details."
        ],
        "response_format": "Answer: I'm sorry, I don’t have information about that in the Shoplite documentation.\nSources: None"
    },

    "clarification_prompt": {
        "role": "You are a Shoplite assistant skilled at guiding customers.",
        "goal": "Ask clarifying questions when a user request is ambiguous or incomplete.",
        "context_guidelines": [
            "Identify unclear terms or missing details in the query.",
            "Politely ask the user to provide additional information.",
            "Do not attempt to answer until clarification is received."
        ],
        "response_format": "Answer: Could you clarify your question? For example: [List possible clarifications]\nSources: None"
    }
}


In [4]:
EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"  # small, fast in Colab; change if you prefer
embedder = SentenceTransformer(EMBEDDING_MODEL_NAME)

def create_document_embeddings(kb: List[Dict[str,Any]]):
    texts = [doc["title"] + "\n\n" + doc["text"] for doc in kb]
    embeddings = embedder.encode(texts, convert_to_numpy=True, show_progress_bar=True)
    return embeddings

def build_faiss_index(embeddings, kb: List[Dict[str,Any]]):
    dim = embeddings.shape[1]
    index = faiss.IndexFlatIP(dim)  # Inner product; we'll normalize vectors for cosine sim
    faiss.normalize_L2(embeddings)
    index.add(embeddings)
    # keep a mapping from idx->doc
    return index

logger.info("Creating embeddings for knowledge base (this may take a moment)...")
KB_EMBEDDINGS = create_document_embeddings(KNOWLEDGE_BASE)
FAISS_INDEX = build_faiss_index(KB_EMBEDDINGS, KNOWLEDGE_BASE)
logger.info("FAISS index built with %d documents", FAISS_INDEX.ntotal)

# helper retrieval
def retrieve(query: str, top_k: int = 4):
    q_emb = embedder.encode([query], convert_to_numpy=True)
    faiss.normalize_L2(q_emb)
    D, I = FAISS_INDEX.search(q_emb, top_k)
    results = []
    for idx in I[0]:
        if idx < 0 or idx >= len(KNOWLEDGE_BASE):
            continue
        doc = KNOWLEDGE_BASE[idx]
        results.append({"id": doc["id"], "title": doc["title"], "text": doc["text"]})
    return results

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
# Cell 1: Installation, Upgrade, and Restart Script

# 1. Install and force-upgrade all necessary libraries
# -U ensures the latest version of bitsandbytes (for 4-bit) and transformers are used.
!pip install -q -U transformers torch accelerate bitsandbytes huggingface-hub sentencepiece

# 2. Restart Runtime Script
# This is REQUIRED to resolve the 'GPTNeoXTokenizer' and 'bitsandbytes version' errors.
import os
print("\nInstallation complete. The runtime must now restart to load the new libraries.")
# os.kill(os.getpid(), 9) is the command to crash and restart the runtime automatically in Colab
# If the runtime doesn't restart automatically, manually click 'Runtime' -> 'Restart runtime'.
# I'll leave it as a print statement in case you prefer to manually restart, but the intended code is:
# os.kill(os.getpid(), 9)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m96.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.3/564.3 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[?25h
Installation complete. The runtime must now restart to load the new libraries.


In [5]:
# Cell 2: Configuration, Login, and Model Execution (Run AFTER Runtime Restart)

# 1. Imports and Logging
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
from google.colab import userdata
from huggingface_hub import login
import logging

# Set up a simple logger
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('model_loader')

# Set DEVICE
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# 2. QUANTIZATION CONFIGURATION (Crucial for Free Colab)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# 3. SECURE HUGGING FACE LOGIN
print("Attempting secure Hugging Face login...")
try:
    hf_token = userdata.get('HUGGING_FACE_TOKEN')
    login(token=hf_token, add_to_git_credential=True)
    print("✅ Hugging Face authentication successful.")

except userdata.SecretNotFoundError:
    print("❌ ERROR: Secret 'HUGGING_FACE_TOKEN' not found.")
    print("Please set your Hugging Face token in the Colab Secrets panel (🔑 icon).")
except Exception as e:
    print(f"❌ ERROR during Hugging Face login: {e}")

# ----------------------------------------------------------------------
# 4. Model Loading Logic
# ----------------------------------------------------------------------
MODEL_NAME_PRIMARY = "microsoft/Phi-3-mini-4k-instruct"
MODEL_NAME_FALLBACK = "stabilityai/stablelm-zephyr-3b"

tokenizer = None
model = None
generator = None

def load_model(model_name=MODEL_NAME_PRIMARY):
    global tokenizer, model, generator

    # Primary Model Attempt
    try:
        logger.info(f"Attempting to load primary model: {model_name}")
        tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)

        # Load model with quantization
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            device_map="auto" if torch.cuda.is_available() else None,
            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
            low_cpu_mem_usage=True,
            quantization_config=bnb_config, # <-- 4-bit quantization applied
        )

        # Removed 'device' argument
        generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
        logger.info(f"✅ Primary Model ({model_name}) loaded successfully.")

    except Exception as e:
        logger.warning(f"❌ Failed to load primary model {model_name}: {e}")

        # --- Fallback Attempt ---
        try:
            logger.info(f"Loading fallback model: {MODEL_NAME_FALLBACK}")
            # trust_remote_code=True for custom tokenizers (like GPTNeoX)
            tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME_FALLBACK, use_fast=False, trust_remote_code=True)

            # Load fallback model with quantization
            model = AutoModelForCausalLM.from_pretrained(
                MODEL_NAME_FALLBACK,
                device_map="auto" if torch.cuda.is_available() else None,
                quantization_config=bnb_config, # <-- 4-bit quantization applied
                trust_remote_code=True
            )

            # Removed 'device' argument
            generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
            logger.info(f"✅ Fallback model ({MODEL_NAME_FALLBACK}) loaded successfully.")

        except Exception as e_fallback:
            logger.error(f"❌ Failed to load **both** primary and fallback models. Final Error: {e_fallback}")
            print("\nFATAL ERROR: Could not load any model. Check your Hugging Face access, token, and model names.")


# --- Execution ---
load_model()

Attempting secure Hugging Face login...
✅ Hugging Face authentication successful.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

Device set to use cuda:0


In [6]:
# Cell 7: RAG pipeline: combine retrieval + generation
# ------------------------------

from typing import List, Dict, Any

def build_prompt_with_context(user_query: str, retrieved_docs: List[Dict[str, Any]], prompt_config: Dict[str, Any] = None):
    """
    Build a prompt for the LLM using the retrieved documents and prompt configuration.
    """
    # Use base retrieval prompt by default
    prompt_config = prompt_config or PROMPTS["base_retrieval_prompt"]
    role = prompt_config["role"]
    goal = prompt_config["goal"]
    response_format = prompt_config["response_format"]

    # Compose context: include titles and clipped text for each retrieved doc
    context_pieces = []
    for d in retrieved_docs:
        snippet = d["text"][:800].replace("\n", " ")
        context_pieces.append(f"=== {d['title']} ===\n{snippet}")
    context_block = "\n\n".join(context_pieces)

    prompt = (
        f"{role}\n\n"
        f"Goal: {goal}\n\n"
        f"Context:\n{context_block}\n\n"
        f"User question: {user_query}\n\n"
        f"Instructions: Follow the response format below.\n"
        f"{response_format}\n\n"
        f"Final Answer:\n"
    )
    return prompt


def generate_answer(prompt: str, max_new_tokens: int = 512, temperature: float = 0.0):
    """
    Generate an answer from the LLM for a given prompt. Stops cleanly at "Sources:".
    """
    if generator is None:
        return "LLM not loaded."

    out = generator(
        prompt,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        do_sample=(temperature > 0.0),
        return_full_text=False
    )

    chunk = out[0]["generated_text"] if isinstance(out, list) else str(out)

    # Trim everything after "Sources:" to prevent repeated instructions
    if "Sources:" in chunk:
        chunk = chunk.split("Sources:")[0].strip()

    return chunk.strip()


def extract_sources(answer_text: str, retrieved_docs: List[Dict[str, Any]]):
    """
    Extract a list of source titles from the answer text. Falls back to retrieved doc titles.
    """
    if "Sources:" in answer_text:
        _, _, src_text = answer_text.partition("Sources:")
        sources = [line.strip("-• ").strip() for line in src_text.split("\n") if line.strip()]
        if not sources:
            sources = [d["title"] for d in retrieved_docs]
    else:
        sources = [d["title"] for d in retrieved_docs]
    return sources


def rag_respond(question: str, top_k: int = 4):
    """
    Full RAG response pipeline: retrieve docs, build prompt, generate answer, extract sources.
    """
    # Limit number of retrieved documents
    top_k = min(top_k, 2)
    retrieved = retrieve(question, top_k=top_k)

    if not retrieved:
        return {
            "answer": PROMPTS["no_context_prompt"]["response_format"],
            "sources": [],
            "retrieved_docs": []
        }

    # Build prompt and generate answer
    prompt = build_prompt_with_context(question, retrieved)
    answer_text = generate_answer(prompt)

    # Extract sources cleanly
    sources = extract_sources(answer_text, retrieved)

    return {
        "answer": answer_text,
        "sources": sources,
        "retrieved_docs": retrieved
    }


In [7]:
# Cell 8: Flask app for endpoints
# ------------------------------
from flask import Flask, request, jsonify
from threading import Thread

app = Flask(__name__)

@app.route("/health", methods=["GET"])
def health():
    return jsonify({
        "status": "ok",
        "device": DEVICE,
        "model_loaded": True if model is not None else False,
        "faiss_index_size": FAISS_INDEX.ntotal
    })

@app.route("/ping", methods=["POST"])
def ping():
    """
    Direct LLM ping without retrieval.
    JSON body: {"prompt": "...", "max_new_tokens": 128}
    """
    try:
        payload = request.json
        prompt = payload.get("prompt", "")
        max_new_tokens = int(payload.get("max_new_tokens", 128))
        temp = float(payload.get("temperature", 0.0))
        if not prompt:
            return jsonify({"error": "Empty prompt"}), 400
        out = generate_answer(prompt, max_new_tokens=max_new_tokens, temperature=temp)
        return jsonify({"response": out})
    except Exception as e:
        logger.exception("Ping error")
        return jsonify({"error": str(e)}), 500

@app.route("/chat", methods=["POST"])
def chat():
    """
    RAG chat endpoint.
    JSON body: {"question": "...", "top_k": 4}
    """
    try:
        payload = request.json
        question = payload.get("question", "")
        top_k = int(payload.get("top_k", 2))
        if not question:
            return jsonify({"error": "Empty question"}), 400
        result = rag_respond(question, top_k=top_k)
        return jsonify(result)
    except Exception as e:
        logger.exception("Chat error")
        return jsonify({"error": str(e)}), 500

def run_flask():
    # flask runs on 127.0.0.1:5000
    app.run(host="0.0.0.0", port=5005)

# Run Flask in a background thread so notebook remains interactive
flask_thread = Thread(target=run_flask, daemon=True)
flask_thread.start()
logger.info("Flask started in background thread.")

In [8]:
!pip install ngrok

Collecting ngrok
  Downloading ngrok-1.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Downloading ngrok-1.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ngrok
Successfully installed ngrok-1.5.1


In [9]:
from pyngrok import ngrok
ngrok.kill()


In [12]:
from pyngrok import ngrok, conf
from google.colab import userdata
# 🔐 Load secrets from Colab's Secrets tab
ngrok_token = userdata.get("NGROK_AUTH_TOKEN")
reserved_domain = userdata.get("NGROK_DOMAIN")

# ✅ Safety check
if not ngrok_token or not reserved_domain:
    raise ValueError("Missing ngrok credentials. Please add NGROK_AUTHTOKEN and NGROK_DOMAIN in the Secrets tab.")

# 🔧 Set token and kill any lingering tunnels
conf.get_default().auth_token = ngrok_token
ngrok.kill()

# 🚀 Start tunnel bound to your reserved domain
public_url = ngrok.connect(
    addr=5005,
    bind_tls=True,
    hostname=reserved_domain
).public_url

# 📣 Display endpoints
print("✅ ngrok tunnel established at:", public_url)
print("Endpoints:")
print(f" - Chat (RAG): {public_url}/chat")
print(f" - Ping (LLM only): {public_url}/ping")
print(f" - Health: {public_url}/health")


✅ ngrok tunnel established at: https://shu-tenantlike-muddly.ngrok-free.dev
Endpoints:
 - Chat (RAG): https://shu-tenantlike-muddly.ngrok-free.dev/chat
 - Ping (LLM only): https://shu-tenantlike-muddly.ngrok-free.dev/ping
 - Health: https://shu-tenantlike-muddly.ngrok-free.dev/health


In [13]:
# ------------------------------
# Cell 10: Quick local tests (optional)
# ------------------------------
# Example tests you can run in the notebook cell
example_qs = [
    "How do I create a seller account on Shoplite?",
    "What are the return policies and how do I track an order?",
    "Which payment methods does Shoplite accept?"
]

for q in example_qs:
    print("Q:", q)
    res = rag_respond(q)
    print("A:", res["answer"][:600])
    print("Sources:", res["sources"])
    print("-" * 80)

# ------------------------------
# End of notebook
# ------------------------------


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Q: How do I create a seller account on Shoplite?




A: To create a seller account on Shoplite, you need to follow these steps:

1. Visit the Shoplite registration portal and click on the 'Create Seller Account' button.
2. Fill in the required information, including your business name, tax ID, bank details, and contact information.
3. Upload proof of business ownership, such as incorporation certificates or trade licenses.
4. Wait for the Shoplite compliance team to review your submission, which typically takes 3–5 business days.
5. Once your account is approved, log in to the Seller Dashboard to list your products, manage inventory, and configure 
Sources: ['Shoplite Seller Account Setup', 'Shoplite User Registration Process']
--------------------------------------------------------------------------------
Q: What are the return policies and how do I track an order?
A: Return policies at Shoplite allow buyers to return unused items in their original packaging within 14 days of delivery, with refunds issued back to the original payment m