# NICK'S EXPERT QUERY ANSWER 
### Hello guys, I have created this Expert Conversational Chatbot that  
### can be able to answer any question about How To Talk To Girls... 🤩🤩

## 1️⃣ Setup: Imports & Configuration

In [26]:
# Imports 
import os
import glob
from dotenv import load_dotenv
import gradio as gr

# LangChain imports
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [27]:
# Load environment variables
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "your-key-if-not-using-env")

In [28]:
# Define Model and Database Name
MODEL = "gpt-4o-mini"
DB_NAME = "vector_db"

## 2️⃣ Load and Process the Markdown File

In [30]:
# Ensure we're loading only directories inside the 'knowledge-base' folder
folders = [f for f in glob.glob("knowledge-base/*") if os.path.isdir(f)]

# Set text loader configurations
text_loader_kwargs = {'encoding': 'utf-8'}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    
    # Load all markdown files from the folder
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()

    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

# Ensure "How-to-Talk-to-Girls.md" is loaded
if not documents:
    raise ValueError("No documents found! Ensure 'How-to-Talk-to-Girls.md' is in 'knowledge-base'.")

print(f"✅ Loaded {len(documents)} documents from knowledge base.")


✅ Loaded 1 documents from knowledge base.


## 3️⃣ Split Documents into Chunks for Vector Storage

In [31]:
# Split the document into smaller chunks for embedding
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"✅ Split documents into {len(chunks)} chunks for vector storage.")


Created a chunk of size 1180, which is longer than the specified 1000


✅ Split documents into 180 chunks for vector storage.


## 4️⃣ Create & Store Embeddings in ChromaDB

In [32]:
# Use OpenAI embeddings (can replace with HuggingFace if needed)
embeddings = OpenAIEmbeddings()

# Delete existing vector database if it exists
if os.path.exists(DB_NAME):
    Chroma(persist_directory=DB_NAME, embedding_function=embeddings).delete_collection()

# Create new vector store
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=DB_NAME)
print(f"✅ Vectorstore created with {vectorstore._collection.count()} document chunks.")

# Retrieve collection for later use
collection = vectorstore._collection


✅ Vectorstore created with 180 document chunks.


## 5️⃣ Retrieve a Sample Embedding & Validate

In [33]:
# Get a sample embedding to verify it worked
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)

print(f"✅ The vectors have {dimensions} dimensions.")


✅ The vectors have 1536 dimensions.


## 6️⃣ Set Up Conversation Memory & LLM Chain

In [34]:
# Initialize the language model
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)

# Set up conversation memory for history tracking
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Create retriever abstraction over vector database
retriever = vectorstore.as_retriever()

# Set up the conversational retrieval chain (RAG)
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

print("✅ Conversation chain with memory initialized.")


✅ Conversation chain with memory initialized.


  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


## 7️⃣ Define Chatbot Function for Gradio

In [35]:
# Chat function for Gradio interface
def chat(message, history):
    """
    Processes user input and retrieves responses from the LLM with vector memory.
    """
    result = conversation_chain.invoke({"question": message})
    return result["answer"]


## 8️⃣ Build Gradio Chat Interface

In [36]:
# Launch the chat interface with Gradio
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

print("✅ Gradio chat interface running. Open browser to interact.")


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


✅ Gradio chat interface running. Open browser to interact.
