In [1]:
pip install transformers sentence-transformers faiss-cpu langchain openai


Collecting transformers
  Downloading transformers-4.52.4-py3-none-any.whl.metadata (38 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp310-cp310-win_amd64.whl.metadata (5.0 kB)
Collecting langchain
  Downloading langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting openai
  Downloading openai-1.83.0-py3-none-any.whl.metadata (25 kB)
Collecting filelock (from transformers)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers)
  Downloading huggingface_hub-0.32.4-py3-none-any.whl.metadata (14 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-2.2.6-cp310-cp310-win_amd64.whl.metadata (60 kB)
Collecting pyyaml>=5.1 (from transformers)
  Using cached PyYAML-6.0.2-cp310-cp310-win_amd64.whl.metadata (2.1 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-20

In [8]:
import pandas as pd
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.docstore.document import Document

In [4]:
from dotenv import load_dotenv
import os

# Load variables from .env file into os.environ
load_dotenv()

# Now you can access the key
api_key = os.getenv("GROQ_API_KEY")


In [1]:
from dotenv import load_dotenv
import os
from groq import Groq

# Step 1: Load the .env file
load_dotenv()
# Step 2: Get the API key from environment variable
os.environ ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
# Step 3: Use the API key in the client
client = Groq()

# Step 4: Call the model
completion = client.chat.completions.create(
    model="deepseek-r1-distill-llama-70b",
    messages=[
        {
            "role": "user",
            "content": "Which is the best machine learning model for text generation?"
        }
    ],
    temperature=0.6,
    max_completion_tokens=4096,
    top_p=0.95,
    stream=True,
    stop=None,
)

# Step 5: Stream the result
for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")


<think>
Okay, so I'm trying to figure out which machine learning model is best for text generation. I've heard a bit about different models like GPT and BERT, but I'm not entirely sure how they all stack up against each other. Let me start by breaking this down.

First, I know that text generation is about creating coherent and natural-sounding text. So the model needs to understand context and produce the next word or character that makes sense. I've heard terms like RNN, LSTM, and Transformer. I think RNNs are older and maybe not as good for long texts because of something called vanishing gradients. LSTMs are a type of RNN that helps with that problem, so they can handle longer sequences better. But then Transformers came along, right? They use attention mechanisms which allow them to process sequences in parallel and handle long-range dependencies more effectively. So maybe models based on Transformers are better for text generation these days.

Then there's GPT, which I believe st

Embedding to Vector DB

In [12]:
# 1. Load the CSV
df = pd.read_csv(r'C:\Users\v-niranr\OneDrive - Microsoft\Desktop\Telecom Usecase\Data\bitext-telco-llm-chatbot-training-dataset.csv')  # 🔁 Use your CSV file path
print(df.head())


                                         instruction           intent  \
0  there are charges on my phone bill that i do n...  dispute_invoice   
1  my internet bill is incorrect, can you help me...  dispute_invoice   
2              my invoice is not corect challenge it  dispute_invoice   
3  I don't recognize some charges on my bill, can...  dispute_invoice   
4       i have to dispute a fucking bill i need help  dispute_invoice   

  category    tags                                           response  
0  BILLING  BCELQZ  If you have noticed discrepancies in your bill...  
1  BILLING   BCILZ  If there is an issue with your bill and you wo...  
2  BILLING  BCELQZ  If you find any discrepancies in your invoice ...  
3  BILLING    BCIL  If you have found charges on your bill that yo...  
4  BILLING    BCQW  If you believe there is an error on your bill ...  


In [13]:
# 2. Convert each row (all columns) to a single string
documents = [
    Document(page_content=row.to_string(index=False)) for _, row in df.iterrows()
]

print(documents)

[Document(metadata={}, page_content='there are charges on my phone bill that i do no...\n                                   dispute_invoice\n                                           BILLING\n                                            BCELQZ\nIf you have noticed discrepancies in your bill ...'), Document(metadata={}, page_content='my internet bill is incorrect, can you help me ...\n                                   dispute_invoice\n                                           BILLING\n                                             BCILZ\nIf there is an issue with your bill and you wou...'), Document(metadata={}, page_content='             my invoice is not corect challenge it\n                                   dispute_invoice\n                                           BILLING\n                                            BCELQZ\nIf you find any discrepancies in your invoice a...'), Document(metadata={}, page_content="I don't recognize some charges on my bill, can ...\n                 

In [None]:
# 2. Load Embedding Model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


In [None]:

# 3. Create Vector DB
vectorstore = FAISS.from_documents(documents, embedding_model)

# 4. Load LLM (OpenAI here; can be replaced with HuggingFace model)
llm = OpenAI(temperature=0)  # Requires `OPENAI_API_KEY` set in your environment

# 5. Create RAG QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# 6. Ask a Question
query = "Where is the Eiffel Tower?"
result = qa_chain(query)

# 7. Show Answer + Context
print("Answer:", result["result"])
print("Context used:")
for doc in result["source_documents"]:
    print("-", doc.page_content)


In [None]:

# 3. Create Vector DB
vectorstore = FAISS.from_documents(documents, embedding_model)

# 4. Load LLM (OpenAI here; can be replaced with HuggingFace model)
llm = OpenAI(temperature=0)  # Requires `OPENAI_API_KEY` set in your environment

# 5. Create RAG QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# 6. Ask a Question
query = "Where is the Eiffel Tower?"
result = qa_chain(query)

# 7. Show Answer + Context
print("Answer:", result["result"])
print("Context used:")
for doc in result["source_documents"]:
    print("-", doc.page_content)


In [None]:

# 3. Create Vector DB
vectorstore = FAISS.from_documents(documents, embedding_model)

# 4. Load LLM (OpenAI here; can be replaced with HuggingFace model)
llm = OpenAI(temperature=0)  # Requires `OPENAI_API_KEY` set in your environment

# 5. Create RAG QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# 6. Ask a Question
query = "Where is the Eiffel Tower?"
result = qa_chain(query)

# 7. Show Answer + Context
print("Answer:", result["result"])
print("Context used:")
for doc in result["source_documents"]:
    print("-", doc.page_content)
