In [2]:
"""
RESOURCES
https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/
https://python.langchain.com/docs/tutorials/rag/
https://scalexi.medium.com/implementing-a-retrieval-augmented-generation-rag-system-with-openais-api-using-langchain-ab39b60b4d9f
"""

'\nRESOURCES\nhttps://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/\nhttps://python.langchain.com/docs/tutorials/rag/\nhttps://scalexi.medium.com/implementing-a-retrieval-augmented-generation-rag-system-with-openais-api-using-langchain-ab39b60b4d9f\n'

In [3]:
## Imports
import os
from dotenv import load_dotenv

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

Generate and save [Langchain API key](https://docs.smith.langchain.com/how_to_guides/setup/create_account_api_key) to `.env`.

In [4]:
## Setup the API keys
load_dotenv()
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")

To provide context-rich data to the interior designer, I created a text repository comprising of fake expert data.
I used ChatGPT to create the fake data.
For easy handling, separate text files were created for different areas of the house.

In [5]:
## Function for reading text files containing information for RAG
def read_txt_files_in_folder(folder_path):
    all_texts = []
    
    for root, dirs, files in os.walk(folder_path):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                    filtered_content = ''.join([char for char in content if char not in ['**','#','##','###']])
                    all_texts.append(filtered_content)
    
    return all_texts

In [6]:
## Reading the text files
text = read_txt_files_in_folder('data/')

In [7]:
print(len(text))
print(len(text[0])) 

4
5676


The text data should be split into manageable chunks that fit within the context window of the model.
`RecursiveCharacterTextSplitter` recursively splits the text data into fragments using characters from the default list `["\n\n", "\n", " ", ""]`, by finding the one that works. Chunks are created so that they are less than or equal in length to `chunk_size`.
While there are several other text splitters, [this splitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/) works best for generic text. 

`create_documents` is a little confusing since we have already processed the text documents into an array of strings. This method takes the array of string data as input and returns a set of 'document' objects that contain the split chunks. 

In [8]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=200, 
    add_start_index=True
)

## Converting text data into documents
docs = text_splitter.create_documents(text)

In [9]:
print(len(docs))
print(docs[1])

28
page_content='**Pro Tip:**
Layer different shades of the same color for a sophisticated, monochromatic look. Add texture with natural fabrics like linen and wool.

 2. **Layered Textiles for Cozy Comfort**

Bedrooms in 2024 are embracing the concept of **“cocooning” with layered textiles.** From plush rugs to oversized throws, layering different fabrics adds warmth and comfort. Mixing materials like velvet, cotton, and faux fur creates a tactile experience that invites you to sink in and unwind.

 **Pro Tip:**
Opt for a combination of heavy and light fabrics to create a balanced, luxurious feel. A faux fur throw paired with linen sheets can be both stylish and functional.

 3. **Minimalist Luxury with Clean Lines**

While comfort is key, **2024’s bedrooms also focus on minimalist luxury.** Clean lines, uncluttered spaces, and high-quality materials define this trend. Think of it as “less is more” but with a touch of opulence. The idea is to create a space that feels both serene and 

Now we have domain specific information, loaded from text files and processed into a format suitable for LangChain. For each user query, we should retrieve the appropriate snippets and provide them as context to the model. The RAG process is only as good as the retrieved snippets' relevance and quality. LangChain has [implementations](https://python.langchain.com/v0.2/docs/concepts/#retrieval) of multiple retrieval techniques that are suitable for different usecases.  

Here, I use vector stores, one of the the simplest methods of retrieval. This is a beginner friendly method. Specifically, [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/) vector database was used to prepare the vector store. Here, unstructured text data is transformed into embeddings and during query phase, the query is converted to an embedding, the appropriate snippets are retrieved based on embedding similarity and an index corresponding to the relevant chunk is returned. Embeddings are computed using [OpenAI embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai/).

It is important to note that, in addition to the retrieval method, the size of the chunks and overlap used during text splitting play a key role on the effectiveness of RAG inputs.

In [10]:
vectorstore = Chroma.from_documents(documents=docs, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

The output of an LLM will be only as good as the prompt we give. [LangChain Hub](https://smith.langchain.com/hub) consists of pre-defined prompts for diverse usecases. created a prompt template based on `rag-prompt` from the Hub.   
[PromptTemplate](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/) converts the string prompt to a LangChain prompt template. 

In [11]:
## Template based on hub.pull("rlm/rag-prompt")
template = """Use the following pieces of context to answer the questions related to interior and exterior design of homes. Please respond without using double-quotation marks. 
If the question is not related to interior or exterior design, politely say that your are an assistant helping with interior and exterior design and tell the user to ask relavant questions, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}

Question: {question}

Helpful Answer:"""

custom_rag_prompt = PromptTemplate.from_template(template)

In [12]:
print(custom_rag_prompt)

input_variables=['context', 'question'] template="Use the following pieces of context to answer the questions related to interior and exterior design of homes. Please respond without using double-quotation marks. \nIf the question is not related to interior or exterior design, politely say that your are an assistant helping with interior and exterior design and tell the user to ask relavant questions, don't try to make up an answer.\nUse three sentences maximum and keep the answer as concise as possible.\n{context}\n\nQuestion: {question}\n\nHelpful Answer:"


As the LLM, I use OpenAI's `gpt-4o` model through `ChatOpenAI` API of LangChain. To try other OpenAI models, you can simply update the `model_name` argument with a different model. Check [OpenAI Plaform](https://platform.openai.com/docs/models) for all available models. 

In [13]:
llm = ChatOpenAI(temperature=0.7, model_name="gpt-4o")

We have all the components required query our AI interior designer model. Now we create a chain that composes all the components and functions together. We use `RunnablePassthrough` to pass the user query into the prompt.     

In [14]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What color scheme should I use in a model kitchen?") 

'Consider using deep and bold colors like rich navy blues, forest greens, or charcoal grays to create depth and drama in your kitchen. Balance these hues with lighter countertops or backsplashes to maintain a harmonious look. Adding brass or gold fixtures can further enhance the luxurious feel.'

In [15]:
## Out of context question
rag_chain.invoke("What is the capital of the United States?") 

'I am an assistant focused on interior and exterior design. Please feel free to ask me any questions related to home design or decor.'