# Strategic Planning Project (RAG and LLM Prompt Solution Section Preview) #

**by Pranai Fuang-Arom**

In our quest to enhance the quality of life for citizens, we delve into the underlying causes of pressing issues such as the chronic traffic congestion in Bangkok's Metropolitan areas. By identifying these root causes and their associated pain points, we empower Members of Parliament (MPs) to communicate more effectively with their constituents, addressing their longstanding concerns with precision and empathy.

Our approach necessitates rigorous validation of proposed solutions. This involves scrutinizing whether such measures align with guidelines and recommendations set forth by esteemed global organizations like the United Nations and the World Economic Forum. Such a methodical review ensures that our proposals are not only effective but also resonate with international best practices.

Furthermore, it is imperative to convey the significance of these proposals to the public in a manner that underscores their direct impact on daily life and future prospects. Drawing upon key insights from the Maslow Hierarchy of Needs and other relevant indicators, we aim to illustrate how these solutions are vital for enhancing the well-being and livelihoods of individuals and their families, thereby paving the way for a brighter future for subsequent generations.

The success of this endeavor hinges on public belief in the relevance and urgency of these issues and solutions. Once there is widespread recognition of their importance, we can anticipate substantial support for implementing these proposals in the Thai Parliament. This collective effort will not only solve immediate problems but also lay the groundwork for sustainable development and prosperity.

## Introduction

**Retrieval-Augmented Generation (RAG) in the Prompt**:
The prompt’s usage of RAG is critical for augmenting the knowledge of LLMs with additional, real-time, or private data, which is vital for creating AI applications that reason about contemporary or domain-specific data​​.
In the prompt, RAG comprises indexing (ingesting and indexing data from sources) and the retrieval-generation chain (retrieving relevant data from the index and passing it to the model during runtime)​​. This process allows the prompt to provide contextually rich and updated answers by combining user queries with the latest retrieved information, ensuring that solutions for user groups, like Thailand residents, are informed by the most current and relevant data (Langchain (RAG), 2023)

**Multiple Chains in the Prompt for Complex Query Processing**:
The prompt leverages the concept of multiple chains from Langchain, allowing for more complex and nuanced processing of queries. This involves using different components or "chains" in a sequence or parallel configuration to handle various aspects of a query.
For example, in finding solutions for specific user groups and benefits, the prompt might use one chain to retrieve current data relevant to the query, another to process this data in the context of existing knowledge, and yet another to generate insights or recommendations. This multi-chain approach allows the prompt to handle complex queries with greater depth and precision, offering more detailed and contextually relevant solutions (Langchain (Multiple), 2023).


In [None]:
# Importing Necessary Modules
import json
from operator import itemgetter

import dotenv
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnableMap, RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

import os

# Set OpenAI API key as an environment variable
os.environ["OPENAI_API_KEY"] = "REPLACE THIS WITH YOUR OWN API KEY"

## Data Preprocessing:

The data preprocessing section involves several steps to prepare the document for subsequent processing in the RAG pipeline.

### Document Loading:

- **WebBaseLoader Initialization:**
  - The document loader is initialized with a specific URL. For this demo, the chosen document is "A Grid-Based Spatial Analysis for Detecting Supply–Demand Gaps of Public Transports: A Case Study of the Bangkok Metropolitan Region."

### Text Splitting:

- **RecursiveCharacterTextSplitter Initialization:**
  - A text splitter is initialized to break down large documents into smaller chunks.
  - Parameters:
    - `chunk_size=500`: Specifies the size of each chunk.
    - `chunk_overlap=0`: Defines the overlap between consecutive chunks.

- **Document Splitting:**
  - The loaded document is split into chunks using the initialized text splitter (`text_splitter.split_documents(loader.load())`).

### Text Embeddings:

- **OpenAIEmbeddings Initialization:**
  - OpenAI embeddings are initialized to convert text into vector representations.

### Vector Store Creation:

- **FAISS Vector Store Creation:**
  - A FAISS vector store is created from the document chunks using OpenAI embeddings.
  - Parameters:
    - `documents=splits`: The chunks obtained from document splitting.
    - `embedding=OpenAIEmbeddings()`: The embedding method used for vectorization.

### Retriever Initialization:

- **Retriever Creation:**
  - The vector store is converted into a retriever (`retriever = vectorstore.as_retriever()`), enabling subsequent information retrieval in the RAG pipeline.

This data preprocessing prepares the document by breaking it into manageable chunks, converting text into vector representations, and creating a retriever for efficient information retrieval in the later stages of the pipeline.

In [None]:
# Initialize a document loader with a specific URL. For the purpose of the demo, we used "A Grid-Based Spatial Analysis for Detecting Supply–Demand Gaps of Public Transports: A Case Study of the Bangkok Metropolitan Region"
loader = WebBaseLoader("https://www.mdpi.com/2071-1050/12/24/10382/htm")
# Initialize a text splitter for breaking down large documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
# Split the loaded document into chunks using the text splitter
splits = text_splitter.split_documents(loader.load())
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings()
# Create a FAISS vector store from the document chunks with OpenAI embeddings
vectorstore = FAISS.from_documents(documents=splits, embedding=OpenAIEmbeddings())
# Convert the vector store into a retriever, for subsequent information retrieval
retriever = vectorstore.as_retriever()

## RAG Pipeline Overview:

The `rag_chain` is structured as a series of operations chained together to handle context, questions, and answer generation. Each component in the pipeline performs a specific task in the NLP workflow.

### Retriever and Question Pass-through:

- `{"context": retriever, "question": RunnablePassthrough()}`:
  - **retriever:** Represents the retriever object initialized earlier. It fetches relevant vectors based on the context.
  - **RunnablePassthrough():** A placeholder allowing the passage of questions through the pipeline.

### Prompt for Context and Question:

- **prompt:**
  - **ChatPromptTemplate:** A template providing guidance for generating helpful answers based on context and questions. This part injects context and questions into the pipeline.

### Chat Model for Answer Generation:

- **model:**
  - **ChatOpenAI:** A chat-based model leveraging OpenAI's capabilities to generate answers based on the provided context and questions. It processes the context and question, generating a response.

### Output Parsing:

- **StrOutputParser():**
  - Parses the output from the model into a string format for ease of handling and printing.


In [16]:
# Define a template for the chat prompt
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use five sentences maximum and keep the answer as concise as possible.
{context}

Question: {question}
Helpful Answer:"""
prompt = ChatPromptTemplate.from_template(template)

# Initialize a model using ChatOpenAI
model = ChatOpenAI()

# Define a runnable chain for the Retrieval-Augmented Generation (RAG) process
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

loader.load()

[Document(page_content='\n\n\n\n\n\n\n\nSustainability | Free Full-Text | A Grid-Based Spatial Analysis for Detecting Supply–Demand Gaps of Public Transports: A Case Study of the Bangkok Metropolitan Region\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNext Article in Journal\nAgriculture and Green Economy for Environmental Kuznets Curve Adoption in Developing Countries: Insights from Rwanda\n\n\n\n\n\n\n\nPrevious Article in Journal\nUnderstanding the Local Sustainable Economic Development from New “3D” Perspective: Case of Hainan Island\n\n\n\n\n\n\n\n\nJournals\n\n\n\n\nActive Journals\nFind a Journal\nProceedings Series\n\n\n\n\n\nTopics\n\n\n\nInformation\n\n\n\n\nFor Authors\nFor Reviewers\nFor Editors\nFor Librarians\nFor Publishers\nFor Societies\nFor Conference Organizers\n\n\nOpen Access Policy\nInstitutional Op

## Summarization of the Outputs

### User Group & Desired Benefit:

The code correctly identifies the user group as "Residents of Thailand" and pinpoints their desired benefit as "Traveling faster." This initial identification sets the context for the subsequent solutions suggested.

### Wanted Features:

The code extracts and summarizes the desired features sought by the user group. It suggests an improvement in the public transport system as the desired feature to enable faster transportation options. This identification seems appropriate given the context provided.

### Solutions:

The code generates a list of potential solutions that align with the identified user needs:

1. **Improving Existing Transport System:**
   - It suggests enhancing the efficiency and reliability of the current public transport system in Thailand. This aligns with the desire for faster travel options and addresses existing infrastructure.

2. **Investing in Infrastructure Development:**
   - Recommends investing in transportation infrastructure development, such as expanding and upgrading public transportation networks. This solution targets long-term enhancements to accommodate faster travel.

3. **Promoting Alternative Modes:**
   - Suggests promoting alternative modes of transportation like bicycles or walking for shorter distances. This diversifies the approach, offering sustainable and faster alternatives.

4. **Encouraging Carpooling/Ride-sharing:**
   - Recommends encouraging carpooling or ride-sharing services to reduce the number of vehicles on the road. This addresses traffic congestion and potentially enhances travel speed.

5. **Implementing Traffic Management Measures:**
   - Proposes implementing measures to reduce traffic congestion, like traffic management systems or congestion pricing. This tackles one of the significant barriers to faster travel.


In [17]:
# Define user group and benefit as prompt variables
user_group = "Residents of Thailand"
for_benefit = "Traveling faster"

# Define a function to find solutions based on the RAG chain
def find_solutions(user_group, for_benefit):
  wanted_features = rag_chain.invoke(f"For the {user_group} and their wishes for {for_benefit}, what feature is needed")
  solutions  = rag_chain.invoke(f"For the {user_group} and their wishes for {for_benefit}, list suitable solutions")
  json_object = {"user_group": user_group,
                 "wanted_features": wanted_features,
                 "for_benefit": for_benefit,
                 "solutions": solutions.split("\n")}
  json_string = json.dumps(json_object, indent=2)
  print(json_string)

  # Invoke the function with the specified user group and benefit
find_solutions(user_group, for_benefit)

{
  "user_group": "Residents of Thailand",
  "wanted_features": "Improvement of the public transport system with faster transportation options.",
  "for_benefit": "Traveling faster",
  "solutions": [
    "1. Improving the efficiency and reliability of the existing public transport system in Thailand.",
    "2. Investing in the development of transportation infrastructure, such as expanding and upgrading public transportation networks.",
    "3. Promoting the use of alternative modes of transportation, such as bicycles or walking, for shorter distances.",
    "4. Encouraging the adoption of carpooling or ride-sharing services to reduce the number of vehicles on the road.",
    "5. Implementing measures to reduce traffic congestion, such as implementing traffic management systems or implementing congestion pricing."
  ]
}


## Further Applications of the Codes and the Outputs

### Community Engagement and Grassroots Campaigns:

**Application:**
Utilizing the solutions as a foundation for grassroots campaigns and community engagement initiatives in Thailand to encourage behaviors like carpooling and alternative transport modes.

**Utilization:**
The solutions proposed align with the strategies outlined in the paper, providing tangible suggestions for community-driven campaigns promoting alternative transportation modes in Thailand (Institute, 2018).

### Technology Development for Traffic Management:

**Application:**
Exploring technological solutions, such as implementing traffic management systems, inspired by the suggestions provided to alleviate congestion in Thailand.

**Utilization:**
The proposed solutions echo the strategies outlined in the ITF report, serving as a practical starting point for the development and implementation of traffic management systems tailored to Thailand's congestion issues (OECD, 2007).


## Proposed Improvements & Problems

### Improvements Required:

**Adding Functions for Dynamic Capability Theory:**

The prompt can be improved by incorporating the principles of Dynamic Capabilities Theory, which focuses on an organization's ability to adapt to changes in its environment by reconfiguring its resources and capabilities​​ (Marcello M. Mariani, 2022). To implement this, the prompt can integrate functionalities that allow it to dynamically:

- Search for and integrate data from various sources, especially those that reflect current market trends and public demands.
- Regularly update its source URLs to include current and emerging topics in relevant fields.
- Utilize APIs from frequently updated data sources like news sites, journals, and databases that provide insights into changing public demands.
- Implement a feedback mechanism to continuously refine and update the search parameters based on the changing environment.

**Combining Sources for Detailed Outputs:**

The current prompt primarily uses a single source for data retrieval. To improve this, the system can be designed to aggregate and synthesize information from multiple sources (Marcello M. Mariani, 2022). This can be achieved by:

- Expanding the WebBaseLoader to pull data from a variety of sources, as mentioned in your document​​.
- Implementing a keyword co-occurrence analysis, similar to the approach in your document, to identify relevant themes and topics across different sources​​.
- Using advanced text analysis techniques to combine and synthesize information from these sources, ensuring a more comprehensive understanding of the topic.

### Problems Identified:

**Specificity of User Group or Benefits:**

If the `user_group` or `for_benefit` parameters are too specific, they may not find relevant data in the chosen sources. To mitigate this issue:

- Broaden the search parameters within the WebBaseLoader to include general keywords related to the specific `user_group` or `for_benefit`.
- Implement a tiered search approach where the system first looks for specific information and, if not found, gradually broadens its search scope.
- Include a fallback mechanism where the system can provide generalized suggestions or insights if specific data is not available.


## Bibliography

1. Institute, V. T. (2018). *Transit Examples Determining the Value of Public Transit Service.* TDM Encyclopedia.

2. Langchain (Multiple). (2023). Retrieved from Multiple Chains: [https://python.langchain.com/docs/expression_language/cookbook/multiple_chains](https://python.langchain.com/docs/expression_language/cookbook/multiple_chains)

3. Langchain (RAG). (2023). Retrieved from Retrieval-augmented generation (RAG): [https://python.langchain.com/docs/use_cases/question_answering/](https://python.langchain.com/docs/use_cases/question_answering/)

4. Marcello M. Mariani, I. M. (2022). *Artificial intelligence in innovation research: A systematic review, conceptual framework, and future research directions.* Elsevier, 25.

5. OECD. (2007). *MANAGING URBAN TRAFFIC CONGESTION.* JOINT TRANSPORT RESEARCH CENTRE, 31.
