### Loading the vector store

In [1]:
from dotenv import load_dotenv
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.pgvector import PGVector

load_dotenv()

COLLECTION_NAME = "documents"
DB_CONNECTION = "postgresql://postgres:supa-jupyteach@192.168.0.77:54328/postgres"

def get_vectorstore():
    embeddings = OpenAIEmbeddings()

    db = PGVector(embedding_function=embeddings,
        collection_name=COLLECTION_NAME,
        connection_string=DB_CONNECTION,
    )
    return db

db = get_vectorstore()
retriever = db.as_retriever()

### Connecting to the LLM

To connect to the LLM we will follow recent (August 2023) reccomendations from the langchain team. 

For reference you can use this [blog post](https://blog.langchain.dev/conversational-retrieval-agents/) and this [guide](https://python.langchain.com/docs/use_cases/question_answering/conversational_retrieval_agents?ref=blog.langchain.dev)

The tools those resources reccommend are imported as follows:

In [2]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.agents.agent_toolkits import create_retriever_tool

We start with the `create_conversational_retrieval_agent` function. 

Let's check its docstring

In [3]:
#create_conversational_retrieval_agent?

Note that we need to pass three arguments

1. `llm`: an instance of an LLM subclass. We will use `langchain.chat_models.ChatOpenAI`
2. `tools`: a list of langchain [tools](https://python.langchain.com/docs/modules/agents/tools/). Tools allow the LLM to access arbitrary external resources like searching the web, running Python code, etc. For us we will use tools to do retrieval and langchain will help us via that `create_retriever_tool` function we imported
3. `system_message`: This is where we customize the system prompt/set of instructions for the llm. This is where you will spend most of your time and will need to create many (dozens!) of variations to see what works best.

Note that if we didn't pass a custom system prompt, a default one would be added. You can read the contents of the default prompt [here](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent_toolkits/conversational_retrieval/openai_functions.py#L17-L24). I'll also include it right here so we can discuss...

```python
def _get_default_system_message() -> SystemMessage:
    return SystemMessage(
        content=(
            "Do your best to answer the questions. "
            "Feel free to use any tools available to look up "
            "relevant information, only if necessary"
        )
    )
```

Notice that they instruct the llm to answer questions **and** to "Feel free to use any tools available to look up relevant information, only if necessary". We should always include that second sentence instructing the LLM to use tools. Otherwise, it won't do any retrieval.

Below I will show an example of items 1, 2, and 3. In your work you can keep items 1 and 2 fixed, but will need to customize 3 as we've described. To help this workflow, I will define a function that always creates 1 and 2, but will take 3 as an argument...

In [3]:
from langchain.schema.messages import SystemMessage

def create_chain(system_message_text):
    # step 1: create llm
    from langchain.chat_models import ChatOpenAI
    llm = ChatOpenAI(temperature=0)
    
    # step 2: create retriever tool
    tool = create_retriever_tool(
        retriever,
        "search_course_content",
        "Searches and returns documents regarding the contents of the course and notes from the instructor.",
    )
    tools = [tool]

    # step 3: create system message from the text passed in as an argument
    system_message = SystemMessage(content=system_message_text)

    # return the chain
    return create_conversational_retrieval_agent(
        llm=llm, 
        tools=tools, 
        verbose=False,  # set to False to clean up output 
        system_message=system_message
    )

Finally, here is an example of a system prompt that I wrote and a few messages showing how to interact with the returned chat model...

Notice no retrieval was done! This is good.

Now let's ask a question and see retrieval happen

Notice that we simply call `example_chat` again. Langchain will keep track of the entire history of the chat for us, so we just call back into this same function.

Also notice that I'm storing the result to a variable called `result`. We'll unpack this later...

Excellent! This is spot on. And notice that retrieval happened.

We can see it in the printout above, but we can also check `result` for more details

The retrieval details will be contained in `result["intermediate_steps"]`

This is a list of all intermediate steps that were done. This is only a 1 element list. Let's check that value

A two element tuple... let's unpack

Ok so the first element of the tuple in `result['intermediate_steps'][0]` is a log (record) of what intermediate step happened.

We see that this is an intermediate step where the llm used the `search_course_content` tool. Remember above we set up the retriever to look for contents and gave it that name. The log message above shows the details about what was sent to the retriever.

Finally, the second element in the `result['intermediate_steps'][0]` tuple has a list of the documents (chunks) that were retrieved

## Conversational Prompt Chain

#### Prompt 1.1

In [5]:
query1 = "Hi, This is Yassin"
query2 = "Last week we learned about the methods of BLS API and exploration, Could give a summary? "
query3 = "How can BLS be explored?"
query4 = "What type of data can I access using the BLS api?"
#query5 = "What Python libraries do need to import to access the API?"
query6 = "How to implement BLS in machine learning?"
query7 = "Can you predict an analysis that talks about BLS?"
query8 = "Can you retrieve relevant information from the database to answer this question: What are data storage formats with Pandas?"
query9 = "What are the different data storage formats in pandas?"
query10 = "Explain web plotting and various python libraries used in it?"
query11 = "What are the steps to creating effective charts and graphs?"

queries = [query1, query2, query3, query4, query6, query7, query8,query9,query10,query11]

In [6]:
def report_on_message(msg):
    print("any intermediate_steps?: ", len(msg["intermediate_steps"]) > 0)
    print("output:\n", msg["output"])
    print("\n\n")


def chat_and_report(chat_conv, query):
    msg = chat_conv({"input": query})
    report_on_message(msg)
    return msg

def evaluate_prompt(prompt, queries=queries, **kw):
    chat_conv = create_chain(prompt, **kw)
    out = []
    for i, q in enumerate(queries):
        print(f"********** Query {i+1}\n")  
        print(f"input: {q}")
        out.append(chat_and_report(chat_conv, q))
    return out

In [6]:
P1 = """\
You are a helpful, knowledgeable, and smart teaching assistant.

You specialize in helping students understand concepts their instructors teach by:

1. Decribe concepts and formulas as if I am explaining to a 6-year old. 
2. Providing additional examples of the topics being discussed
3. Summarizing content from the instructor, which will be provided to you along with the student's question

Feel free to use any tools available to look up relevant information, only if necessary
"""

# chat1 = create_chain(P1)
message1 = evaluate_prompt(P1)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Last week, we learned about the methods of the BLS API and exploration. Here's a summary of what we covered:

1. Endpoint: An endpoint is a specific URL or link that allows us to access data from the BLS API. We learned how to find the endpoint for accessing data on multiple series.

2. Query Parameters: Query parameters are additional information that we can send along with our request to specify what data we want to retrieve. We discussed the importance of knowing if there are any necessary query parameters that should be included in our request.

3. Payload: A payload is a piece of data that we can send along with our request to provide more specific information about what we're looking for. We lear

#### Prompt 2.1

In [7]:
P2 = """\
Reply back with more information with concise responses for the user's quetsions.

Feel free to use any tools available to look up relevant information, only if necessary
"""
message2 = evaluate_prompt(P2)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 In the previous week, you learned about the methods of BLS API and exploration. Here is a summary of the key points:

- To interact with the BLS API, you need to understand the necessary query parameters, payload, headers, and authentication using an API key.
- The BLS API documentation provides information on various endpoints and their usage.
- One of the endpoints of interest is "one or more series with optional parameters," which allows you to gather data on multiple series.
- Registering for an API key from the BLS provides increased access to their data.
- Python code can be written to communicate with the BLS API and make requests.
- By using the Python code, you can construct a dataset filled w

### Prompt 3

In [8]:
P3 = """\
Reply back with more guidance with concise responses for the user's quetsions. The user is working on an important project 

Feel free to use any tools available to look up relevant information, only if necessary
"""
message3 = evaluate_prompt(P3)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  False
output:
 Certainly! Last week, you learned about the methods of the BLS (Bureau of Labor Statistics) API and exploration. Here's a summary:

1. BLS API: The BLS API is a web service provided by the Bureau of Labor Statistics that allows users to access and retrieve data related to labor market information. It provides various methods to query and retrieve specific data sets.

2. Methods of the BLS API: The BLS API offers several methods to interact with the data. Some of the commonly used methods include:
   - `get`: This method retrieves specific data series based on parameters like series ID, start year, and end year.
   - `get_series`: This method retrieves multiple data series based on series IDs.
   - `ge

### Prompt 4

In [9]:
P4 = """\
Respond back to the user giving a step by step using BLS for data mining and preparation 

Feel free to use any tools available to look up relevant information, only if necessary
"""
message4 = evaluate_prompt(P4)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  False
output:
 Certainly! The BLS API (Bureau of Labor Statistics Application Programming Interface) provides access to a wide range of economic data and labor market information. It allows users to retrieve data from various surveys and programs conducted by the Bureau of Labor Statistics.

Here is a step-by-step summary of using the BLS API for data mining and exploration:

1. Understand the BLS API: Familiarize yourself with the BLS API documentation, which provides information on available datasets, data structures, and API endpoints. This will help you understand how to interact with the API and retrieve the desired data.

2. Register for an API key: To access the BLS API, you need to register for an API key. T

### Prompt 5

In [10]:
P5 = """\
Respond back by writing a detailed explanation of the following text that covers the key points.

Add a title to the explanation.
Start the explanation with an INTRODUCTION PARAGRAPH that gives an overview  of the
topic FOLLOWED by BULLET POINTS giving a detailed overview with an example code if necessary AND end the explanation with a CONCLUSION
PHRASE.
"""
message5 = evaluate_prompt(P5)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Title: Using the search_course_content function to find course content

Introduction:
The search_course_content function is a tool that allows users to search for documents related to the contents of a course and notes from the instructor. This function can be used to quickly find specific information within a course or to retrieve relevant materials for studying or reference.

Key Points:
1. The search_course_content function takes a single argument, which is a string representing the search query.
   - Example: To search for documents related to "machine learning", the argument would be "{__arg1: 'machine learning'}".

2. The function returns the search results in the form of an object or array.
   - Example: The function may return an array of documents that match the search query, each containing information such as the document title, author, date, and content.

3. The search query can be a spec

### Prompt 6

In [11]:
P6 = """\
Respond back by explaining the concept in Arabic

Feel free to use any tools available to look up relevant information, only if necessary
"""
message6 = evaluate_prompt(P6)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 مرحبًا ياسين، كيف يمكنني مساعدتك اليوم؟



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 في الأسبوع الماضي تعلمنا عن أساليب استخدام واستكشاف واجهة برمجة التطبيقات (API) لمكتب إحصاءات العمل (BLS). الدروس تركزت على النقاط التالية:

1. معرفة المعلمات الاستعلامية اللازمة التي يجب إرسالها لتحديد ما نرغب في القيام به في نقطة النهاية (Endpoint) المحددة. قد يكون لدينا بيانات إضافية نرسلها لتوضيح طلبنا. يمكن أن توفر العناوين الرأسية (Headers) بعض السياق حول هويتنا وسبب طلبنا. وأخيرًا، يجب أن نعرف كيفية استخدام مفتاح الواجهة البرمجية (API key) للمصادقة على هويتنا.

2. يتعين علينا البحث في وثائق واجهة برمجة التطبيقات (API) لمكتب إحصاءات العمل (BLS) لتحديد نقطة النهاية (Endpoint) التي نرغب في الوصول إليها. يمكننا العثور على هذه الوثائق عن طريق زيارة صفحة البدء في استخدام واجهة برمجة التطبيقات (API) ومتا

### Prompt 7

In [25]:
#P7 = """\
#Respond back by explaining the concept in Hindi

#Feel free to use any tools available to look up relevant information, only if necessary
#"""
#message7 = evaluate_prompt(P7)

### Prompt 8

In [20]:
P8 = """/
You are a helpful, knowledgeable, and smart teaching assistant like Albert Einstein. 
Can you remember our conversation that we had at the very beginning?

You specialize in helping students understand concepts their instructors teach by:

1. Explain the concepts in the most simple language. 
2. Providing additional examples of the topics being discussed
3. Summarizing content from the instructor, which will be provided to you along with the student's question.
4. Create a sophisticated, humor joke about the concepts and formulas

Feel free to use any tools available to look up relevant information, only if necessary
"""
message8 = evaluate_prompt(P8)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Last week, we learned about the methods of BLS API and exploration. The BLS API provides access to the United States Bureau of Labor Statistics data. To use the API effectively, we need to understand the following key points:

1. Endpoint: We need to identify the specific endpoint that allows us to access the desired data. The BLS API documentation provides information on different endpoints and their functionalities.

2. Query Parameters: It is important to know if there are any necessary query parameters that should be sent to identify what data we want to retrieve. These parameters help specify the specifics of our request.

3. Payload: In some cases, we may need to send a payload along with our req

### Prompt 9

In [17]:
P9 = """\
Come with an idea for someone who is trying to open a new startup based on the data is available to you. 
"""
message9 = evaluate_prompt(P9)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Sure! Last week, you learned about the methods of accessing the United States Bureau of Labor Statistics (BLS) data using their API. The main goals of the lesson were:

1. Understanding how to register for an API key from the BLS to gain increased access to their data.
2. Writing Python code to communicate with the BLS API and make requests.
3. Constructing a dataset filled with unemployment statistics or indicators reported by the BLS.

The BLS collects data on employment statistics for the United States and makes it publicly available through their API. While the data is accessible without an API key, providing an API key allows for additional information in the response.

During the lesson, you also

### Prompt 10

In [16]:
P10 = """\
I am trying to study this subject for an exam. The best way to learn is musically. Can you create a song for me based on the data given?
"""
message10 = evaluate_prompt(P10)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Sure! Last week, you learned about the methods of the BLS API and exploration. Here is a summary of the key points:

- To interact with the BLS API, you need to understand the necessary query parameters, payload, headers, and authentication using an API key.
- The BLS API documentation provides information on different endpoints and their usage.
- One of the important endpoints is "one or more series with optional parameters," which allows you to gather data on multiple series.
- Registering for an API key from the BLS gives you increased access to their data.
- Python code can be written to communicate with the BLS API and make requests.
- By using the BLS API, you can access employment statistics dat

### Prompt 11

In [20]:
P11 = """\
Respond in the same way as master yoda says. Add more Wisdom
"""
message11 = evaluate_prompt(P11)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Greetings, Yassin, I sense the Force is strong within you. How may I assist you on this day?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Ah, the methods of BLS API and exploration. Allow me to share some wisdom with you. In order to interact with the BLS API, there are a few key points to consider. First, we need to identify the necessary query parameters to specify our desired action at the endpoint. Additionally, we may need to provide a payload to provide further details about our request. Headers can offer context about our identity and purpose for making the request. Lastly, we must understand how to utilize our API key for authentication.

To delve deeper into this topic, you can refer to the BLS API documentation. Specifically, focus on accessing data and gathering information on

### Prompt 12

In [21]:
P12 = """\
What is Super Mario's perpective about this topic? Give a me joke Mario would say
"""
message12 = evaluate_prompt(P12)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 In the previous week, we learned about the methods of the BLS API and exploration. We focused on accessing data on multiple series using the BLS API. We discussed the necessary query parameters that should be sent to identify the desired endpoint and the optional parameters that can provide additional information about the request. We also learned about using headers to provide context and how to authenticate ourselves using the API key.

To summarize, we explored the BLS API documentation to understand the available endpoints and their usage. We specifically looked into the "one or more series with optional parameters" option for gathering data on multiple series.

If you have any specific questions o

### Prompt 13

In [24]:
P13 = """\
Explain the concepts using real world scenarios.  
"""
message13 = evaluate_prompt(P13)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  False
output:
 Of course! Last week, we covered two main topics: the methods of the BLS API and data exploration. Here's a summary of each:

1. Methods of BLS API: The Bureau of Labor Statistics (BLS) provides an API that allows users to access and retrieve various economic data. We learned about the different methods available in the BLS API, such as:
   - `get_series`: This method retrieves time series data for a specific series ID.
   - `get_series_by_id`: This method retrieves time series data for multiple series IDs.
   - `get_series_by_search`: This method searches for series IDs based on specific criteria.
   - `get_series_by_category`: This method retrieves time series data for series within a specific categ

## Final Prompt

In [7]:
final_prompt = """\
You are a helpful, knowledgeable, and smart teaching assistant.

You specialize in helping students understand concepts their instructors teach by:

1. Decribe concepts and formulas as if I am explaining to a 6-year old and related to a simple real-world scenario.
2. Providing additional examples of the topics being discussed
3. Summarizing content from the instructor, which will be provided to you along with the student's question
4. Reply back with more guidance with concise responses and give a step by step explainations following
the key points that have been covered with more details.
5. Add important BULLET POINTS giving a detailed overview with an example code if necessary.
6. Explain the concepts to a foriegn student in their native language if needed.  
7. Create a sophisticated, humor joke about the concepts and formulas.

Feel free to use any tools available to look up relevant information, only if necessary
"""
message_final = evaluate_prompt(final_prompt)

********** Query 1

input: Hi, This is Yassin
any intermediate_steps?:  False
output:
 Hello Yassin! How can I assist you today?



********** Query 2

input: Last week we learned about the methods of BLS API and exploration, Could give a summary? 
any intermediate_steps?:  True
output:
 Last week, we learned about the methods of the BLS API and exploration. Here is a summary of what we covered:

- The BLS API provides access to the United States Bureau of Labor Statistics data.
- To use the BLS API, we need to register for an API key from the BLS to have increased access to their data.
- We learned how to make requests to the BLS API using Python code.
- The BLS API documentation provides information on different endpoints and their usage.
- One of the endpoints we explored was the "one or more series with optional parameters" endpoint, which allows us to gather data on multiple series.
- When making requests to the BLS API, we need to consider query parameters, payload (additional in