### Task 1: Calling LLM with a Basic Prompt

Create the ability to call LLM with a regular text prompt.

The prompt should allow obtaining information in simple language on a specific topic. In this task, we want to learn about the topic "Quantum Computing."

The model's response should include a definition, key advantages, and current research in this field.

Limit the response to 200 characters and specify in the prompt that the answer should be brief (this will save you time and money on generated tokens).

You can use either a model from HuggingFace (I recommend Mistral) or ChatGPT4 or ChatGPT3 as the LLM. In both cases, you need to import the necessary "wrapper" (i.e., the class that will allow you to initiate the model) from LangChain to call the LLM via API, as well as read your personal token from a file, for example, `creds.json`, which is stored locally and you DO NOT submit it in the homework and DO NOT commit it to git 😏

Set your temperature value at your discretion (there is no right or wrong value here) and explain why you chose that specific value for this task.

Requests can be made in either Ukrainian or English - consider where and if you want to leave this project later and which language will fit better. In the solutions, prompts should be in Ukrainian.

Quantum computing

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [3]:
request = "Explain quantum computing in a way that anyone can understand, even without a technical background."

In [4]:
print(mistral.invoke(request))





Quantum computing is a new way to process information using the strange and counterintuitive rules of quantum mechanics. In our everyday life, we use classical computers that work with bits, which are like tiny switches that can be either on or off. Quantum computers, on the other hand, use quantum bits, or qubits, which can exist in multiple states at once.

Imagine a coin that can be heads or tails, but also spin at the same time, making it heads and tails at the same time in different directions. This might sound confusing, but it's the key to quantum computing's power. With qubits, quantum computers can process a vast number of possibilities all at once, making them incredibly fast for certain types of calculations.

Classical computers are like a single person trying to solve a complex puzzle by looking at one piece at a time. Quantum computers, however, are like a team of people, each looking


# Conclusions 1
- For this task, the mistralai library was used, which is free, and it performed well for the current task.
- The LLM handled the task well and provided a general definition of how quantum computing differs from classical computing and began to describe the advantages of the methods. The LLM did not provide information on current research, but it seems that it ran out of space due to the 200-token limit.
- For the current task, a temperature of 0.2 was chosen, which allows the model to be quite specific and strict during text generation while still being able to be creative. Because with an increase in temperature, the model started to stray from the topic and add a lot of irrelevant information.

### Task 2: Creating a Parameterized Prompt for Text Generation
Now we want to update the previous functionality so that we can pass the topic as a parameter in the prompt. To do this, use `PromptTemplate` from `langchain` and implement a parameterized prompt and a model call with it.

Run the updated functionality (prompt + model) for explanations on the topics
- "Bayesian methods in machine learning"
- "Transformers in machine learning"
- "Explainable AI"

Display the results of the model's execution on the screen.

In [8]:
prompt = PromptTemplate(
    input_variables=["topic"],
    template="Explain {topic} in a way that anyone can understand, even without a technical background, but precise enough for a technical audience."
)

topic_list = ['Bayesian methods in machine learning', 'Transformers in machine learning', 'Explainable AI']

for topic in topic_list:
    print("\nCURRENT TOPIC IS: ", topic)
    print(mistral.invoke(prompt.format(topic=topic)))




CURRENT TOPIC IS:  Bayesian methods in machine learning


Bayesian methods in machine learning are a way to make predictions or find patterns in data using probabilities. Imagine you have a box with different colored balls, and you want to know what percentage of each color is in the box without looking inside. You can make an educated guess based on the information you have, such as the colors of the balls you've seen before or the total number of balls.

In machine learning, we have data instead of balls, and we want to make predictions or find patterns. Bayesian methods allow us to update our beliefs or probabilities about the underlying patterns based on new data.

Let's say we have a dataset of emails, and we want to know if an email is spam or not. We can start with some initial beliefs or priors about the probability of an email being spam or not. As we receive more data, we can update our beliefs based on the new information.

For example, if we

CURRENT TOPIC IS:  Transformer





Transformers are a type of machine learning model that can be thought of as a super-powered version of autocorrelation, a statistical technique used in signal processing to understand the relationship between a signal and a lagged version of itself. In simpler terms, autocorrelation helps us find patterns in data by comparing it to a slightly delayed version of itself.

Transformers take this idea and apply it to understanding the relationships between words in a sentence or documents in a collection. Instead of comparing words to a lagged version of themselves, Transformers compare words to all other words in the context, allowing the model to understand the meaning and relationships between words in a more nuanced way.

Imagine a large party where everyone is talking at once. Each person's conversation is a sentence, and the overall noise is the collection of all the sentences. Autocorrelation would be like trying to understand one person's conversation by listening to their own wo



In [9]:
chain = prompt | mistral

print(chain.invoke('Bayesian methods in machine learning'))



Bayesian methods in machine learning are a way to make predictions or find patterns in data using probabilities. Imagine you have a box with different colored balls, and you want to know what percentage of each color is in the box without looking inside. You can make an educated guess based on the information you have, such as the colors of the balls you've seen before or the total number of balls.

In machine learning, we have data instead of balls, and we want to make predictions or find patterns. Bayesian methods allow us to update our beliefs or probabilities about the underlying patterns based on new data.

Let's say we have a dataset of emails, and we want to know if an email is spam or not. We can start with some initial beliefs or priors about the probability of an email being spam or not. As we receive more data, we can update our beliefs based on the new information.

For example, if we




# Conclusions 2
- The same language model was used, but it turned out to be necessary to modify the prompt and add "but precise enough for a technical audience." Because after we placed the prompt in PromptTemplate, it started generating overly simplified and "childish" explanations.
- A chain was also used, for practice on how it works. We can compare the prompts for the query "Bayesian methods in machine learning." The LLM works quite similarly in both calls, even providing the same example that describes how we can use Bayesian methods.

### Task 3: Using an Agent to Automate Processes
Create an agent that will help automatically search for information about the latest scientific publications in various fields. For example, the agent should find the 5 most recent publications on the topic of artificial intelligence.

**Steps:**
1. Set up a ReAct type agent in LangChain to perform automatic queries.
2. Create a prompt that directs the agent to search for information on the internet or in databases of scientific publications.
3. The agent should output a list of publications, each containing the title, authors, and a brief description.

To interact with the search, it is necessary to create a `Tool`. In the lecture, we used `serpapi`. You can continue using it or choose another API for searching (some of them are free). A list of various APIs available in LangChain and a pricing guide for queries can be found in a separate document [here](https://hannapylieva.notion.site/API-12994835849480a69b2adf2b8441cbb3?pvs=4).

I also leave below an example of using one of the free search APIs - DuckDuckGo (does not require token creation!) - you might like it :)

In [None]:
from langchain_community.tools import DuckDuckGoSearchRun

tools = load_tools(["ddg-search"], llm=llm) 
tools.extend(load_tools(["arxiv"], llm=llm))  

prompt = hub.pull("hwchase17/react")



In [143]:
tools

[DuckDuckGoSearchRun(api_wrapper=DuckDuckGoSearchAPIWrapper(region='wt-wt', safesearch='moderate', time='y', max_results=5, backend='auto', source='text')),
 ArxivQueryRun(api_wrapper=ArxivAPIWrapper(arxiv_search=<class 'arxiv.Search'>, arxiv_exceptions=(<class 'arxiv.ArxivError'>, <class 'arxiv.UnexpectedEmptyPageError'>, <class 'arxiv.HTTPError'>), top_k_results=3, ARXIV_MAX_QUERY_LENGTH=300, continue_on_failure=False, load_max_docs=100, load_all_available_meta=False, doc_content_chars_max=4000))]

In [146]:
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

In [None]:
response = agent_executor.invoke({
    'input': "Search for publications, papers, or news on the topic of 'Artificial Intelligence'. Provide 5 publications on this topic. Provide a list of publications, including the title, authors, and a brief summary of each paper (link if possible)."
})

save_response = response
save_response



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo find recent publications on the topic of 'Artificial Intelligence', I will search for relevant papers on arXiv, as it is a comprehensive source for scientific articles in this field.

Action: arxiv
Action Input: Artificial Intelligence[0m[33;1m[1;3mPublished: 2023-04-06
Title: The Governance of Physical Artificial Intelligence
Authors: Yingbo Li, Anamaria-Beatrice Spulber, Yucong Duan
Summary: Physical artificial intelligence can prove to be one of the most important
challenges of the artificial intelligence. The governance of physical
artificial intelligence would define its responsible intelligent application in
the society.

Published: 2020-05-21
Title: Does an artificial intelligence perform market manipulation with its own discretion? -- A genetic algorithm learns in an artificial market simulation
Authors: Takanobu Mizuta
Summary: Who should be charged with responsibility for an artificial intelligence
performing 

{'input': "Search for publications, papers, or news on the topic of 'Artificial Intelligence'. Provide 5 publications on this topic. Provide a list of publications, including the title, authors, and a brief summary of each paper (link if possible).",
 'output': '1. **Title:** The Governance of Physical Artificial Intelligence  \n   **Authors:** Yingbo Li, Anamaria-Beatrice Spulber, Yucong Duan  \n   **Summary:** This paper discusses the governance of physical artificial intelligence, emphasizing its importance as a challenge in AI. It explores how governance can ensure responsible and intelligent application in society.  \n   [Link to paper](https://arxiv.org/abs/2304.006)\n\n2. **Title:** Does an artificial intelligence perform market manipulation with its own discretion? -- A genetic algorithm learns in an artificial market simulation  \n   **Authors:** Takanobu Mizuta  \n   **Summary:** The study investigates whether AI can discover market manipulation strategies through learning in

In [150]:
print(save_response['output'])

1. **Title:** The Governance of Physical Artificial Intelligence  
   **Authors:** Yingbo Li, Anamaria-Beatrice Spulber, Yucong Duan  
   **Summary:** This paper discusses the governance of physical artificial intelligence, emphasizing its importance as a challenge in AI. It explores how governance can ensure responsible and intelligent application in society.  
   [Link to paper](https://arxiv.org/abs/2304.006)

2. **Title:** Does an artificial intelligence perform market manipulation with its own discretion? -- A genetic algorithm learns in an artificial market simulation  
   **Authors:** Takanobu Mizuta  
   **Summary:** The study investigates whether AI can discover market manipulation strategies through learning in an artificial market simulation. It highlights the need for regulation to prevent AI from engaging in market manipulation.  
   [Link to paper](https://arxiv.org/abs/2005.021)

3. **Title:** Impact of Artificial Intelligence on Economic Theory  
   **Authors:** Tshilid

# Conclusions 3

- For the current agent, we connected various tools:
    - ddg-search for searching the internet for various articles on the topic
    - arxiv - a wrapper specifically designed for searching scientific articles in the open database arXiv
- In the initial iterations, free versions of LLMs like Mistral and gemma were used, but they were prone to strong hallucinations, generating unrealistic article titles, inventing authors, and it was very difficult to "force" them to execute requests correctly.
- For the final solution, gpt-4o was connected with a temperature of 0 (because we only need real articles, nothing additional should be invented). It finds articles excellently both in arXiv and on the internet in general; all the information provided by the queries is real and corresponds to reality.
- HOWEVER, there is a problem with arXiv that the link is not fully functional. The last digits after `.` do not correspond to reality (there are 3 instead of 5 as it should be). This may be a problem with the arXiv wrapper, as there is no such issue with generating working links from the internet.
- **Does an artificial intelligence perform market manipulation with its own discretion? -- A genetic algorithm learns in an artificial market simulation** 
    - Real link https://arxiv.org/abs/2005.10488
    - Generated   https://arxiv.org/abs/2005.021

- **The Governance of Physical Artificial Intelligence**  
    - Real link https://arxiv.org/abs/2304.02924 
    - Generated    https://arxiv.org/abs/2304.006

### Task 4: Creating an Assistant Agent for Solving Business Problems

Create an agent that helps solve business analytics tasks. The agent should assist the user in creating a sales forecast for the next year, taking into account the level of inflation and weather conditions. The agent should be able to use Python and browse the internet to obtain current data.

**Steps:**
1. Set up the agent that will work with analytical data provided in text. The user writes

```
We export oranges from Brazil. In 2021 we exported 200 tons, in 2022 - 190 tons, in 2023 - 210 tons, and in 2024 which has not yet ended - 220 tons. Make an estimate of how many oranges we will be able to export in 2025 considering the weather conditions in Brazil and the global demand for oranges based on the economic situation.
```

2. Create a request to the agent that contains a clear task – to provide the result of the business analysis or to state that it cannot do so along with the user's request (it can simply be all in one message).

3. Run the agent and analyze the results. What can be improved?

In [None]:
python_repl = PythonREPL()
python_tool = Tool(
    name="python_repl",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you necessarily should print it out with `print(...)`. Otherwise you won't see the result! It's very important.",
    func=python_repl.run,
)
python_tool.name = "python_interpreter"

In [None]:
with open('creds.json') as file:
  creds = json.load(file)
os.environ["OPENAI_API_KEY"] = creds["OPENAI_API_KEY"]
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

In [121]:
os.environ["SERPAPI_API_KEY"] = creds["SERPAPI_API_KEY"]
tools = load_tools(["serpapi"], llm=llm)
tools.append(python_tool)
prompt = hub.pull("hwchase17/react")



In [None]:
print(prompt)

input_variables=['agent_scratchpad', 'input', 'tool_names', 'tools'] input_types={} partial_variables={} metadata={'lc_hub_owner': 'hwchase17', 'lc_hub_repo': 'react', 'lc_hub_commit_hash': 'd15fe3c426f1c4b3f37c9198853e4a86e20c425ca7f4752ec0c9b0e97ca7ea4d'} template='Answer the following questions as best you can. You have access to the following tools:\n\n{tools}\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'


In [123]:
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [None]:
question = (
    "We export oranges from Brazil. In 2021, we exported 200 tonnes, in 2022 - 190 tonnes, "
    "in 2023 - 210 tonnes, in 2024, which has not yet ended - 220 tonnes. "
    "Make an estimate of how much we can export oranges in 2025, given the weather conditions in Brazil "
    "and the demand for oranges in the world based on the economic situation. "
    "If you cannot perform this analysis, explicitly state that you cannot do so."
)

response = agent_executor.invoke({"input": question})

print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo estimate the export of oranges from Brazil in 2025, I would need to consider several factors, including historical export data, weather conditions in Brazil, and global demand for oranges based on the economic situation. I have historical data for the years 2021 to 2024, but I need to gather information on the current weather conditions in Brazil and the global demand for oranges.

Action: Search
Action Input: "Brazil weather conditions 2024 impact on orange production"[0m[36;1m[1;3mDespite this decrease, Brazil's production is anticipated to recover slightly from the current season, which was the lowest since 1988. Post forecasts orange weight at 160 grams/5.64 ounces in MY 2024/25, three percent lighter than Post's previous estimate of 165 grams for MY 2023/24 due to adverse weather conditions.[0m[32;1m[1;3mThe weather conditions in Brazil have been challenging, leading to a decrease in orange production. However, 

# Conclusions 4

- For this task, two settings were used for the agent to be able to search the internet for information (serpapi) and to be able to write code in Python (PythonREPL).
- Similar to the previous task, only gpt-4o was able to handle the task.
- Our forecast *The estimated export of oranges from Brazil in 2025 is approximately 232 tonnes*, which seems quite logical based on its reasoning. We have weather conditions and tried to take into account economic indicators. Thus, in addition to the direct estimation of the coefficient of how much demand has increased on average, the model also added +2% to the demand based on the information it found on the internet. However, where exactly the 2% came from is not entirely clear; it seems the model made some additional assumptions.
- Overall, its result looks plausible.
- Model improvements:

    - To improve the model, first and foremost, more data and details are needed. It was difficult for me to understand how to account for weather conditions on average over a year (too wide a range between the minimum and maximum); it would be better to have at least export data for each month, and from the month, we would be better able to average the weather conditions and determine how critical they are.
    - It is also possible to clarify which specific economic situations should be focused on, as there are quite a few, and the LLM may accidentally focus on less prioritized indicators during the search. To avoid having to clarify independently, the prompt can be improved and ask the agent to first highlight 2-3 key economic indicators that are most important for the export of oranges, and only after that ask it to consider them in the forecast.