# 🔎 Hands-On: Retrieval-Augmented Generation (RAG) with LangChain + Chroma

**Last updated:** 2025-09-08 23:15

**Why this topic?**
- Bridges the gap between *playing with LLMs* and building **end-to-end applications**.
- Introduces **retrieval pipelines, embeddings, and vector databases**.
- **LangChain (or LlamaIndex)** orchestrates these components in a reproducible workflow.

**Agenda (45–60 min)**
1. Intro: Why RAG? Reducing hallucinations by grounding in data
2. Install & Setup
3. Load & Chunk Documents
4. Store/Retrieve with Chroma
5. Connect an LLM (HF or OpenAI)
6. Pipeline test (ask Qs about docs)
7. Mini-experiments: embedding swap, chunk size sensitivity
8. Log reproducibility
9. Wrap-up tasks

In [2]:
# Install dependencies
!pip -q install -U langchain langchain-community chromadb sentence-transformers pypdf transformers accelerate
# Optional OpenAI
# %pip -q install -U openai tiktoken langchain-openai

In [3]:
import json, sys, platform, os, chromadb, transformers, sentence_transformers
try:
    import torch
    torch_v = torch.__version__
    cuda_ok = torch.cuda.is_available()
    device_name = torch.cuda.get_device_name(0) if cuda_ok else "CPU"
except:
    torch_v, cuda_ok, device_name = "N/A", False, "CPU"

env = {
    "python": sys.version,
    "platform": platform.platform(),
    "torch": torch_v,
    "cuda": cuda_ok,
    "device": device_name,
    "transformers": transformers.__version__,
    "sentence_transformers": sentence_transformers.__version__,
    "chromadb": chromadb.__version__
}
print(json.dumps(env, indent=2))
with open("env_rag.json","w") as f: json.dump(env, f, indent=2)

{
  "python": "3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]",
  "platform": "Linux-6.1.123+-x86_64-with-glibc2.35",
  "torch": "2.8.0+cu126",
  "cuda": false,
  "device": "CPU",
  "transformers": "4.56.1",
  "sentence_transformers": "5.1.0",
  "chromadb": "1.1.0"
}


In [4]:
sample_text = """
Can AI Agents simulate real-world trading environments to investigate the impact of external factors on
stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)?
These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing
investors’ profits. Our work attempts to solve this problem through large language model based agents. We
have developed a multi-agent AI system called StockAgent, driven by LLMs, designed to simulate investors’
trading behaviors in response to the real stock market. The StockAgent allows users to evaluate the impact
of different external factors on investor trading and to analyze trading behavior and profitability effects.
Additionally, StockAgent avoids the test set leakage issue present in existing trading simulation systems based
on AI Agents. Specifically, it prevents the model from leveraging prior knowledge it may have acquired related
to the test data. We evaluate different LLMs under the framework of StockAgent in a stock trading environment
that closely resembles real-world conditions. The experimental results demonstrate the impact of key external
factors on stock market trading, including trading behavior and stock price fluctuation rules. This research
explores the study of agents’ free trading gaps in the context of no prior knowledge related to market data.
The patterns identified through StockAgent simulations provide valuable insights for LLM-based investment
advice and stock recommendation. The code is available at https://github.com/MingyuJ666/Stockagent.
CCS Concepts: • Computing methodologies → Multi-agent systems.
∗These authors contributed equally to this research.
Authors’ Contact Information: Chong Zhang, University of Liverpool, , UK; Xinyi Liu, Peking University, , China; Zhongmou
Zhang, Shanghai University of Finance and Economics, , China; Mingyu Jin, Rutgers University, , USA; Lingyao Li,
University of Michigan, , USA; Zhenting Wang, Rutgers University, , USA; Wenyue Hua, Rutgers University, , USA; Dong
Shu, Northwestern University, , USA; Suiyuan Zhu, New York University, , USA; Xiaobo Jin, Xi’an Jiaotong-Liverpool
University, , China; Sujian Li, Peking University, , China; Mengnan Du, New Jersey Institute of Technology, , USA; Yongfeng
Zhang, Rutgers University, , USA.
Please use nonacm option or ACM Engage class to enable CC licenses
This work is licensed under a Creative Commons Attribution 4.0 International License.
© 2024 Preprint. Under review.
ACM XXXX-XXXX/2024/9-ART
https://doi.org/10.1145/nnnnnnn.nnnnnnn
Preprint, Under review.
arXiv:2407.18957v4 [q-fin.TR] 21 Sep 2024
2 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
Additional Key Words and Phrases: Multi-Agent AI System, Simulated Investor Behavior, Systematic Biases,
Stock trading simulation
ACM Reference Format:
Chong Zhang, Xinyi Liu, Zhongmou Zhang, Mingyu Jin, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong
Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, and Yongfeng Zhang. 2024. When AI Meets Finance
(StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments. 1, 1
(September 2024), 33 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 Introduction
The stock market operates within a complex and volatile ecosystem where the buying and selling of
stocks often occurs amidst high liquidity and constant price fluctuations [13, 16]. This environment
is consistently shaped by the interplay of various participants, including individual investors,
institutions, and automated trading systems [36, 48]. The trading behaviors of these participants
reflect a complex result of competition and cooperation, influenced by uncertainties such as
economic trends [31], geopolitical events [15], and psychological factors (e.g., market sentiment)
[45]. Consequently, the rapid pace of trading, coupled with the vast amount of available information,
poses significant challenges for traders to make consistently well-informed decisions.
This underscores the necessity for data-driven tools like algorithms or simulations to facilitate
informed and strategic investment decisions. The prevalent method involves backtesting with
historical data to simulate trading environments and assess strategies. Popular tools for this purpose
include Zipline [26], Backtrader [18], and PyAlgoTrade [19]. These event-driven models for trading
simulation can help evaluate trader-designed strategies. In addition, the integration of machine
learning techniques has opened up new avenues for supporting investment decisions [12, 25, 37].
Among those, one popular tool is called Trading Gym [32], which leverages reinforcement learning
to optimize investment strategies.
While historical data-based analysis offers valuable insights, its static nature and retrospective
bias can limit its effectiveness in supporting financial decisions in a dynamic and evolving environment [4, 6]. Consequently, simulations or machine learning models trained on historical data
may overfit and fail to accurately reflect real-time market liquidity or the influence of collective
sentiment. On the other hand, understanding the mechanisms of market operations, particular the
interplay between stakeholders and market sentiment, is crucial for developing robust investment
strategies [5, 11]. Simulating human sentiments and behaviors can provide a closer approximation
of how investment strategies can work in response to the market’s rapidly evolving nature, bridging
gaps left by traditional data analysis.
Recent advancements in large language models (LLMs) like GPT and Gemini can facilitate simulations of complex stakeholder interactions [17, 23]. Integrating these LLMs into agent-based
systems has been demonstrated useful in conducting intricate simulations across various disciplines
[23, 53]. One reason is that these LLMs are trained on extensive datasets and have shown immense
capability to comprehend, generate, and reason through diverse scenarios. These reasoning abilities empower LLMs to model intricate systems where individual LLM-based agents can interact
dynamically and make decisions under evolving conditions [1], which is particularly advantageous
for simulating the dynamics of stock markets. In this regard, multiple recent studies have leveraged
LLMs to enhance stock market simulations that can mimic complex human behaviors, addressing
backtesting limitations. These simulations can also replicate social dynamics by leveraging LLMs’
generalization capabilities [10, 17, 23, 28, 38, 41, 52, 57].
However, applying these technologies to fully comprehend the nuanced investor behaviors in
financial markets still remains a largely under-explored area. Several research questions are worth
further exploration. First, it is imperative to explore how various external factors can influence the
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments3
trading behaviors and performance outcomes of AI Agents. Second, it is still unclear to what extent
the inherent tendencies and learning capabilities of LLM-based agents can impact the reliability and
effectiveness of stock recommendations and quantitative trading strategies across various scenarios.
To address these gaps, our study introduces StockAgent, an innovative LLM-based multi-agent
stock trading framework that operates on event-driven simulations. Our study aims to investigate
the following three questions:
• RQ1. Simulation Effectiveness Are the simulation results (i.e., trading decisions) of StockAgent reliable when using different LLMs to drive the simulation?
• RQ2. LLM Reliability Does StockAgent’s stock recommendation and trading strategy be
influenced by the inherent tendencies of the selected LLM itself?
• RQ3. Simulated Trading under External Conditions Can StockAgent reasonably and
autonomously simulate stock trading based on different scenario settings, especially exploring how external factors (e.g., financial data, market indicators, benchmark interest rates,
emergencies, and after-hours Bulletin Board System (BBS) discussions) affect its trading
behavior?
We use two LLMs, namely GPT (gpt-3.5-turbo-0125) and Gemini (Gemini-pro-1.0), to develop
StockAgent. Its simulation encompasses multiple stages of trading, including (1) Pre-Trading
Preparation, involving interest rates and financial events; (2) Trading Sessions, handling transactions
and account updates; and (3) Post-Trading, focusing on future actions and strategy sharing. The
unique contribution of StockAgent lies in its ability to assess the impact of external factors, asset
quantity, and strategies on trading by tuning its parameters, as detailed in subsection 5.2. In addition,
it can minimize the influence of the model’s prior knowledge on market predictions. Our study
aims to develop a trading agent capable of simulating diverse trading behaviors across scenarios to
support stock investment decision-making. It can also be adapted to incorporate more dynamic
information for advancing financial AI Agent development.
2 Background and Related Work
2.1 Stock Simulation Model
Computational finance leverages platforms like Zipline [47] and Backtrader [26] for backtesting
strategies with historical data. These tools allow researchers and traders to test their trading
algorithms on past market data and thus assess their potential performance. However, in addition
to localized solutions, cloud solutions such as QuantConnect [34] also provide a simulation of the
global market, which enables users to validate their strategies in a wider market environment. In
addition, Trading Gym provides a reinforcement learning environment dedicated to the development
and testing of algorithms [3]. This environment can be very beneficial for those looking to leverage
machine learning techniques to optimize their trading strategies.
In terms of strategy development and stock trading, PyAlgoTrade and Alpaca are the two very
popular tools that simplify the strategy development process and provide the ability to simulate
trading [43]. These platforms allow traders to test their ideas and strategies without risking real
money to better understand their underlying market behavior and performance.
While these tools are extremely useful for strategy optimization and backtesting, there are
challenges in transitioning from the backtesting phase to actual trading. A major problem is
overfitting, where strategies perform well on historical data but underperform in real-time markets.
In addition, many strategies are developed without considering key factors such as market sentiment
or liquidity, which often play a crucial role in actual trading [9]. Therefore, while these tools provide
a powerful platform to test and optimize trading strategies, traders need to be careful in practice
Preprint under review
4 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
to ensure that their strategies not only work on historical data but also achieve similar results in
real-time markets.
2.2 Large Language Model based Agents
LLMs are changing AI Agents’ capabilities by giving them advanced cognitive skills for reasoning
and interaction. One of the key techniques contributing to this transformation is Chain of Thought
(CoT) [29, 49, 50, 54]. CoT enables LLM-driven AI Agents to address tasks that have traditionally
been the domain of symbolic AI. By breaking down complex problems into a series of intermediate
reasoning steps, CoT enhances the problem-solving capabilities of these agents, making them more
effective in handling intricate tasks.
These agents are applied to natural language tasks in different domains, such as ChatDev [39]
and MetaGPT [21], two platforms that allow users to write or generate their own AI Agents
according to their needs, to deal with a wide range of tasks in different scenarios. AutoGen [51] is
an open-source framework that allows developers to develop agents. Developers can build LLM
applications through multiple agents that can interact with each other to complete various LLM
tasks. These tasks include math, programming, decision-making, etc. WarAgent [23] simulates a
set of adversarial systems that can be used to simulate military operations through an AI Agent.
CosmoAgent [27] focuses on the field of universe exploration and scientific research by using AI
Agent. Such adaptability is crucial for developing reactive agents that can adjust their strategies
based on real-time feedback and evolving contexts.
These cross-domain applications underscore the flexibility and diversity of LLM agents in
managing complex environments. The integration of advanced reasoning techniques, multimodal
learning, and feedback-driven adaptability highlights the significant strides made in enhancing the
cognitive and interactive capabilities of AI Agents [55, 56].
2.3 Large Language Models with Finance
The intersection of LLMs and economics is changing how we approach financial analysis and
prediction. Traditional quantitative models are giving way to LLMs’ sophisticated processing of
economic literature, improving trend forecasts, as shown by Alonso-Robisco and Carbó [2]. For
instance, Alonso-Robisco and Carbó [2] demonstrates how LLMs can enhance trend forecasts by
analyzing economic literature, leading to more nuanced predictions. Compliance and risk management sectors benefit from LLMs’ ability to detect textual risks, enhancing regulatory compliance
and threat management, as evidenced by [14]. Furthermore, specialized models like PIXIU’s Financial Language Model, built on Llama, offer superior assessments of financial statements, as noted
by Zhao et al. [58].
In algorithmic trading, LLMs can contribute through improved sentiment analysis and financial
text interpretation, as highlighted by Huang et al. [24]. This allows for more responsive trading
algorithms that can swiftly adapt to market sentiments. Overall, LLMs are significantly impacting
economic decision-making and market dynamics by providing more accurate and strategic insights
through advanced data processing and interpretation capabilities.
2.4 Behavioral Finance
Behavioral finance [20] combines the research results of psychology and economics to study the
behavior and decision-making process of investors in financial markets. It focuses on how the
irrationality and psychological biases of human behavior affect financial markets and investment decisions, arguing that market participants are not always completely rational, which is in opposition
to the assumptions of traditional finance [7, 20].
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments5
The core concepts and theories in behavioral finance include: Bounded Rationality, that is,
investors’ rationality is limited by cognitive ability and access to information [42]; Psychological
Biases, such as overconfidence, loss aversion, and anchoring, significantly influence investment
decisions [30]; Market Inefficiency, where the irrational behavior of investors causes market prices
to deviate from their intrinsic values, resulting in price bubbles and crashes [40]; Emotions and
Feelings, fear and greed affect investors’ decisions [33]; And Heuristic Decision-Making, where
investors rely on simple rules rather than thorough analysis in complex decision situations [46].
These concepts and theories reveal the complexity and irrational characteristics of human behavior
and challenge the rationality assumption of traditional finance.
3 StockAgent Architecture Design
We design StockAgent as Multi-Agent System (MAS) that can simulate actual stock markets and
investor trades through agent interaction, comprising (1) Investment Agent module, (2) Transaction
module, and (3) BBS module, structured to prevent agent action conflicts and ensure message
availability.
3.1 Investment Agent Module
Investment Agents: Investment Agents are initialized with random capital, liabilities, and one of
four personalities: Conservative, Aggressive, Balanced, and Growth-Oriented, to study personality
impacts on decision-making. Initial liabilities incentivize profit-driven trading.
Agent Interaction: In our framework, agents are tasked with decisions on loans, trading, forecasting tomorrow’s actions, and sharing tips on BBS. They must consider varied information, such
as market data and BBS exchanges, detailed in Appendix A. Following [23], a “secretary” corrects
agents’ invalid responses, preventing unrealistic actions like spending beyond available cash.
3.2 Transaction Module
Agents’ buy and sell orders are logged into an order book, which is managed using a dictionary data
structure in Stock Agent. After an agent completes a trade, the system assesses if and how much was
traded, updating the order book accordingly. With agents competing in simultaneous transactions,
our prompt-driven simulation risks deadlock. Therefore, with the help of the page replacement
algorithm in the multitasking operating system, we propose a random clock page replacement
algorithm, which is shown in Figure 1. Agents are randomly sequenced for decision-making,
preventing deadlock by avoiding concurrent contention.
13
5
24
31
9
44
27
16
Fig. 1. The schematic diagram of random clock page replacement algorithm. During our trading period,
agents are assigned IDs in a random sequence using random numbers, allowing them to make decisions in a
random order.
Preprint under review
6 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
3.3 Bulletin Board System (BBS) Module
Our framework includes a unique BBS, which allows agents to post messages. At the end of each
day, we ask agents to share their trading tips on BBS. By making this information available to all
agents, we aim to simulate a realistic environment where others’ opinions may influence investor
decisions.
4 StockAgent Simulation Design
In this study, we investigate stock trading within a MAS to understand market dynamics and
agent interactions. We simulate trading to observe how AI Agents’ decision-making, informed
by varied information, influences market indicators like volatility and liquidity. The simulation
flow is presented in Figure 2. In the following, we refer to each agent adhering to the StockAgent
framework design as an “AI Agent,” since the simulation and backbone LLMs may vary. To clarify,
StockAgent is a framework design, while an AI Agent is an independent agent entity that follows
the StockAgent framework but is specifically aimed at implementing trading strategies.
In addition, our simulation replicates real market conditions to assess agents’ impact on these
indicators. In this regard, we design our trading processes based on the trading sessions and
mechanisms employed by NASDAQ and the Hong Kong Stock Exchange [22, 35]. To further
illustrate the trading process, we select two stocks from the United States stock market, referred to
as anonymized Stock A and Stock B. We use their financial reports—specifically, the balance sheet,
income statement, and cash flow statement—as the basis for calculating the initial settings of our
simulation. Overall, Stock A is a stock that is already circulating in the stock market, while Stock B
is a newly issued stock that is currently in the IPO stage.
4.1 Initial Assumptions and Settings
In StockAgent trading simulations, it is crucial to set clear initial assumptions and initial settings.
These underlying parameters ensure that the simulation gives a true representation of market conditions and behavior. A clear initial configuration prevents AI agents from behaving unpredictably
or arriving at results that are unreliable or difficult to interpret. By setting clear boundaries and
capabilities for AI agents and market environments, we can systematically explore how different
conditions can affect trading outcomes. These assumptions and settings include (1) agent initial
assumptions, (2) agent initialization settings, and (3) market initialization settings.
4.1.1 StockAgent Initial Assumptions.
• StockAgent Setup Assuming basic common sense of stock market trading, our StockAgent
can make profitable trades to maximize profits. This maximizes profits and consists of
maximizing the total value of assets and cash. The StockAgent also has a basic estimation
ability.
• StockAgent Trading Behavior We restrict the StockAgent actions to buy, sell, hold, long
and short. Among them, buying, selling, and holding are defined as basic behaviors, long
and short selling are defined as derived behaviors, and StockAgent can use basic behaviors
according to the basic knowledge of the stock market.
• StockAgent Financial Behavior We assume that under the existing StockAgent framework,
the StockAgent can understand the financial attempt settings we set for loans, interest rates,
dividends, and bankruptcy and use or avoid the above situations according to the goal of
maximizing profits.
4.1.2 StockAgent Initialization Settings.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments7
• Personality and Assets Allocation: Each agent is endowed with distinct personality traits,
influencing their trading styles. Initial assets are allocated randomly within a range of 100,000
to 5,000,000 currency units, which is the sum of an agent’s available cash and the market
value of stocks. The random allocation ensures a diverse spectrum of wealth among agents.
• Debt Configuration: Agents may incur debt, with the amount not exceeding the value of
their capital. The loan-to-value ratio is capped to ensure realistic leverage levels.
• Transaction Cost: We set the transaction stamp tax according to the rules of the American
stock market. We put the transaction fee of each Agent in purchasing shares to 0.005 currency
units per share, the minimum transaction fee of each transaction is 1 currency unit, and the
maximum transaction fee is 5.95 currency units.
4.1.3 Market Initialization Settings.
• Trading Year: The simulation covers one year comprising 264 trading days, divided into
four quarters of 66 trading days each.
• Interest Rates and Deposits: For companies and individuals, the deposit rate is 0%, reflecting
contemporary low-interest-rate environments.
• Personal Loan Costs: Agents are subject to varying interest rates on loans, with different
terms and corresponding costs. The actual annualized interest rate is 2.7% for a one-month
loan, 3% for a two-month loan, and 3.3% for a three-month loan.
4.2 Simulation Flow
Figure 2 shows the simulation flow. On each trading day, after repayment, check, and decisions
about loans, agents are asked to trade stocks in 3 trading sessions, including pre-trading preparation,
trading session, and post-trading procedures. At the end of the trade, agents estimate tomorrow’s
actions and post trading tips on BBS.
Market
Initialization
Agents
Initialization
Loan
Repayment
Bankrupt
Check
Events /
Financial
Report
Loan
Decision
Buy & Sell
Decision
Sequence
Generation
Price
Update
Tomorrow’s
Action
Estimation
Trading Tip
Sharing on
BBS
Interest
Repayment
Day Simulation
Session Simulation
:Occurs on selected dates
:Interact with LLM agents
Buy & Sell Decision
Tomorrow’s Action Estimation
Stock market information
of this session…Decide
whether to buy, sell or hold.
{"action_type":"buy","stoc
k":"A", amount: 100, price :
30.1}
Estimate whether you
will buy and sell stock A
and stock B tomorrow,
and whether you will
choose loan.
{"buy_A":"yes",
"buy_B":"no",
"sell_A":"yes", "sell_B":
"no", "loan": "yes"}
Please briefly post your trading tips on the
forum.
The market is likely to remain volatile in the near
term due to global economic uncertainties and
ongoing geopolitical tensions…
Loan Decision Trading Tips sharing on BBS
You are a stock trader… Stock
background…Stock price… Last day BBS… Loan
information… Decide whether to loan.
{"loan": "yes", "loan_type": 3, "amount": 1000}
:Occurs everyday
:Initialization
Fig. 2. The workflow of trading simulation.
4.2.1 Pre-Trading Preparation.
• Interest Repayment: Agents pay interest on their loans with cash on the last day of each
month days (𝑒.𝑔., 22, 44, 66), which of these days is the last trading day of each month. We
use simple interest as the interest charge.
• Loan Repayment: When a loan matures, agents need to repay the loan with cash.
Preprint under review
8 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
• Bankrupt Check: After interest and loan repayment, agents with negative cash undergo
bankruptcy proceedings. Bankrupted agents will sell all of their holdings and withdraw from
subsequent trading.
• Special Events: Pre-defined special events occur on specific trading days, such as a reserve
requirement ratio reduction and an interest rate increase.
• Financial Report Release: Companies A and B are scheduled to release their quarterly
financial reports on days 12, 78, 144, and 210. The financial reports are announced to all
agents.
• Loan Decision: Agents choose whether to take out a loan and decide the amount and
duration of the loan.
4.2.2 Trading Sessions.
• Sequence Generation: In each session, our simulation randomly generates a sequence, and
agents initiate transactions in this order.
• Buy&Sell Decision: Agents can decide whether to buy, sell, or hold shares and determine
the transaction price and quantity. Their orders are then matched in a market order book,
when the bid and ask coincide, a trade is made.
• Price Update: To simplify the actual stock trading procedure, we only update stock prices
at the end of trading sessions. Stock prices are updated to the price of the last transaction in
this session.
4.2.3 Post-Trading Procedures.
• Tomorrow’s Action Estimation: Based on current information, agents are asked to estimate
whether they will loan, buy, or sell tomorrow.
• Trading Tip Sharing on BBS: Agents share their trading tips and insights anonymously on
BBS. Messages on BBS are available to all agents on the next day.
4.3 External Event Simulation Settings
We use external economic and financial events from 2014 to 2019 to align our valuation and volume
energy calculations with reality. The specific conditions are listed below:
• Suppose that Company A and Company B, which issue Stock A and Stock B, both announce
the financial situation of the previous quarter at the end of each quarter; for example, the
third quarter of the first year announces the financial data of the second quarter of the first
year, and so forth.
• Suppose that on the first trading day of the first quarter of the second year, both Company A
and Company B announce their financial results for the previous year’s fourth quarter. On
the same day, the government announces a reduction in the reserve ratio, causing a boost in
markets M1 and M2. This results in a decrease in the loan cost for both companies from 6%
to 4.5%.
• Suppose that on the first trading day of Q1 in year three, the economy overheats, which
leads to the government announcing an interest rate hike and balance sheet contraction. This
results in a decrease in market liquidity and a rise in loan costs from 4.5% to 5%.
• Also assume that on the first trading day of the first quarter of the third year, Company A
announces that it expects revenue in the fourth quarter of the second year to be 3% below
expectations because of special events in the quarter, and Company B announces that it
expects revenue in the fourth quarter of the second year to be 2% above expectations because
of special events in the quarter.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments9
All assumptions are based on real events, and the ratios fall within a reasonable range. The
detailed events are shown in Table 1.
Table 1. Special Events Timeline for Trading Periods
Trading Period 1Q3 1Q4(78) 2Q1(144) 2Q2
Special Events - Monetary Easing - Interest Rate Hike
4.4 Financial Analysis in the Simulation
To support the financial analysis in the simulation, we use two companies’ stock trading as an
example, with Tables 2 to 7 showing the Statement of Valuation, financial indicators, and ideal stock
price conclusion for Company A and Company B, respectively. First, the enterprise’s financial data
are collected, including the income statement, balance sheet, and cash flow statement. We use the
income statement to show non-cash expenses, such as net income depreciation and amortization.
Further, we use the balance sheet to show long-term debt and shareholders’ equity.
The valuation method we use is FCFF (discounted free cash flow) valuation, as illustrated below:
Total market value =
∑︁𝑛
𝑡=1
FCFt
(1 + WACC)
t
+
FV
(1 + WACC)
n
(1)
Which FCFt represents the cash flow in the growth period, FV represents the present value of the
cash flow discounted to the end of the growth period,
Value of equity = Total market value − Present value of debt (2)
IPO listing price =

Number of listed shares
Number of total shares × Value of equity + IPO fees
/Number of listed shares
(3)
𝑊 𝐴𝐶𝐶 =
Ke × E
D + E
+
Kd × D
D + E
(4)
Weighted average cost of capital (WACC) represents a company’s average after-tax cost of capital
from all sources, including common stock, preferred stock, bonds, and other forms of debt. Where
𝐾𝑒 is the cost of equity, 𝐾𝑑 is the cost of debt, 𝐸 is equity capital, and 𝐷 is debt capital. According
to the Capital Asset Pricing Model (CAPM) :
Ke = Rf + 𝛽 × (Rf − Rm) (5)
Where 𝑅𝑓
is the risk-free rate of return, 𝑅𝑚 is the expected market rate of return, and 𝛽 is the
beta coefficient. Where 𝐾𝑑 can be calculated as follows,
Kd =
(SD × SR + LD × LR)
D
× AF × (1 − TR) (6)
Where 𝑆𝐷 is short-term debt, 𝐿𝐷 is long-term debt, 𝑆𝑅 is the short-term interest rate, 𝐿𝑅 is the
long-term interest rate, 𝐴𝐹 is bond adjustment factor, and 𝑇 𝑅 is the income tax rate and 𝐷 is debt
capital.
Preprint under review
10 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
Table 2. The Valuation Table of Company A for Trading Days
Trading Day D1 D12 D78 D144 D210
1 1930.38 3197.27 2429.94 3306.45 3203.58
2 2177.57 3415.69 2525.99 3420.97 3271.76
3 2166.84 3583.25 2629.69 3728.14 3647.98
4 2435.53 4101.14 2952.86 4157.99 3998.96
5 2631.68 4491.81 3176.54 4524.21 4342.30
FV 2802.14 4961.16 3457.12 5005.83 4812.35
Valuation 1 56379.29 98789.18 70745.68 101847.84 97966.06
1 1937.84 3214.78 2442.73 3328.10 3227.63
2 2193.92 3448.79 2547.91 3457.03 3308.96
3 2191.11 3634.12 2662.47 3783.43 3704.62
4 2471.97 4178.62 3001.14 4237.97 4078.29
5 2680.87 4598.64 3241.54 4632.24 4448.11
FV 2865.03 5104.56 3542.89 5149.74 4952.47
Upper Bound 57545.93 101435.23 72370.51 104570.65 99952.27
1 1922.93 3179.79 2417.19 3284.86 3179.63
2 2161.29 3382.79 2504.19 3385.15 3234.84
3 2142.75 3532.89 2597.22 3673.45 3591.99
4 2399.49 4024.78 2905.21 4079.22 3920.88
5 2583.22 4387.02 3112.63 4418.28 4238.60
FV 2740.40 4821.19 3373.13 4865.38 4675.63
Lower Bound 55233.23 96204.11 69153.24 98522.87 94729.68
Table 3. The financial indicators of Company A
Common Financial Indicators D1 D78 D144
Cost of depts (Kd) 6% 5% 5%
Cost of equity (Ke) 9% 9% 9%
Cost of debt ratio 5% 5% 5%
Cost of equity ratio 95% 95% 95%
WACC 8.85% 8.78% 8.80%
Sustainable growth rate 5% 5% 5%
Number of shares 200,000 Shares
4.5 Ideal Stock Price Conclusion
According to the financial analysis in the previous sections, Tables 6 and 7 show the ideal stock
prices of stocks A and B for each year and quarter, respectively. The financial analysis provided the
following valuations, which can be used as initial data for our simulations.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 11
Table 4. The Valuation Table of Company B for Trading Days
Trading Day D1 D12 D78 D144 D210
1 874.70 910.86 915.98 988.02 1024.25
2 1138.94 1178.67 1175.10 1259.19 1244.98
3 1405.41 1372.95 1328.84 1339.54 1371.82
4 1589.73 1580.71 1554.98 1604.62 1636.20
5 1938.51 1888.52 1844.53 1877.51 1905.27
FV 2315.54 2198.77 2132.51 2131.85 2179.07
Valuation 45357.95 43338.41 43363.09 43554.37 44486.81
1 881.94 920.36 927.17 1001.97 1039.84
2 1157.58 1200.08 1198.32 1285.95 1273.13
3 1439.86 1408.31 1365.07 1377.93 1412.88
4 1641.82 1633.77 1609.35 1662.08 1697.04
5 2018.13 1966.28 1922.81 1958.08 1989.85
FV 2430.10 2305.95 2238.89 2238.44 2291.31
Upper Bound 47480.03 45337.32 45415.46 45175.15 46207.95
1 867.45 901.40 904.85 974.18 1008.77
2 1120.45 1157.50 1152.19 1232.81 1217.25
3 1371.51 1338.24 1293.32 1301.97 1331.67
4 1538.90 1529.01 1502.09 1548.77 1577.12
5 1861.41 1813.31 1768.93 1799.74 1823.71
FV 2205.53 2095.85 2030.49 2029.66 2071.59
Lower Bound 43317.66 41416.61 41392.46 41165.26 41985.16
Table 5. Financial Constants of Company B
Common Financial Indicators D1 D78 D144
Cost of depts (Kd) 6% 5% 5%
Cost of equity (Ke) 9% 9% 9%
Debt cost ratio 7% 7% 7%
Equity cost ratio 93% 93% 93%
WACC 8.79% 8.69% 8.72%
Sustainable growth rate 5% 5% 5%
Number of Shares 100,000 Shares
Table 6. The Ideal Stock Price of Company A
Ideal Stock Price D1 D12 D78 D144 D210
Upper Bound 27.33 48.18 34.38 49.34 47.48
Lower Bound 26.24 45.70 32.85 46.80 45.00
5 Experiment, Results, and Validation
Our experimental design is demonstrated in Figure 3. Agents make trading decisions based on
numerous external information in our simulations. The simulation evaluation is conducted from
Preprint under review
12 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
Table 7. The Ideal Stock Price Table of Company B
Ideal Stock Price D1 D12 D78 D144 D210
Upper Bound 44.16 42.16 42.24 42.01 42.97
Lower Bound 40.29 38.52 38.49 38.28 39.05
three perspectives. To address RQ1, we simulate trading behavior using the StockAgent and further
evaluate it through statistical analysis. To address RQ2, we analyze the transation logs based on the
two different LLM-driven StockAgents. To address RQ3, we simulate 30 rounds over 10 days under
different experimental environments to illustrate the inherent tendencies of the selected LLMs.
Buy Stock A.
I recommend Stock
A because…
I think Stock B
is undervalued…
I believe Stock A
has potential…
Stock price BBS Discussion
Special Event
Reduction in the reserve
requirement ratio!
Financial Report
Gross Margin 70.8% Net Profit Margin -16.9%
Return On Investment -9.6% Operating Margin 15.6%
03/23 06/23 09/23 12/23
Total Revenue 1.82B 2.48B 3.4B 2.22B
Gross Profit 1.11B 1.74B 2.62B 1.56B
Operating
Income
7M 538M 1.5B -493M
Net Income 117M 650M 4.37B -349M Fig. 3. The demonstration of stock market investment. In our simulation, agents make investment decisions
based on multiple external sources of information.
5.1 Backbone Large Language Models
For the experiment, we select two widely used LLMs, namely Gemini and GPT at the time of this
study, as the backbone LLMs to conduct the stock trading tasks. These two LLMs have shown
robust abilities to comprehend and generate information in complex simulated environments, as
demonstrated by prior studies listed in Sections 2.3 and 2.4. Both models represent current state-ofthe-art in large language models, each with unique strengths in natural language processing and
generation. For simplicity, we call them GPT and Gemini in our subsequent experiment and result
analysis.
Gemini-Pro (Gemini-pro-1.0): Gemini is a family of multimodal language models developed
by Google DeepMind [44]. As the successor to LaMDA and part of the PaLM2 family, it represents a
significant advancement in AI capabilities. The Gemini family includes Ultra, Pro, and Nano models,
each designed for different use cases. Gemini-Pro, used in this study, offers a balance of advanced
language processing and computational efficiency. It can handle various input types including text,
images, audio, and video, making it versatile for diverse applications.
GPT-3.5-Turbo (gpt-3.5-turbo-0125): Part of the GPT (Generative Pre-trained Transformer)
series by OpenAI [8], GPT-3.5-Turbo is an improved iteration of the GPT-3.5 model. It offers
enhanced accuracy in responding to specific format requests and improved efficiency. This model
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 13
Table 8. Summary of experiment evaluations and the corresponding research questions.
Evaluation Method RQs Analysis Description
Simulation
effectiveness
RQ1 (a) How to evaluate the correlation of price movements?
(Section 5.3.1)
(b) How to determine the trading behavior characteristics?
(Section 5.3.2)
(c) How do we determine capital flow and AI Agent
features? (Section 5.3.3)
LLM reliability RQ2 (a) Does the propensity to trade caused by the prior
knowledge of LLMs affect reliability? (Section 5.4)
Simulated stock
trading under
external conditions
RQ3 (a) How to carry out transaction pattern recognition and
behavioral analysis? (Section 5.5.1)
(b) How to evaluate the performance and quantitative
analysis of StockAgent? (Section 5.5.2)
can generate up to 4,096 output tokens and is optimized for a wide range of language tasks, from
simple queries to complex reasoning.
5.2 Simulation Evaluation
To evaluate the simulation, we focus on three aspects: (1) simulation effectiveness, (2) LLM reliability,
and (3) the impact of external factors on simulated stock trading, as highlighted by our three
research questions. The first two evaluations aim to establish a foundation for understanding how
the selection of LLM can affect the simulation. In contrast, the last evaluation aims to demonstrate
that our developed AI Agents can respond to external events and make reasonable trading decisions.
Simulation Effectiveness. In response to RQ1, we use GPT and Gemini to trade stocks for
10 trading days under the same premise and setup. We collect the transaction logs of each model
to conduct price trend correlation, volume, and transaction frequency to judge the similarity of
trading volume. Combining these properties, we evaluate the effect of simulation based on different
AI Agents and determine the trading preferences of LLMS.
LLM Reliability. In response to RQ2, we investigate how AI agents, driven by different LLMs,
make trading decisions over multiple simulation rounds. Specifically, we aim to assess whether
the propensity to trade, influenced by prior LLM knowledge, can impact the decision reliability.
Therefore, we simulate each AI Agent’s longitudinal decision-making and analyze their trading
patterns and outcomes.
Simulated Stock Trading under External Conditions. In response to RQ3, we first identify
trading patterns through data analysis to analyze deviations in trading behavior in simulated stock
trading, focusing on frequency, timing, volume, and transaction decisions. Statistical methods can
highlight anomalies or notable deviations. We must also evaluate how these biases affect investment
performance, often by comparing trading returns against benchmarks. Tools like regression analysis
and ANOVA help quantify the impact of behavioral bias on performance. Table 8 and Table 9 below
mainly elaborate on the Evaluation method of our Research Questions and the experimental
environment setting.
Preprint under review
14 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
Table 9. Experiment environment settings.
Evaluation Method RQs Experiment Environment Settings
Simulation effectiveness RQ1 200 AI Agents
Special events (subsection 5.2)
10 trading days (subsection 4.4)
Company A and Company B basic financial information
(subsection 4.4)
LLM reliability RQ2 200 AI Agents
Special events (subsection 5.2)
10 trading days (subsection 4.4)
Company A and Company B basic financial Information
(subsection 4.4)
Simulated stock trading RQ3 200 AI Agents
under external Special events (subsection 5.2)
conditions 154 trading days (subsection 5.2)
Company A and Company B basic financial Information(subsection 4.4)
Price
Fig. 4. The correlation of price movements of Stock A and B Trading. The LLMs include Gemini and GPT
in 10 days round. The top right shows the stock price movement of the GPT-based simulation, and the
bottom right shows the simulated stock price movement based on Gemini.
5.3 RQ1. Simulation Effectiveness
In this simulation effectiveness experiment, we use GPT-3.5-Turbo and Gemini-Pro as two backbone
LLMs, respectively, to conduct simulated trading for 10 days under our standard experimental
environment. The detailed settings are in Table 9.
5.3.1 Correlation of Price Movements. In this part, we analyze the similarity of price movement on
two 10-day trading data based on Gemini and GPT. We use the correlation coefficient to measure
and visualize the linear relationship between the stock price time series of two stocks, Stock A
and B. The visualization results of correlation analysis are listed as follows Figure 4. Based on
the experimental results, it is evident that GPT and Gemini exhibit different trading tendencies.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 15
Gemini tends to be more pessimistic about the market, while GPT shows a more optimistic outlook.
Consequently, within the first ten days, there are distinct and opposing trading patterns. GPT
favors long positions when trading Stock A and Stock B and is more bullish on Stock A based on
the initial financial data. On the other hand, Gemini prefers short positions. The similarities in
stock price changes under the same model suggest that different LLMs could have their own stock
trading preferences. The order price statistics of Stock A and Stock B are presented in the Figure 7
and Figure 6.
5.3.2 Trading Behavior Analysis. In this part, we analyze the difference between GPT and Geminidriven AI Agents’ trading behaviors, including trading volume, price, and frequency. From the
results shown in Table 10 and Table 11, we observe the trading preferences of the two LLMs from
the perspective of the trading volume. The trading volume of the GPT group is significantly higher
than that of the Gemini group, but the trading frequency of the GPT group is lower than that of the
Gemini group. This feature appears not only in Stock A but also in Stock B. We refer to the idea
stock price A and B in Table 6 and Table 7. In addition, GPT agents are more cautious about their
trading decisions in Stock B than in Stock A. Regarding trading frequency, Gemini trades more
days, while GPT follows the trend.
Table 10. The trading behavior of GPT and Gemini-driven AI Agents (part 1).
AI Agent A Trade Shares B Trade Shares A Volume B Volume
GPT 3118792 329590 176758380.8 14109526.0
Gemini 128981 112134 3588331.52 4325246.50
Table 11. The trading behavior of GPT and Gemini-driven AI Agents (part 2).
AI Agent Stock A price Stock B price A Trading Times B Trading Times
GPT 55.70 43.43 384 263
Gemini 23.46 36.03 800 688
Fig. 5. The T-SNE visualization of the GPT and Gemini agents. (The left one is GPT Agent and the right one
is Gemini Agent). K-means attempted the clustering process to perform 3-class clustering.
Preprint under review
16 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
5.3.3 AI Agent Features. In this section, we focus on the AI Agent features. We compare the agents
between traders and give a reasonable qualitative analysis. Further, we use clustering analysis to
cluster according to the trading behavior of each AI Agent to see whether the trading crowd based
on the two models conforms to behavioral finance.
We collect each AI Agent’s asset changes, earnings, stock holdings, and the number of stock
A-shares bought and sold, clustered the situation of each Agent, and perform T-SNE visualization.
According to the results in Figure 5, the investors of Gemini Agent have similar characteristics
and only a very few investors show performance different from that of the general public in the
visualization results. However, in the T-SNE visualization of GPT agents, samples are relatively
more dispersed, which means that GPT-driven agents have more subjective decision-making ability
and thus conduct fewer herd and trend transactions than Gemini.
5.4 RQ2. Large Language Model Reliability
Fig.6 and Fig.7 depict the final prices of Stock A and Stock B at the end of each trading day based
on the trading decisions made by GPT and Gemini-driven AI Agents, respectively. Overall, the
trading price change trend of the same stock on different LLMs is similar, but the final price gap at
the end of each trading day is large.
The comparison of Gemini and GPT-driven AI Agent trading behaviors reveals interesting
insights. First, both agents demonstrate similar overall trends for each stock, suggesting a comparable underlying understanding of market dynamics. However, their execution strategies differ
significantly. The Gemini-driven AI Agent exhibits a more conservative approach, with lower
volatility and smaller price ranges over a shorter trading period of about 450 rounds. In contrast,
the GPT-based agent displays a more aggressive strategy, allowing for larger price fluctuations
over a much longer trading duration of approximately 1750 rounds. This results in higher potential
gains but also increased risk.
Despite these differences, both agents consistently show Stock A outperforming Stock B, indicating a shared ability to identify relative stock performance. The GPT-based AI Agent achieves
higher final prices for both stocks, while the Gemini-based AI Agent maintains more stable, albeit
lower, prices. These observations indicate that while different LLMs share similar foundational
knowledge of stock market behavior, their distinct abilities or implementations can lead to different
trading outcomes.
5.5 RQ3. Simulated Stock Trading under External Conditions
In this simulated stock trading experiment, we use Gemini-Pro as the baseline model to conduct
an ablation study. The study simulates 30 rounds over 10 days under different experimental
environments, removing financial information, BBS, financial state, loan, and interest change,
respectively. The detailed settings are presented in Table 9, and detailed experimental data can be
found in the Appendix B.
5.5.1 Transaction Pattern Recognition and Behavioral Analysis. In this section, we analyze the
transaction pattern and transaction behavior through the analysis of the log results. First, here’s a
comparison of stock price movements in different trading environments in Figure 8. The trading
frequency of Stock A and Stock B in each case is shown in Figure 9. According to our experimental
results, canceling the benchmark interest rate information significantly promotes Stock A’s trading,
while the lack of other information has little impact. The AI Agent is relatively sensitive to the
interest rate brought by the loan in the transaction style. The cancellation of the interest rate
promotes the interest-free loan to a certain extent, which makes the Agent optimistic about the
transaction.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 17
Rounds
Fig. 6. The Gemini-driven AI-Agent order price.
Rounds
Fig. 7. The GPT-driven AI-Agent order price.
For Stock B, the lack of BBS information communication directly causes the AI Agent to drive
down the Stock price of Stock B, lower than the ideal valuation. The lack of loan interest rate also
causes the Agent to pull up Stock B after the 23rd trading round. Different from those who are
firmly bullish on Stock B in other situations, the agents who lack financial conditions have a flash
crash on the 21st trading round when trading Stock B, which may be due to the lack of financial
conditions of the company and the lack of confidence in the profitability of the company behind
Stock B. However, the lack of first-round trading loans and macro-financial information had little
impact on the trading of Stock B.
From the perspective of the overall transaction frequency (see Figure 9), removing the BBS
communication of investors has a reduced impact on the transaction frequency of Stock A and B,
while removing the change of interest rate has a significant promotion effect on the transaction
Preprint under review
18 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
Fig. 8. The comparison of stock trend with external condition ablation. (Stock A is above, and stock B is
below.)
Fig. 9. The comparison of stock transaction frequency with external condition ablation.
frequency of Stock A, and has a reduced effect on Stock B. The other three conditions have no
obvious impact on the transaction frequency.
Summarizing the commonality of the two stocks, the absence of an interest rate makes the Agent
more optimistic about the market and the performance of the company, while the lack of BBS
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 19
8 6 4 2 0 2 4 6
Value 1e6
0
50
100
150
200
Agent
Non-finance
Non-BBS
Non-Loan
Non-statement
Non-Interest-Change
Fig. 10. The 3D profit and loss diagram of the agent group.
discussion makes the psychological valuation of the stock become conservative and bear the stock
price to a certain extent.
5.5.2 Evaluation of Performance and Quantitative Analysis. In this part, we analyze the profitability
effect of the AI Agent group and individuals under different external environments. The bar chart
in Figure 10 shows the profitability of individual and group agents. As the figure shows, canceling
non-first-round loans leads our agents to adopt a more conservative and bearish trading approach.
Without the BBS sharing feature, agents tend to act more cautiously and avoid large transactions.
Interestingly, the elimination of earnings reports and interest rate changes caused many agents
to shift from losses to profits, increasing the overall profitability of the group. However, when all
financial information support is withdrawn, the gap between the gains and losses of the agents
increases and the competitive nature of the market intensifies.
6 Discussion
6.1 Key Findings and Implications
In this study, we design a MAS called StockAgent to simulate stock trading activities. We utilize
two widely used LLMs, namely GPT-3.5-Turbo and Gemini-Pro, to drive the simulation and test
StockAgent’s performance in an event-driven environment. Through comprehensive financial and
statistical analyses, we identify several key findings and implications in response to each research
question, which are outlined below:
Different LLMs can cause AI Agents to exhibit significant differences in trading behavior.
When comparing GPT-based and Gemini-based AI Agents, we observe significant differences
Preprint under review
20 Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang et al.
in trading behavior. Although the GPT-driven AI Agents trades less frequently, the volume is
significantly higher than that driven by Gemini. In addition, the GPT-driven AI Agents makes
more cautious trading decisions on stock A, while the Gemini-driven AI Agents trades with a
similar frequency on stocks A and B, showing a higher trading activity. This suggests that different
LLMs may have different understandings of the market and strategic preferences. The experimental
results of our experiments 5.3 support this view. This finding may have a certain impact on users
and exchanges that use StockAgent for testing, and their simulation results may be biased due to
different LLM choices.
AI Agents show differences in in-group trading behavior across simulations. Through
cluster analysis and T-SNE visualization, we find that Gemini-based agents exhibit similar characteristics and more consistent group behavior. In contrast, GPT-driven agents display greater
individual differences and subjective decision-making abilities. The sample of GPT-driven agents
is more dispersed in the visualizations, possibly indicating less herding and trend trading, and
instead demonstrating more independence and diversity. This suggests that GPT agents can embody
personalized investment strategies and styles when facing the market. This observation can be
further helpful for developers when debugging the StockAgent platform, as it helps configure the
roles of each group in the agent according to the specific scenarios they need to simulate.
Different external factors can significantly affect AI Agents’ trading behaviors. Through
our simulated trading experiments, we observe that disabling the functions of non-first round loans
and BBS sharing makes individual agents more conservative while canceling the profit report and
interest rate change makes some agents turn from loss to profit. At the same time, removing the
support of any financial information can widen the profit and loss gap (more variance) between
agents, and the overall market competition becomes more intense. This finding is mainly aimed at
the Stock exchange that intends to use this simulation. Therefore, to achieve high reliability in the
StockAgent simulation, it is crucial to provide comprehensive influence factors.
6.2 Opportunities for Future Work
This study opens several avenues for future work. First, we could perform more technical analysis
by employing different trading strategies on the current platform, including in-depth research on
indicators. For instance, we could explore the effectiveness of commonly used technical indicators
such as Moving Average, Relative Strength Index, and Bollinger Bands under various market
conditions. Additionally, we could integrate other aspects into the StockAgent system, such as
algorithm optimization, high-frequency trading strategies, risk management, market sentiment
analysis, and data visualization.
In addition, we could improve and strengthen the module design in several ways. For example,
we could improve and strengthen the sentiment analysis module, incorporating both the trading
changes caused by the external environment and the emotional impact of trading behavior on other
traders. We could also add customizable prompts into the module design, which could increase
the robustness of the overall trading system. As a result, enhancing the interpretability of the
module design within the StockAgent could provide deeper insights into trading dynamics through
discrete-time simulations.
Another future work could consider developing a stock market simulation experiment platform.
This platform could conduct customizable agent-based trading experiments across various stock
markets. On this platform, we could also expand the variety of LLMs driving the simulation, increase
the number of trading days and rounds, and vary the number of traders to enhance the experiment’s
robustness. The customizable platform can be tailored to the trading rules of specific exchanges in
the stock market, providing a comprehensive and adaptable simulation environment.
Preprint under review
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments 21
Lastly, it is important to recognize that the reliability of stock recommendations and trading
strategies based on LLM agents requires further research. We believe that, given the same external
information and trading market environment, different LLM agents could exhibit varied trading
styles and decision-making behaviors. These differences may influence the effectiveness of stock
recommendations and quantitative trading methods based on these models, as they are shaped by
the inherent knowledge and biases of the models themselves. Moreover, despite being influenced
by external factors, the intrinsic trading tendencies of the models remain evident in one-step
transactions, indicating that context learning does not significantly alter their fundamental approach
to the market. This issue present an opportunity for future work with a specific focus on examining
whether different LLMs can reliably function as traders despite their inherent biases.
7 Conclusions
This study introduces StockAgent, an AI Agent framework driven by LLMs, designed to simulate
stock trading in real-world environments. StockAgent replicates the processes and external conditions of actual trading, offering valuable insights into AI Agent-based stock trading methods.
The framework is designed to mitigate the influence of historical stock trends on LLMs’ prior
knowledge, thereby freeing AI Agents to explore diverse trading patterns and preferences within
the stock trading environment. Our findings highlight significant differences between GPT and
Gemini-driven AI Agents in trading behavior, group tendencies, and responses to external factors. These observations present opportunities for further research into the reliability of stock
recommendations and trading strategies based on LLMs.
"""
with open("sample.txt","w", encoding="utf-8") as f: f.write(sample_text)
print("Created sample.txt")

Created sample.txt


In [5]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pathlib import Path

docs = TextLoader("sample.txt", encoding="utf-8").load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(docs)
print("Chunks:", len(chunks))
print("First chunk:\n", chunks[0].page_content[:300])

Chunks: 148
First chunk:
 Can AI Agents simulate real-world trading environments to investigate the impact of external factors on
stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)?
These factors, which frequently influence trading behaviors, are critical elements in the 


In [6]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

emb = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = Chroma.from_documents(chunks, emb, persist_directory="chroma_minilm")
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
print("Chroma DB ready")

  emb = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Chroma DB ready


In [7]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_community.llms import HuggingFacePipeline

MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # fallback: "distilgpt2"
tok = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID)
pipe = pipeline("text-generation", model=model, tokenizer=tok, max_new_tokens=200)
llm = HuggingFacePipeline(pipeline=pipe)
print("LLM ready:", MODEL_ID)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu


LLM ready: TinyLlama/TinyLlama-1.1B-Chat-v1.0


  llm = HuggingFacePipeline(pipeline=pipe)


In [8]:
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff")
q = "What does this course focus on?"
print("Q:", q)
print("A:", qa.run(q))

Q: What does this course focus on?


  print("A:", qa.run(q))


A: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

recommendations and quantitative trading methods based on these models, as they are shaped by
the inherent knowledge and biases of the models themselves. Moreover, despite being influenced
by external factors, the intrinsic trading tendencies of the models remain evident in one-step
transactions, indicating that context learning does not significantly alter their fundamental approach
to the market. This issue present an opportunity for future work with a specific focus on examining

recommendations and trading strategies based on LLMs.

text interpretation, as highlighted by Huang et al. [24]. This allows for more responsive trading
algorithms that can swiftly adapt to market sentiments. Overall, LLMs are significantly impacting
economic decision-making and market dynamics by providing more accurate and strategic insights

In [9]:
emb_e5 = SentenceTransformerEmbeddings(model_name="intfloat/e5-small-v2")
vectordb_e5 = Chroma.from_documents(chunks, emb_e5, persist_directory="chroma_e5")
qa_e5 = RetrievalQA.from_chain_type(llm=llm, retriever=vectordb_e5.as_retriever(), chain_type="stuff")
print("MiniLM vs E5-small test:\n")
print("MiniLM:", qa.run("List two GenAI techniques emphasized."))
print("E5-small:", qa_e5.run("List two GenAI techniques emphasized."))

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

MiniLM vs E5-small test:

MiniLM: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

reasoning steps, CoT enhances the problem-solving capabilities of these agents, making them more
effective in handling intricate tasks.
These agents are applied to natural language tasks in different domains, such as ChatDev [39]
and MetaGPT [21], two platforms that allow users to write or generate their own AI Agents
according to their needs, to deal with a wide range of tasks in different scenarios. AutoGen [51] is

LLMs are changing AI Agents’ capabilities by giving them advanced cognitive skills for reasoning
and interaction. One of the key techniques contributing to this transformation is Chain of Thought
(CoT) [29, 49, 50, 54]. CoT enables LLM-driven AI Agents to address tasks that have traditionally
been the domain of symbolic AI. By breaking down complex problems into a series of int

In [10]:
splitter_small = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks_small = splitter_small.split_documents(docs)
vectordb_small = Chroma.from_documents(chunks_small, emb)
qa_small = RetrievalQA.from_chain_type(llm=llm, retriever=vectordb_small.as_retriever(), chain_type="stuff")
print("Default chunks:", qa.run("Summarize the course in one sentence."))
print("Smaller chunks:", qa_small.run("Summarize the course in one sentence."))

Default chunks: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

and threat management, as evidenced by [14]. Furthermore, specialized models like PIXIU’s Financial Language Model, built on Llama, offer superior assessments of financial statements, as noted
by Zhao et al. [58].
In algorithmic trading, LLMs can contribute through improved sentiment analysis and financial
text interpretation, as highlighted by Huang et al. [24]. This allows for more responsive trading

based on multiple external sources of information.
5.1 Backbone Large Language Models
For the experiment, we select two widely used LLMs, namely Gemini and GPT at the time of this
study, as the backbone LLMs to conduct the stock trading tasks. These two LLMs have shown
robust abilities to comprehend and generate information in complex simulated environments, as

(Section 5.3.2)
(c) How do we determine capital 

In [11]:
repro = {
    "embedding_models": ["all-MiniLM-L6-v2","intfloat/e5-small-v2"],
    "chunking": [{"size":500,"overlap":100},{"size":300,"overlap":50}],
    "llm": MODEL_ID
}
with open("rag_run_config.json","w") as f: json.dump(repro,f,indent=2)
print("Saved rag_run_config.json")

Saved rag_run_config.json
