# Simple Arb Bot Project Report
Wanli Zhou         Aug 17 2024

## I. Project Description

The Simple Arb Bot is an exploratory project that shows how an arbitrage implementation monitors asset prices across exchanges on the blockchain, and executes trades for the same pair of asset on different markets to gain a low-risk price-difference profit. 

The project sets up a locally-run fork of the Ethereum blockchain, and simulates real-world trading dynamics through procedurally-generated trades of random sizes based on different trader profiles. Through the implementation, the Simple Arb Bot successfully demonstrates the end-to-end flow of price monitoring, opportunity detection, and on-chain execution, and serves as a valuable eductaional source to explore blockchain interactions and development.

## II. Background and Motivation

Arbitrage is a trading strategy that involves taking advantage of price discrepancies of the same asset in different markets or forms to make a profit without risk. The core idea is to buy low in one market and sell high in another simultaneously, thereby capitalizing on the price difference. 

With the rise of blockchain technology, decentralized exchanges (DEXes) have seen tremendous growth. The top 10 DEXs, including Uniswap, Curve, and PancakeSwap, had a combined trading volume of $66.6 billion in Q2 2023. The price discrepancies of assets across these different DEX markets thus become huge arbitrage opportunities that I'm interested in exploring.

This project aims to develop an automated arbitrage bot for Uniswap V2 DEXes and profit from the price differences of the same asset in different DEX markets. This bot will collect live price of a list of assets from on-chain smart contracts, and make informed decisions about when to execute simultaneous arbitrage transactions. The ultimate goal is to profit from these transactions while contributing to the overall efficiency and balance of DeFi markets by aligning prices across different markets or exchanges. 

### How Arbitrage Works in Decentralized Exchanges
1. **Identification of Price Discrepancy**: The trader identifies an asset that is priced differently in two or more markets.
2. **Simultaneous Transactions**: The trader buys the asset at the lower price in one market and sells it at the higher price in another market simultaneously.
3. **Profit Realization**: The profit is the difference between the buying and selling prices, minus any transaction costs such as blockchain gas fees and exchange costs.

### Major Types of On-Chain DEXes
On-chain exchanges are decentralized platforms that facilitate the trading of cryptocurrencies directly on the blockchain. The two major types of on-chain exchanges are Central Limit Order Book (CLOB) exchanges and Automated Market Maker (AMM) exchanges.

1. Central Limit Order Book (CLOB) Exchanges operate similarly to traditional financial exchanges. They maintain an order book that lists all the buy and sell orders for a particular asset, and trades are executed when buy and sell orders match.
   
   Key features of a CLOB exchange include:
	- **Order Book**: A ledger where all buy and sell orders are recorded.
	  
	  <img src="images/Orderbook.png" alt="ETH-USD Perpetual's Orderbook on dYdX Exchange" width="700"/>
	
	- **Order Matching**: Orders are matched based on price and time priority.
	  
	- **Trade Execution**: Trades occur when buy and sell orders match.
	  
	- **Liquidity Providers**: Users place orders in the order book, providing liquidity.
   

2. Automated Market Maker (AMM) Exchanges use smart contracts to create liquidity pools, where users can trade against the pool rather than matching with another user. Prices are determined by mathematical formulas.

   Key features of an AMM exchange include:
	- **Liquidity Pools**: Pools of tokens provided by liquidity providers (LPs). Each liquidity pool is a trading venue for a pair of tokens.

	  <img src="images/AMM_pool_by_Uniswap.png" alt="Concept of an AMM pool by Uniswap" width="700"/><br>
	
	- **Pricing Algorithm**: Prices are determined by algorithms like the constant product formula (x * y = k).

	  <img src="images/AMM_Swap_by_Uniswap.png" alt="Concept of an AMM Swap by Uniswap" width="700"/><br>
	
	- **Liquidity Providers**: Users provide liquidity to pools and earn fees.
	
<br>
Since prices are discovered in a different way on each model, this creates many arbitrage opportunities between **CLOB x AMM** or even between two different AMMs.

### Factors in a Profitable Arbitrage Transaction
1. **Slippage**<br>
   Slippage refers to the difference between the expected price of a trade and the actual price at which the trade is executed. In the context of an arbitrage trade, slippage refers to the price change between the time an arbitrage opportunity is spotted and the time the arbitrage trade is finalized on-chain. Slippage can occur due to various factors, including low liquidity, high volatility, or delays in transaction processing.
   
   - Liquidity: <br>
      AMMs rely on liquidity pools to facilitate trades. If a liquidity pool has a small amount of an asset, large trades can significantly impact the price, leading to higher slippage. In contrast, larger liquidity pools can absorb bigger trades with minimal price impact, reducing slippage.
     
   - Volatility: <br>
      High volatility in the market can lead to rapid price changes, increasing the likelihood of slippage, increasing the likelihood of slippage.
     
   - Blockchain transaction speed: <br>
      Due to the nature of blockchain, every transaction on-chain participates in an auction, where all users compete to see who will pay more to have their transactions validated first on the blockchain. Users can set a **gas price** for their transaction, where a higher gas price means higher resulting gas cost, but faster processing time of their transaction; a lower gas price means saving on transaction costs, but risking too long to complete the transaction and losing the arbitrage opportunity. Thus, an arbitrageur must always be looking for the best blockchain transaction price, balancing cost and speed.
<br>
<br>

1. **Concurrency**<br>
   Trades must be executed as fast as possible to capture the price difference opportunity of an asset. This means that funds will need to be deposited on both wallets for the different markets so that the bot can do both the buy and sell transactions at the same time.
<br>
<br>

1. **Costs**<br>   
   Profitability of an arbitrage trade also needs to factor in the blockchain gas cost, and the exchange transaction fees.

## III. Product Design

### Simple Arb Bot Workflow
![Simple Arbitrage Bot Architecture Design](images/Simple_Arb_Bot_Architecture.png)

The run the arb bot and capture on-chain opportunities, in the demo implementation there are two core components.
1. Arbitrage Bot
   
    Arbitrage Bot consists of an off-chain `opportunity_analysis` module that 
    
    - sets up the bot with user-provided configurations including slippage, target profitability, and run time

    - fetches token prices from on-chain DEXes including Uniswap and SushiSwap
  
    - analyzes arbitrage opportunities factoring in gas, transaction costs, and price differences

    - sends on-chain transactions to the arbitrage contract
  
    - logs arbitrage trade details on console and in a csv file

2. Sim Traders
   Sim Traders consists of traders that conducts 4 different types of trading scenarios in the `trading_env_sims` module, which is described in details in section **Trading Environment Simulation**  below.
   
   This component sends trades of random values between random token pairs in the two different router contracts to mimic a realistic trading environment. Such random trades are aimed to cause price discrepancies between exchanges and create arbitrage opportunities.

### Strategy Design
With the profitability factors listed in the last section of II. Background and Motivation, the strategy implemented in the arbitrage bot is thus formulated around the following principles:
1. An arbitrage trade will only be executed when the $\text{price difference(\%)}$ based on a user-defined $\text{minimum profitability(\%)}$ is reached.
   
2.  $\text{price difference(\%)}$ is calculated using the following formula:
   $$\text{min. profitability}+\frac{(\text{gas costs + trading fees})}{\text{order value}}+\text{slippage buffer}$$

3. The $\text{slippage buffer}$ is a user-defined percentage enforced on AMM trades, which will fail a trade if the final slippage exceeds the pre-defined percentage.

4. Trades in one arbitrage transaction will be executed only when both trades can be executed under predefined $\text{minimum profitability}$ and $\text{slippage buffer}$, through smart contract enforcement. If any condition causes at least one trade to fail, the transaction is reverted, ensuring that neither trade is executed.

5. Because of the static nature of the Ethereum fork hosted locally for this demo implementation, it's likely that trades might encounter large slippage. In order to reduce revert possbility, for the default bot strategy without user configurations, $\text{minimum profitability}$ is set to be 5%, and $\text{slippage buffer}$ is set to be 10%. 

### Trading Environment Simulation
As the local Ethereum environment is forked from a fixed state, the project uses procedurally generated transactions of random trades to create a trading environment to test and optimize the performance of the strategy implemented.

In order to mimic a realistic trading environment with different trader profiles and liquidity operation scenarios, four different trade simulation bots are created in `trading_env_sims`. Specifically,

1. `trading_env_sims/whale_trading.py`: a whale trader simulator that trades ETH (the gas token on Ethereum) for ERC20 (other non-gas assets) on UniswapV2.
    - The whale account holds ~$300k worth of each baseAsset listed in configs/mainnet.json.
    - Every 5 seconds, the whale trades ETH for another random baseAsset for a random value between $30,000 and $100,000 in a single swap.
<br>
<br>

1. `trading_env_sims/whale_erc20_trading.py` a whale trader simulator that trades ERC20 for ERC20 on UniswapV2:
    - The whale account holds ~$300k worth of each baseAsset listed in configs/mainnet.json.
    - Every 5 seconds, the whale trades a random baseAsset for another random baseAsset for a random value between $30,000 and $100,000 in a single swap.
<br>
<br>

1. `trading_env_sims/regular_trading.py` a regular trader simulator that trades ETH for ERC20 on SushiSwap. 
    - The regular trader account holds ~$15k worth of each baseAsset listed in configs/mainnet.json.
    - Every second, the regular trader trades ETH for another random baseAsset for a random value between $300 and $9,000 in a single swap.
<br>
<br>

1. `trading_env_sims/regular_erc20_trading.py` a regular trader simulator that trades ERC20 for ERC20 on SushiSwap. 
    - The regular trader account holds ~$15k worth of each baseAsset listed in configs/mainnet.json.
    - Every second, the regular trader trades a random baseAsset for another random baseAsset for a random value between $300 and $9,000 in a single swap.
<br>
<br>

The trade simulation bots are run through the shell script `./scripts/simulate_trade_env.sh`.

For future interations, more types of trade simulation bots will be created, covering profiles including:
    - liquidity providers who add and withdraw liquidity
    - competing arbitrage bot

## IV. Code Design 
### UML(before implementation, with changes made in the implementation)
![Simple Arbitrage Bot Architecture Design UML](images/Simple_Arb_Bot_Code_Design.png)

## V. Code Implementation 
GitHub link:
https://github.com/bigbowlz/simple_arb_bot

Please follow the README.md to set up a local trade environment and start a bot instance.

## VI. Bot Performance Evaluation

### Performance Metrics
1. Profitability
    - Net Profit: Calculate the total profits after deducting all costs, including transaction fees, gas fees, and any other operational costs. Net Profit = BalanceAfter - BalanceBefore.
    - Return on Investment (ROI): ROI = (Net Profit / Total Investment) * 100%. This measures the efficiency of the investment. Specifically, total investment refers to all liquidity provided to the arb contract as well as gas. In the actual implementation, as gas is supplied from a test account with unlimited ETH and is unable to factor in, it is omitted in the ROI calculation.
    - Profit per Trade: Average profit per arbitrage opportunity.
1. Success Rate
    - Winning Trades vs. All Trades: The ratio of profitable trades to all trades. Losing trades include both trades that execute on exchanges with a net loss, and trades that fail to execute on exchanges but incur gas costs.
2. Execution Speed
    - Analysis Latency: Time taken to detect and execute an arbitrage opportunity.  
    - On-chain Execution Time: The time between placing the order and the order getting executed, which reflects gas price competitiveness and serves gas optimization reference.
3. Trade Volume: The total volume of trades executed by the bot.
	

### Performance Experiment
To evaluate the performance of the bot, I created 3 bot runs with the same configurations, and recorded the bot performance respectively for an aggregated analysis.

The results of each run is located in the `experiment_logs` directory.

The three runs were conducted with the exact same configurations as below, which can be found in their `arb_bot_config.json` file.
- 500 bps in minimum profit
- 1000 bps in slippage buffer
- and 30 minutes in run time

And the following are the performance analysis based on the `trade_logs_bot.csv` file for each run: 

### Profitability and Success Rate Analysis

In [50]:
import pandas as pd
trade_logs_1 = pd.read_csv('../experiment_logs/08171715/trade_logs_bot.csv')
trade_logs_2 = pd.read_csv('../experiment_logs/08171749/trade_logs_bot.csv')
trade_logs_3 = pd.read_csv('../experiment_logs/08171915/trade_logs_bot.csv')

# get the number of arb trades in each bot run
trade_count_1 = int(len(trade_logs_1))
trade_count_2 = int(len(trade_logs_2))
trade_count_3 = int(len(trade_logs_3))

# get the final net result of each bot run and combine them in one dataframe 
final_result_1_df = trade_logs_1.iloc[-1].to_frame().T
final_result_2_df = trade_logs_2.iloc[-1].to_frame().T
final_result_3_df = trade_logs_3.iloc[-1].to_frame().T
result_combined_df = pd.concat([final_result_1_df, final_result_2_df, final_result_3_df], ignore_index=True)

# get the net_profit, profit_per_trade, net_roi, success_rate, total_trade_volume_usd of the three bot runs
net_result_df = result_combined_df.iloc[:, :5]
net_result_df['trade_count'] = [trade_count_1, trade_count_2, trade_count_3]
average_row = net_result_df.mean()
net_result_df.loc['Average'] = average_row

net_result_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  net_result_df['trade_count'] = [trade_count_1, trade_count_2, trade_count_3]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  net_result_df.loc['Average'] = average_row


Unnamed: 0,net_profit($),profit_per_trade($),net_roi,success_rate,total_trade_volume_usd($),trade_count
0,304882.539391,101627.51,11.77,1.0,601374.323119,3.0
1,302088.947561,60417.79,11.67,1.0,1182458.815662,5.0
2,372608.626829,74521.73,14.39,1.0,1294585.0645,5.0
Average,326526.704593,78855.676667,12.61,1.0,1026139.401093,4.333333


As shown in the output dataframe (and ignore the warning), for half an hour, 
- The average net_profit of the bot is $326,526.7 in US dollars, while average profit per trade reaches close to $80k.
- The net return on investment, which refers to (Net Profit / Total Investment), reaches 1261% on average. It matches the initial investment amount of all sorts of tokens in the bot contract account: $25,895.
- The success rate has been 100% across all three bot runs. 
- The total trade volume on average amounts to $1.03m, and on average there are more than 4 arb trades for each bot instance.
 
The three bot runs are **consistent** in their performance. However, the profits and trade volume **cannot be seen at their face value** for two reasons:

1. In value reporting calculations, the token prices are decided by their off-chain dollar amount. However, as the experiment is conducted in a closed trading environment with just a few trading accounts, the randomness of trades from these simulator accounts could generate dramatic arbitrage opportunities without interference from other participants. For example, along time, it's likely that in the on-chain environment in the experiment, a trader is able to trade 1000 USDT for 1 BTC because of the lack of USDT liquidity in a pool (while the off-chain price is 50,000 USDT for 1 BTC). 

    In a real trading environment, such large opportunities will never exist, because competing bots and other trading participants would have initiated price correction trades way before prices diverge so much across exchanges. 

2. Also, the simulator traders implemented in this experiment didn't consider slippage when generating random trades, meaning that even when they can only get an unreasonably small amount of token B in return with a certain amount of token A in a trade, they would still proceed without worrying about the profit or loss. 

Due to the two reasons mentioned above, the only way to get the profitability performance of the bot based on token prices at off-chain dollar values is to deploy the bot for a production blockchain where real participants correct prices constantly. This attempt will be explored in future iterations after more optimization on the bot.

### Execution Speed Analysis

In [51]:
latency_columns_1 = trade_logs_1[['analysis_latency(ms)', 'on_chain_execution_time(ms)']]
latency_columns_2 = trade_logs_2[['analysis_latency(ms)', 'on_chain_execution_time(ms)']]
latency_columns_3 = trade_logs_3[['analysis_latency(ms)', 'on_chain_execution_time(ms)']]

# get the average latency across trades for each bot run 
average_latency_1 = latency_columns_1.mean()
average_latency_2 = latency_columns_2.mean()
average_latency_3 = latency_columns_3.mean()

# create a df for average latency records of the three bot runs
columns = ['analysis_latency(ms)', 'on_chain_execution_time(ms)']
average_latency_combined_df = pd.DataFrame(columns = columns)
average_latency_combined_df.loc['bot_run_1'] = average_latency_1
average_latency_combined_df.loc['bot_run_2'] = average_latency_2
average_latency_combined_df.loc['bot_run_3'] = average_latency_3

# add an average latency row for the three bot runs
average_latency = average_latency_combined_df.mean()
average_latency_combined_df.loc['Average'] = average_latency

average_latency_combined_df


Unnamed: 0,analysis_latency(ms),on_chain_execution_time(ms)
bot_run_1,7.424593,145.66199
bot_run_2,12.202168,18.310976
bot_run_3,9.334421,18.084717
Average,9.653727,60.685894


As shown in the output chart above, except one anomaly for the on-chain execution time in bot_run_1, all other latency metrics are consistently low in the order of less than 100 ms. 

This is as expected as the `opportunity_analysis` module quering the on-chain tokens prices, as well as the smart contract execution are both conducted in a locally-hosted fast block-generation environment.

The anomaly is potentially due to the initiation of the simulator trades.

### Console Performance Logging 
Apart from the csv logs, during the uptime of the bot, users will be able read from the console the performance logs after every trade, allowing users to manage bot instance in real time.
<img src="images/arb_trade_success_log.png" alt="Simple Arbitrage Bot Architecture Design" width="700"/>


### Industry Performance Reference (not benchmarked)
Due to the business nature of arbitrage bots and time constraint on benchmarking, there are no easily-obtainable open-source implementations that I can find for *production* uses. Thus all of the following performance benchmarks/references are obtained from communications (social media posts or blog announcements) of some arbitrage bots, rather than from actual benchmarking in the same test environment.
1. Profitability

    A reference implementation was able to achieve the following ROI with an initial $20 investment for each asset after 12 hours running in a production blockchain. 
    - weth: 69.58bps
    - wnear: 966.04bps
    - usdt: 124.10bps
    - aurora: 7.26bps
    - atust: 585.36bps
    - pad: 0.00bps
    - usdc: 194.33bps

    With annualization, this would translate to 1300% in Annual Percentage Yield (APY). However, considering the investment size is really small, this high APY would have yielded $1,820 in profits and faces scability challenges.
    Source: https://jamesbachini.com/dex-arbitrage/

    Hummingbot community discussions (Reddit) have shown that Hummingbot users have seen returns raging from 0.1% to 1% per day, which translates to ~200% in APY when compounded daily. However, this is highly dependent on market conditions, strategy, and transaction costs. 

    While my simple arb bot implementation currently achieves 1261% * 24 * 2 * 365 = 22,092,720% in APY, these performance results cannot be compared as the variables and environment were configured entirely differently. 
    
    However, the online references provide some insights into the profitability of arbitrage trade bots: with a small investment, we could usually see really high return that reaches several times the original investment, but scalability is limited in a single strategy, so in future iterations, it's worth combining a set of trading strategies together to respond more dynamically to market changes.

2. Execution Speed
   
    Latency is one of the most important factors in on-chain arbitrage trades, as transactions are subject to attacks when they reside in the mempool. The mempool is a publicly open pool of transactions that are yet to be mined and not finalized on-chain. There are bots specfically targeting the mempool for transactions that are profitable, and conduct Maximal Extractable Value (MEV) attacks to execute trades against them.

    Although my arbitrage bot is able to achieve milisecond performance in the test environment, for real-world on-chain transactions, finality usually takes more than 12 seconds (which is the current block generation time)or even minutes when there's congestion. Through research, I have noticed that there are certain protocols and applications that are built to bypass the mempool during arbitrage trades, resulting in miliseconds in latency as the transactions can be finalized/mined immediately after submission. Such protocols include Flashbots. 

    Other attempts to avoid latency in on-chain finality include designs that batch transactions and match orders without directly exposing them to the public mempool, and Cowswap is a representative application that implemented trade matching.

    

### Next Steps for Benchmarking
There are paid arbitrage bot implementations which I am planning to use for future profitability benchmarking and optimization references, including https://wundertrading.com/en/account/subscription/pricing.

The research findings on arbitrage implementations that try to avoid latency also provide guidance on future research and integrations between my arbitrage bot and protocols like Flashbot and Cowswap to improve on finality speed.

## VII. Toolings and Implementation References

#### Smart Contract development and interactions
**web3.py** - a Python library for interacting with Ethereum.

web3.py is used in the project to help with sending transactions, interacting with smart contracts, reading block data and transaction data, and other utilities functions including converting currencies.

**Hardhat** -  a development environment for Ethereum software. Compile, test, debug and deploy smart contracts. Supporting network forking to simulate real data in a test environment. Tutorial: https://hardhat.org/tutorial

Hardhat is used in the project to host a locally run Ethereum mainnet fork and deploy the arbitrage contract.


#### Existing bot implementations:
HummingBot: open-source framework that helps you design, backtest, and deploy automated trading bots. https://hummingbot.org/

An eductational implementation: https://github.com/jamesbachini/DEX-Arbitrage



## VIII. Learnings

The Simple Arb Bot is a great learning experience that taught me blockchain development basics, honed my python coding skills, and most importantly, trained me to solve problems in all kinds of error scenarios interacting with new coding languages, libraries, and frameworks. 

This is my first CS project that involves blockchain development. Through this project, I explored important toolings in blockchain development like the HardHat development framework and the web3.py Python libraries to interact with blockchain; I also started to code in Solidity and JavaScript, the two most commonly used languages in the blockchain development ecosystem. However, as many of these toolings are recently built, there aren't many experiences shared online for debugging and working around limitations, so figuring out how to solve an issue that derived from these toolings became the biggest trunk of my work in the actual development. This was a tough and time-consuming experience, but it's rewarding because I learned valuable problem-solving skills that will help me with future explorations with any new library or framework. I found that reading the documentation details and the code implementation itself has been helpful in understanding the uses of a library or framework function. I also became more experienced in identifying the root cause of a bug through traceback logs.

Developing a bot that can monitor and interact with multiple exchanges and contracts simultaneously also helped me gain deeper knowledge of how blockchain networks, smart contracts and DEXes work. It gave me a chance to look into the economic and functional designs of top blockchain exchanges, and gain insights into financial concepts like arbitrage, market making, and liquidity provision. Understanding these is valuable not just in blockchain, but also in fintech and traditional finance sectors. While trying to get the contract ABIs from the production Ethereum blockchain, I learned to interact with APIs, which taught me the fundamentals of HTTP protocols. This knowledge is transferable to many areas of software development.

As the project is mostly built in Python, I tried to for the first time employ OOP design principles in the Python language, and modeled the arbitrage bot as a class that allows me to easily access the smart contract functions and blockchain utility features. The choice of encapsulation not only helped with tweaking bot configurations with different setups to optimize performance, but further proved to be helpful in running multiple trader instances for different trade profiles.

There are many limitations in my bot implementation, and the current state of strategy design and code implementation is far from production-ready. However, the current implementation successfully demonstrates the end-to-end flow of an arbitrage bot in a simulated dynamic trading environment, and provides a solid foundation for a production-use bot to be built on top of. In my future iterations, I'm planning to focus on the following to improve the bot:
- diversify trading strategies for better scalibility and opportunity capture in different market conditions
- utilze searching and sorting algorithms for faster arb opportunity capture
- integrate with existing bot frameworks or platforms to improve on latency
- benchmark production bot implementations to understand profitability performance and reverse-engineer their stragegy through on-chain events

By working on a blockchain arbitrage bot project, I gained hands-on experience that ties together theory and practice across multiple areas of computer science, programming, and blockchain technology. I hope to keep exploring the topic of arbitrage and extend the opportunity capture of my bot to more types of decentralized financial applications in future interations such as derivatives and basis trading.