# Zeth cadCAD simulations

---------

- **Author:** Antoine Rondelet (ar@clearmatics.com)
- **Version:** 0.2.0

---------

In this notebook, we will simulate the evolution of a simplified blockchain system on which Zeth is deployed in order to better educate the choice of the various protocol parameters (number of input notes, number of output notes, depth of the Merkle tree etc). This notebook can be used for further experiments (using parameters which are not yet documented).

This notebook is focused around some key questions regarding the blockchain state growth under different configurations. This is particularly important since the growth of the blockchain state is a key factor that impacts the number of nodes on the distributed system (it drives the HW requirements for existing nodes on the network as well as affects how easy it is for new nodes to join the network (i.e. sync a new node)).
In other words, as the number of nodes on a blockchain "boils down to convenience", we are interested to see how convenient (easy/fast/cheap) it is to validate on a blockchain under various network assumptions. Studying the state growth provides valuable hints with that regard. Nevertheless, the reader is reminded that, by the very essence of modeling, we make several simplifying assumptions in the section below that will thus ignore various aspects of a "real-life" running system.

------------------

<em><center>"All models are wrong, but some are useful"</center></em>
<center>George E. P. Box.</center>

------------------

Hopefully this one is somewhat useful...

## Open questions

This notebook is structured around some key open questions that aim to better understand the impact of Zeth on a blockchain system. Likewise, in a future work, we will be investigasting how well the privacy-preserving scalability solutions Zecale performs in term of both data compression and TPS.

### Question 1

After how much time does the Zeth merkle tree become full for a given Merkle tree depth?
- Assumption: all blocks mined are full and only made of Zeth transactions

### Question 2

How does the chain state size compare when only Zeth transactions are used, as opposed to the case where only "plain" EOA-to-EOA Ethereum transactions are used?

### Question 3

What is the gas cost per byte for EoA-to-EoA transactions and for Zeth transactions?

### Question 4

After how much time does the chain data become higher that 1TB? (1TB is the max storage of the latest XPS-15 laptop. We use this threshold as an indicator to track after how much time running a node becomes inconvenient and requires some "specialized" HW)

### Question 5

What is the impact of Zeth on the TPS of the system?
- Assumption: all blocks mined are full and only made of Zeth transactions

### Question 6

How well does Zecale compress the state (compared to "vanilla Zeth")? Furthermore:
- How is TPS impacted when batching Zeth transactions with Zecale?
- Are data compression and TPS moving in the same direction?

**This question is answered in another notebook dedicated to Zecale**

## Setup code dependencies

In [None]:
# cadCAD configuration modules
from cadCAD.configuration.utils import config_sim
from cadCAD.configuration import Experiment
from cadCAD import configs

# cadCAD simulation engine modules
from cadCAD.engine import ExecutionMode, ExecutionContext
from cadCAD.engine import Executor

# Analysis and plotting modules
import pandas as pd
import plotly
pd.options.plotting.backend = "plotly"

# Misc
## Pretty print function
from pprint import pprint
# Numpy
import numpy as np

## Model

### Zeth state

We assume that the Zeth state ($\zeta_z$) is only made of the following:

- Zeth Merkle tree, simply modelled as a set of leaves nodes, i.e. Merkle tree Leaves Set (denoted $\mathtt{MKLS}$). The full tree with intermediate nodes can be recovered by recursive hashing from the tree leaves.
- Nullifiers set (denoted $\mathtt{NS}$)
- Roots set (denoted $\mathtt{RS}$)

Importantly, we do not account for the storage cost of the Zeth contracts (one time operation carried out at initialization time) and their various storage **constants** (i.e. constant protocol parameters) etc.

### Blockchain state

We assume that the blockchain state ($\zeta_b$) is *only* made of:
- The chain of block headers
- The chain of block bodies (the set of transactions)
- The chain of receipts (past transaction results and contract logs)

### Further assumptions

*Some of these assumptions are not strictly necessary, but that's helpful to make them for now, to further simplify the system and remove any potential unexpected moving pieces*

1. We assume that all transactions emitted are eventually mined. More precisely, we assume:
    - no network failures (no messages are dropped/lost)
    - miners have unbounded memory (no need to drop transactions from the pool)
    - miners mine transactions in the order they receive them (no censorship etc)
2. We only consider two types of blockchain transactions:
    - plain "EOA-to-EOA" transactions with no extra `data`
    - Zeth transactions
3. We assume the number of accounts is fixed throughout the simulation (for now we simply reason in term of number of transactions, without bothering about the account from which they come from).
4. We model the blockchain as a mere chain of blocks (no forks, no ommer blocks etc) that are made of a header and a list of transactions.
5. We assume that all blockchain related configuration parameters are fixed (fixed block gas limit etc.)
6. We assume that all Zeth contracts are already deployed (no deployment cost (size/storage-wise and gas-wise) to take into consideration).
7. We assume that the blockchain state is stored by the client in a database which supports automatic data compression:
    - We assume that compression is instantaneous
    - We assume a fixed compression ratio on the stored state as plain text data
    - We ignore the potential overhead associated with accessing values associated with hashes on disk
    - We assume that the state fits entirely in memory

<u>Note:</u> At the time of writing, the [go-ethereum](https://github.com/ethereum/go-ethereum) client uses the [LevelDB database](https://github.com/google/leveldb) which compresses with [Snappy](https://github.com/google/snappy). Other databases may be used by other clients however. For instance, the [openethereum client](https://github.com/openethereum/openethereum/blob/v3.2.0/bin/oe/db/rocksdb/mod.rs) uses [Rocksdb](https://rocksdb.org/docs/getting-started.html) which compression can be further configured to use [lz4](https://github.com/lz4/lz4) for instance (though [Snappy is kept as default](https://github.com/facebook/rocksdb/wiki/Compression)). See also the documentation of [Turbo-Geth](https://github.com/ledgerwatch/turbo-geth/blob/v2021.02.04/docs/programmers_guide/db_walkthrough.MD) which proposes an alternative to go-ethereum to organise the persistent data in its database.

### Constants

Below are the parameters that remain constant across simulations.

- The size (in bytes) of a plain Ethereum transaction is denoted by $\mathtt{ETHTXSIZE}$
- The intrinsic/default gas cost of an Ethereum transaction ("plain EOA to EOA" transaction) is denoted by $\mathtt{DGAS}$
- The gas cost for the set of supported pre-compiled contracts is denoted by $\mathtt{PRECOMPILED\_CURVES}$
- The database compression ratio is denoted by $\mathtt{COMPRESSION\_RATIO}$

### Variable parameters

Below are the parameters that may change across (and during) simulations.

#### Zeth parameters

- We denote the curve used by Zeth as $\mathtt{ZETHCRV} \in \{\mathtt{BN254}, \mathtt{BLS12-377}\}$ (will determine which precompiled sets we need to use to do the Zeth state transition (proof verification), which will educate on the expansiveness of the state transition)
- Merkle tree depth $\mathtt{MDEPTH}$ (i.e. |$\mathtt{MKLS}| \leq 2^\mathtt{MDEPTH}$)
- The number of Zeth input notes $\mathtt{JSIN}$
- The number of Zeth output notes $\mathtt{JSOUT}$
- The size (in bytes) of the Zeth Mix transaction is denoted by $\mathtt{ZETHTXSIZE}$.
- The number of Zeth Mix inputs is denoted by $\mathtt{ZETHINPSIZE} = 1 + \mathtt{JSOUT} + \mathtt{JSIN} + 1 + \mathtt{JSIN} + 1$ (MK root + JSOUT commitments + JSIN nullifiers + h_sig + JSIN h_i tags + residual_bits)
- The gas cost of a Zeth Mix call is denoted by $\mathtt{ZETHGCOST}$

#### Blockchain parameters

- We denote the blockchain by $\mathcal{B}$ and see it as a mere chain of blocks
- We denote the block gas limit as $\mathtt{BGLIM}$ (important to know how many Zeth transactions can fit into a block)
    - **TODO:** Consider treating $\mathtt{BGLIM}$ as an "elastic" param/variable (instead of a constant) as in EIP1559.
- We denote the block production time target as $\mathtt{BTIMETRGT}$
- (Optional) We denote the block production time lag $\mathtt{BTIMELAG}$ (randomly selected in a time window to account for potential delays due to PoW and/or due to network latency in block propagation)
    - For now, $\mathtt{BTIMELAG} = 0$
    - **TODO:** Consider adding a randomized block production lag in future iterations of the model in order to use Monte Carlo executions.
- We denote the block production time as $\mathtt{BTIME} = \mathtt{BTIMETRGT} + \mathtt{BTIMELAG}$
- The size (in bytes) of a block is denoted by $\mathtt{BLKSIZE}$

### Initial state

These are the constants initialization values that do not vary across executions

- $\mathtt{MKLS} = \emptyset$
- $\mathtt{NS} = \emptyset$
- $\mathtt{RS} = \emptyset$
- $\mathcal{B}$ = $\mathcal{B}_{genesis}$ (The chain is instantiated, an empty genesis block is mined)

### State transition

Each Zeth transaction mined adds:
- $\mathtt{JSOUT}$ leaves to the set $\mathtt{MKLS}$
- $\mathtt{JSIN}$ nullifiers to the set $\mathtt{NS}$
- $\mathtt{JSIN}$ roots to the set $\mathtt{RS}$
- $\lfloor \frac{\mathtt{BGLIM}}{\mathtt{ZETHGCOST}} \rfloor$ new transactions to the blockchain

## Modeling Zeth with different protocol parameters

We first start by tracking the blockchain state growth when only plain "EoA-to-EoA" transactions are carried out. Then, we model Zeth with different protocol parameters to see how the blockchain state size grows under different conditions, as well as track the rate at which the Merkle tree of Zeth notes commitments is filled.

We use A/B testing and "Parameters Sweep" simulations to study the state growth under different blockchain configurations (block gas limit etc.) and Zeth configurations (Merkle tree depth, JSIN/JSOUT etc.):
- **Simulation A:** Only "plain" (with no extra `data`) EoA-to-EoA transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations.
- **Simulation B:** Only Zeth transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations. The Zeth configuration tested is:
    - Merkle tree depth = 32
    - JSIN = JSOUT = 2
    - Curve = BN254
- **Simulation C:** Only Zeth transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations. The Zeth configuration tested is:
    - Merkle tree depth = 32
    - JSIN = JSOUT = 2
    - Curve = BLS12_377
    
All these simulations are deterministic (no MC runs) and **represent 24h worth of data**. Since no random runs are employed, the simulation results can be cached into a file to avoid multiple (expensive) runs of the model's simulations.

### Simulation dataset

Before pursuing with the simulation, it is worth clarifying how the input dataset has been obtained.

Ideally, in order to determine the gas cost of a state transition, one may want to use the blockchain network's gas table along with the set of opcodes defining the state transition in order to come up with a deterministic formula that computes the cost of the smart-contract call. However, such approach is not sufficient to properly determine the cost of a state transition, since several opcodes (such as `SSTORE`) have different costs depending on the smart-contract's state (i.e. depending if empty storage slots are initiliazed or simply re-written).
As a consequence, and to ease the process, the following data (transactions gas cost and byte-size) are obtained via empirical experiments, during which a set of transactions are fired on a test network. The results below are obtained via the arithmetic mean of a simulation's results.
Importantly, certain Zeth configurations (i.e. certain curve selections: `BLS12_377` and `BW6_761`) necessitate extensions to the EVM in order to support curve operations (point addition, scalar multiplication) and pairings for remarkable pairing groups. As such, Zeth related simulations have been carried out on an [extended version of ganache-cli](https://github.com/clearmatics/ganache-cli).

We use some Ethereum mainnet data as basis to determine values for the blockchain-related variables and constants.

In [None]:
# Size of a "standard" raw (i.e. rlp encoded) EoA-to-EoA transaction (in bytes)
# Here, we assume that no extra `data` is set in the transaction. We obtain this value
# by taking the arithmetic mean of the size of a few "plain" EoA-to-EoA transactions
# (i.e. without additional `data`).
ETHTXSIZE = 111

# See https://github.com/ethereum/go-ethereum/blob/v1.10.1/core/types/receipt.go#L48
# and
# https://github.com/ethereum/go-ethereum/blob/v1.10.1/core/types/receipt.go#L92-L97
# Receipts for succesful EoA-to-EoA transactions will be of the form:
# ["0x5208520852085208",[],"0x1"], leading to RLP encodings of the form:
# 0xcb885208520852085208c001, which are 12 bytes long.
ETH_RECEIPT_SIZE = 12

# Approximate size of an Ethereum block header (in bytes)
# See: https://ethereum.github.io/yellowpaper/paper.pdf and
# https://github.com/ethereum/go-ethereum/blob/v1.10.1/core/types/bloom9.go#L32-L38
# for reference
BLOCKHEADERSIZE = 508 # 32 + 32 + 20 + 32 + 32 + 32 + 256 + 32 + 32 + 8

# Intrinsic gas cost of a transaction
DGAS = 21000

# Compression ratio for Snappy on the chain state
# See: https://github.com/google/snappy#performance
COMPRESSION_RATIO = 1.5

# The block gas limit and block time below are obtained as the median
# of the "Value" column from the following datasets provided by Etherscan.io:
# - https://etherscan.io/chart/blocktime (exported on 11/03/2021 into a file named `export-BlockTime.csv`)
# - https://etherscan.io/chart/gaslimit (exported on 11/03/2021 into a file named `export-GasLimit.csv`)
# More precisely, the values were obtained by running the following commands:
# ```python
# csv_result = pd.read_csv('export-GasLimit.csv')
# MEDIAN_BLOCKGASLIMIT = math.ceil(csv_result["Value"].median())
#
# csv_result = pd.read_csv('export-BlockTime.csv')
# MEDIAN_BLOCKTIME = math.ceil(csv_result["Value"].median())
# ```
MAINNET_MEDIAN_BLOCKGASLIMIT = 7996822
MAINNET_MEDIAN_BLOCKTIME = 15

PRECOMPILED_CURVES = {}
# See Ethereum Istanbul gas table:
# https://github.com/ethereum/go-ethereum/blob/master/params/protocol_params.go
PRECOMPILED_CURVES['BN254'] = {'ECADDCOST': 150, 'ECMULCOST': 6000, 'ECPAIRBASECOST': 45000, 'ECPAIRPERPOINTCOST': 34000}
# Gas table extension obtained during the early Zecale simulations (June 2020).
# WARNING: The values of the parameters below need to be refined (new software benchmarks need to be carried out.)
PRECOMPILED_CURVES['BLS12_377'] = {'ECADDCOST': 300, 'ECMULCOST': 12000, 'ECPAIRBASECOST': 90000, 'ECPAIRPERPOINTCOST': 68000}

pprint(PRECOMPILED_CURVES)

In [None]:
#########################
# Benchmark dataset     #
#########################

# Data obtained via the `singleton_deterministic_agent.sh` Zeth script
# ran under different Zeth configurations. The data used below is obtained
# as the arithmetic mean of all the transactions fired by the bot script above
# on a local testnet (and on the Autonity Bakerloo Testnet).

# Note: In order to have a more flexible set of simulation scripts to use against
# our local test network (without additional tooling), it is desirable for the issue
# https://github.com/trufflesuite/ganache-core/issues/135
# to be tackled and integrated into clearmatics/ganache-cli.
# (`eth_getRawTransactionByHash` is already available in Geth and Autonity).

#########
# TODO: #
#########
#
# - Consider moving this to an external CSV file that we load here.
# - Gather mode data points for the various settings of interest AND/OR
# consider computing some of these data points as part of the model
# from the system's parameters (e.g. gas cost/size of txs etc.)
#########

# - The documented sizes are the sizes (in bytes) of the Zeth JSON transaction objects.
#metrics_df = pd.DataFrame(
#    [
#        ["BN254", 32, 2, 2, 3090, 1315520],
#        ["BLS12_377", 32, 2, 2, 3603, 1353261]
#        # Switch JSOUT to 3 (e.g. to pay a Relay with an output note)
#        #["BN254", 32, 2, 3, XX, XX], # TODO
#        #["BLS12_377", 32, 2, 3, XX, XX] # TODO
#    ],
#    columns=['curve', 'mk_depth', 'jsin', 'jsout', 'zeth_tx_size', 'zeth_tx_gcost']
#)

# Approximate size (in bytes) for RLP encoded Zeth transaction receipt
# Follows the specified storage encoding of a receipt
# https://github.com/ethereum/go-ethereum/blob/v1.10.1/core/types/receipt.go#L92-L97
#
# This value has been obtained by:
# 1. Inspecting Zeth transaction receipts, such as:
# {
#   "blockHash":"0xa1ca015b7b7472f6a4a649890fb8d6cd7a85955e03e3d1b8603b2fa819c14071",
#   "blockNumber":"0x56b73",
#   "contractAddress":null,
#   "cumulativeGasUsed":"0x1b2cd3",
#   "from":"0xee0c66a2c570b0331c5bb1991124ed0529d11c4f",
#   "gasUsed":"0x1b2cd3",
#   "logs":[
#     {
#       "address":"0x26895344ba95f7a9762a5a4f871b5d5202115039",
#       "topics":[
#         "0x36ed7c3f2ecfb5a5226c478b034d33144c060afe361be291e948f861dcddc618"
#       ],
#       "data":"0x14ce028fa1e1df2d8c3b298a0659da99ae576eabd78e50607180755382193d2091f11ae060b100db666d01db0e8ab71b423192d9a1a7e75363af0ce5444ea1f96f8d3a4610c44a20545020204f0b19623b1abbb1784eb87c101ae398da0378351786c6efd30b72af6e1d7f875a8351872a4657dacb778b186d8a93e914df34490c3badefa35ee7b3a27f1b191c66bf9c066c8199452ed8f5c5f747f0997f627700000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000098ebf17ba5c8801a93c71f110a2a000f952fe24d018991295ae4c1b384b29dcd662cca2d57afc4c57dadcc1304122888ba13b6531cdc0afd4bb48bac8011d4c1c01bf04f27920e50b1792b6a6713644412c28b52b7f9142bd2d9dd59297fd7a546e4511b32f0d3eb3cbf19fd65374d0a55cd171b4b2d5b342802ab2788e91b5837087db3d0944f45ec011ba6b9723bbc24bf98f8ee1b732750000000000000000000000000000000000000000000000000000000000000000000000000000000981c1743cfcd668e4683946881b81ba9a69f79bcc87a5176df49c19aebfbf33d7d789be4f6de07bf66375aa208f5954a2cc41e0137e5a07239f97944e6f982493cb127c977091738c687532c7e3548394194ffa448b7e59b1222ae9d4bd9d969a6cf53b501be95300bb7b4a21bad83ffc33bc4140108d458f9b07847e7e7b3f38f27439322754d4a54e976549b40b8b10dab0a8d5ce89689050000000000000000",
#       "blockNumber":"0x56b73",
#       "transactionHash":"0xb4e683d7bbf4709fe7eb59fcd9041b1b90ab36266790a224d47ef894f0afa703",
#       "transactionIndex":"0x0",
#       "blockHash":"0xa1ca015b7b7472f6a4a649890fb8d6cd7a85955e03e3d1b8603b2fa819c14071",
#       "logIndex":"0x0",
#       "removed":false
#     }
#   ],
#   "logsBloom":"0x00000000000000400000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000800000000000000000000000000000000000000000000000800000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000",
#   "status":"0x1",
#   "to":"0x26895344ba95f7a9762a5a4f871b5d5202115039",
#   "transactionHash":"0xb4e683d7bbf4709fe7eb59fcd9041b1b90ab36266790a224d47ef894f0afa703",
#   "transactionIndex":"0x0"
# }
#
# 2. Following the structure of the storage encoding of a receipt
# https://github.com/ethereum/go-ethereum/blob/v1.10.1/core/types/receipt.go#L92-L97
# and removing redundant fields from the JSON receipt, to obtain something like:
# ["0x1b2cd3", ["0x26895344ba95f7a9762a5a4f871b5d5202115039",["0x36ed7c3f2ecfb5a5226c478b034d33144c060afe361be291e948f861dcddc618"],"0x14ce028fa1e1df2d8c3b298a0659da99ae576eabd78e50607180755382193d2091f11ae060b100db666d01db0e8ab71b423192d9a1a7e75363af0ce5444ea1f96f8d3a4610c44a20545020204f0b19623b1abbb1784eb87c101ae398da0378351786c6efd30b72af6e1d7f875a8351872a4657dacb778b186d8a93e914df34490c3badefa35ee7b3a27f1b191c66bf9c066c8199452ed8f5c5f747f0997f627700000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000098ebf17ba5c8801a93c71f110a2a000f952fe24d018991295ae4c1b384b29dcd662cca2d57afc4c57dadcc1304122888ba13b6531cdc0afd4bb48bac8011d4c1c01bf04f27920e50b1792b6a6713644412c28b52b7f9142bd2d9dd59297fd7a546e4511b32f0d3eb3cbf19fd65374d0a55cd171b4b2d5b342802ab2788e91b5837087db3d0944f45ec011ba6b9723bbc24bf98f8ee1b732750000000000000000000000000000000000000000000000000000000000000000000000000000000981c1743cfcd668e4683946881b81ba9a69f79bcc87a5176df49c19aebfbf33d7d789be4f6de07bf66375aa208f5954a2cc41e0137e5a07239f97944e6f982493cb127c977091738c687532c7e3548394194ffa448b7e59b1222ae9d4bd9d969a6cf53b501be95300bb7b4a21bad83ffc33bc4140108d458f9b07847e7e7b3f38f27439322754d4a54e976549b40b8b10dab0a8d5ce89689050000000000000000","0x56b73","0xb4e683d7bbf4709fe7eb59fcd9041b1b90ab36266790a224d47ef894f0afa703","0x0","0xa1ca015b7b7472f6a4a649890fb8d6cd7a85955e03e3d1b8603b2fa819c14071","0x0","0x0"],"0x1"]
# 
# 3. which can then be RLP encoded into something like:
# "0xf9030b831b2cd3f903039426895344ba95f7a9762a5a4f871b5d5202115039e1a036ed7c3f2ecfb5a5226c478b034d33144c060afe361be291e948f861dcddc618b9028014ce028fa1e1df2d8c3b298a0659da99ae576eabd78e50607180755382193d2091f11ae060b100db666d01db0e8ab71b423192d9a1a7e75363af0ce5444ea1f96f8d3a4610c44a20545020204f0b19623b1abbb1784eb87c101ae398da0378351786c6efd30b72af6e1d7f875a8351872a4657dacb778b186d8a93e914df34490c3badefa35ee7b3a27f1b191c66bf9c066c8199452ed8f5c5f747f0997f627700000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000098ebf17ba5c8801a93c71f110a2a000f952fe24d018991295ae4c1b384b29dcd662cca2d57afc4c57dadcc1304122888ba13b6531cdc0afd4bb48bac8011d4c1c01bf04f27920e50b1792b6a6713644412c28b52b7f9142bd2d9dd59297fd7a546e4511b32f0d3eb3cbf19fd65374d0a55cd171b4b2d5b342802ab2788e91b5837087db3d0944f45ec011ba6b9723bbc24bf98f8ee1b732750000000000000000000000000000000000000000000000000000000000000000000000000000000981c1743cfcd668e4683946881b81ba9a69f79bcc87a5176df49c19aebfbf33d7d789be4f6de07bf66375aa208f5954a2cc41e0137e5a07239f97944e6f982493cb127c977091738c687532c7e3548394194ffa448b7e59b1222ae9d4bd9d969a6cf53b501be95300bb7b4a21bad83ffc33bc4140108d458f9b07847e7e7b3f38f27439322754d4a54e976549b40b8b10dab0a8d5ce8968905000000000000000083056b73a0b4e683d7bbf4709fe7eb59fcd9041b1b90ab36266790a224d47ef894f0afa70300a0a1ca015b7b7472f6a4a649890fb8d6cd7a85955e03e3d1b8603b2fa819c14071000001"
#
# TODO: Add the receipt size to the metrics_df below (as it depends on the config, most notably on the JSOUT)
ZETH_RECEIPT_SIZE = 785

# - The documented sizes are the sizes (in bytes) of the Zeth RAW (RLP encoded) transaction objects.
metrics_df = pd.DataFrame(
    [
        ["BN254", 32, 2, 2, 1335, 1315520],
        ["BLS12_377", 32, 2, 2, 1557, 1353261] # (approx. obtained as 3603/3090 * 1335)
        # Switch JSOUT to 3 (e.g. to pay a Relay with an output note)
        #["BN254", 32, 2, 3, XX, XX], # TODO
        #["BLS12_377", 32, 2, 3, XX, XX] # TODO
    ],
    columns=['curve', 'mk_depth', 'jsin', 'jsout', 'zeth_tx_size', 'zeth_tx_gcost']
)


def get_benchmark_data(curve, mk_depth, jsin, jsout):
    """
    Function that queries the benchmark dataset to retrieve the data points
    relevant to the given model configuration.
    """
    result_df = metrics_df.query(
        'curve == @curve and\
        mk_depth == @mk_depth and\
        jsin == @jsin and\
        jsout == @jsout'
    )
    # Assert that the retrieved df has only 1 line
    # (to avoid data inconsistency on duplicated lines)
    assert result_df.shape[0] == 1, "[ERROR] Wrong number of data points in benchmark data"
    
    zeth_tx_size = result_df.iloc[0]['zeth_tx_size']
    zeth_tx_gcost = result_df.iloc[0]['zeth_tx_gcost']
    return (zeth_tx_size, zeth_tx_gcost)

# Test the query function
size, gas = get_benchmark_data("BN254", 32, 2, 2)
print("Result size: {} and gas: {}".format(size, gas))

### Genesis state of the simulation(s)

In [None]:
###############################
# State variables             #
###############################

# Genesis state: the same is used for both A/B testing simulations
genesis_state = {
    # All sets are initialized with the empty set (they have no elements)
    'MKLS_cardinality': 0,
    'NS_cardinality': 0,
    'RS_cardinality': 0,
    # The size of the blockchain is assumed to be 0 at starting time
    'B_size': 0,
    'B_txcount': 0
}

### System parameters of the simulation

The set of parameters (of interest) used for the A/B(/C) testing and "Parameter Sweep" simulation of Zeth is defined below.

In [None]:
##############################
# Simulation configuration   #
##############################

# Array of params used during the "Parameter Sweeps" Simulation
# to simulate under different blockchain configuration assumptions.
BLOCKCHAIN_PARAMS = [
    # Arbitrary set of blockchain config params
    # (from big blocks mined "slowly" to smaller blocks mined "frequently")
    { 'bglim': 25000000, 'btimetrgt': 15 },
    { 'bglim': 12500000, 'btimetrgt': 5 },
    { 'bglim': 5000000, 'btimetrgt': 1 },
    # Ethereum mainnet median data
    { 'bglim': MAINNET_MEDIAN_BLOCKGASLIMIT, 'btimetrgt': MAINNET_MEDIAN_BLOCKTIME }
]

# Simulate a system where only plain "EoA-to-EoA" transactions are
# carried out (no smart contract deployed)
system_params_A = {
    'chain' : BLOCKCHAIN_PARAMS,
}

# Below, we simulate different Zeth configurations (using different curves)
# on different blockchain configurations.
system_params_B = {
    'chain' : BLOCKCHAIN_PARAMS,
    'zeth' : [
        # Test all the blockchain configs with this Zeth config
        { 'curve': "BN254", 'mkdepth': 32, 'jsin': 2, 'jsout': 2 },
    ]
}

system_params_C = {
    'chain' : BLOCKCHAIN_PARAMS,
    'zeth' : [
        # Test all the blockchain configs with this Zeth config
        { 'curve': "BLS12_377", 'mkdepth': 32, 'jsin': 2, 'jsout': 2 },
    ]
}

### Policy functions

In [None]:
#############################
# Policy functions          #
#############################
#
# Manage time on the system
#
# We need to simulate the various block time targets
# If btimetrgt = 1, a block is added a each time step
# If btimetrgt = 5, a block is added every 5 time steps
# If btimetrgt = 15, a block is added every 15 time steps
def p_is_add_block(params, substep, state_history, previous_state):
    """
    Function that determines whether we need to mine a block at this time step or not
    """
    # At t = 0, this condition will be true for all block intervals
    # hence, producing the genesis block
    if previous_state['timestep'] % params['chain']['btimetrgt'] == 0:
        return ({'add_block': True})
    return ({'add_block': False})

#### Policy functions (simulation A: EoA-to-EoA transactions)

In [None]:
# Computes the block size (bytes) for Ethereum (EoA-to-EoA) only txs
def p_get_ethereum_block_size_bytes(params, substep, state_history, previous_state):
    nb_txs = params['chain']['bglim'] // DGAS
    # Account for the snappy compression in the state DB
    block_size = (nb_txs * ETHTXSIZE + BLOCKHEADERSIZE + ETH_RECEIPT_SIZE) / COMPRESSION_RATIO
    return ({'block_size': block_size, 'number_txs': nb_txs})

#### Policy functions (simulation B/C: Zeth transactions)

In [None]:
# Computes the block size (bytes) for Zeth only txs
# - Returns the block size and the number of txs in the block
def p_get_zeth_block_size_bytes(params, substep, state_history, previous_state):
    # Retrieve Zeth benchmark data
    curve = params['zeth']['curve']
    mkdepth = params['zeth']['mkdepth']
    jsin = params['zeth']['jsin']
    jsout = params['zeth']['jsout']
    zeth_tx_size, zeth_tx_gas = get_benchmark_data(curve, mkdepth, jsin, jsout)
    
    nb_txs = params['chain']['bglim'] // zeth_tx_gas
    # Account for the snappy compression in the state DB
    block_size = (nb_txs * zeth_tx_size + BLOCKHEADERSIZE + ZETH_RECEIPT_SIZE) / COMPRESSION_RATIO
    return ({'block_size': block_size, 'number_txs': nb_txs})

### State update functions (SUFs)

In [None]:
#################################
# Simple state update functions #
#################################
#
# - `params`: Python dictionary containing the system parameters
# - `substep`: Integer value representing a step within a single timestep
# - `state_history`: Python list of all previous states
# - `previous_state`: Python dictionary that defines what the state of the system was at the previous timestep or substep
# - `policy_input`: Python dictionary of signals or actions from policy functions

def s_update_B_size(params, substep, state_history, previous_state, policy_input):
    """
    Update the size of the blockchain B
    """
    new_value = previous_state['B_size'] + policy_input['add_block'] * policy_input['block_size']
    return 'B_size', new_value

def s_update_B_txcount(params, substep, state_history, previous_state, policy_input):
    """
    Update the size of the blockchain B
    """
    new_value = previous_state['B_txcount'] + policy_input['add_block'] * policy_input['number_txs']
    return 'B_txcount', new_value

#### SUFs (simulation A: EoA-to-EoA transactions)

In [None]:
# None are specific to simulation A

#### SUFs (simulation B/C: Zeth transactions)

In [None]:
def s_update_MKLS_cardinality(params, substep, state_history, previous_state, policy_input):
    """
    Update the cardinality of the set MKLS
    """
    new_value = previous_state['MKLS_cardinality'] + policy_input['add_block'] * policy_input['number_txs'] * params['zeth']['jsout']
    return 'MKLS_cardinality', new_value

def s_update_NS_cardinality(params, substep, state_history, previous_state, policy_input):
    """
    Update the cardinality of the set NS
    """
    new_value = previous_state['NS_cardinality'] + policy_input['add_block'] * policy_input['number_txs'] * params['zeth']['jsin']
    return 'NS_cardinality', new_value

def s_update_RS_cardinality(params, substep, state_history, previous_state, policy_input):
    """
    Update the cardinality of the set RS
    """
    new_value = previous_state['RS_cardinality'] + policy_input['add_block'] * policy_input['number_txs'] * params['zeth']['jsin']
    return 'RS_cardinality', new_value

### Partial State Update Blocks (PSUBs)

#### PSUBs (simulation A: EoA-to-EoA transactions)

In [None]:
partial_state_update_blocks_A = [
    { 
        'policies': {
            'is_add_block': p_is_add_block,
            'get_ethereum_block_size_bytes': p_get_ethereum_block_size_bytes
        },
        # Update all these variables in parallel
        'variables': {
            'B_size': s_update_B_size,
            'B_txcount': s_update_B_txcount
        }
    }
]

#### PSUBs (simulation B/C: Zeth transactions)

In [None]:
# Partial State Update Block shared by both simulation B and simulation C
partial_state_update_blocks_B_and_C = [
    { 
        'policies': {
            'is_add_block': p_is_add_block,
            'get_zeth_block_size_bytes': p_get_zeth_block_size_bytes
        },
        # Update all these variables in parallel
        'variables': {
            'MKLS_cardinality': s_update_MKLS_cardinality,
            'NS_cardinality': s_update_NS_cardinality,
            'RS_cardinality': s_update_RS_cardinality,
            'B_size': s_update_B_size,
            'B_txcount': s_update_B_txcount
        }
    }
]

### Simulation configuration

In [None]:
# For multiple MC runs, we can use the state['run'] variable that allows to get the run number
MONTE_CARLO_RUNS = 1
SIMULATION_TIMESTEPS = 86400 # Number of seconds in a day: 60*60*24

In [None]:
sim_config_A = config_sim({
    'N': MONTE_CARLO_RUNS,
    'T': range(SIMULATION_TIMESTEPS),
    'M': system_params_A
})

In [None]:
sim_config_B = config_sim({
    'N': MONTE_CARLO_RUNS,
    'T': range(SIMULATION_TIMESTEPS),
    'M': system_params_B
})

In [None]:
sim_config_C = config_sim({
    'N': MONTE_CARLO_RUNS,
    'T': range(SIMULATION_TIMESTEPS),
    'M': system_params_C
})

In [None]:
# Print the configuration structure for the param sweep simulation
def print_config_and_system_params(sim_config, system_params):
    print('sim_config: ')
    pprint(sim_config)
    print('  ')
    print('system_params: ')
    pprint(system_params_A)

print(' === Simulation A: ===\n')
print_config_and_system_params(sim_config_A, system_params_A)
print('\n=== Simulation B: ===\n')
print_config_and_system_params(sim_config_B, system_params_B)
print('\n=== Simulation C: ===\n')
print_config_and_system_params(sim_config_C, system_params_C)

### Simulations

In [None]:
# Carry out the simulations

# Clear any prior configs
del configs[:]

# Create new experiment
experiment = Experiment()

# Append Simulation A config (only EoA-to-EoA transactions hit the chain)
experiment.append_configs(
    initial_state = genesis_state,
    partial_state_update_blocks = partial_state_update_blocks_A,
    sim_configs = sim_config_A
)
# Append Simulation B config (only Zeth transactions hit the chain: BN254, MK depth 32, JSIN=JSOUT=2)
experiment.append_configs(
    initial_state = genesis_state,
    partial_state_update_blocks = partial_state_update_blocks_B_and_C,
    sim_configs = sim_config_B
)
# Append Simulation C config (only Zeth transactions hit the chain: BLS12_377, MK depth 32, JSIN=JSOUT=2)
experiment.append_configs(
    initial_state = genesis_state,
    partial_state_update_blocks = partial_state_update_blocks_B_and_C,
    sim_configs = sim_config_C
)

# Get relation between the:
# - Simulation ID
# - Subset ID
# - Run ID
# And the actual user-defined configurations.
# This is particularly useful to better understand the simulation results in the next section of the notebook.
for i in range(len(configs)):
    #pprint(configs[i].__dict__)
    print(configs[i].sim_config)

In [None]:
# Hack to use the cached simulation results by default
CACHED_SIMULATION = True

**If you have already carried out the simulation, cached its results, and simply want to plot the simulation results, please jump to [this step](#Load-cached-simulation-results) (and do not execute the boxes below).**

In [None]:
# If this box is executed, then we run the full simulations
# and thus won't be plotting from the cached results
CACHED_SIMULATION = False

In [None]:
# Simulation A
exec_mode = ExecutionMode()
exec_context = ExecutionContext(context=exec_mode.multi_mode)

simulation = Executor(exec_context=exec_context, configs=configs)
raw_result, tensor_field, sessions = simulation.execute()

In [None]:
simulation_result = pd.DataFrame(raw_result)
simulation_result

In [None]:
#from tabulate import tabulate
#print(tabulate(simulation_result, headers='keys', tablefmt='psql'))

#### Cache simulation results

In [None]:
# Cache the simulation results
compression_opts = dict(method='zip', archive_name='simulation_result.csv')
simulation_result.to_csv('simulation_result.zip', index=False, compression=compression_opts)

#### Load cached simulation results

If you wish to plot the results of a past simulation (without re-running the full set of simulations above), please start here.

In [None]:
# If we use cached simulation results
if CACHED_SIMULATION:
    print("Loading cached simulation results...")
    compression_opts = dict(method='zip', archive_name='simulation_result.csv')
    simulation_result = pd.read_csv('simulation_result.zip', compression=compression_opts)
    print("Loading completed!")

### Simulation data processing and plots

In [None]:
# Copy the simulation data in a new data frame
df = simulation_result.copy()
df_simulation_sweep_A = df[df['simulation'] == 0]
df_simulation_sweep_B = df[df['simulation'] == 1]
df_simulation_sweep_C = df[df['simulation'] == 2]
df.head(5)

In [None]:
import plotly.express as px
# Multiple plots for each `subset` (i.e. for each system configuration during the "Param Sweep" simulation) 
px.line(
    df,
    x='timestep',
    y=['B_size'],
    facet_row='subset', # Each row is a blockchain config (and zeth config if applicable)
    facet_col='simulation',# Columns = Zeth txs (simulation 0), EoA-to-EoA (simulation 1)
    color='subset',
    title='Growth of the Blockchain state under various protocol configurations (Facets view)',
    labels=dict(timestep="Timesteps (sec)", value="Chain size (bytes)", subset="Configuration")
)

In [None]:
# Multiple plots for each `subset` (i.e. for each system configuration during the "Param Sweep" simulation) 
px.line(
    df,
    x='timestep',
    y=['B_size'],
    color='subset',
    facet_col='simulation',
    title='Growth of the Blockchain state under various protocol configurations (Overlaid view)',
    labels=dict(timestep="Timesteps (sec)", value="Blockchain size (in bytes)", subset="Configuration number")
)

In [None]:
# We can further extract the results of a specific "sweep simulation"
df_simulation_zeth_A = df[df['simulation'] == 1]
# Extract a specific sweep run (i.e. a run with a specific configuration)
df_simulation_sweep_run = df_simulation_zeth_A[df_simulation_zeth_A['subset'] == 0]
# Display the first timesteps of the simulation with config 0
df_simulation_sweep_run.head()

In [None]:
# The Zeth simulations are simulations B and C (i.e. at index 1 and 2 in the results)
# The simulation A (at index 0) contains the results for the EoA-to-EoA case
df_simulation_zeth = df[df['simulation'].isin([1, 2])]
df_simulation_zeth.head()

In [None]:
px.line(
    df_simulation_zeth,
    x='timestep',
    y=['MKLS_cardinality'],
    facet_row='subset',
    facet_col='simulation',
    color='subset',
    title='Growth of the Merkle tree leaves set cardinality under various protocol configurations (Facets view)',
    labels=dict(timestep="Timesteps (sec)", value="MKLS", subset="Config")
)

In [None]:
px.line(
    df_simulation_zeth,
    x='timestep',
    y=['MKLS_cardinality'],
    facet_col='simulation',
    color='subset',
    title='Growth of the Merkle tree leaves set cardinality under various protocol configurations (Overlaid view)',
    labels=dict(timestep="Timesteps (sec)", value="Number of leaves in the Merkle tree", subset="Configuration number")
)

In [None]:
# Get the number of Zeth simulations
nb_simulations = df_simulation_zeth['simulation'].nunique()
# For now we only account for 2 zeth param sweep simulations
# We can make the code more generic if necessary
assert nb_simulations == 2

In [None]:
# All simulations should have the same number of subsets (should be equal to the number of parameter sweeps)
nb_sweeps = len(BLOCKCHAIN_PARAMS)
nb_sweeps

In [None]:
# Track the ratio of leaves (in the MK tree) that are allocated in a
# simulation over the total number of leaves in the tree
mktree_occupancy_ratios_result = {}
# Track of the number of times the given simulation needs to be ran
# (i.e. number of days) before the Merkle tree is completely filled.
simulation_runs_to_mktree_occupancy_result = {}


# Build dataset representing the % of Merkle tree occupancy rate for all "sweeped simulations"
for sim_id in [1,2]:
    # Get the specific simulation results
    df_simulation_id = df_simulation_zeth[df_simulation_zeth['simulation'] == sim_id]
    
    mktree_occupancy_ratio = []
    simulation_runs_to_mktree_occupancy = []
    for sweep_id in range(nb_sweeps):
        # Get the specific simulation's sweep results
        df_sweep_id = df_simulation_id[df_simulation_id['subset'] == sweep_id]
        config_id = nb_sweeps * sim_id + (sweep_id % nb_sweeps)
        # Get Merkle tree depth of the associated configuration
        mkdepth = configs[config_id].sim_config['M']['zeth']['mkdepth']
        # Get last state of the simulation
        final_state_simulation = df_sweep_id.tail(1)
        # Read final cardinality of MKLS (at the end of the simulation)
        final_mkls_cardinality = final_state_simulation.iloc[0]['MKLS_cardinality']
        # Compute the remaining number of leaves
        free_leaves = 2**mkdepth - final_mkls_cardinality
        # Compute occupancy ratio
        ratio_occupied = (final_mkls_cardinality / free_leaves) * 100
        mktree_occupancy_ratio.append([ratio_occupied, sweep_id])
        simulation_runs_to_mktree_occupancy.append((2**mkdepth) // final_mkls_cardinality)
        
    mktree_occupancy_ratios_result[sim_id] = mktree_occupancy_ratio
    simulation_runs_to_mktree_occupancy_result[sim_id] = simulation_runs_to_mktree_occupancy

pprint(mktree_occupancy_ratios_result)
pprint(simulation_runs_to_mktree_occupancy_result)

In [None]:
# See here for the list of available colors:
# https://community.plotly.com/t/plotly-colours-list/11730/3

import plotly.graph_objects as go

merkle_tree_df_zeth_A = pd.DataFrame(
    mktree_occupancy_ratios_result[1],
    columns=['ratio_occupied', 'subset'],
)
merkle_tree_df_zeth_B = pd.DataFrame(
    mktree_occupancy_ratios_result[2],
    columns=['ratio_occupied', 'subset'],
)

fig = go.Figure()
fig.add_trace(go.Bar(
    x=merkle_tree_df_zeth_A['subset'],
    y=merkle_tree_df_zeth_A['ratio_occupied'],
    name='Zeth (BN254)',
    marker_color='darkcyan'
))
fig.add_trace(go.Bar(
    x=merkle_tree_df_zeth_B['subset'],
    y=merkle_tree_df_zeth_B['ratio_occupied'],
    name='Zeth (BLS12_377)',
    marker_color='darkgray'
))

fig.update_layout(
    title="Occupancy rate of Merkle Tree per simulation run under various system configurations",
    xaxis_title="Various blockchain configurations",
    yaxis_title="Percentage of Merkle Tree occupancy per simulation run",
    barmode='group'
)
fig.show()

## Answers to the open questions

### Answer to [question 1](#Question-1)

In [None]:
# Number of successive/incremental simulation runs required to completely fill the Merkle Tree
# in the different situations
for sim_id in simulation_runs_to_mktree_occupancy_result:
    print('=== Simulation {}: ==='.format(sim_id))
    for i in range(len(simulation_runs_to_mktree_occupancy_result[sim_id])):
        print("- Config {} needs to be ran another {} times to fill the Merkle tree".format(i, simulation_runs_to_mktree_occupancy_result[sim_id][i]))

### Answer to [question 2](#Question-2) and [question 4](#Question-4)

In [None]:
# 1 TB approx. 10**12 bytes
terabyte = 10**12

def compute_simulation_terabyte_occupancy_rate(simulation_df):
    terabyte_occupancy_rate = []
    nb_simulation_runs = []
    for i in range(nb_sweeps):
        # Get the specific simulation results
        df_sweeped_simulation = simulation_df[simulation_df['subset'] == i]
        # Get last state of the simulation
        final_state_simulation = df_sweeped_simulation.tail(1)
        final_blockchain_size = final_state_simulation.iloc[0]['B_size']
        # Compute terabyte occupancy ratio
        ratio_occupied = (final_blockchain_size / terabyte) * 100
        # Append simulation results to the aggregate arrays (aggregating these metrics for all simulations)
        terabyte_occupancy_rate.append([ratio_occupied, "config-"+str(i)])
        nb_simulation_runs.append(terabyte // final_blockchain_size)
    return (terabyte_occupancy_rate, nb_simulation_runs)


terabyte_occupancy_rate_simulations_result = {}
nb_simulation_runs_simulations_result = {}

nb_simulations = df['simulation'].nunique()
for sim_id in range(nb_simulations):
    df_simulation_id = df[df['simulation'] == sim_id]
    res_simulation_id = compute_simulation_terabyte_occupancy_rate(df_simulation_id)
    terabyte_occupancy_rate_simulations_result[sim_id] = res_simulation_id[0]
    nb_simulation_runs_simulations_result[sim_id] = res_simulation_id[1]


pprint(terabyte_occupancy_rate_simulations_result)
pprint(nb_simulation_runs_simulations_result)

In [None]:
# terabyte occupancy rate after simulation
terabyte_df_simulation_A = pd.DataFrame(
    terabyte_occupancy_rate_simulations_result[0],
    columns=['ratio_occupied', 'subset'],
)
terabyte_df_simulation_B = pd.DataFrame(
    terabyte_occupancy_rate_simulations_result[1],
    columns=['ratio_occupied', 'subset'],
)
terabyte_df_simulation_C = pd.DataFrame(
    terabyte_occupancy_rate_simulations_result[2],
    columns=['ratio_occupied', 'subset'],
)

fig = go.Figure()
fig.add_trace(go.Bar(
    x=terabyte_df_simulation_A['subset'],
    y=terabyte_df_simulation_A['ratio_occupied'],
    name='EoA-to-EoA',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x=terabyte_df_simulation_B['subset'],
    y=terabyte_df_simulation_B['ratio_occupied'],
    name='Zeth (BN254)',
    marker_color='darkcyan'
))
fig.add_trace(go.Bar(
    x=terabyte_df_simulation_C['subset'],
    y=terabyte_df_simulation_C['ratio_occupied'],
    name='Zeth (BLS12_377)',
    marker_color='darkgray'
))

fig.update_layout(
    title="Occupancy rate of 1TB per simulation run under various system configurations",
    xaxis_title="Various blockchain configurations",
    yaxis_title="Percentage of occupancy of 1TB per simulation run",
    barmode='group'
)
fig.show()

In [None]:
# Number of successive/incremental simulation runs required to completely fill the Merkle Tree
# in the different situations
for sim_id in nb_simulation_runs_simulations_result:
    print('=== Simulation {}: ==='.format(sim_id))
    for i in range(len(nb_simulation_runs_simulations_result[sim_id])):
        print("- Config {} needs to be ran another {} times to fill 1TB of storage".format(i, nb_simulation_runs_simulations_result[sim_id][i]))

### Answer to [question 3](#Question-3)

In [None]:
# TODO: Refactor this block
#
# For now we use hardcoded queries. May be worth refactoring this to
# query the benchmark data by reading the configurations of the simulations
# to have a more generic approach.
zethSizeBN254, zethGasBN254 = get_benchmark_data("BN254", 32, 2, 2)
zethSizeBLS12377, zethGasBLS12377 = get_benchmark_data("BLS12_377", 32, 2, 2)

# Compute byte cost (in gas) of the transactions
BYTE_COST_EOA_TO_EOA = DGAS / (ETHTXSIZE + ETH_RECEIPT_SIZE)
byteCostZethTxBN254 = zethGasBN254/ (zethSizeBN254 + ZETH_RECEIPT_SIZE)
byteCostZethTxBLS12377 = zethGasBLS12377 / (zethSizeBLS12377 + ZETH_RECEIPT_SIZE)

# Compute the size ratio of Zeth transactions vs plain EoA-to-EoA transactions
sizeRatioBN254 = (zethSizeBN254 + ZETH_RECEIPT_SIZE) / (ETHTXSIZE + ETH_RECEIPT_SIZE)
sizeRatioBLS12377 = (zethSizeBLS12377 + ZETH_RECEIPT_SIZE) / (ETHTXSIZE + ETH_RECEIPT_SIZE)

# Compute the cost ratio of Zeth transactions vs plain EoA-to-EoA transactions
costRatioBN254 = zethGasBN254 / DGAS
costRatioBLS12377 = zethGasBLS12377 / DGAS

print(" == Gas paid per byte added to the chain == ")
print("Config A (EoA-to-EoA): ", BYTE_COST_EOA_TO_EOA)
print("Config B (Zeth with BN254): ", byteCostZethTxBN254)
print("Config C (Zeth with BLS12377): ", byteCostZethTxBLS12377)
print(" ")
print(" == Size ratios (tracks how much bigger a Zeth tx is w.r.t a plain EoA-to-EoA tx) == ")
print("Config B (Zeth with BN254): ", sizeRatioBN254)
print("Config C (Zeth with BLS12377): ", sizeRatioBLS12377)
print(" ")
print(" == Cost ratios (tracks how much more expensive a Zeth tx is w.r.t a plain EoA-to-EoA tx) == ")
print("Config B (Zeth with BN254): ", costRatioBN254)
print("Config C (Zeth with BLS12377): ", costRatioBLS12377)

### Answer to [question 5](#Question-5)

In [None]:
# TPS tracker
# (more precisely, here we track the number of transactions per block
# under the different configurations)

def get_number_transactions_end_simulation(simulation_df):
    nb_transactions = []
    for i in range(nb_sweeps):
        # Get the specific simulation results
        df_sweeped_simulation = simulation_df[simulation_df['subset'] == i]
        # Get last state of the simulation
        final_state_simulation = df_sweeped_simulation.tail(1)
        final_transaction_nb = final_state_simulation.iloc[0]['B_txcount']
        nb_transactions.append([final_transaction_nb, "config-"+str(i)])
    return nb_transactions


nb_transactions_simulations_result = {}

nb_simulations = df['simulation'].nunique()
for sim_id in range(nb_simulations):
    df_simulation_id = df[df['simulation'] == sim_id]
    res_simulation_id = get_number_transactions_end_simulation(df_simulation_id)
    nb_transactions_simulations_result[sim_id] = res_simulation_id


pprint(nb_transactions_simulations_result)

In [None]:
# Number of processed transactions
nb_transactions_simulation_A = pd.DataFrame(
    nb_transactions_simulations_result[0],
    columns=['B_txcount', 'subset'],
)
nb_transactions_simulation_B = pd.DataFrame(
    nb_transactions_simulations_result[1],
    columns=['B_txcount', 'subset'],
)
nb_transactions_simulation_C = pd.DataFrame(
    nb_transactions_simulations_result[2],
    columns=['B_txcount', 'subset'],
)

fig = go.Figure()
fig.add_trace(go.Bar(
    x=nb_transactions_simulation_A['subset'],
    y=nb_transactions_simulation_A['B_txcount'],
    name='EoA-to-EoA',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x=nb_transactions_simulation_B['subset'],
    y=nb_transactions_simulation_B['B_txcount'],
    name='Zeth (BN254)',
    marker_color='darkcyan'
))
fig.add_trace(go.Bar(
    x=nb_transactions_simulation_C['subset'],
    y=nb_transactions_simulation_C['B_txcount'],
    name='Zeth (BLS12_377)',
    marker_color='darkgray'
))

fig.update_layout(
    title="Number of transactions processed at the end of each simulations",
    xaxis_title="Various blockchain configurations",
    yaxis_title="Number of transactions processed per simulation run",
    barmode='group'
)
fig.show()

## Notes and Observations

###  Note 1

Our results are obtained by using an *over-simplification of the real-life system* (simplifying assumptions are made - by the essence of modeling). The goal of this work is simply to get a "taste" on how the system evolves over time under different configurations. **Importantly however, we stress that these numbers should not be interpreted as a fully accurate and truthworthy representation of a real life system's growth.**

###  Note 2

The data fed into the simulations comes from different sources (some is derived from data sets provided by [Etherscan.io](https://etherscan.io/) (see [details here](#Simulation-dataset)), some come from [our BLS12-377-enabled fork of ganache](https://github.com/clearmatics/ganache-cli), some come from the [Autonity Bakerloo testnet](https://explorer.bakerloo.autonity.network/) etc). Hence, it is necessary to note that the multiple sources of data used in this work adds another layer of noise on top of the model's simplifications of the real world. Further iterations of these set of simulations will be carried out in the future with more uniform data sources and more accurate data.

### Observations

Overall, no surprising results here.

- We see, on the various plots above summarizing the simulation results, that using Zeth does not yield a "state blow-up" for the mere reason that fewer transactions hit the chain. It would, of course, be foolish to say that "Zeth is more scalable than EoA-to-EoA transactions" however. This, in fact, is all the contrary. As expected, Zeth significantly impacts the chain state (a Zeth transaction is an order of magnitude bigger (byte-wise) than an EoA-to-EoA transaction (see boxes above)). Nevertheless, it is worth noting that the ratio of gas between Zeth and EoA-to-EoA transactions is *significantly higher* than the size ratio. This means that Zeth transactions are "less gas-efficient" than EoA-to-EoA transactions as you pay more gas per byte added to the state when doing such state transitions (whether or not this is good or bad depends on the context. This surely isn't really a good news for Zeth users who are paying more - byte-wise - than plain EoA-to-EoA transaction senders do. However, having "expensive" byte addition to the state decreases tensions around _"state rent"_ which is good overall for the health of the system)). This also means that significantly less Zeth transactions are included and mined into blocks, affecting the throughput/TPS of the system. Hence, to keep a similar TPS than in the EoA-to-EoA case, one needs to adjust the L1 parameters to mine bigger blocks, effectively compromising on the state growth. (This is the main motivation behind Zecale as a way to reconcile privacy preserving state transitions and scalability.)

- It is worth noting that for similar configuration parameters (i.e. for fixed `JSIN`/`JSOUT`/`MK Tree depth`) the BN254 and BLS12-377 simulations provide slightly diffferent simulation results. While some of these results (especially the different occupancy rate of the Zeth Merkle tree) may seem surprising at first glance, they are not. In fact, using different curves implies:
    - using different precompiled contracts for the state transition (these EVM "extensions" have different costs)
    - sending different pieces of data on-chain (these data pieces have different byte-length and this incur different cost)

Hence switching the curve implies having different transaction costs, which means, a potentially different number of transactions fitting in a block (provided a fixed gas limit per block). This, in fine, impacts the number of commitments added to the Zeth tree per block, hence leading to potentially different occupancy rates for the Merkle tree.

## Final remarks

- As described in the [Zecale paper](https://arxiv.org/abs/2008.05958), using Zecale can allow to have sender anonymity (under additional assumptions) and can allow to save gas and bytes for the nested transactions. Importantly, we know that beyond the existing tension between (trustless) privacy and scalability (i.e. more data needs to be added to the chain to maximize "undistinguishability"), there also exists a tension between (trustless) "privacy" and "composability" on blockchain systems. In fact, manipulating one's funds in a privacy preserving fashion is an impediment to composability (which is fully leveraged in the "clear"). 
- No solution is a silver bullet. Whether "multi/side-chain"-based solutions, sharding approaches (somewhat releated to the "multi-blockchain" paradigm) or a "rollup"/L2 approach is employed, one needs to be aware of the various tradeoffs on composability (challenges around "cross-rollup execution"/"cross-shard execution"), on decentralization and censorship resistance (potential increased centralization with rollups, potential "data availability" violations) etc.

In this experiment, we simply modeled various systems under extreme conditions (some where **only** EoA-to-EoA transactions are carried out, some where **only** Zeth transactions are mined). In practice, this is much more nuanced and blocks contain various type of transactions (various "DApps transactions", ERC token transactions, plain EoA-to-EoA transactions etc). As such, it is worth remembering that in real-life systems, the number of Zeth transactions is not expected to represent 100% of the chain use. Since Zeth can be used to transact fungible assets (ERC20/223 tokens, ETH etc.), we are interested to get the proportion of block space occupied by such transactions on Ethereum. To that end, we looked at several thousands of Ethereum transactions, which showed that approximatively 85% of them could be carried out in a privacy preserving manner using Zeth on mainnet (i.e. around 85% of the transactions analyzed were EoA-to-EoA transactions or ERC token transactions).

Knowing this, along with the known tension between privacy and composability (i.e. manipulating one's funds in a privacy preserving fashion is an impediment to composability (which is fully leveraged in the "clear")) - which would likely drive the number of Zeth transaction down (i.e. numerous "clear-text" EoA-to-EoA/ERC token transfer transactions would still need to be carried out to compose with DeFi applications on-chain) -, we assume (for illustration purposes), that the number of Zeth transactions represents 50% of the chain "traffic". This means, that despite Zeth transactions being approx. 18 times bigger than EoA-to-EoA transactions, using Zeth **does not** translate into a state growing 18 times faster, but rather **50% of the state growing 18 times faster** (for a fixed number of transactions).

--------

*Note: The high number of EoA-to-EoA transactions seems to be partially due to mining pools that redistribute rewards to participants on the network. Interestingly, it looks like a Zeth contract configured to handle a high `JSOUT` could be used for Zeth multi-private payments within a mining pool. No simulation has been carried out under such configuration however. This is let for future work.*

## Conclusions

This study offers preliminary results about the impact of Zeth on the state growth of the underlying blockchain. While these results are due to be refined in the future, we think they provide an acceptable estimate of the state growth of the system under different conditions.

Importantly, for public blockchain networks to be sustainable, we think that all L2 developers *MUST* carry out some analysis to track the impact that their smart-contracts/layer 2 protocols will have on the layer 1 they build upon (be it Ethereum or another one). Doing so before deploying the smart-contracts is particularly important to prevent bloating the system's state which - as we know - is immutable. It is the community's responsibility to protect "its common goods" and prevent the so called ["tragedy of the commons"](https://science.sciencemag.org/content/162/3859/1243).

## Up next

In the next notebook, we will simulate our systems under different Zecale configurations to study the data compression achieved for different batch sizes. We will see that limited data compression can be achieved without compromising "data availability" and censorship resistance. However, if we are ready to compromise "data availability" better compression can be achieved. Appropriate tradeoffs are always required.

## Additional References

- Zeth paper: https://arxiv.org/abs/1904.00905
- Zeth specifications: https://github.com/clearmatics/zeth-specifications
- Reference implementation: https://github.com/clearmatics/zeth
- Chainwipe Gist: https://gist.github.com/karalabe/60be7bef184c8ec286fc7ee2b35b0b5b
- The State Growth Problem Facing Blockchains: https://thecontrol.co/state-growth-a-look-at-the-problem-and-its-solutions-6de9d7634b0b
- RocksDB wiki: https://github.com/facebook/rocksdb/wiki/Compression
- Which database(s) do the ethereum clients use and why? https://ethereum.stackexchange.com/questions/824/which-databases-do-the-ethereum-clients-use-and-why
- The Ethereum-blockchain size will not exceed 1TB anytime soon: https://dev.to/5chdn/the-ethereum-blockchain-size-will-not-exceed-1tb-anytime-soon-58a
- Lies, Damn Lies And SSD Benchmark Test Result: https://www.seagate.com/gb/en/tech-insights/lies-damn-lies-and-ssd-benchmark-master-ti/
- Turbo-Geth: https://github.com/ledgerwatch/turbo-geth
- How Nervos is Tackling the State Explosion Problem Facing Smart Contract Blockchains: https://medium.com/nervosnetwork/how-nervos-is-tackling-the-state-explosion-problem-facing-smart-contract-blockchains-a9acc4c5708e

---------------------

<center>Found a mistake, or thinking about a way to improve this document?</center>

<center>Great! All contributions are welcomed. To do so, please feel free to open a Pull Request or an Issue on the repository :)</center>

---------------------