<a href="https://colab.research.google.com/github/IFFranciscoME/CL-AMM/blob/main/Monolith.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Brief Vision
---


The requested/suggested perspective and evaluation criteria for this task had these 3 areas of focus:

1. Reasoning
2. Engineering decisions
3. Priorities

Which I re-interpreted as the following type of activities:

1. Modeling Financial Process.
2. Data Architecture & Systems Design.
3. Product Management / Project Management.

## Update general knowledge

Update knowledge about Liquidity Provision in Automated Market Makers OnChain with Concentrated Liquidity Architecture.

References:

- Docs for **EKUBO**: https://docs.ekubo.org/ | https://uniswapv3book.com/
- Docs for **Uniswap V3**: https://docs.uniswap.org/contracts/v3/overview
- **Fritsch, Robin.** Concentrated Liquidity in Automated Market Makers. Proceedings of the 2021 ACM CCS workshop on decentralized finance and security. 2021.

## Fundamental Abstract Components

Define what would be the fundamental elements for a for-profit liquidity provision strategy for CL-AMM

- Anchor Price: A single value, decision: How to choose it.
- Price Range: A tuple of 2 values, decision: Which Unit of definition to use, Which magnitud will it have, is it symmetric in both directions,
- Policies: Range Symmetry, Anchor trailing, Inventory, Operational.
- Tools: Metrics, Stats, Models.

Define fundamental areas of expertise/focus needed:

- Management / Business logic expectations
	- Resources availability
	- time window
- Engineering

## Overlooked principles

- Tick distance in OrderDriven CEX are post-defined by market demand, and although not by default or always present, they are always quotable (usable) at trader's will. In CL-AMM tick spaces are pre-defined, and done so by a protocol criteria based on volatility.


# Expectations
---

70% of the effort should be towards :
- An algo development platform (Codes, References, Hypothesis, Working group).
- A deployment system with "dev/staging/production" levels of maturity.
- A monitoring system to keep bounded costs (the resulting combination of PnL and Direct infrastructure costs).

30% of the remaining should towards the development and continuous improvement of the capacity to conduct rapid iterations of trading hypothesis. Ideally, anyone in the company with enough time and skill/resources should be able to develop an idea, test it and present it to decision makers.

The expected system's stability will be measured throughout 3 different aspects, each of which representing a different attribution of results from the company, from the least controlable to the almost fully controlable one:

- Market metrics:
	- Existing Volume per LP.
- Operations metrics
	- Computational Metrics
	- Networking Metrics
	- Uptime, Posted Liquidity.
	- From idea to deployment cycle
- Performance metrics.
	- Quoted Volume per LP.
	- Risk adjusted Profit
- Risk Metrics:
	- Adverse selection indicator (VPIN Model).
	- Token Inflow/Outflow DEX/CEX Monitoring.
	- Expected clusters of trading activity (Hawkes Model).



# Roadmap
---

## Phase 0 : Hypothesis

- Duration: 3 days
- Purpose: Acknowledge existing resources, requirements, and knowledge from other team members, industry content and Academic literature.
- Tasks:
	- Benchmark definition (The cost of doing nothing, as a baseline).
	- Explore the state of the art (internally in company, industry content, and, academia).
- Outcome:
	- Definition on the source for alpha: The unfair advantage we can have.
	- Notes on available resources and constrains: People, Tech, Knowledge, Time, Data, Infrastructure, Licenses, Tools.

## Phase 1: PoC exploration

- Duration: 2 weeks
- Purpose: Test the most fundamental and risky assumptions in the Hypothesis.
- Tasks:
	- build basic data acquisition layer (Later will become the Data-Infrastructure repo/project)
	- build basic data structures  (Later will become Data-Infrastructure repo/project)
	- build data pre-processing and EDA locally (Later will become model-amm repo/project)
	- Exploratory Data Analysis for Stylized Facts validations
		- For comparison purposes, chose 3 pairs that are very different according to a given dimension (tokenomics, purpose, liquidity profile ...).
		- Define, build and execute a basic Exploratory Data Analysis with the basic statistical tests (distributions, outliers, ranges)
		- Define a list of questions whose answer might be an stylized fact:
			- Is there an observable pattern for a particular metric ? (quoted volume across different hours of the day, intraday volatility clusters, spoof trading, pump and dump...)
			- Do the involved tokens have known restricted/calendarized monetary/minting policies for scheduling supply?
			- other similar questions and respective data-based explorations.
- Outcome:
	- repos:
		- data-infrastructure: Data Acquisition codes
		- model-amm: EDA methods for Hypothesis Testing
	- presentation:
		- Present results and discus findings with the team.

## Phase 2 : Alpha version

- Duration: 6 weeks
- Purpose: Build initial data-infrastructure, modeling, execution, monitoring functional layers to test altogether.
- Tasks:
	- Data infrastructure:
		- SDK for basic operations with smart contract in Ekubo and Uniswap v3
		- Data Structures (Standardize naming, units, etc, across venues),
		- Data Pipeline:	Serialization/Deserialization processes, pre-processing for standardization)
		- Data Storage: Partitioning schemas, Database definition/access.
		- Data Monitoring: Data completeness metric definition (expected amount of data or activity per unit of time.
	- Modeling:
		- Forecast Boundaries:
			- target variable:
			- lower-bound price + upper-bound price + bounded-volume.
		- Option B:
			- forecast volatility
		- price chase
		- rebalance
		- Adverse selection indicator (VPIN Model).
		- Token Inflow/Outflow DEX/CEX Monitoring.
		- Expected clusters of trading activity (Hawkes Model).

- Basic position management (mint, burn, collect fees)
- Price oracle integration with redundancy and validation
- Gas estimation and optimization framework
## Phase 1: Data-Infrastructure

- Purpose: Build, test and present the fundamental definitions and methods for data acquisition, pre-processing and storing.
- Paradigm: Monolithic
- Stack: Terraform + Shell + Docker + SQL
- Initial global data model:
	- Standardize naming, units for symbols, CEX/DEX, chains, transactions.
	- Data structures: Aware of Serialization and Partition needs.
	- Base methods: for HTTP, WSS, gRPC, Unstructured/Raw, Data Systems.
- Initial Source Mapping:
	- Oracle categorization/exploration/prioritization based on Hypothesis.
- Monitoring
	- MEV monitoring HoneyPot

## Data Layer (Off-chain).

- Create the global data model:
	- Standardize naming, units for symbols, CEX/DEX, chains, transactions.
	- Data structures: Aware of Serialization and Partition needs.
	- Base methods: for HTTP, WSS, gRPC, Unstructured/Raw.
- Source Catalog:
	- Oracle categorization/exploration
	- Prioritization based on hypothesis
	- Access/resource/monitoring allocation based on purpose

- e.g. Selected pair from Hypothesis dictates data pipeline sourcing priority, not only price and volume, also any other tokenomics-related, category-related, usage-related.
- Build the Global Price Model (Oracle integrations with redundant price feeds)

## Compute Layer (Off-chain)

- Local Development
	- Preferably: Unix-like (or linux distro), local repo, execution on docker images, local stack, env variables)
- Data Worker
	- Docker, Throughput or i/o optimized image, Serializer/Deserializer, DB Connectivity, Pre-processing.
- Compute Worker
	- Docker, Compute optimized image
- Execution Worker
	- private mempool integration for MEV protection.

## Infrastructure

- Cloud Development Setup.
- Storage model.
- Data Acquisition Workers.
- Data Processing Workers.

Later stages depending on success and lessons learned from the previous ones:
- Advanced Modeling
- Model Calibration
- Model efficacy and Resources efficiency



# Prioritization
---

Impact dimensions:

- TS: Time spent
- FR: Financial risk
- FP: Financial profit
- TR: Technology risk
- OR: Operational risk

Effort scale:

- Hours Spent Developing
- Hours Spent Maintaining


## Alpha Version Features

Purpose: Start the iteration cycle with focus on simplicity, monitoring capacity, lower financial impact.

- **Price-Range**:
	- Determination: Lower and upper bound prices determination based on %bps from TOB.
	- Management: Rebalancing every hour (For monitoring purposes as well)
	- High priority for time: Main source of time spent (monitoring, tooling, simplistic models)
	- Lower Financial Risk at the expense of lowering financial profit
	- Do not avoid but identify well technology and operational risk
- **Posted-Volume**:
	- Determination: The intentional volume or amount to post within the price boundary.
	- Low priority: The upper bound of supported posted volume will later be optimized.
	- This lowers financial risk, and lowers impact of technology and operational risk as well.
- **MEV Protection**: Flashbots integration for private transaction submission.
	- Medium priority: Meta data around execution context is important but only after price-range static and price-chase are defined and tested.
- **Protocol Support**: Initial focus on Uniswap V3
	- High priority since it potential has faster development cycle.
	- This lowers financial risk, and lowers impact of technology and operational risk as well.
- **Capital Management**: Basic fee collection
	- Lower priority on auto-compounding.

## Beta Version Features

Purpose: Enhance iteration cycle by focusing on pattern finding and value extracting, monitoring capacity enhanced a bit towards performance attribution, noticeable up-side financial impact exposure.

- **Advanced Rebalancing Modes**: Price-chase and volatility-adaptive strategies
- **Ekubo Integration**: Cross-protocol optimization and comparison
- **Auto-compounding**: Automated fee reinvestment with gas optimization
- **Multi-position Management**: Risk distribution across multiple ranges

## Future Iterations

Purpose: Either breadth expansion towards other venues for the same token-class, or, double-down into more value extraction in the same set of LPs or token pairs.

- **Cross-protocol Arbitrage**: Automated opportunity detection and execution
- **ML-based Optimization**: Reinforcement learning for strategy parameter tuning
- **Flash Loan Integration**: Capital efficiency improvements through leveraged positions
- **Cross-chain Deployment**: Extension to L2s and alternative chains


# Risks and Mitigation
---


| Risk Category                      | Severity | Likelihood | Mitigation Strategy                                                                                               |
| ---------------------------------- | -------- | ---------- | ----------------------------------------------------------------------------------------------------------------- |
| **MEV Extraction**                 | High     | High       | - Private mempool integration<br>- Transaction timing optimization<br>- Slippage bounds enforcement               |
| **Oracle Manipulation**            | Critical | Low        | - Multi-oracle architecture.<br>- Stabilized Cross-reference Pricing.<br>- Circuit breakers for extreme movements |
| **Gas Cost Spikes**                | Medium   | High       | - Dynamic gas pricing<br>- Profitability checks before rebalancing                                                |
| **Impermanent Loss During Trends** | High     | Medium     | - Trend detection algorithms<br>- Asymmetric range adjustment<br>- Position hedging options                       |
| **Protocol Changes**               | Medium   | Medium     | - Version detection<br>- Graceful degradation<br>- Rapid deployment pipelines for updates                         |
| **Liquidity Fragmentation**        | Medium   | Low        | - Token Flow Tracking<br>- Minimum liquidity thresholds<br>- Alternative venue fallbacks                          |
| **Regulatory Changes**             | High     | Low        | - Compliance monitoring<br>- Jurisdiction flexibility, operational transparency features                          |

## MEV Extraction

- **Private Mempool:** Use Flashbots, CoW Swap, Eden Network, or comparable/similar service as a "private mempool" during execution phase.
- **Record MEV events:** Keep record on identified, and even suspected, MEV events. Use it as a record of "Hot time-windows" for execution.
- **Execution Context:** Record metadata from an executed swap, or SDK operation, should include a "surrounding measure" to capture activity before, during, and after execution, for post-trade analysis.
- **Execution Atomicity:** of intentions should be atomic in nature, all-or-nothing, to add extra protection and enforcement of bounds of allowed slippage.

## Oracle manipulation

- **Multi-oracle utilization:** To build an Stabilized Cross-Referenced Price using the median of the weighted price from sources from at least 2 DEX and 2 CEX.
- **Circuit breaker:** Establish boundaries of volume to trade whenever a "Hot time-window" is occurring or has historically occurred.

## Gas costs

**Dynamic Gas Pricing Model:**
- option 1: Martingale model as benchmark (what was the previous price, is expected to be the next price).
- option 2: Regression model: Non-linear features with a Linear Model with Regularization:

**Avoid  NFT-Minted Position:**
The V3 SDK excels at off-chain position mathematics, providing precise calculations without blockchain interaction: [Uniswap v3 Docs - position-data](https://docs.uniswap.org/sdk/v3/guides/liquidity/position-data)

**Profitability Check:**
- Expected PnL should include the estimated boundaries for gas price + slippage.

## Impermanent Loss Protection

- **Adverse Selection Model:** Volumen-Synchronized Probability of Informed Trading (VPIN), volume-based detection of liquidity risk and trade flow toxicity.
- **Asymmetric range adjustment:** According to the model

## Protocol Changes

- **Track technology risk:** SDK / Protocol uptime and fundamental changes detection.

## Liquidity Fragmentation

- **Token Flow Tracking:** Inflow/Outflow monitoring through identified active wallets interacting with the protocol and particular LPs.
- **Alternative venue fallbacks**: List of alternative venues for position off-loading.




# Technical Design
---

## Uniswap V3 Integration

The Uniswap V3 ecosystem provides comprehensive tooling for off-chain automation:[](https://github.com/Aperture-Finance/uniswap-v3-automation-sdk)

- **Position Management**: Full capacity which includes minting and burning a position, also fee collection.
- **Quote Generation**: Real-time price impact calculation and slippage estimation, which is very convenient for a PoC and the Beta version, further versions should include an enhanced Optimal Execution based on Price Impact as well.
- **Transaction Building**: Automated transaction construction with multi-hop routing optimization.  
- **Analytics Integration**: Historical performance data

**NFT minting can be completely avoided** through direct pool interaction

## Ekubo Protocol Integration

Ekubo's singleton architecture is specifically designed for external integrations: [Ekubo Docs](https://docs.ekubo.org/integration-guides/reference/ekubo-api)

- **Extensions Framework**: Custom logic insertion at pool lifecycle events without core contract modification
- **API-First Design**: REST API endpoints so is very convenient.
- **Real-time Data Streams**: WSS connections with sub-second market data updates.
- **Cross-Chain Abstraction**: Unified interface across Starknet and Ethereum for standard data structure and deployment definition.


## Data-Infrastructure

### Adverse Selection & Price-Range-Volume Chaser Model (mermaid)
```mermaid
sequenceDiagram
autonumber

participant Data-processing as Data-Processing
participant DL as Data Lake
participant Report-API as Protocol-SDK
participant FE as Features Engine
participant DW as Data Warehouse
participant IE as Inference Engine

Note over Report-API : START
Note over Report-API : Event & Push to Topic <br/> [New forecast requested]
Report-API ->> Data-processing :
Note over Data-processing: Pull from Topic <br/> [New forecast requested]

par forecast Data Generation
	Data-processing --> Data-processing: Parse and validate
	Data-processing -->> DL: Insert forecast Data
	DL -->> Data-processing: Successful Insert forecast Data
end

Note over Data-processing: Event & Push to Topic <br/> [forecast Data Ready]

Note over FE: Pull from Topic <br/> [forecast Data Ready]
par Feature Data Generation
	FE -->> DL: Fetch data
	FE -->> FE: Compute Features Data
	FE -->> DW: Insert Features Data
	DW -->> FE: Sucessful Insert Feature Data
end
Note over FE: Event & Push to Topic <br/> [Features Data Ready]

Note over IE: Pull from Topic <br/> [Features Data Ready]
par Inference Data Generation
	IE -->> DW: Fetch Model
	IE -->> DW: Fetch features
	IE --> IE: Model Inference
	IE -->> DW: Insert results
	DW -->> IE: Sucessful Insert Inferece Data
end
Note over IE: Event & Push to Topic <br/> [Inferece Data Ready]

Note over Report-API: Pull from Topic <br/> [Inferece Data Ready]
Report-API ->> DW: Fetch forecast
Note over Report-API : FINISH
```

## Software Design Paradigm

- Paradigm: Event-Driven
- Taxonomy: Many-to-Many

## Events

### Main Functionality

- **forecast_request**:
    - **pf_forecast**: The request to produce the forecast for vpin, in this case by **SDK**.
    - **pm_forecast**: The request to produce the forecast for price-range-chaser, in this case by **SDK**.
- **forecast_data**:
    - **generation_ok**: a stafisfactory data internal generation and/or external retrieval, in this case by **Data-processing**.
    - **insertion_ok**: a satisfactory response gathered by the Process, in this case **Data-processing**.
    - **existence_ok**: existence of the full and correct data by the DB, in this case **Data-Lake**.
- **features_data**:
    - **generation_ok**: a stafisfactory data internal generation, in this case by **Features-Engine**.
    - **insertion_ok**: a satisfactory response gathered by the Process, in this case **Features-Engine**.
    - **existence_ok**: existence of the full and correct data by the DB, in this case **Data-Warehouse**.
- **inference_data**:
    - **generation_ok**: a stafisfactory data internal generation, in this case by **Inference-Engine**.
    - **insertion_ok**: a satisfactory response gathered by the Process, in this case **Inference-Engine**.
    - **existence_ok**: existence of the full and correct data by the DB, in this case **Data-Warehouse**.

### Logs and Monitoring

- **log**:
    - **DEBUG**: Free to use for development purposes.
    - **INFO**: Constant info for validation/branching/structuring data sets.
    - **WARNING**: A non-critical error.
    - **SYSTEM_ERROR**: A systems error.
    - **PROCESS_ERROR**: A modeling process error.
    - **CRITICAL**: A unavoidable and urgent error to be fixed.

## Events schemas

- **Name**: {forecast_request, forecast_data, features_data, inference_data}.
- **Payload**: {pf_forecast, pm_forecast, generation_ok, insertion_ok, existence_ok}

## Routings

- topic for forecast : **topic_forecast_events**
- topic for logs : **topic_forecast_logs**

## Ancillary code

- **topic_publish.py** : A script to manually publish a message into an existing Pub/Sub topic.
- **data_io.py** : A script to read/write data from and into both the Data Lake and the Data Warehouse.
- **synthetic_data.py** : A script to create random content that follows the correct expected schema (both for the DL and DW).
- **data_catalog.json** / **features_catalog.json** / **models_catalgo.json** : Schemas for each case

## Error Handling

### Input Data Stage

- **SDK** fails to push to forecast topic the new_report message:
    - Dead letter Queue : Do a 10x re-try cycle.
    - Alternative: Run a manual script to push to topic.
- **Data-processing** fails to pull from the forecast topic the new_report message:
    - Do a 10x re-try cycle.
    - Alternative: Use a script to manually trigger process::{forecast Data Generation}

- **Data-processing** fails to parse and validate forecast_data:
    - Use a script to manually create data.
- **Data-processing** fails to insert forecast_data fully or partially into Data Lake:
    - Do a 3x re-try cycle
    - Alternative: Use a script to manually insert data into Data Lake.
- **Data-processing** fails to push to forecast topic the forecast_data message:
    - Dead letter queue: Do a 10x re-try cycle.
    - Alternative: Run a manual script to push to topic.

###  Feature Stage

- **FEATURES ENGINE** fails to pull from forecast topic the forecast_data message:
    - Do a 3x re-try cycle
    - Alternative: Use a script to manually read the data from Data Lake.
- **FEATURES ENGINE** fails to generate values:
    - Alternative: Use a script to manually generate data.
- **FEATURES ENGINE** fails to insert features_data:
    - Do a 3x re-try cycle
    - Alternative: Use a script to manually insert data into the Data Warehouse.
- **FEATURES ENGINE** fails to push to forecast topic the features_data message:
    - Dead letter Queue : Do a 10x re-try cycle.
    - Alternative: Run a manual script to push to topic.

### Inference Stage

- **INFERENCE ENGINE** fails to pull from forecast topic the features_data message:
    - Do a 3x re-try cycle
    - Alternative: Use a script to manually read data from Data Warehouse.
- **INFERENCE ENGINE** fails to generate values:
    - Alternative: Use a script to manually generate data.
- **INFERENCE ENGINE** fails to insert inference_data:
    - Alternative: Use a script to manually insert data to Data Warehouse.
- **INFERENCE ENGINE** fails to push to forecast topic the inference_data message:
    - Dead letter Queue : Do a 10x re-try cycle.
    - Alternative: Run a manual script to push to topic.

### Final Data Stage

- **SDK** fails to pull from forecast topic the inference_data message:
    - Dead letter Queue : Do a 10x re-try cycle
    - Alternative: Run a manual script to read from Data Warehouse.

### Storage, retention, access

- Inmediate: Entries in table of Logs and Metrics Database
- Archive: Grouped in zipped file.

### Monitoring and Alerts

Grafana and its alerting systems (Slack/Email/SMS)

- Dashboard for logs aggregation and visualization.
- Alerting system for critical events.



# UI & Design
---

## **Streamlit** (Recommended for Alpha and Beta version)

**Simplicity**: Streamlit excels for rapid prototyping with minimal code[](https://docs.kanaries.net/topics/Streamlit/streamlit-vs-dash)  
**Learning Curve**: Nearly zero - uses pure Python syntax[](https://blog.streamlit.io/how-to-build-a-real-time-live-dashboard-with-streamlit/)  
**Setup Time**: Dashboard in <50 lines of code

**Strategy Overview Panel**:

- Current positions across both protocols with P&L visualization    
- Real-time fee earnings and Impermanent loss tracking
- Performance metrics: APR, Sharpe ratio, max drawdown

**Risk Monitoring Section**:

- MEV attack detection alerts
- Oracle deviation warnings
- Gas cost impact analysis

**Position Management Interface**:

- Manual range adjustment with price impact simulation
- Rebalancing trigger configuration (price thresholds, time intervals)
- Emergency position closure controls

**Configuration Visualization Settings**:

- Slippage tolerance adjustment (0.1% - 2% range)
- MEV protection preferences (speed vs. protection trade-off)
- Capital allocation limits and position sizing rules

# Testing & Validation
---

### Strategy Correctness Validation

**Economic Model Validation**: Mathematical verification of fee calculation logic, impermanent loss computation. This includes edge case testing for extreme market conditions.

**Protocol Integration Testing**: Exhaustive testing of smart contract interactions with both Ekubo and Uniswap v3 protocols. Tests cover normal operations, error conditions.

