## LP Finance Docs Training

### Convert Document To json and To df

In [1]:
uri = [
    "https://docs.lp.finance/",
    "https://docs.lp.finance/lp-finance-protocol/quick-user-guide",
    "https://docs.lp.finance/lp-finance-protocol/view-protocol-data",
    "https://docs.lp.finance/lp-finance-protocol/user-faq",
    "https://docs.lp.finance/protocol/minting-borrowing-zsol",
    "https://docs.lp.finance/protocol/zsol-sol-liquidity-providers",
    "https://docs.lp.finance/protocol/peg-stability-module",
    "https://docs.lp.finance/protocol/liquidation",
    "https://docs.lp.finance/lpfi-staking/lpfi-staking",
    "https://docs.lp.finance/governance/governance",
    "https://docs.lp.finance/addresses/programs",
    "https://docs.lp.finance/addresses/tokens",
    "https://docs.lp.finance/links/links"
]

In [2]:
data = {
    "title": [
        "LP Finance Protocol",
        "LP Finance Protocol",
        "LP Finance Protocol",
        "LP Finance Protocol",
        "Architecture",
        "Architecture",
        "Architecture",
        "Architecture",
        "LPFi Staking",
        "Governance",
        "Address",
        "Address",
        "Links"
    ],
    "heading": [
        "LP Finance Protocol",
        "Quick User Guide",
        "View Protocol Data",
        "User FAQ",
        "Minting/Borrowing zSOL",
        "zSOL-SOL Liquidity Providers",
        "Peg-stability Module",
        "Liquidation",
        "LPFi Staking",
        "Governance",
        "Programs",
        "Tokens",
        "Links"
    ],
    "content": [
        """
        In this document, the background behind the protocol design and economic model is explained.\n
        LP Finance is a decentralized synthetic asset issuance protocol on Solana. LP Finance designed Protocol Debt Vault (PDV) and liquidity provider incentives advancing stability fees to dramatically enhance scalability of synthetic assets.
        At a high level, the protocol allows users to leverage zSOL to create strategies such as leveraged liquid staking and short selling. Liquidity providers can earn interest rates and swap fees at the same time.
        On LP Finance, users can do the following: Leverage SOL staking yields, Short-sell, Earn interest by providing liquidity
        """,
        """
        ## Deposit Collateral: Select a token and amount to deposit. Confirm the minimum amount to deposit and deposit fee. Click "Deposit" and approve transaction.
        ## Borrow zSOL: Select amount of zSOL to borrow or click "Max" to borrow maximum amount. There is no borrow fee. Click "Borrow" and approve transaction.
        ## View Account Data: After performing the prior steps, users can view the account's data on "Your Account" section. Here are definitions of the terms
        - Borrow Limit: Maximum value (USD) of zSOL that can be borrowed.
        - Liquidation Threshold: A threshold of collateral value where the account would be liquidated. If collateral value drops below "Liquidation Threshold" liquidation might occur.
        - LTV: Loan to Value ratio, which is calculated as follows.
        `LTV = borrowedValueUsd / collateralValueUsd * 100`
        ## Repay zSOL: Select amount of zSOL to repay or click "Max" to fully repay the debt. Click "Repay" and confirm transaction.
        ## Withdraw Collateral: Select a token and amount to withdraw. If there is zSOL debt remaining, it is impossible to withdraw full amount. Click "Withdraw" and confirm transaction.
        """,
        """
        Following is the default page on LP Finance, https://app.lp.finance. Users can view protocol data. Following are the definitions.
        - Stability Fee Epoch: Period left until stability fee is applied
        - TVL: Total Value Locked
        - Global LTV: Global LTV (Protocol net LTV)
        - Stability Fee: Current borrow interest (APR)
        - Max mSOL APY: Maximum APY users can earn when fully leveraging mSOL position
        ## Collateral Composition: Click the pie chart on "Protocol Overview" to view current collateral composition. 
        ## Historical Data: Click "Data" on "Protocol Overview" to view historical data.
        """,
        """
        Question: Why is "Your Account" page not updating after transaction?
        Answer: If the account info does not update, plz try refreshing the app and reconnecting wallet.
        
        Question: When is the borrow interest applied?
        Answer: On the main page, "Stability Fee Epoch" is the time left until next borrow interest is applied. Borrow interest is applied every 24h.
        """,
        """
        zSOL is a over collateralized synthetic asset, which can be minted after depositing collateral. Even if the UX is identical to lending protocols, borrowing/repaying is performed as minting/burning zSOL.
        The collateral accepted on LP Finance are as follows.
        - SOL
        - mSOL
        - USDC
        It is only possible to deposit one collateral per account. To deposit other collateral, account should fully withdraw collateral.
        Once an account borrows zSOL, stability fee would be applied which is equivalent to borrow interest on lending protocols. Stability fee is applied daily and compounding which is calculated as follows.
        `daily_stability_fee = stability_fee_apr / 365`
        Stability fee is dynamic which is decided by LP Finance DAO through governance. Unlike lending protocols, there are no "zSOL suppliers". Therefore stability fee is paid to zSOL-SOL liquidity providers to allow liquidity to linearly scale in respect of zSOL supply.
        ## Mint Halt: zSOL mint is halted if zSOL's market price is 5% below SOL price.
        """,
        """
        zSOL-SOL liquidty providers can deposit Saber zSOL-SOL LP tokens to earn stability fee for zSOL borrowers. This allows zSOL market liquidity to scale linearly in respect to zSOL supply, increasing the capacity of zSOL to be used for leveraging purposes.
        Same as zSOL stability fee epoch, the zSOL rewards are distributed every 24h and can claim by interacting with the program.
        """,
        """
        Peg-stability module (PSM) is a common function in most stablecoin issuance protocols. PSM allows 1:1 swap between a stablecoin issued by a protocol and quote token (ex. USDC).
        For zSOL PSM should allow zSOL <-> SOL at 1:1 ratio, however LP Finance uses mSOL, which is a liquid staking derivative of SOL.
        ## Pricing of mSOL & zSOL: mSOL and zSOL's market price is not correctly accounted to SOL price. Therefore the program treats the price as follows.
        - mSOL: True price (value when delayed unstake)
        - zSOL: SOL price
        This prevents oracle exploits where zSOL or mSOL price depegs, causing higher redemption of mSOL or higher issuance of zSOL, causing zSOL to be undercollateralized.
        ## Risk of using mSOL
        mSOL is not always pegged to staked SOL's value. As mSOL to be fully claimed for SOL, a full epoch has to pass, which means in a volatile market condition, people would rather sell mSOL on market rather than wait to claim.
        Here is an example of mSOL depegging during FTX collapse event. (mSOL-SOL Saber)
        In case of mSOL depeg event, zSOL would be depegged at an equivalent rate. This is not a bug but a "design". Short-sellers can take benefit by shorting zSOL instead of SOL as zSOL is likely to fall more in price where SOL price crashes.
        ## Mechanism: Unlike other PSM, LP Finance implement "Protocol Debt Vault" which is an account managed by LP Finance DAO. It acts same as other accounts, but have solely mSOL as collateral, maximum LTV of 100%, and cannot be liquidated.
        PSM interacts with the Protocol Debt Vault in the backend as follows.
        - Swap mSOL-->zSOL: Deposit mSOL as collateral & borrow zSOL
        - Swap zSOL-->mSOL: Repay zSOL & withdraw mSOL collateral
        There is a swap fee for claiming mSOL (zSOL-->mSOL), which is a revenue source for LP Finance DAO. Additionally, when PSM activities are static, Protocol Debt Vault earns staking yields which is more efficient than having SOL.
        """,
        """
        On LP Finance, there are two types of liquidation.
        - T1: Accounts with SOL or mSOL as collateral
        - T2: Accounts with USDC as collateral
        ## T1 Liquidation: T1 liquidation is likely to happen when stability fee exceeds the value accural of SOL or mSOL. When using SOL as collateral, account is in higher risk of liquidation when not managed continuously. However, mSOL has a low risk of liquidation due to staking yields.
        Additionally, as mSOL price is calculated using "True price" liquidation does not occur even if mSOL depegs.
        Following are steps of T1 liquidation.
        1. Liquidator triggers liquidation of target account
        2. IF SOL, mint mSOL via calling Marinade Finance CPI
        3. Target account's mSOL collateral transferred to Protocol Debt Vault
        4. Target account's zSOL debt transferred to Protocol Debt Vault
        5. Target account initialized
        There is near-to-zero risk of bad debt for T1 liquidation. In case liquidation occurs, Protocol Debt Vault earns liquidation fees which is normally (100 - liquidation_threshold)%.
        ## T2 Liquidation: T2 liquidation occurs when SOL price rises, which triggers LTV to increase. Following are steps of T2 liquidation.
        1. Liquidator triggers liquidation of target account
        2. Liquidator transfers mSOL to Protocol Debt Vault
        3. Liquidator redeems USDC from target account
        4. Target account's zSOL debt transferred to Protocol Debt Vault
        The amount of USDC to be claimed with mSOL is calculated as follows.
        ```
        repaid_zsol_amt = msol_transfer_amt * (msol_price / sol_price)
        repaid_ratio = repaid_zsol_amt / borrowed_zsol_amt
        usdc_claim_amt = usdc_collateral_amt * repaid_ratio
        ```
        Let's assume liquidating the following account
        - Collateral: 1000 USDC (1000 USD)
        - Debt: 32.5 zSOL (650 USD)
        If liquidator transfers 16.25 zSOL (325 USD, assume 1 mSOL= 1 SOL),
        ```
        usdc_claim_amt = 1000 * (16.25 * (20/20)) / 32.5
        >> usdc_claim_amt = 500
        In this case, liquidator was able to earn 175 USDC. The script is currently closed source, but users can interact with the UI to perform liquidation on https://app.lp.finance/liquidate.
        ```
        """,
        """
        Revenue genereated by protocol is distributed to LPFi staking pool. Users can stake LPFi to earn SOL. 
        In order to concentrate staking rewards to long-term holders, unstaking fee exists, which can be checked by clicking "Unstaking Fee" on "Staking Overview".
        Staking is possible on the website https://staking.lp.finance
        """,
        """
        LP Finance is governed by xLPFi token. Details can be found in the links below.
        - xLPFi Governance Document: https://x.lp.finance
        - Mint xLPFi: https://staking.lp.finance/xlpfi-minting
        - Realms: https://app.realms.today/dao/xLPFi
        """,
        """
        zSOL Issuance Program Address: 8Hka1oR6uoNLqjpYXXDKpF6NwiYidD6L3ncQdfu11aWw
        LP Incentives Program Address: FzTfrps1jZstduuSJ3ePsMTTR4rYX1hp73AcJ22ip4Gu
        LPFi Staking Program Address: HDXUGdC2hJNmwDY8DtfuWumnPyd4u2BvHhrvy5or8BZP
        xLPFi Minting Program Address: J8ttQ7yrZ3s1gkgjkXNCLRzhS6SyQnwLTL2FwxFgEBeN
        """,
        """
        zSOL Address: So111DzVTTNpDq81EbeyKZMi4SkhU9yekqB8xmMpqzA
        LPFi Address: LPFiNAybMobY5oHfYVdy9jPozFBGKpPiEGoobK2xCe3
        xLPFi Address: xLPFiPmWve5rUnAYcHZSZWjwgyqEdcV6dDzoBJRtNw9
        """,
        """
        LP Finance App Link: https://app.lp.finance
        LPFi Staking Link: https://staking.lp.finance
        Realms Governance Link: https://app.realms.today/dao/xLPFi
        Governance Document Link: https://x.lp.finance
        Twitter Link: https://twitter.com/LPFinance_
        Github Link: https://github.com/LP-Finance-Inc
        """
    ]
}

In [3]:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict(data)

### Get Embedding

In [4]:
import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key=os.environ.get("OPENAI_KEY")

COMPLETIONS_MODEL = "text-davinci-003"
EMBEDDING_MODEL = "text-embedding-ada-002"

In [5]:
def get_embedding(text: str, model: str=EMBEDDING_MODEL) -> list[float]:
    result = openai.Embedding.create(
      model=model,
      input=text
    )
    return result["data"][0]["embedding"]

def compute_doc_embeddings(df: pd.DataFrame) -> dict[tuple[str, str], list[float]]:
    """
    Create an embedding for each row in the dataframe using the OpenAI Embeddings API.
    
    Return a dictionary that maps between each embedding vector and the index of the row that it corresponds to.
    """
    return {
        idx: get_embedding(r.content) for idx, r in df.iterrows()
    }

In [6]:
computed_embeddings = compute_doc_embeddings(df)
computed_embeddings

{0: [0.0022572746966034174,
  0.02176707237958908,
  0.0026687930803745985,
  -0.023975729942321777,
  -0.002543774899095297,
  0.028309695422649384,
  -0.009209680370986462,
  -0.00971669889986515,
  0.0018023469019681215,
  -0.0556192472577095,
  0.0235728919506073,
  0.004375639837235212,
  -0.021517036482691765,
  0.01911390759050846,
  -0.005903641227632761,
  0.0067544602788984776,
  0.005740422755479813,
  -0.041617199778556824,
  0.0073552425019443035,
  -0.007153823971748352,
  0.002364929299801588,
  -0.005445240996778011,
  -0.01771092414855957,
  -0.0038061123341321945,
  0.00486529478803277,
  0.011300263926386833,
  0.016113467514514923,
  -0.020489107817411423,
  0.0031358753331005573,
  0.014613248407840729,
  -0.010925209149718285,
  -0.014724375680088997,
  -0.016321832314133644,
  -0.008779061958193779,
  0.0027469296474009752,
  -0.038061123341321945,
  -0.015016085468232632,
  -0.02155870944261551,
  0.01022371742874384,
  -0.0002851979515980929,
  0.01500219479203

In [7]:
embeddings_array = np.array([computed_embeddings[i] for i in range(len(computed_embeddings.keys()))])

embeddings_array_T = embeddings_array.transpose().tolist()
len(embeddings_array_T)

1536

In [10]:
for i in range(len(embeddings_array_T)):
    df[i] = embeddings_array_T[i]
    
df

Unnamed: 0,title,heading,content,0,1,2,3,4,5,6,...,1526,1527,1528,1529,1530,1531,1532,1533,1534,1535
0,LP Finance Protocol,LP Finance Protocol,"\n In this document, the background beh...",0.002257,0.021767,0.002669,-0.023976,-0.002544,0.02831,-0.00921,...,-0.010189,-0.007793,0.029532,-0.017808,-0.02956,0.015391,0.020725,0.008647,0.01089,-0.032505
1,LP Finance Protocol,Quick User Guide,\n ## Deposit Collateral: Select a toke...,-0.005194,0.014542,0.013266,-0.018567,-0.042635,0.016202,-0.025179,...,-0.010416,-0.002054,0.043347,-0.016586,-0.041837,-0.004029,0.022243,0.011264,0.015888,-0.035823
2,LP Finance Protocol,View Protocol Data,\n Following is the default page on LP ...,0.007837,0.029183,0.014268,-0.044984,-0.031123,0.00131,-0.011,...,-0.025064,-0.021634,0.021044,-0.004375,-0.02737,0.0079,0.012448,0.0167,-0.005918,-0.042453
3,LP Finance Protocol,User FAQ,"\n Question: Why is ""Your Account"" page...",-0.008645,0.014053,0.022719,-0.021894,-0.032189,-0.002704,-0.015943,...,-0.009738,-0.010686,0.010995,0.006528,-0.006212,0.000187,0.015201,-0.000595,0.008954,-0.032244
4,Architecture,Minting/Borrowing zSOL,\n zSOL is a over collateralized synthe...,0.002925,0.020394,0.012717,-0.011294,-0.025754,0.029778,-0.035647,...,0.01141,-0.012369,0.016923,-0.010749,-0.026118,0.019871,0.007742,0.013843,0.016414,-0.044449
5,Architecture,zSOL-SOL Liquidity Providers,\n zSOL-SOL liquidty providers can depo...,0.018179,0.011689,0.008429,-0.018207,-0.006914,0.009826,-0.026296,...,0.002985,-0.01444,0.039444,-0.000311,-0.006942,0.01811,0.00435,-0.006803,0.012036,-0.033412
6,Architecture,Peg-stability Module,\n Peg-stability module (PSM) is a comm...,0.011689,0.013543,-0.008909,-0.010381,-0.033882,0.029887,-0.029025,...,0.019297,-0.034629,0.010173,0.001764,-0.023809,0.017099,-0.003537,0.016998,0.008097,-0.041641
7,Architecture,Liquidation,"\n On LP Finance, there are two types o...",0.010173,0.00777,0.00836,-0.009353,-0.004655,0.00531,-0.042706,...,0.024461,-0.017454,0.021814,-0.007141,-0.016346,-0.010446,0.014375,0.002975,0.009993,-0.028087
8,LPFi Staking,LPFi Staking,\n Revenue genereated by protocol is di...,0.003489,0.003156,-0.006718,-0.049445,-0.026995,0.0027,-0.025408,...,0.003154,-0.005603,0.030974,-0.006792,-0.017127,0.014143,0.017759,-0.001879,0.019641,-0.028097
9,Governance,Governance,\n LP Finance is governed by xLPFi toke...,0.023129,0.022068,0.003066,-0.031626,-0.024202,0.007101,-0.038579,...,-0.008766,0.003514,0.026807,-0.012276,-0.01643,0.013571,0.019652,-0.005725,0.015585,-0.025706


In [11]:
df.to_csv("df.csv")

## Find most similar document embedings

In [132]:
def load_embeddings(fname: str) -> dict[tuple[str, str], list[float]]:
    """
    Read the document embeddings and their keys from a CSV.
    
    fname is the path to a CSV with exactly these named columns: 
        "title", "heading", "0", "1", ... up to the length of the embedding vectors.
    """
    
    df = pd.read_csv(fname, header=0)
    max_dim = max([int(c) for c in df.columns if c != "title" and c != "heading"])
    return {
           (r.title, r.heading): [r[str(i)] for i in range(max_dim + 1)] for _, r in df.iterrows()
    }

In [133]:
document_embeddings = load_embeddings("data.csv")

In [134]:
# An example embedding:
example_entry = list(document_embeddings.items())[0]
print(f"{example_entry[0]} : {example_entry[1][:5]}... ({len(example_entry[1])} entries)")

('LP Finance Protocol', 'LP Finance Protocol') : [0.002262801, 0.021795081, 0.002703562, -0.02397459, -0.002594239]... (1536 entries)


In [135]:
def vector_similarity(x: list[float], y: list[float]) -> float:
    """
    Returns the similarity between two vectors.
    
    Because OpenAI Embeddings are normalized to length 1, the cosine similarity is the same as the dot product.
    """
    return np.dot(np.array(x), np.array(y))

def order_document_sections_by_query_similarity(query: str, contexts: dict[(str, str), np.array]) -> list[(float, (str, str))]:
    """
    Find the query embedding for the supplied query, and compare it against all of the pre-calculated document embeddings
    to find the most relevant sections. 
    
    Return the list of document sections, sorted by relevance in descending order.
    """
    query_embedding = get_embedding(query)
    
    document_similarities = sorted([
        (vector_similarity(query_embedding, doc_embedding), doc_index) for doc_index, doc_embedding in contexts.items()
    ], reverse=True)
    
    return document_similarities

In [136]:
order_document_sections_by_query_similarity("What is the condition for getting liquidated?", document_embeddings)[:5]

[(0.7957855478665574, ('Architecture', 'Liquidation')),
 (0.749251445285552, ('LP Finance Protocol', 'Quick User Guide')),
 (0.7399240594935899, ('LP Finance Protocol', 'LP Finance Protocol')),
 (0.7366096712421534, ('LPFi Staking', 'LPFi Staking')),
 (0.7353469899147111, ('Architecture', 'zSOL-SOL Liquidity Providers'))]

## Add most relevant section to prompt

In [137]:
import tiktoken
MAX_SECTION_LEN = 500
SEPARATOR = "\n* "
ENCODING = "gpt2"  # encoding for text-davinci-003

encoding = tiktoken.get_encoding(ENCODING)
separator_len = len(encoding.encode(SEPARATOR))

f"Context separator contains {separator_len} tokens"

'Context separator contains 3 tokens'

In [159]:
def construct_prompt(question: str, context_embeddings: dict, df: pd.DataFrame) -> str:
    """
    Fetch relevant 
    """
    most_relevant_document_sections = order_document_sections_by_query_similarity(question, context_embeddings)
    
    chosen_sections = []
    chosen_sections_indexes = []
    
    # change most_relevant_document_sections[:n] to select n docs
    for _, section_index in most_relevant_document_sections[:1]:

        # Add contexts until we run out of space.        
#         document_section = df.loc[section_index]
        idx = df.heading.tolist().index(section_index[1])
        content_section = df.content[idx]
        eligible_uri = uri[idx]
            
        chosen_sections.append(SEPARATOR + content_section.replace("\n", " "))
        chosen_sections_indexes.append(str(section_index))
        
            
    # Useful diagnostic information
    print(f"Selected {len(chosen_sections)} document sections:")
    print("\n".join(chosen_sections_indexes))
    
    header = """Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I do not have a clear response for it."\n\nContext:\n"""
    
    return header + "".join(chosen_sections) + "\n\n Q: " + question + "\n A:", eligible_uri

In [160]:
prompt = construct_prompt(
    "How can I borrow zSOL?",
    document_embeddings,
    df
)
print("===\n", prompt)

Selected 1 document sections:
('Architecture', 'Minting/Borrowing zSOL')
===
 ('Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I do not have a clear response for it."\n\nContext:\n\n*          zSOL is a over collateralized synthetic asset, which can be minted after depositing collateral. Even if the UX is identical to lending protocols, borrowing/repaying is performed as minting/burning zSOL.         The collateral accepted on LP Finance are as follows.         - SOL         - mSOL         - USDC         It is only possible to deposit one collateral per account. To deposit other collateral, account should fully withdraw collateral.         Once an account borrows zSOL, stability fee would be applied which is equivalent to borrow interest on lending protocols. Stability fee is applied daily and compounding which is calculated as follows.         `daily_stability_fee = stability_fee_apr / 365`      

## Answer question based on context

In [186]:
COMPLETIONS_API_PARAMS = {
    # We use temperature of 0.0 because it gives the most predictable, factual answer.
    "temperature": 0.0,
    "max_tokens": 150,
    "model": COMPLETIONS_MODEL,
}

In [187]:
def answer_query_with_context(
    query: str,
    df: pd.DataFrame,
    document_embeddings: dict[(str, str), np.array],
    show_prompt: bool = False
) -> str:
    prompt, eligible_uri = construct_prompt(
        query,
        document_embeddings,
        df
    )
    
    if show_prompt:
        print(prompt)

    response = openai.Completion.create(
                prompt=prompt,
                **COMPLETIONS_API_PARAMS
            )

    return response["choices"][0]["text"].strip(" \n")  + " Reference: " + eligible_uri

In [196]:
answer_query_with_context("How do I withdraw collateral?", df, document_embeddings)

Selected 1 document sections:
('LP Finance Protocol', 'Quick User Guide')


'Select a token and amount to withdraw. If there is zSOL debt remaining, it is impossible to withdraw full amount. Click "Withdraw" and confirm transaction. Reference: https://docs.lp.finance/lp-finance-protocol/quick-user-guide'