## Introduction

This study aims to analyze user interactions with various lending protocols, providing insights into their borrowing and lending patterns. The aim of the study is achieved by loading loan data from different sources and analysing user behaviour across the lending protocols. 

## Objective
The primary objective of this study is to analyze user behavior across multiple lending protocols. We will achieve this by examining the data on loans, including user information, protocol details, collateral, and debt amounts. Our analysis will focus on answering key questions related to user engagement with different protocols, such as the number of users providing liquidity or borrowing on one or multiple protocols and the distribution of staked/borrowed capital across these protocols.

## Methodology

To conduct this analysis, we applied the following structured approach:
1. Data Loading: Create a data data load function that allows for easy switching between google storage and sql database, ensuring flexibility in data sourcing.
2. Data Visualization: Visualizing user behaviour across the lending protocols allow us to answer:
   - The number of users providing liquidity or borrowing just one protocol versus multiple protocols.
   - The distribution of borrowed capital across different lending protocols.
3. Venn Diagram Creation: Provides an overlap of user participation across different lending protocols, providing clear visual representation of multi-protocol engagement.


## Expected Outcomes

This study will not only shed light on current user engagement patterns but also pave the way for future research and development in decentralized lending and borrowing platforms.

### Importing Libraries

In [1]:
# importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib_venn import venn3
from sqlalchemy import create_engine
import gcsfs

### Loading the Data

#### From Postgress

In [2]:
# from sqlalchemy import create_engine

# # List of protocols (table names in the PostgreSQL database)
# protocols = ["zklend", "nostra_alpha", "nostra_mainnet", "hashstack_v0", "hashstack_v1"]

# # Database connection string
# db_connection_string = 'postgresql://username:password@hostname:port/database'

# # Load data from PostgreSQL
# postgres_df_list = []
# engine = create_engine(db_connection_string)

# for protocol in protocols:
#     df = pd.read_sql_table(protocol, con=engine)
#     df['Protocol'] = protocol
#     postgres_df_list.append(df)

# # Combine all PostgreSQL DataFrames into one
# df_loans_postgres = pd.concat(postgres_df_list, ignore_index=True)a

#### From GCS

In [3]:
# Dictionary of Parquet URLs
parquet_urls = {
    "zklend": "https://storage.googleapis.com/derisk-persistent-state/zklend_data/loans.parquet",
    "nostra_alpha": "https://storage.googleapis.com/derisk-persistent-state/nostra_alpha_data/loans.parquet",
    "nostra_mainnet": "https://storage.googleapis.com/derisk-persistent-state/nostra_mainnet_data/loans.parquet",
    "hashstack_v0": "https://storage.googleapis.com/derisk-persistent-state/hashstack_v0_data/loans.parquet",
    "hashstack_v1": "https://storage.googleapis.com/derisk-persistent-state/hashstack_v1_data/loans.parquet",
}

# Load data from GCS
gcs_df_list = []
for protocol, url in parquet_urls.items():
    fs = gcsfs.GCSFileSystem()
    gcs_path = url.replace('https://storage.googleapis.com/', '')
    with fs.open(gcs_path, 'rb') as f:
        df = pd.read_parquet(f, engine='pyarrow')
        df['Protocol'] = protocol
        gcs_df_list.append(df)

# Combine all GCS DataFrames into one
df_loans = pd.concat(gcs_df_list, ignore_index=True)

In [None]:
df_loans.head()

### Determine User Activity
#### Users Providing Liquidity and their Protocols

In [None]:
# the distribution of protocols among users
df_loans['Protocol'].value_counts()

### Subset the DataFrame for users who provide liquidity

In [10]:
from collections import defaultdict, Counter

liquidity_data = df_loans[df_loans['Collateral (USD)'] > 0]

# Initialize a dictionary to store users and their associated protocols for liquidity
user_protocols_liquidity = defaultdict(set)

# Populate the dictionary
for _, row in liquidity_data.iterrows():
    user = row['User']
    protocol = row['Protocol']
    user_protocols_liquidity[user].add(protocol)

# Count the number of protocols each user lends on
user_protocol_counts_liquidity = Counter([len(protocols) for protocols in user_protocols_liquidity.values()])

# Convert the counter to a DataFrame for better readability
protocol_count_df_liquidity = pd.DataFrame.from_dict(user_protocol_counts_liquidity, orient='index').reset_index()
protocol_count_df_liquidity.columns = ['Number of Protocols', 'Number of Users']

# Sort the DataFrame by the number of protocols
protocol_count_df_liquidity = protocol_count_df_liquidity.sort_values(by='Number of Protocols')

##### Users Providing Liquidity Across the Top 3 Protocols

In [11]:
## Helper funcitons:
# Function to get unique users per protocol
def get_unique_users(df, value_column):
    protocol_users = defaultdict(set)
    for protocol in df['Protocol'].unique():
        users = set(df[df['Protocol'] == protocol]['User'])
        protocol_users[protocol].update(users)
    return protocol_users
    
# Helper function to plot Venn diagram
def plot_venn_diagram(user_sets, title):
    plt.figure(figsize=(10, 8))
    venn3(subsets=(user_sets[0], user_sets[1], user_sets[2]), 
          set_labels=('zklend', 'nostra_mainnet', 'nostra_alpha'))
    plt.title(title)
    plt.show()

In [None]:
# Get unique users providing liquidity
liquidity_df = df_loans[df_loans['Collateral (USD)'] > 0]
liquidity_protocol_users = get_unique_users(liquidity_df, 'Collateral (USD)')


# Prepare sets for Venn diagrams (top 3 protocols by user count)
top_protocols = ['zklend', 'nostra_mainnet', 'nostra_alpha']
liquidity_user_sets = [liquidity_protocol_users[protocol] for protocol in top_protocols]


# Plot Venn diagrams
plot_venn_diagram(liquidity_user_sets, 'Users Providing Liquidity Across Top 3 Protocols')
# plot_venn_diagram(debt_user_sets, 'Users Borrowing Across Top 3 Protocols')

#### Users Borrowing Behavior and their Protocols

In [15]:
# Subset the DataFrame for users who have debt
debt_data = df_loans[df_loans['Debt (USD)'] > 0]

# Initialize a dictionary to store users and their associated protocols for debt
user_protocols_debt = defaultdict(set)

# Populate the dictionary
for _, row in debt_data.iterrows():
    user = row['User']
    protocol = row['Protocol']
    user_protocols_debt[user].add(protocol)

# Count the number of protocols each user borrows on
user_protocol_counts_debt = Counter([len(protocols) for protocols in user_protocols_debt.values()])

# Convert the counter to a DataFrame for better readability
protocol_count_df_debt = pd.DataFrame.from_dict(user_protocol_counts_debt, orient='index').reset_index()
protocol_count_df_debt.columns = ['Number of Protocols', 'Number of Users']

# Sort the DataFrame by the number of protocols
protocol_count_df_debt = protocol_count_df_debt.sort_values(by='Number of Protocols')

# Print the result for debt
# print("Users borrowing:")
# print(protocol_count_df_debt)

##### Users Borrowing Across the Top 3 Protocols

In [None]:
# Get unique users having debt
debt_df = df_loans[df_loans['Debt (USD)'] > 0]
debt_protocol_users = get_unique_users(debt_df, 'Debt (USD)')


# Prepare sets for Venn diagrams (top 3 protocols by user count)
top_protocols = ['zklend', 'nostra_mainnet', 'nostra_alpha']
debt_user_sets = [debt_protocol_users[protocol] for protocol in top_protocols]

# Plot Venn diagrams
plot_venn_diagram(debt_user_sets, 'Users Borrowing Across Top 3 Protocols')

#### Distribution of stacked/borrowed capital across Protocols

In [None]:
import seaborn as sns

# Function to calculate total capital per token across protocols
def calculate_capital(df, column_name):
    capital_per_protocol = df.groupby('Protocol')[column_name].sum()
    return capital_per_protocol

# Function to plot bar chart for token capital across protocols
def plot_capital(capital, title):
    plt.figure(figsize=(10, 6))
    sns.barplot(x=capital.index, y=capital.values)
    plt.xlabel('Protocol')
    plt.ylabel('Total Capital (USD)')
    plt.title(title)
    plt.xticks(rotation=45)
    plt.show()

# Calculate total staked capital per token
staked_capital = calculate_capital(liquidity_df, 'Collateral (USD)')
plot_capital(staked_capital, 'Total Staked Capital per Token Across Protocols')




#### Total capital borrowed per token

In [None]:
# Calculate total borrowed capital per token
borrowed_capital = calculate_capital(debt_df, 'Debt (USD)')
plot_capital(borrowed_capital, 'Total Borrowed Capital Across Protocols')

#### To Analyze the amounts stacked on a per token basis across the protocols

In [19]:
import re
# List of tokens
tokens = ["ETH", "wBTC", "USDC", "DAI", "USDT", "wstETH", "LORDS", "STRK", "UNO", "ZEND"]

def parse_token_amounts(column, protocol_column, tokens):
    token_amounts = defaultdict(lambda: defaultdict(float))
    for entry, protocol in zip(column, protocol_column):
        for token in tokens:
            match = re.search(f'{token}: ([0-9.]+)', entry)
            if match:
                token_amounts[protocol][token] += float(match.group(1))
    return token_amounts

# Extract token amounts for collateral and debt
collateral_amounts = parse_token_amounts(df_loans['Collateral'], df_loans['Protocol'], tokens)
debt_amounts = parse_token_amounts(df_loans['Debt'], df_loans['Protocol'], tokens)

In [20]:
# agregating the data
# Convert the aggregated data to DataFrame for better readability
collateral_list = [(protocol, token, amount) for protocol, tokens in collateral_amounts.items() for token, amount in tokens.items()]
collateral_df = pd.DataFrame(collateral_list, columns=['Protocol', 'Token', 'Total Collateral (USD)'])

debt_list = [(protocol, token, amount) for protocol, tokens in debt_amounts.items() for token, amount in tokens.items()]
debt_df = pd.DataFrame(debt_list, columns=['Protocol', 'Token', 'Total Debt (USD)'])

In [None]:
collateral_df.groupby(['Protocol','Token'])['Total Collateral (USD)'].sum()

#### Data Visualization

In [None]:
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Plotting collateral amounts
plt.figure(figsize=(12, 8))
sns.barplot(data=collateral_df, x='Protocol', y='Total Collateral (USD)', hue='Token')
plt.xlabel('Tokens')
plt.ylabel('Total Collateral (USD)')
plt.title('Total Collateral per Token and Protocol')
plt.xticks(rotation=45)
plt.legend(title='Protocol')
plt.show()

# Plotting debt amounts
plt.figure(figsize=(12, 8))
sns.barplot(data=debt_df, x='Protocol', y='Total Debt (USD)', hue='Token')
plt.xlabel('Tokens')
plt.ylabel('Total Debt (USD)')
plt.title('Total Debt per Token and Protocol')
plt.xticks(rotation=45)
plt.legend(title='Protocol')
plt.show()
