# Get Repays

In this notebook, we compile the data for borrow transactions from the API.

In [1]:
#import packages
import requests
import json
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime
import numpy as np
import matplotlib.dates as md
import math
import time

In C:\Users\CCCam\Miniconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle: 
The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In C:\Users\CCCam\Miniconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle: 
The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In C:\Users\CCCam\Miniconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.
In C:\Users\CCCam\Miniconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle: 
The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In C:\Users\CCCam\Miniconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_tes

## Fetch Repay Transaction Data

First, if this data was previously compiled, we load the existing data.

In [2]:
start_time = 0
try:
    old_data = pd.read_csv('repays.csv')
    exists = True
    start_time = old_data['timestamp'].max()
    old_data.info()
except:
    old_data = pd.DataFrame()
    exists = False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86575 entries, 0 to 86574
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   amount           86575 non-null  float64
 1   onBehalfOf       86575 non-null  object 
 2   pool             86575 non-null  object 
 3   reserve          86575 non-null  object 
 4   timestamp        86575 non-null  int64  
 5   user             86575 non-null  object 
 6   type             86575 non-null  object 
 7   reservePriceETH  86575 non-null  float64
 8   reservePriceUSD  86575 non-null  float64
 9   amountETH        86575 non-null  float64
 10  amountUSD        86575 non-null  float64
dtypes: float64(5), int64(1), object(5)
memory usage: 7.3+ MB


Here, we write a query to fetch transaction information from the API.

In [3]:
lastId='""'
repay_data = []
#loop until no more data left
while(1):
    try:
        #set query
        query="""
        {
            repays (first: 1000 orderBy: id where:{id_gt:"""+lastId+""",timestamp_gt:"""+str(start_time)+"""}) {
            id,
            user{
                id
            }
            onBehalfOf{
                id
            }
            pool{
                id
            }
            amount,
            reserve {
              id,
              symbol
            },
            timestamp
            }
        }
        """
        #make request
        url = 'https://api.thegraph.com/subgraphs/name/aave/protocol-v2'
        request = requests.post(url,json={'query':query})
        #store data
        repay_data.extend(request.json()['data']['repays'])
        lastId = "\""+request.json()['data']['repays'][-1]['id']+"\""
    except:
        #exit when no more data left to get
        break

#create borrows data frame
df_repays = pd.DataFrame(repay_data)
df_repays['type']='repay'
df_repays.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3271 entries, 0 to 3270
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          3271 non-null   object
 1   user        3271 non-null   object
 2   onBehalfOf  3271 non-null   object
 3   pool        3271 non-null   object
 4   amount      3271 non-null   object
 5   reserve     3271 non-null   object
 6   timestamp   3271 non-null   int64 
 7   type        3271 non-null   object
dtypes: int64(1), object(7)
memory usage: 204.6+ KB


### Re-Format Repay Transaction Data

Some of the values in the dataset are dictionaries, so we must re-format the data to get the wanted field from those dictionaries.

In [4]:
#get id's
def getUser(row):
    return row['user']['id']
df_repays['user']=df_repays.apply(lambda x: getUser(x), axis=1)

def getOnBehalfOf(row):
    if not isinstance(row['onBehalfOf'],float):
        return row['onBehalfOf']['id']
    else:
        return np.nan
df_repays['onBehalfOf']=df_repays.apply(lambda x: getOnBehalfOf(x), axis=1)

def getPool(row):
    return row['pool']['id']
df_repays['pool']=df_repays.apply(lambda x: getPool(x), axis=1)

#get symbols
def getReserve(row):
    if not isinstance(row['reserve'],float):
        return row['reserve']['symbol']
    else:
        return np.nan
df_repays['reserve']=df_repays.apply(lambda x: getReserve(x), axis=1)

### Fetch Price Data

The API does not list the price of the asset for each transaction, so we must gather this information in another way. We can search for the most recent price of the desired asset before the transaction occurred. Prices are reported in Ether, not USD. Because of this, we also get the price of Tether at the time of the transaction. Tether is a stable coin, so its price should always be close to one dollar. We will divide the asset price by the price of Tether to determine the price of the asset in USD. 

In [5]:
pricesSym=[]
pricesUSDT=[]
i=0
#get prices for each asset at time of transaction, and price for USDT at time of transaction
def getPrice(row, sym):
    global i
    
    #get symbol and time
    symbol = row[sym]
    timestamp = row['timestamp']
    
    #there doesn't seem to be price data for ammWETH, so instead we will just substitute WETH as they have the same price
    if symbol =='AmmWETH':
        symbol='WETH'
    
    #get query
    query="""
    {
    reserves(where: { symbol_in:["USDT",\""""+symbol+"""\"] }){
        symbol,
        price{
            priceInEth,
            priceHistory(where:{timestamp_lte: """+str(timestamp)+"""} orderBy: timestamp orderDirection: desc first:1){
                price,
                timestamp
              }
            }
          }
        }    
    """
    #keep trying request until it is successful
    while(True):
        try:
            #get json
            url = 'https://api.thegraph.com/subgraphs/name/aave/protocol-v2'
            request = requests.post(url,json={'query':query})
            req_json = request.json()
            break
        except:
            #if request unsuccessful, try again in 10 seconds
            print('stalling')
            time.sleep(10)
 
    try:
        #if only data for 1 asset...
        if len(req_json['data']['reserves'])<2:
            #if USDT, add data
            if symbol=="USDT":
                pricesSym.append(req_json['data']['reserves'][0]['price']['priceHistory'][0]['price'])
                pricesUSDT.append(req_json['data']['reserves'][0]['price']['priceHistory'][0]['price'])
            #otherwise, symbol not found
            else:
                pricesSym.append(np.nan)
                pricesUSDT.append(np.nan)
        #if both present...
        else:
            #ensure price data exists for asset
            phistory = req_json['data']['reserves'][0]['price']['priceHistory']
            #if data not available, set as null
            if len(phistory)==0:
                #WETH doesn't seem to return a priceHistory list, so we the priceInETH column 
                #(as the price in ETH for WETH is always 1)
                if symbol=='WETH' or symbol=='AmmWETH':
                    pricesSym.append(req_json['data']['reserves'][0]['price']['priceInEth'])
                    pricesUSDT.append(req_json['data']['reserves'][1]['price']['priceHistory'][0]['price'])
                else:
                    pricesSym.append(np.nan)
                    pricesUSDT.append(req_json['data']['reserves'][1]['price']['priceHistory'][0]['price'])
            #otherwise add data
            else:
                pricesSym.append(phistory[0]['price'])
                pricesUSDT.append(req_json['data']['reserves'][1]['price']['priceHistory'][0]['price'])
    except:
        print('ERROR')
        print(req_json)
        return
    
    #update progress
    i+=1
    if i%5000==0:
        print(i)
        
#get borrow prices
print('getting repay prices...')
df_repays.apply(lambda x: getPrice(x,'reserve'),axis=1)

df_repays['priceSym']=pricesSym
df_repays['priceUSDT']=pricesUSDT

getting repay prices...


### Get Decimal Data

The amount value is reported in units of the lowest denomination for each currency. To standardize these values, we must know the number of decimals each currency is reported in. The following query creates a dictionary holding the number of decimals for each currency.

In [6]:
#set query
query="""
        {
  reserves(first:1000){
    symbol
    decimals
  }
}
        """
#make request
url = 'https://api.thegraph.com/subgraphs/name/aave/protocol-v2'
request = requests.post(url,json={'query':query})
jsondata=request.json()['data']['reserves']

#create dictionary of the number of decimals in each asset
decimals=dict()
for data in jsondata:
    decimals[data['symbol']]=int(data['decimals'])
    
decimals

{'TUSD': 18,
 'AmmUniWBTCUSDC': 18,
 'RAI': 18,
 'GUSD': 2,
 'YFI': 18,
 'BAT': 18,
 'MANA': 18,
 'DPI': 18,
 'AmmBptWBTCWETH': 18,
 'UNI': 18,
 'AmmWBTC': 8,
 'WBTC': 8,
 'AmmUniYFIWETH': 18,
 'AmmUniCRVWETH': 18,
 'REN': 18,
 'AmmUniSNXWETH': 18,
 'BUSD': 18,
 'AmmGUniDAIUSDC': 18,
 'LINK': 18,
 'SUSD': 18,
 'AmmBptBALWETH': 18,
 'AmmDAI': 18,
 'DAI': 18,
 'AAVE': 18,
 'FRAX': 18,
 'XSUSHI': 18,
 'AmmUniRENWETH': 18,
 'PAX': 18,
 'FEI': 18,
 'MKR': 18,
 'AmmUSDC': 6,
 'USDC': 6,
 'AmmUniLINKWETH': 18,
 'AmmUniDAIWETH': 18,
 'AmmUniDAIUSDC': 18,
 'STETH': 18,
 'AmmUniUSDCWETH': 18,
 'AmmUniBATWETH': 18,
 'BAL': 18,
 'AmmUniWBTCWETH': 18,
 'SNX': 18,
 'AmmWETH': 18,
 'WETH': 18,
 'AmmUniMKRWETH': 18,
 'AmmGUniUSDCUSDT': 18,
 'AmmUniUNIWETH': 18,
 'AMPL': 9,
 'RENFIL': 18,
 'CRV': 18,
 'AmmUSDT': 6,
 'USDT': 6,
 'KNC': 18,
 'AmmUniAAVEWETH': 18,
 'ZRX': 18,
 'ENJ': 18}

### Adjust Amount

Next, we convert the amount column to type float. Then, with the information we gathered above, we write a function to adjust the transaction amounts. 

In [7]:
#transform amount column to float
df_repays['amount']=df_repays['amount'].astype(float)

In [8]:
#function to divide each amount based on the reserve
def adjustAmount(row):
    decs = decimals[row['reserve']]
    return row['amount']/(10**decs)

#adjust amounts
df_repays['amount']=df_repays.apply(lambda x: adjustAmount(x),axis=1)
df_repays['amount'].describe()

count    3.271000e+03
mean     5.504758e+05
std      4.576628e+06
min      3.797500e-14
25%      2.093528e+03
50%      1.532111e+04
75%      7.992115e+04
max      1.078706e+08
Name: amount, dtype: float64

### Re-Format Price Data

We format the raw price data so that the prices are reported in USD. We will also determine the total amount of the transaction in USD.

In [9]:
#get the prices in ethereum and in USD
df_repays['reservePriceETH'] = df_repays['priceSym'].astype(float)
df_repays['reservePriceUSD'] = df_repays['reservePriceETH']*(1/(df_repays['priceUSDT'].astype(float)))
df_repays['reservePriceETH'] = df_repays['reservePriceETH']/1e18

#get amount in ETH
df_repays['amountETH'] = df_repays['amount'].astype(float)*df_repays['reservePriceETH']

#get amount in USD
df_repays['amountUSD']=df_repays['amount'].astype(float)*df_repays['reservePriceUSD']                                     

#drop redundent columns
df_repays.drop(columns=['priceSym','priceUSDT'],inplace=True)
                                               
#reset index
df_repays = df_repays.set_index('id')           

### Save Final Data Frame

Finally, we save the dataframe to a csv file.

In [10]:
df_repays = old_data.append(df_repays)
df_repays.info()

<class 'pandas.core.frame.DataFrame'>
Index: 89846 entries, 0 to 0xfffba79f5164528608f74834e2ed63bf05e65e3f9e4f4867b21febb7e70ca77d:4
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   amount           89846 non-null  float64
 1   onBehalfOf       89846 non-null  object 
 2   pool             89846 non-null  object 
 3   reserve          89846 non-null  object 
 4   timestamp        89846 non-null  int64  
 5   user             89846 non-null  object 
 6   type             89846 non-null  object 
 7   reservePriceETH  89846 non-null  float64
 8   reservePriceUSD  89846 non-null  float64
 9   amountETH        89846 non-null  float64
 10  amountUSD        89846 non-null  float64
dtypes: float64(5), int64(1), object(5)
memory usage: 8.2+ MB


In [11]:
df_repays.to_csv("repays.csv",index=False)