# Preprocessing TVL chains

## Libraries

In [19]:
import pandas as pd
import numpy as np
import json

In [20]:
# Function to load JSON
def loadJSON(filepath):
    with open(filepath) as file:
        return json.load(file)

In [21]:
# Loading the JSON and converting it in a dataframe
jsonFile = loadJSON('../../data/json/all-chains.json')

df = pd.DataFrame(jsonFile)
print(df.shape)
df.head()

(305, 6)


Unnamed: 0,gecko_id,tvl,tokenSymbol,cmcId,name,chainId
0,harmony,1895183.0,ONE,3945.0,Harmony,1666600000.0
1,mantle,461335500.0,MNT,27075.0,Mantle,5000.0
2,x-layer,9910153.0,,,X Layer,
3,fraxtal,32884530.0,FXTL,,Fraxtal,252.0
4,aurora-near,16711900.0,AURORA,14803.0,Aurora,1313161554.0


## Removing non usable data

In [22]:
# Checking basic info (specially nulls)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 305 entries, 0 to 304
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   gecko_id     237 non-null    object 
 1   tvl          305 non-null    float64
 2   tokenSymbol  248 non-null    object 
 3   cmcId        210 non-null    object 
 4   name         305 non-null    object 
 5   chainId      139 non-null    object 
dtypes: float64(1), object(5)
memory usage: 14.4+ KB


In [23]:
# Dropping null values and checking again
df = df.dropna(subset=['tokenSymbol'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 248 entries, 0 to 304
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   gecko_id     236 non-null    object 
 1   tvl          248 non-null    float64
 2   tokenSymbol  248 non-null    object 
 3   cmcId        210 non-null    object 
 4   name         248 non-null    object 
 5   chainId      103 non-null    object 
dtypes: float64(1), object(5)
memory usage: 13.6+ KB


## Something to point out...
Once we get to the step of getting the information of every token from CoinMarketCap API, we are going to have some issues with "ETHF", "FXTL" and "-" (tokenSymbol property). In order to avoid those API errors, we are removing them

In [24]:
# Initializing the list with those values
values_to_remove = ['ETHF', 'FXTL', '-']

In [25]:
# Filter out rows where tokenSymbol is in values_to_remove
filtered_df = df[~df['tokenSymbol'].isin(values_to_remove)]

In [26]:
filtered_df.to_json('../../data/json/tvl-chains-symbol.json', orient='records', indent=4)