# Standardizing Event Data for DeRisk

DeRisk currently works with event data in a raw format where relevant data (like the user, token, amount, etc.) is saved in a dictionary or list-like structure in one of the columns. To better extract information from the database, it is helpful to have a unified data structure where all relevant pieces of information are saved in separate columns. This allows for easy querying of all events of a given type (deposit, withdrawal, liquidation) for a given user, lending protocol, etc.

The following steps outline the process to take a sample of events and convert them to a standardized format that can be used to store information about any type of event and any lending ocolsv')


1.  Load event data from Paraquet file

In [None]:
import pandas as pd
# Define the GCS path to the Parquet file
gcs_path = 'https://storage.googleapis.com/derisk-persistent-state/zklend_data/events_sample.parquet'

# Read the Parquet file into a Pandas DataFrame
df = pd.read_parquet(gcs_path, engine='pyarrow')

In [None]:
df.head()

In [None]:
# taking a look at the entire second row
pd.set_option('display.max_colwidth', None)
df.iloc[1,:]

2. Define a Function to Decode Byte Strings
Define a helper function to decode byte strings if necessary. This function will be used to ensure that data in the keys and data columns is correctly decoded before further processing.

In [None]:
def decode_byte_string(value):
    if isinstance(value, bytes):
        return value.decode("utf-8")
    return value

3. Define a Function to Transform Each Row
Define a function that decodes the keys and data columns, converts the hexadecimal strings to integers, and structures the information into a standardized format.

In [None]:
import ast

def decode_and_convert_row(row):
    # Decode 'keys' and 'data' fields
    keys_decoded = decode_byte_string(row['keys'])
    data_decoded = decode_byte_string(row['data'])
    
    # Convert string representation of list to actual list
    keys = ast.literal_eval(keys_decoded)
    data = ast.literal_eval(data_decoded)
    
    # Function to convert hexadecimal string to integer
    def hex_to_int(hex_str):
        try:
            return int(hex_str, 16)
        except ValueError:
            return hex_str
    
    # Extract and structure the information
    standardized_row = {
        'block_hash': row['block_hash'],
        'block_number': row['block_number'],
        'transaction_hash': row['transaction_hash'],
        'event_index': row['event_index'],
        'timestamp': row['timestamp'],
        'user': row['from_address'],
        'event_type': row['key_name'],
        'token': keys[0] if keys else None,
        'amount': hex_to_int(data[0]) if data else None
    }
    
    return standardized_row

4. Apply the Transformation Function to the DataFrame
Apply the decode_and_convert_row function to each row of the DataFrame to create a new standardized DataFrame.

In [None]:
# Apply the function to each row
standardized_rows = df.apply(decode_and_convert_row, axis=1)
standardized_df = pd.DataFrame(standardized_rows.tolist())

5. Save the Standardized DataFrame to a CSV File
Save the transformed data to a CSV file for further analysis and easy querying.

In [None]:
# Save the standardized DataFrame to a CSV file
output_file = 'standardized_events.csv'
standardized_df.to_csv(output_file, index=False)

# Display the first few rows of the standardized DataFrame
standardized_df.head()