# Wallet Feature Engineering

The purpose of this notebook is to...

# Read Data

Description of the dataset ...
Event dataset extracted and provided by [莊惟翔](https://github.com/Fred-Zhuang)

a list of events that occur on the NFTs that are tracked by OpenSea. The event_type field indicates the type of event (transfer, successful auction, etc) and the results are sorted by event timestamp.


In [1]:
import os
import re
import time
import datetime
import pandas as pd

data_dir = os.path.join(os.getcwd(), 'data')
cool_cats_nft = os.path.join(data_dir, 'cool-cats-nft.feather')

start_time = time.time()
wallets = pd.read_feather(cool_cats_nft)
total_time = time.time() - start_time
print("Total seconds to load:", total_time)

Total seconds to load: 7.475966215133667


In [2]:
wallets.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398450 entries, 0 to 2398449
Data columns (total 29 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   index                   2398450 non-null  int64  
 1   event_timestamp         2398450 non-null  object 
 2   event_type              2398450 non-null  object 
 3   token_id                2382796 non-null  object 
 4   num_sales               2382796 non-null  float64
 5   listing_time            2268796 non-null  object 
 6   token_owner_address     2382796 non-null  object 
 7   token_seller_address    2395806 non-null  object 
 8   deal_price              2398450 non-null  float64
 9   payment_token_symbol    2398411 non-null  object 
 10  payment_token_decimals  2398445 non-null  float64
 11  payment_token_usdprice  2397937 non-null  float64
 12  quantity                2394188 non-null  float64
 13  starting_price          0 non-null        float64
 14  en

In [3]:
wallets.drop(["index", "starting_price", "ending_price",
              "approved_account", "bid_amount",
              "pages"], axis=1, inplace=True)

In [4]:
wallets.event_timestamp = pd.to_datetime(wallets.event_timestamp)
wallets.listing_time = pd.to_datetime(wallets.listing_time)

In [5]:
print("Most recent event:", max(wallets.event_timestamp))

Most recent event: 2022-05-10 23:35:55


In [6]:
print("Earliest event:", min(wallets.event_timestamp))

Earliest event: 2017-07-04 04:33:49


In [7]:
print("Length of this time series dataset:", max(wallets.event_timestamp) - min(wallets.event_timestamp))

Length of this time series dataset: 1771 days 19:02:06


In [8]:
print("Total number of wallet addresses used to retieve data from OpenSea:", wallets.wallet_address_input.nunique())

Total number of wallet addresses used to retieve data from OpenSea: 9766


### WIP

# Generate Features

## Impute data

### `buy` vs. `sell` `event_type`

_N.b._ The dataset is missing `winner_account_address` attribute to postively confirmed the buyer is
the same as the wallet owner, i.e. `wallet_address_input`. We therefore infer whether the wallet owner
is either the buyer or the seller as such:

In [9]:
import numpy as np

wallets.event_type = np.where(wallets.wallet_address_input == wallets.token_seller_address, "sell", "buy")

### `duration`
the time between the token listed and the completion of the the sale.

_What to do when listing_time is `NaN`_?

In [10]:
wallets.rename(columns={"user_account_address": "token_seller_address"}, inplace=True)
wallets.duration = wallets.event_timestamp - wallets.listing_time
wallets.loc[:, ["token_seller_address", "event_type", "duration"]].head(10)

Unnamed: 0,token_seller_address,event_type,duration
0,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,sell,0 days 09:45:26
1,0xd44a7b02e9692f491fb360d6a509e37c06bcd579,buy,0 days 00:56:41
2,0x56a7a519cb9d369334a24c98b44164d18a9b8385,buy,0 days 00:10:33
3,0x278d9db7032ffe25c5fcec6fb517f4e2041805d3,buy,0 days 10:08:26
4,0xef9fdc930d645299d01440d82b6c417cbd8f7162,buy,0 days 00:23:56
5,0xef9fdc930d645299d01440d82b6c417cbd8f7162,buy,0 days 00:07:53
6,0xd0d20158daa57b04c1094b7c0fa31efbdd675b52,buy,0 days 00:18:01
7,0x721b1b99af3ccbc2d42c1934f0aabc006ea36e31,buy,0 days 00:12:30
8,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,sell,0 days 01:44:32
9,0x165dedca327cebac0a8222b71dc76a62b4727b83,buy,NaT


### `deal_price_usd` and payment token attributes

In [11]:
print(sum(wallets.payment_token_symbol.isna()), 
      sum(wallets.payment_token_decimals.isna()),
      sum(wallets.payment_token_usdprice.isna()))

39 5 513


In [12]:
wallets[wallets.payment_token_symbol.isna() |
        wallets.payment_token_decimals.isna() |
        wallets.payment_token_usdprice.isna()].shape[0]

513

is the total number of records missing either payment token symbol, token decimals, i.e. the deal price multiplication factor,
or the token to USD exchange rate. **We will ignore these record for now.**

In [13]:
wallets["deal_price_usd"] = wallets.deal_price / 10 ** wallets.payment_token_decimals * wallets.payment_token_usdprice
wallets.deal_price_usd.agg({max, np.mean, min})

max     6.925186e+06
min     0.000000e+00
mean    2.557588e+03
Name: deal_price_usd, dtype: float64

In [14]:
wallets[wallets.quantity.isna()]["deal_price_usd"].quantile(q=[x / 10 for x in range(0, 10)])

0.0       0.000000
0.1      11.217255
0.2      19.093200
0.3      32.133298
0.4      47.733000
0.5      72.282328
0.6     115.448000
0.7     188.907200
0.8     377.833480
0.9    1158.180000
Name: deal_price_usd, dtype: float64

### `quantity`

In [15]:
wallets.quantity.unique()

array([1.00000000e+00, 2.00000000e+00, 4.00000000e+00, 7.00000000e+01,
       3.00000000e+00, 9.00000000e+00, 1.00000000e+01, 2.00000000e+01,
       5.00000000e+00, 1.50000000e+01, 6.00000000e+00, 2.50000000e+01,
       5.00000000e+01, 7.00000000e+00, 8.00000000e+00, 1.10000000e+01,
       1.20000000e+01, 6.50000000e+02, 1.40000000e+01, 1.30000000e+01,
       1.90000000e+01, 1.60000000e+01,            nan, 1.28000000e+02,
       1.00000000e+11, 1.80000000e+01, 2.40000000e+01, 2.20000000e+01,
       2.10000000e+01, 1.70000000e+01, 2.90000000e+01, 3.50000000e+01,
       4.00000000e+01, 2.80000000e+01, 3.00000000e+01, 1.00000000e+02,
       6.40000000e+01, 3.20000000e+01, 1.92000000e+02, 4.20000000e+02,
       2.50000000e+02, 5.10000000e+01, 2.00000000e+02, 6.90000000e+01,
       1.00000000e+03, 8.00000000e+01, 1.72000000e+02, 1.88000000e+02,
       5.50000000e+01, 1.00000000e+22, 4.00000000e+18, 2.70000000e+01,
       1.00000000e+09, 1.00000000e+04, 2.50000000e+03, 2.30000000e+01,
      

In [16]:
sum(wallets.quantity.isna())

4262

In [17]:
wallets.quantile(q=[x / 1000 for x in range(0,1000, 10)])

Unnamed: 0,num_sales,deal_price,payment_token_decimals,payment_token_usdprice,quantity,block_number,is_private,deal_price_usd
0.00,0.0,0.000000e+00,0.0,5.011600e-11,1.0,3971224.00,0.0,0.000000
0.01,1.0,3.212983e+08,18.0,2.218410e+03,1.0,11817809.29,0.0,0.022499
0.02,1.0,5.000000e+15,18.0,2.249890e+03,1.0,12120872.98,0.0,12.005050
0.03,1.0,9.500000e+15,18.0,2.254960e+03,1.0,12434482.70,0.0,22.313492
0.04,1.0,1.000000e+16,18.0,2.254960e+03,1.0,12612300.28,0.0,23.825900
...,...,...,...,...,...,...,...,...
0.95,191.0,4.200000e+18,18.0,2.422550e+03,1.0,14597862.00,0.0,9659.160000
0.96,459.0,5.200000e+18,18.0,2.436340e+03,1.0,14621963.52,0.0,12005.050000
0.97,1124.0,6.890000e+18,18.0,2.436340e+03,1.0,14649170.13,0.0,15725.714117
0.98,2823.0,9.000000e+18,18.0,2.442540e+03,1.0,14681174.26,0.0,20799.630000


In [18]:
max(wallets.quantity)

1e+22

### `is_private` sales

_Do we assume Nan is __not__ private?_

In [19]:
wallets.is_private.value_counts(dropna=False)

0.0    2230703
NaN     129654
1.0      38093
Name: is_private, dtype: int64

## WIP

將錢包地址分組

暫時給定任一個錢包地址來進行以下特徵計算 (最後再用迴圈串起來)

In [20]:
sectors = wallets.groupby("wallet_address_input")

df_temp3 = sectors.get_group("0x5338035c008ea8c4b850052bc8dad6a33dc2206c")
df_temp3 = df_temp3.reset_index(drop=True)  # Is this necessary?

In [21]:
df_temp3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5521 entries, 0 to 5520
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype          
---  ------                  --------------  -----          
 0   event_timestamp         5521 non-null   datetime64[ns] 
 1   event_type              5521 non-null   object         
 2   token_id                5503 non-null   object         
 3   num_sales               5503 non-null   float64        
 4   listing_time            5472 non-null   datetime64[ns] 
 5   token_owner_address     5503 non-null   object         
 6   token_seller_address    5521 non-null   object         
 7   deal_price              5521 non-null   float64        
 8   payment_token_symbol    5521 non-null   object         
 9   payment_token_decimals  5521 non-null   float64        
 10  payment_token_usdprice  5521 non-null   float64        
 11  quantity                5521 non-null   float64        
 12  asset_bundle            18 non-nul

In [22]:
df_temp3.head()

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
0,2022-05-04 03:29:18,buy,2977,1.0,2022-05-04 03:19:37,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x4984fc170325e8fe57e9de1c2b74ce5eabb6f9da,8.45e+17,ETH,18.0,...,0x5b7266357899fc13841a02456c8128b56e8852bc55ae...,0xe0e7e1a0a2516ec3ec8764e1f29773b9a77da8355ff3...,14708585.0,0.0,0 days 00:09:41,2022-05-04T03:29:50.900288,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2020.38655
1,2022-05-04 03:29:18,buy,3016,1.0,2022-05-04 00:14:49,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xc293bc1602efeba837cb240c49476e1d3fe0fd98,8.45e+17,ETH,18.0,...,0x5b7266357899fc13841a02456c8128b56e8852bc55ae...,0xe0e7e1a0a2516ec3ec8764e1f29773b9a77da8355ff3...,14708585.0,0.0,0 days 03:14:29,2022-05-04T03:29:50.461508,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2020.38655
2,2022-05-04 03:29:18,buy,4956,1.0,2022-05-03 19:39:05,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xe2fb909159dea75b1520c382ca102989cdd1a276,8.4e+17,ETH,18.0,...,0x5b7266357899fc13841a02456c8128b56e8852bc55ae...,0xe0e7e1a0a2516ec3ec8764e1f29773b9a77da8355ff3...,14708585.0,0.0,0 days 07:50:13,2022-05-04T03:29:50.009580,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2008.4316
3,2022-05-04 03:29:18,buy,5078,1.0,2022-05-04 01:40:56,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x8a45d09b2dbbf1657fb8c14561b6525443631d22,8.5e+17,ETH,18.0,...,0x5b7266357899fc13841a02456c8128b56e8852bc55ae...,0xe0e7e1a0a2516ec3ec8764e1f29773b9a77da8355ff3...,14708585.0,0.0,0 days 01:48:22,2022-05-04T03:29:49.627993,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2032.3415
4,2022-05-04 03:29:18,buy,5800,1.0,2022-05-04 02:52:43,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xcc60f720388551bc9159cfed814a15de2f49d1e9,8.49e+17,ETH,18.0,...,0x5b7266357899fc13841a02456c8128b56e8852bc55ae...,0xe0e7e1a0a2516ec3ec8764e1f29773b9a77da8355ff3...,14708585.0,0.0,0 days 00:36:35,2022-05-04T03:29:49.240455,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2029.95051


In [23]:
import numpy as np

#買入價格，給定以太鏈。Eth = deal_price/(10**18)
df_temp3["cost"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], 0,df_temp3["deal_price"]/10**18)
#賣出價格
df_temp3["sellprice"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], df_temp3["deal_price"]/10**18, 0)

## cost and sellprice probably aren't necessary since there is the 'Buy_Sell' created below
## a deal_price_usd is recommended

#日期轉換
df_temp3["Datetime"] = pd.to_datetime(df_temp3["event_timestamp"]) # this can be store in event_timestamp instead of a new column
#買賣戳記
# Consider overwriting event_type column
df_temp3["Buy_Sell"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], "S", 'B')
#投資組合(庫存) << what do we plan to store here?
df_temp3["Profolio"] = np.NaN
#損益 << Profit? How do we plan to calcuate this for each row of event?
df_temp3["PL"] = 0
#token持有數量 << 
df_temp3["NFT_total_num"] = 0
#用collection_slug和tokenid組一個獨立欄位，用以紀錄錢包所持有的token
df_temp3["collection_slug_tokenid"] = df_temp3["collection_slug"] + df_temp3["token_id"]
#token從二級買進到賣出所持有的時間
df_temp3["HoldPeriod"] = np.NaN
df_temp3["Position"] = 0
df_temp3["Sell"] = 0

In [24]:
df_temp3.head()

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,sellprice,Datetime,Buy_Sell,Profolio,PL,NFT_total_num,collection_slug_tokenid,HoldPeriod,Position,Sell
0,2022-05-04 03:29:18,buy,2977,1.0,2022-05-04 03:19:37,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x4984fc170325e8fe57e9de1c2b74ce5eabb6f9da,8.45e+17,ETH,18.0,...,0.0,2022-05-04 03:29:18,B,,0,0,fragments-by-james-jean2977,,0,0
1,2022-05-04 03:29:18,buy,3016,1.0,2022-05-04 00:14:49,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xc293bc1602efeba837cb240c49476e1d3fe0fd98,8.45e+17,ETH,18.0,...,0.0,2022-05-04 03:29:18,B,,0,0,fragments-by-james-jean3016,,0,0
2,2022-05-04 03:29:18,buy,4956,1.0,2022-05-03 19:39:05,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xe2fb909159dea75b1520c382ca102989cdd1a276,8.4e+17,ETH,18.0,...,0.0,2022-05-04 03:29:18,B,,0,0,fragments-by-james-jean4956,,0,0
3,2022-05-04 03:29:18,buy,5078,1.0,2022-05-04 01:40:56,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x8a45d09b2dbbf1657fb8c14561b6525443631d22,8.5e+17,ETH,18.0,...,0.0,2022-05-04 03:29:18,B,,0,0,fragments-by-james-jean5078,,0,0
4,2022-05-04 03:29:18,buy,5800,1.0,2022-05-04 02:52:43,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xcc60f720388551bc9159cfed814a15de2f49d1e9,8.49e+17,ETH,18.0,...,0.0,2022-05-04 03:29:18,B,,0,0,fragments-by-james-jean5800,,0,0


### Is the code block below attempting to calcuate the _current cumulative stat_ of each wallet?

In [25]:
porfolio_dict = {}#紀錄持有的NFT集合
porfolio_costdict = {}#紀錄買入成本
porfolio_datedict = {}#紀錄買入時間
count = 0
error = []
#資料時間是從新到舊，計算時要倒序，從舊到新去累計上來。
for i in range(len(df_temp3)-1,-1,-1):
    #初次買進NFT項目
    if df_temp3["collection_slug"][i] not in porfolio_dict.keys():
        if df_temp3["Buy_Sell"][i]=="B":
            #庫存加一
            count = count+1
            porfolio_dict[df_temp3["collection_slug"][i]] = [df_temp3["token_id"][i]]
            df_temp3.loc[i, "Profolio"] = [porfolio_dict]
            df_temp3.loc[i, "NFT_total_num"] = count
            #NFT成本
            porfolio_costdict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["cost"][i]
            #NFT買進時間
            porfolio_datedict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["Datetime"][i]
            #position
            df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
            
        else:
            #賣出代表過去有可能發生來自於其他錢包轉移，但無法計算到先前持有的成本。
            df_temp3.loc[i, "NFT_total_num"] = count
            df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
    else:
        #手上持有此項目的NFT
        if df_temp3["token_id"][i] not in porfolio_dict[df_temp3["collection_slug"][i]]:
            if df_temp3["Buy_Sell"][i]=="B":
                #買進加碼
                porfolio_dict[df_temp3["collection_slug"][i]].append(df_temp3["token_id"][i])
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                #庫存加一
                count = count+1
                df_temp3.loc[i, "NFT_total_num"] = count
                #NFT成本
                porfolio_costdict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["cost"][i]
                #NFT買進時間
                porfolio_datedict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["Datetime"][i]
                #position
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
                
            else:
                #賣出。有可能發生來自於其他錢包轉移，但無法計算到先前持有的成本。
                df_temp3.loc[i, "NFT_total_num"] = count
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
        else:
            if df_temp3["Buy_Sell"][i]=="B":
                #不可能發生，因為tokenid是唯一的?
                df_temp3.loc[i, "NFT_total_num"] = count
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
            else:
                #損益發生點，完成一次買入跟賣出
                #庫存減一
                count = count-1
                df_temp3.loc[i, "NFT_total_num"] = count
                #將token從porfolio移除
                porfolio_dict[df_temp3["collection_slug"][i]].remove(df_temp3["token_id"][i])
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                if df_temp3["collection_slug_tokenid"][i] in porfolio_costdict.keys():
                    profit = df_temp3["sellprice"][i] - porfolio_costdict[df_temp3["collection_slug_tokenid"][i]]
                    df_temp3.loc[i, "PL"] =  profit        
                    #丟棄key and value因為賣出了
                    porfolio_costdict.pop(df_temp3["collection_slug_tokenid"][i])
                    df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
                    #TOKEN從買入到賣出持有間隔時間
                    date_substrate = df_temp3["Datetime"][i] - porfolio_datedict[df_temp3["collection_slug_tokenid"][i]]
                    df_temp3.loc[i, "HoldPeriod"] =  date_substrate
                    #賣出戳記
                    df_temp3.loc[i, "Sell"] =  1
                    
                else:
                    #通常不會到這裡
                    error.append([df_temp3["wallet_address_input"][0],df_temp3["collection_slug_tokenid"][i]])
                    df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                    df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())

In [26]:
#損益為正
def positive_SIGN(row):
    if row['PL_sign'] == 1:
        return 1
    return 0

#損益為負
def negative_SIGN(row):
    if row['PL_sign'] == -1 :
        return 1
    return 0

In [27]:
#累計損益是在一個錢包裡完成完整的買進賣出動作所累計的。
df_temp3['cum_PL'] = df_temp3.loc[::-1, 'PL'].cumsum()[::-1]
#總利潤
df_temp3['TotalRevenue'] = df_temp3['cum_PL'] - df_temp3["Position"]
#損益正負符號
df_temp3["PL_sign"] = np.sign(list(df_temp3["PL"].values))
#累計賣出數量
df_temp3["cum_Sell"] = df_temp3.loc[::-1, 'Sell'].cumsum()[::-1]
#損益為正做記號
df_temp3["positive_sign"] = df_temp3.apply(lambda row: positive_SIGN(row), axis=1)
#損益為負做記號
df_temp3["negative_sign"] = df_temp3.apply(lambda row: negative_SIGN(row), axis=1)
#累積正損益數
df_temp3["cum_positive_sign"] = df_temp3.loc[::-1, 'positive_sign'].cumsum()[::-1]
#累積負損益數
df_temp3["cum_negative_sign"] = df_temp3.loc[::-1, 'negative_sign'].cumsum()[::-1]
#勝率
df_temp3["winrate"] = df_temp3["cum_positive_sign"] / df_temp3['cum_Sell']
#輸錢率
df_temp3["lossrate"] = df_temp3["cum_negative_sign"] / df_temp3['cum_Sell']
#用0填補缺值
df_temp3["winrate"] = df_temp3["winrate"].fillna(0)
df_temp3["lossrate"] = df_temp3["lossrate"].fillna(0)
#接受問價而賣出做紀號
df_temp3["Bid_sell"] = np.where((df_temp3["payment_token_symbol"]=="WETH")&(df_temp3["Buy_Sell"]=="S"), 1,0)
#透過問價而買入做紀號
df_temp3["Bid_buy"] = np.where((df_temp3["payment_token_symbol"]=="WETH")&(df_temp3["Buy_Sell"]=="B"), 1,0)
#累計問價買入數
df_temp3["cum_Bid_buy"] = df_temp3.loc[::-1, 'Bid_buy'].cumsum()[::-1]
#累計接受問價賣出數
df_temp3["cum_Bid_sell"] = df_temp3.loc[::-1, 'Bid_sell'].cumsum()[::-1]

#勝率(透過問價而買入&接受問價而賣出)前者代表很會釣魚，後者代表失去信心或是無法抵抗高價誘惑
df_temp3["Bid_sell_rate"] = df_temp3["cum_Bid_sell"] / df_temp3["cum_Sell"]
df_temp3["Bid_sell_rate"] = df_temp3["Bid_sell_rate"].fillna(0)
df_temp3["Bid_buy_rate"] = df_temp3["cum_Bid_buy"] / df_temp3["NFT_total_num"]
#TOKEN賣出數/手上TOKEN持有數
df_temp3["sellposition_rate"] = df_temp3["cum_Sell"]/df_temp3["NFT_total_num"]

In [28]:
df_temp3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5521 entries, 0 to 5520
Data columns (total 52 columns):
 #   Column                   Non-Null Count  Dtype          
---  ------                   --------------  -----          
 0   event_timestamp          5521 non-null   datetime64[ns] 
 1   event_type               5521 non-null   object         
 2   token_id                 5503 non-null   object         
 3   num_sales                5503 non-null   float64        
 4   listing_time             5472 non-null   datetime64[ns] 
 5   token_owner_address      5503 non-null   object         
 6   token_seller_address     5521 non-null   object         
 7   deal_price               5521 non-null   float64        
 8   payment_token_symbol     5521 non-null   object         
 9   payment_token_decimals   5521 non-null   float64        
 10  payment_token_usdprice   5521 non-null   float64        
 11  quantity                 5521 non-null   float64        
 12  asset_bundle        

In [29]:
df_temp3.describe().loc["mean"]

num_sales                                101.757587
deal_price                     597986835388445568.0
payment_token_decimals                         18.0
payment_token_usdprice                  2390.965238
quantity                                   1.049085
block_number                        13350087.601812
is_private                                 0.001279
duration                  3 days 12:03:26.675250534
deal_price_usd                          1429.741645
cost                                       0.169379
sellprice                                  0.428607
PL                                        -0.007802
NFT_total_num                            543.792972
Position                                 179.554514
Sell                                       0.098714
cum_PL                                    42.495312
TotalRevenue                            -137.059203
PL_sign                                    0.018475
cum_Sell                                 225.092556
positive_sig

## WIP

In [30]:
wallets.token_seller_address.nunique()

215099

In [31]:
sum(wallets.payment_token_symbol.isna())

39

In [32]:
wallets.quantity.unique()

array([1.00000000e+00, 2.00000000e+00, 4.00000000e+00, 7.00000000e+01,
       3.00000000e+00, 9.00000000e+00, 1.00000000e+01, 2.00000000e+01,
       5.00000000e+00, 1.50000000e+01, 6.00000000e+00, 2.50000000e+01,
       5.00000000e+01, 7.00000000e+00, 8.00000000e+00, 1.10000000e+01,
       1.20000000e+01, 6.50000000e+02, 1.40000000e+01, 1.30000000e+01,
       1.90000000e+01, 1.60000000e+01,            nan, 1.28000000e+02,
       1.00000000e+11, 1.80000000e+01, 2.40000000e+01, 2.20000000e+01,
       2.10000000e+01, 1.70000000e+01, 2.90000000e+01, 3.50000000e+01,
       4.00000000e+01, 2.80000000e+01, 3.00000000e+01, 1.00000000e+02,
       6.40000000e+01, 3.20000000e+01, 1.92000000e+02, 4.20000000e+02,
       2.50000000e+02, 5.10000000e+01, 2.00000000e+02, 6.90000000e+01,
       1.00000000e+03, 8.00000000e+01, 1.72000000e+02, 1.88000000e+02,
       5.50000000e+01, 1.00000000e+22, 4.00000000e+18, 2.70000000e+01,
       1.00000000e+09, 1.00000000e+04, 2.50000000e+03, 2.30000000e+01,
      

In [33]:
wallets[~wallets.payment_token_symbol.isna()].quantity.unique()

array([1.0000e+00, 2.0000e+00, 4.0000e+00, 7.0000e+01, 3.0000e+00,
       9.0000e+00, 1.0000e+01, 2.0000e+01, 5.0000e+00, 1.5000e+01,
       6.0000e+00, 2.5000e+01, 5.0000e+01, 7.0000e+00, 8.0000e+00,
       1.1000e+01, 1.2000e+01, 6.5000e+02, 1.4000e+01, 1.3000e+01,
       1.9000e+01, 1.6000e+01,        nan, 1.2800e+02, 1.0000e+11,
       1.8000e+01, 2.4000e+01, 2.2000e+01, 2.1000e+01, 1.7000e+01,
       2.9000e+01, 3.5000e+01, 4.0000e+01, 2.8000e+01, 3.0000e+01,
       1.0000e+02, 6.4000e+01, 3.2000e+01, 1.9200e+02, 4.2000e+02,
       2.5000e+02, 5.1000e+01, 2.0000e+02, 6.9000e+01, 1.0000e+03,
       8.0000e+01, 1.7200e+02, 1.8800e+02, 5.5000e+01, 1.0000e+22,
       4.0000e+18, 2.7000e+01, 1.0000e+09, 1.0000e+04, 2.5000e+03,
       2.3000e+01, 4.9000e+01, 2.6000e+01, 2.4700e+02, 6.0000e+03,
       1.0000e+18, 1.2618e+04, 8.8880e+03, 8.7000e+01, 8.9000e+01,
       1.0000e+10, 1.0000e+19, 5.0000e+18, 3.1000e+01, 1.2500e+02,
       4.3000e+10, 6.1650e+10, 2.5000e+10, 6.1200e+20, 9.2100e

In [34]:
wallets[wallets.quantity.isna()]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
97775,2020-06-27 19:12:07,buy,147226,1.0,NaT,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,0x42a60d2f2ffa2150c568010a8d425f0aad284fd2,3.249210e+16,ETH,18.0,...,0x2ff432a6205be89a625a7016815b5a81991541d94173...,0x0c2cd5d23b6676863b2278e225ed650f4c0884860c4b...,10349619.0,,NaT,2020-06-27T19:12:48.763876,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,75.729669
97776,2020-06-20 16:16:52,buy,44584,2.0,NaT,0x6d1f5fac38edca69b9d02637a173c1e62d331896,0x8b3ad493c077e894a034db7eb53e8285560298fd,2.450000e+16,ETH,18.0,...,0x21746dd417132bb6e3b6ca77c809ba62322302ff16ea...,0x7124a292f83004a5f888acd3d808a4e6544c94568534...,10303653.0,,NaT,2020-06-20T16:17:15.773754,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,57.102395
97777,2020-06-19 17:01:06,buy,140164,4.0,NaT,0xc301878610b1952e94a57d9958fba4ec043537e4,0x9b7061023cd42263448d48c48572507f19f39b78,7.500000e+16,ETH,18.0,...,0xb0993b84033070e3b6e11c10e55ffcc78386f287188c...,0xc08102bade64f5976a023213267cc09b06dc714a5df4...,10297405.0,,NaT,2020-06-19T17:01:53.569908,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,174.803250
97778,2020-06-19 16:56:36,buy,71500,3.0,NaT,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,0x937b9093cd9d8798930e394d188f9b5596d49f54,5.000000e+16,ETH,18.0,...,0x97ce92af142c265957c66c5db4909a66d282b39e2df6...,0xe42efeaef744ded41ffd2d35a40eccef40a87c89a98b...,10297384.0,,NaT,2020-06-19T16:57:06.608060,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,116.535500
97779,2020-06-19 16:52:38,buy,121128,2.0,NaT,0x7a9112792211461205db4191381866b0508fa4a8,0xb4d055d63cc6a7bfb51a588ddaeb245ce5e3fc48,1.890000e+16,ETH,18.0,...,0xeb9f95b93ee31757b257ba06a30d35a56bb0de35a511...,0xf0237c4fada0f5fb6fdde8c1ba8702bac7cf1598ea44...,10297361.0,,NaT,2020-06-19T16:52:55.760877,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,44.050419
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2168131,2019-05-20 01:12:22,buy,23300,1.0,2019-05-20 00:09:28,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x37e0c7d0b3b606c2dd9ec83fdf3c1c6aff70e02b60d8...,"b""t`\x91E\x9d\x96;\x9e\xe3\xd6y'\xfaO\xca\xbe\...",7793907.0,0.0,0 days 01:02:54,2019-05-20T01:12:56.975962,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168132,2019-05-20 01:12:07,buy,32194,1.0,2019-05-20 00:11:56,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x0362ec6a23c7cc9aff6ca78587ffaa09bf8708ad51cd...,"b""w\xb2\xa2\xecN\xe2\xf4\x10N\xd8\xfc\xa5\x8d'...",7793905.0,0.0,0 days 01:00:11,2019-05-20T01:12:31.755591,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168133,2019-05-20 01:10:19,buy,23301,1.0,2019-05-20 00:09:59,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x55145bccaeed167e32804c9b2af89a859993ddba6598...,b'\x06B\x1c7D\xcd\x89\xd1\xbe\xcb\xf1\xd5~?v\x...,7793900.0,0.0,0 days 01:00:20,2019-05-20T01:10:57.798340,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168134,2019-05-20 01:09:42,buy,33912,1.0,2019-05-20 00:16:54,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x613a7d28e158d5db005e808dbd4d152cbf0ed8968b68...,b'4>S\xda\xff\x15\x02\xe5\x1dP=\xaa\xfb{]\xc7\...,7793896.0,0.0,0 days 00:52:48,2019-05-20T01:10:15.673318,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100


## Create features

In [35]:
wallets.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398450 entries, 0 to 2398449
Data columns (total 24 columns):
 #   Column                  Non-Null Count    Dtype          
---  ------                  --------------    -----          
 0   event_timestamp         2398450 non-null  datetime64[ns] 
 1   event_type              2398450 non-null  object         
 2   token_id                2382796 non-null  object         
 3   num_sales               2382796 non-null  float64        
 4   listing_time            2268796 non-null  datetime64[ns] 
 5   token_owner_address     2382796 non-null  object         
 6   token_seller_address    2395806 non-null  object         
 7   deal_price              2398450 non-null  float64        
 8   payment_token_symbol    2398411 non-null  object         
 9   payment_token_decimals  2398445 non-null  float64        
 10  payment_token_usdprice  2397937 non-null  float64        
 11  quantity                2394188 non-null  float64        
 12  

### Wallet age (WIP)

In [47]:
df=wallets.groupby("user_account_address") \
    .agg({"event_timestamp": [max, min]}) \
    .assign(wallet_age=lambda x : x.loc[:, ("event_timestamp", "max")] - x.loc[:, ("event_timestamp", "min")])

In [48]:
df.columns.set_levels([['event_timestamp'], ['max', 'min', 'wallet_age']], level=[1,1])

MultiIndex([('event_timestamp',        'max'),
            ('event_timestamp',        'min'),
            (     'wallet_age', 'wallet_age')],
           )

In [49]:
df

Unnamed: 0_level_0,event_timestamp,event_timestamp,wallet_age
Unnamed: 0_level_1,max,min,Unnamed: 3_level_1
user_account_address,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
0x000000000000123ca35c69ba3f852a46b2a27c94,2022-02-17 09:37:34,2022-02-17 09:08:33,0 days 00:29:01
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,2021-07-26 15:47:53,2021-06-07 23:09:17,48 days 16:38:36
0x00000000000360176d958e11c140308cd0863679,2021-09-09 19:10:47,2021-08-06 06:43:27,34 days 12:27:20
0x000000000004d7463d0f9c77383600bc82d612f5,2021-11-14 09:27:10,2021-11-14 09:27:10,0 days 00:00:00
0x00000000000a486c964069bb7390ae37010a04ca,2022-05-03 01:38:31,2022-04-18 23:53:59,14 days 01:44:32
...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,2022-01-30 09:43:45,2022-01-30 09:43:45,0 days 00:00:00
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,2022-02-06 21:14:21,2022-01-06 15:05:31,31 days 06:08:50
0xfffff6e70842330948ca47254f2be673b1cb0db7,2021-06-30 17:05:50,2021-06-29 22:57:27,0 days 18:08:23
0xffffff5800b709071d4adc74759ae4b89bef2a9d,2022-05-04 17:03:44,2022-04-19 04:24:04,15 days 12:39:40


In [50]:
df.droplevel(0, axis=1)

Unnamed: 0_level_0,max,min,Unnamed: 3_level_0
user_account_address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0x000000000000123ca35c69ba3f852a46b2a27c94,2022-02-17 09:37:34,2022-02-17 09:08:33,0 days 00:29:01
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,2021-07-26 15:47:53,2021-06-07 23:09:17,48 days 16:38:36
0x00000000000360176d958e11c140308cd0863679,2021-09-09 19:10:47,2021-08-06 06:43:27,34 days 12:27:20
0x000000000004d7463d0f9c77383600bc82d612f5,2021-11-14 09:27:10,2021-11-14 09:27:10,0 days 00:00:00
0x00000000000a486c964069bb7390ae37010a04ca,2022-05-03 01:38:31,2022-04-18 23:53:59,14 days 01:44:32
...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,2022-01-30 09:43:45,2022-01-30 09:43:45,0 days 00:00:00
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,2022-02-06 21:14:21,2022-01-06 15:05:31,31 days 06:08:50
0xfffff6e70842330948ca47254f2be673b1cb0db7,2021-06-30 17:05:50,2021-06-29 22:57:27,0 days 18:08:23
0xffffff5800b709071d4adc74759ae4b89bef2a9d,2022-05-04 17:03:44,2022-04-19 04:24:04,15 days 12:39:40


In [52]:
df.reset_index(col_level=1).droplevel(0, axis=1).assign(wallet_age=lambda x: x["max"]-x["min"]).sort_values("wallet_age")

Unnamed: 0,user_account_address,max,min,Unnamed: 4,wallet_age
107549,0x7fe24b3b9a1003afb0af084e084a555d7e40e7d4,2022-04-25 13:34:57,2022-04-25 13:34:57,0 days 00:00:00,0 days 00:00:00
118565,0x8d03b5788b1ed9951ac37c72b4a14a7fea946c5b,2022-04-17 03:36:47,2022-04-17 03:36:47,0 days 00:00:00,0 days 00:00:00
118564,0x8d033ec65dcfd6fcccef68b1f4c895d1e88dc580,2022-01-12 22:38:38,2022-01-12 22:38:38,0 days 00:00:00,0 days 00:00:00
118561,0x8d0261fe9f96aa3c604ae3d1fb7c06c47f01122c,2021-09-10 08:01:31,2021-09-10 08:01:31,0 days 00:00:00,0 days 00:00:00
118560,0x8d01bdf55fa7f1ccfef7b670a11b8c14faf827bf,2021-10-24 01:19:27,2021-10-24 01:19:27,0 days 00:00:00,0 days 00:00:00
...,...,...,...,...,...
57641,0x442dccee68425828c106a3662014b4f131e3bd9b,2022-05-05 02:13:02,2018-03-01 06:27:09,1525 days 19:45:53,1525 days 19:45:53
16022,0x12a0e25e62c1dbd32e505446062b26aecb65f028,2022-05-07 12:12:30,2018-02-24 18:31:49,1532 days 17:40:41,1532 days 17:40:41
35901,0x2a5ba6819249aa93c0ad8711a9f8058360083fb7,2022-05-09 13:13:33,2018-02-26 02:14:48,1533 days 10:58:45,1533 days 10:58:45
177796,0xd387a6e4e84a6c86bd90c158c6028a58cc8ac459,2022-05-08 21:45:05,2018-02-24 19:53:03,1534 days 01:52:02,1534 days 01:52:02


## Transform dataset

In [54]:
wallets.user_account_address.nunique()

215099

In [42]:
wallets.columns

Index(['event_timestamp', 'event_type', 'token_id', 'num_sales',
       'listing_time', 'token_owner_address', 'token_seller_address',
       'deal_price', 'payment_token_symbol', 'payment_token_decimals',
       'payment_token_usdprice', 'quantity', 'asset_bundle', 'auction_type',
       'transaction_hash', 'block_hash', 'block_number', 'is_private',
       'duration', 'created_date', 'collection_slug', 'contract_address',
       'wallet_address_input', 'deal_price_usd'],
      dtype='object')

In [55]:
wallets.wallet_address_input.nunique()

9766

In [44]:
wallets.groupby(["user_account_address", "event_type"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,num_sales,deal_price,payment_token_decimals,payment_token_usdprice,quantity,block_number,is_private,deal_price_usd
user_account_address,event_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0x000000000000123ca35c69ba3f852a46b2a27c94,buy,3.0,1.479000e+18,36.0,4773.25,2.0,28445621.0,0.0,3536.472260
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,buy,10.0,3.730000e+18,36.0,4737.92,2.0,25493004.0,0.0,8785.110700
0x00000000000360176d958e11c140308cd0863679,buy,16799.0,4.304900e+18,216.0,28500.43,12.0,156312940.0,0.0,10278.447125
0x000000000004d7463d0f9c77383600bc82d612f5,buy,2.0,2.500000e+18,18.0,2249.89,1.0,13613175.0,0.0,5624.725000
0x00000000000a486c964069bb7390ae37010a04ca,buy,3.0,5.440000e+16,36.0,4731.51,2.0,29313998.0,0.0,128.946970
...,...,...,...,...,...,...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,buy,3.0,4.790000e+17,18.0,2392.29,1.0,14106408.0,0.0,1145.906910
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,buy,4.0,2.330000e+18,36.0,4713.89,2.0,28107432.0,0.0,5460.431400
0xfffff6e70842330948ca47254f2be673b1cb0db7,buy,6.0,9.195000e+18,54.0,7077.25,3.0,38203705.0,0.0,22076.943750
0xffffff5800b709071d4adc74759ae4b89bef2a9d,buy,2.0,9.600000e+16,36.0,4732.86,2.0,29325598.0,0.0,227.553940


In [45]:
wallets[wallets.user_account_address == "0x000000000000123ca35c69ba3f852a46b2a27c94"]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,user_account_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
86922,2022-02-17 09:37:34,buy,7527,1.0,2022-02-15 20:00:50,0xd311bdacb151b72bddfee9cbdc414af22a5e38dc,0x000000000000123ca35c69ba3f852a46b2a27c94,6.8e+17,ETH,18.0,...,0xabd7bba058aceebb37d724fb0875579ef7e447884289...,0xb726498fab138863e12d88abf217b70b1facdc69c2b4...,14222873.0,0.0,1 days 13:36:44,2022-02-17T09:37:47.808970,raidpartyfighters,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x4bdc1cad2d045ec17955688d0fefeb99a385c30f,1584.8828
2277773,2022-02-17 09:08:33,buy,354,2.0,2022-02-15 20:01:20,0xd311bdacb151b72bddfee9cbdc414af22a5e38dc,0x000000000000123ca35c69ba3f852a46b2a27c94,7.99e+17,ETH,18.0,...,0x26549fd9bf01563258e633a6c6d0239718cff37f35a8...,0xfce46d606649b8608f9e1b31e3ee6de18d157fbd8d05...,14222748.0,0.0,1 days 13:07:13,2022-02-17T09:08:52.893135,raidpartyfighters,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x150fa2afc4db393b4d231cbace82ecfc7d3b4be9,1951.58946


In [59]:
wallets.groupby("user_account_address")["event_type"].nunique().sort_values()

user_account_address
0x000000000000123ca35c69ba3f852a46b2a27c94    1
0xa9016ca47a8234986e5f32424948c86e7685917f    1
0xa90177eb7a438b534518d6c152becd730bd65121    1
0xa901bdf0b405069f671320b9d7bfeeb30dade032    1
0xa901ca455ba935b5dd8bbf8dd986ec34b931f8e3    1
                                             ..
0x0aa568cfc61041aa215cce4a39b883004276a0be    2
0xa5eae3eacf95a344cc5c54413729cf5331b9b495    2
0x4949338bb2586b9e99b6fb48f3ce8f3cd88a5aac    2
0x87f4efff19b3ddd7302f5c2219382ff1211139eb    2
0x5cd1c9be0bbe4294d70a87a826323958caf94e4a    2
Name: event_type, Length: 215099, dtype: int64

# Explore Data

In [61]:
wallets.groupby(["user_account_address", "event_type"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,num_sales,deal_price,payment_token_decimals,payment_token_usdprice,quantity,block_number,is_private,deal_price_usd
user_account_address,event_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0x000000000000123ca35c69ba3f852a46b2a27c94,buy,3.0,1.479000e+18,36.0,4773.25,2.0,28445621.0,0.0,3536.472260
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,buy,10.0,3.730000e+18,36.0,4737.92,2.0,25493004.0,0.0,8785.110700
0x00000000000360176d958e11c140308cd0863679,buy,16799.0,4.304900e+18,216.0,28500.43,12.0,156312940.0,0.0,10278.447125
0x000000000004d7463d0f9c77383600bc82d612f5,buy,2.0,2.500000e+18,18.0,2249.89,1.0,13613175.0,0.0,5624.725000
0x00000000000a486c964069bb7390ae37010a04ca,buy,3.0,5.440000e+16,36.0,4731.51,2.0,29313998.0,0.0,128.946970
...,...,...,...,...,...,...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,buy,3.0,4.790000e+17,18.0,2392.29,1.0,14106408.0,0.0,1145.906910
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,buy,4.0,2.330000e+18,36.0,4713.89,2.0,28107432.0,0.0,5460.431400
0xfffff6e70842330948ca47254f2be673b1cb0db7,buy,6.0,9.195000e+18,54.0,7077.25,3.0,38203705.0,0.0,22076.943750
0xffffff5800b709071d4adc74759ae4b89bef2a9d,buy,2.0,9.600000e+16,36.0,4732.86,2.0,29325598.0,0.0,227.553940


## Which user has bought and sold NFT during the specified period?

In [104]:
x = wallets.groupby("user_account_address")["event_type"].nunique().reset_index()
x = x[x.event_type > 1].user_account_address.reset_index()

In [106]:
z = x.merge(y, on="user_account_address")
z.groupby(["user_account_address", "event_type"]).sum().reset_index().sort_values("user_account_address")

Unnamed: 0,user_account_address,event_type,index,num_sales,deal_price,payment_token_decimals,payment_token_usdprice,quantity,block_number,is_private,deal_price_usd
0,0x000000000ad266ec3db44bbe580e87f9baa358e6,buy,20,1.0,1.069000e+18,18.0,2397.48,1.0,1.469526e+07,0.0,2562.906120
1,0x001096190e8f3fa24b749704be231ab232348cb1,sell,6768,23337.0,2.661100e+19,846.0,111853.15,47.0,6.755032e+08,0.0,63338.054350
2,0x0014ea9bbe130c8af7f00c8e61fc07368bdaaf7d,buy,154,2.0,2.700000e+16,18.0,2380.60,1.0,1.302324e+07,0.0,64.276200
3,0x003e790127e6375be0e016e3294de76fe57ed369,buy,319,3.0,3.000000e+17,18.0,2380.60,1.0,1.309129e+07,0.0,714.180000
4,0x00493aa44bcfd6f0c2ecc7f8b154e4fb352d1c81,buy,716,3.0,3.500000e+17,36.0,4727.62,2.0,2.771503e+07,0.0,819.058500
...,...,...,...,...,...,...,...,...,...,...,...
3533,0xff955eff3d270d44b39d228f7ecdfe41ad5760b3,buy,214773,36280.0,1.030000e+18,18.0,2396.91,1.0,1.405268e+07,0.0,2468.817300
3534,0xff955eff3d270d44b39d228f7ecdfe41ad5760b3,sell,15678429,75488.0,2.264989e+19,1314.0,174437.31,82.0,9.803911e+08,1.0,54122.462054
3535,0xffa4d998539cc03b97bbc5ffab6232e08dd5201f,sell,18044376,43346.0,2.443730e+19,1512.0,199970.40,84.0,1.180700e+09,0.0,58175.436380
3536,0xffaeb8245a90057fe513f45ef571e102788fd71d,buy,214842,1.0,1.550000e+18,18.0,2397.48,1.0,1.444719e+07,0.0,3716.094000


In [99]:
wallets[wallets.user_account_address == "0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1"]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,user_account_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
21607,2022-04-16 00:23:47,buy,6825,4.0,2022-04-15 23:02:26,0x3c050243e71db15ed07e05784eb9c9b74f7a3b71,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,8e+18,ETH,18.0,...,0x0b579aaa7c5e0559d9750ece43e6050158d6d6d1f768...,0x5288a812de59d7091f7dfa049d52ed1287f81e89bcc2...,14593138.0,0.0,0 days 01:21:21,2022-04-16T00:24:25.491437,cool-cats-nft,0x7f268357a8c2552623316e2562d90e642bb538e5,0x3c050243e71db15ed07e05784eb9c9b74f7a3b71,19179.84
156452,2022-04-01 12:19:19,buy,8053,6.0,2022-03-31 19:31:11,0xc1a282e93651a3959bc4c0b15dc8879ae22086b1,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2.5e+19,ETH,18.0,...,0x91aa71143ce6f0814f483683d4d53c80597c183b506d...,0xd75c468f6db8ca2b3d83027f08ad5c677b8d76214357...,14500315.0,0.0,0 days 16:48:08,2022-04-01T12:19:53.978532,azuki,0x7f268357a8c2552623316e2562d90e642bb538e5,0xc1a282e93651a3959bc4c0b15dc8879ae22086b1,57509.0
184616,2022-03-02 06:40:15,buy,4502,4.0,2022-03-01 22:56:14,0xd75233704795206de38cc58b77a1f660b5c60896,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.125e+20,ETH,18.0,...,0x4b1ab04b1df5a77bfa0e614cc0f8f1708cafcf4a826e...,0xdcfbff895b037ed5bf100ecf682157d964687017eb68...,14306038.0,0.0,0 days 07:44:01,2022-03-02T06:40:52.256536,boredapeyachtclub,0x7f268357a8c2552623316e2562d90e642bb538e5,0xd75233704795206de38cc58b77a1f660b5c60896,259995.375
2171542,2022-05-01 05:25:25,sell,3514,8.0,2022-05-01 00:18:37,0x272937fbc0d9afb04eda6e67973346eb5998327b,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0x526401149dfcb40904f432a299004473802ae61e2f4d...,0xc83eb16ffa8e69851c679859875f90475b4502c866e3...,14690113.0,0.0,0 days 05:06:48,2022-05-01T05:25:34.444577,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171543,2022-05-01 05:24:50,sell,6270,4.0,2022-05-01 00:19:08,0xaae6b492d1e1bc0e8263d9965a2e8e565d6d176b,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0xd0a4374da6a8e761749964c3a52d3e3407d5fa73f9b9...,0x2f4089732165e442a074d8c57c287f56607e961ddb88...,14690112.0,0.0,0 days 05:05:42,2022-05-01T05:25:32.269839,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171544,2022-05-01 05:22:59,sell,2356,1.0,2022-05-01 00:18:51,0x8471d92cd0d98bee82258c106adf147bde76ed9f,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0x9fc1003b78d1b596bec4918cceacadfd8e66cba47e2d...,0x888a49d4233d15b4b0025028585a6de8497421d496a3...,14690100.0,0.0,0 days 05:04:08,2022-05-01T05:23:19.645256,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171545,2022-05-01 05:22:38,sell,2357,1.0,2022-05-01 00:19:40,0x445791c312981d59e0d1f464ba64d872bc016b83,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0x6e81442dc86543d80c28bbcbdc985c1645fd0fe4a266...,0xbfa84f7f98591d76550419bb5831365f8c2e3b530807...,14690099.0,0.0,0 days 05:02:58,2022-05-01T05:23:01.828797,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171546,2022-05-01 05:21:08,sell,2358,1.0,2022-05-01 00:19:18,0x0e453354e99553c807561683d538cbf1457bd76d,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0x8c83e3718ca767bc9d2dc83f4927836e8f68bb90a0f4...,0xcf4b5418846deaa0d075dc0c567c9048e8bc2ab20878...,14690090.0,0.0,0 days 05:01:50,2022-05-01T05:21:14.800199,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171547,2022-05-01 05:18:36,sell,2355,3.0,2022-05-01 00:19:30,0x11d4e98df3ce5bd7874e83cfa3867e804d455909,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0xecdee41c49df0c603faa5456bfa1a099ec837f44ef81...,0x467678764b691995b2cb580410c181e22efb2e746b51...,14690078.0,0.0,0 days 04:59:06,2022-05-01T05:18:57.361462,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205
2171548,2022-05-01 05:12:51,sell,6704,1.0,2022-05-01 00:18:59,0x050f30bd067ac136b471ed7cb7e7be05ca11d779,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,1.05e+18,ETH,18.0,...,0x3599535d02ce28f1153c3b968d13c2ba6b70edcbe6e6...,0xcb54bc103a9daa1cccb81d2738f589f09448e04c5b5a...,14690057.0,0.0,0 days 04:53:52,2022-05-01T05:13:22.644746,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1,2501.205


# Note
Feature engineering in ML
1. Feature Creation
1. Transformations
1. Feature Extraction
1. and Feature Selection.