# Wallet Feature Engineering

The purpose of this notebook is to create features from OpenSea `Asset Events` time series
in order to:
- model and predict NFT fear of missing out (FOMO) behavior
- classify types of people participating in NFT exchanges

# Read Data

__Description of the dataset:__ Asset events ("events") were extracted by
 [莊惟翔](https://github.com/Fred-Zhuang)
via https://api.opensea.io/api/v1/assets endpoint.
This dateset contains only __successful__ events having occurred on the NFTs
and been tracked by OpenSea.

1. a list of `token_seller_address`es having event timestamp
    between 2022-05-03 and 2022-05-18 was used as the primer to
    extract all events involving these addresses
    (see `os_successful_events.feather`)
2. the final list of events was then used for feature engineering

*The event_type field indicates the types of events (transfer, successful auction, etc)
and the results are sorted by event timestamp
(see [OpenSea API documentation](https://docs.opensea.io/reference/getting-assets)).

In [1]:
import os
import re
import time
import datetime
import pandas as pd

data_dir = os.path.join(os.getcwd(), 'data')
cool_cats_nft = os.path.join(data_dir, 'cool-cats-nft.feather')

start_time = time.time()
wallets = pd.read_feather(cool_cats_nft)
total_time = time.time() - start_time
print("Total seconds to load:", total_time)

Total seconds to load: 5.250505208969116


In [2]:
wallets.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398450 entries, 0 to 2398449
Data columns (total 29 columns):
 #   Column                  Non-Null Count    Dtype         
---  ------                  --------------    -----         
 0   index                   2398450 non-null  int64         
 1   event_timestamp         2398450 non-null  datetime64[ns]
 2   event_type              2398450 non-null  object        
 3   token_id                2382796 non-null  object        
 4   num_sales               2382796 non-null  float64       
 5   listing_time            2268796 non-null  datetime64[ns]
 6   token_owner_address     2382796 non-null  object        
 7   token_seller_address    2395806 non-null  object        
 8   deal_price              2398450 non-null  float64       
 9   payment_token_symbol    2398411 non-null  object        
 10  payment_token_decimals  2398445 non-null  float64       
 11  payment_token_usdprice  2397937 non-null  float64       
 12  quantity      

In [3]:
wallets.drop(["index", "starting_price", "ending_price",
              "approved_account", "bid_amount",
              "pages"], axis=1, inplace=True)

In [4]:
print("Most recent event:", max(wallets.event_timestamp))

Most recent event: 2022-05-10 23:35:55


In [5]:
print("Earliest event:", min(wallets.event_timestamp))

Earliest event: 2017-07-04 04:33:49


In [6]:
print("Length of this time series dataset:", max(wallets.event_timestamp) - min(wallets.event_timestamp))

Length of this time series dataset: 1771 days 19:02:06


In [7]:
print("Total number of wallet addresses used to retrieve data from OpenSea:", wallets.wallet_address_input.nunique())

Total number of wallet addresses used to retrieve data from OpenSea: 9766


## Top collections in this dataset

_The number represents the count of successful sales._

In [8]:
wallets.groupby("collection_slug").size().sort_values(ascending=False).head(20)

collection_slug
cool-cats-nft            39846
parallelalpha            18308
pudgypenguins            13679
deadfellaz               12764
robotos-official         11859
boredapeyachtclub        11832
rarible                  11239
mutant-ape-yacht-club    10691
thewickedcraniums        10569
cryptoadz-by-gremplin    10207
ape-gang-old             10096
creatureworld             9929
coolpetsnft               8916
bored-ape-kennel-club     8841
supducks                  8803
doodles-official          8515
animetas                  8359
adam-bomb-squad           8066
world-of-women-nft        7684
cyberkongz-vx             7680
dtype: int64

## WIP

... data distribution

## Questions

- What is asset_event `created_date`, and how does it differ from `event_timestamp`?
  Would asset_contract created_date be more useful than asset_event created_date?
- Why are `starting_price` and `ending_price` always _null_?

# Generate Features

## Impute data

### `buy` vs. `sell` event_type

_N.b._ The dataset is missing `winner_account_address` attribute to confirmed the buyer is indeed
the wallet owner, i.e. `wallet_address_input`. We therefore infer whether the wallet owner
is either the buyer or the seller as such:

In [9]:
import numpy as np

wallets.event_type = np.where(wallets.wallet_address_input == wallets.token_seller_address, "sell", "buy")

### `duration`
the time between the token listed and the completion of the the sale.

_What to do when listing_time is `NaN`_? These are rows which listing_time is NA.

In [10]:
wallets.duration = wallets.event_timestamp - wallets.listing_time
wallets.loc[:, ["token_seller_address", "event_type", "duration"]].head(10)

Unnamed: 0,token_seller_address,event_type,duration
0,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,sell,0 days 09:45:26
1,0xd44a7b02e9692f491fb360d6a509e37c06bcd579,buy,0 days 00:56:41
2,0x56a7a519cb9d369334a24c98b44164d18a9b8385,buy,0 days 00:10:33
3,0x278d9db7032ffe25c5fcec6fb517f4e2041805d3,buy,0 days 10:08:26
4,0xef9fdc930d645299d01440d82b6c417cbd8f7162,buy,0 days 00:23:56
5,0xef9fdc930d645299d01440d82b6c417cbd8f7162,buy,0 days 00:07:53
6,0xd0d20158daa57b04c1094b7c0fa31efbdd675b52,buy,0 days 00:18:01
7,0x721b1b99af3ccbc2d42c1934f0aabc006ea36e31,buy,0 days 00:12:30
8,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,sell,0 days 01:44:32
9,0x165dedca327cebac0a8222b71dc76a62b4727b83,buy,NaT


In [11]:
wallets.query('duration.isna()')

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,auction_type,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input
9,2022-05-02 16:16:29,buy,13597,2.0,NaT,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,0x165dedca327cebac0a8222b71dc76a62b4727b83,3.000000e+18,WETH,18.0,...,,0x86ab9e6360090aa0716197b36c1734001a6857e660bd...,0x4717c9782bdc9eeb08867872171dd3e4dcbdc19dea7e...,14699338.0,,NaT,2022-05-02T16:16:46.199124,cryptojankyz,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c
20,2022-04-27 19:36:05,buy,1767512068118639850098031759718131858502097765...,2.0,NaT,0x0000000000000000000000000000000000000000,0xd0d20158daa57b04c1094b7c0fa31efbdd675b52,3.890100e+18,WETH,18.0,...,,0xa1fc4b9d6316e54dc47fe355a4a10793987bc8d14d61...,0xf5be8c1fa9b0671ed0862c9d0cbc9b81d40316b37122...,14668416.0,,NaT,2022-04-27T19:36:20.229901,the-lenny-collection,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c
25,2022-04-25 23:40:37,buy,57,1.0,NaT,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,0x0fad8a3a515dedf35181af78371e9fea1dc064c2,3.200000e+18,WETH,18.0,...,,0xfdadb1210984a9ffffc6b4ea5faab144f750e3062249...,0x62ede6312238cf5cc92c6dfe309b50f878d558b2046c...,14656761.0,,NaT,2022-04-25T23:40:58.250711,the-carton,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c
33,2022-04-23 18:54:24,buy,187,5.0,NaT,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,0x6cdff80c43c67f03fc4e875ad26ede41f69a9a46,4.000000e+18,WETH,18.0,...,,0x556e9d160057ed6b9a28472764092bae112d11083575...,0xb1b3580f26824ae6397852bdab7913ac1083ebe4e3f2...,14642792.0,,NaT,2022-04-23T18:54:45.294386,the-carton,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c
39,2022-04-20 13:17:22,buy,92,1.0,NaT,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,0x6a60114b678b04be3fa094eb5abdc2d4ecd80769,3.880000e+18,WETH,18.0,...,,0x7db4150efec5c4180f97470f5b6ba36312e89ad5e528...,0xd6445e64bdf84e2afc558bf908b84b7c552db1c804a0...,14622132.0,,NaT,2022-04-20T13:17:33.592886,superplastic-supergucci,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2398393,2022-01-09 04:55:19,buy,4053,3.0,NaT,0x158f369abbbc8b516f985dfd0fa2197a2e47bf39,0x8e842d82e896a9f2cdea465d1afca20e386dcad9,2.000000e+18,WETH,18.0,...,,0xfe2f82387f67c6bf02c35ca8be4443d579473c0daade...,0xd06128eb8f0a63e1d8eef194248f38523ca47a2572dd...,13969293.0,,NaT,2022-01-09T04:55:46.770694,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a
2398402,2021-12-23 17:50:48,buy,5045,4.0,NaT,0xd3f290145b344952f1ace04397d7dbc8673d85f9,0x7d544a853dbcd39a53315e7002f4951a6d2f080d,9.500000e+17,WETH,18.0,...,,0xb3b7b6f025ea83678ee8b3a5baf6bb70702573d1c9ab...,0x1a73a3612a92fc3b2cbc342edb23b7e3fce37341b45c...,13862857.0,,NaT,2021-12-23T17:51:22.333160,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a
2398403,2021-12-23 15:01:22,buy,6607,3.0,NaT,0x2fc0b1d8f79ac8ac781105c6a601543d0cfdd672,0x7d544a853dbcd39a53315e7002f4951a6d2f080d,9.900000e+17,WETH,18.0,...,,0x97071c2bf76d37ab0df3989d17485b19907bb6141f07...,0x2ac25144cb2710dbba00cde512d0cd1162f3e03fb66c...,13862114.0,,NaT,2021-12-23T15:01:52.293115,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a
2398404,2021-12-23 14:33:12,buy,1209,3.0,NaT,0x24b1502a3df1d97be5b72cddd1db04fce596cd47,0x7d544a853dbcd39a53315e7002f4951a6d2f080d,9.990000e+17,WETH,18.0,...,,0x59c06b5c3620162d2eeba2304b9865dd8a0ec6c439cc...,0x1b0cadc16328ce1e6d7e4fbeeeec23393e59c4b56d86...,13862001.0,,NaT,2021-12-23T14:33:34.650029,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a


### `deal_price_usd` and payment token attributes

In [12]:
print(sum(wallets.payment_token_symbol.isna()), 
      sum(wallets.payment_token_decimals.isna()),
      sum(wallets.payment_token_usdprice.isna()))

39 5 513


In [13]:
wallets[wallets.payment_token_symbol.isna() |
        wallets.payment_token_decimals.isna() |
        wallets.payment_token_usdprice.isna()].shape[0]

513

is the total number of records missing either payment token symbol, token decimals, i.e. the deal price multiplication factor,
or the token to USD exchange rate. **We will ignore these record for now.**

In [14]:
wallets["deal_price_usd"] = wallets.deal_price / 10 ** wallets.payment_token_decimals * wallets.payment_token_usdprice
wallets.deal_price_usd.agg({max, np.mean, min})

max     6.925186e+06
mean    2.557588e+03
min     0.000000e+00
Name: deal_price_usd, dtype: float64

In [15]:
wallets[wallets.quantity.isna()]["deal_price_usd"].quantile(q=[x / 10 for x in range(0, 10)])

0.0       0.000000
0.1      11.217255
0.2      19.093200
0.3      32.133298
0.4      47.733000
0.5      72.282328
0.6     115.448000
0.7     188.907200
0.8     377.833480
0.9    1158.180000
Name: deal_price_usd, dtype: float64

### `quantity`

_How does this field differ from `num_sales`?_

In [16]:
wallets.quantity.unique()

array([1.00000000e+00, 2.00000000e+00, 4.00000000e+00, 7.00000000e+01,
       3.00000000e+00, 9.00000000e+00, 1.00000000e+01, 2.00000000e+01,
       5.00000000e+00, 1.50000000e+01, 6.00000000e+00, 2.50000000e+01,
       5.00000000e+01, 7.00000000e+00, 8.00000000e+00, 1.10000000e+01,
       1.20000000e+01, 6.50000000e+02, 1.40000000e+01, 1.30000000e+01,
       1.90000000e+01, 1.60000000e+01,            nan, 1.28000000e+02,
       1.00000000e+11, 1.80000000e+01, 2.40000000e+01, 2.20000000e+01,
       2.10000000e+01, 1.70000000e+01, 2.90000000e+01, 3.50000000e+01,
       4.00000000e+01, 2.80000000e+01, 3.00000000e+01, 1.00000000e+02,
       6.40000000e+01, 3.20000000e+01, 1.92000000e+02, 4.20000000e+02,
       2.50000000e+02, 5.10000000e+01, 2.00000000e+02, 6.90000000e+01,
       1.00000000e+03, 8.00000000e+01, 1.72000000e+02, 1.88000000e+02,
       5.50000000e+01, 1.00000000e+22, 4.00000000e+18, 2.70000000e+01,
       1.00000000e+09, 1.00000000e+04, 2.50000000e+03, 2.30000000e+01,
      

### `is_private` sales

_Do we assume Nan is __not__ private?_

In [17]:
wallets.is_private.value_counts(dropna=False)

0.0    2230703
NaN     129654
1.0      38093
Name: is_private, dtype: int64

### `deal_price == 0`

Do we keep these rows?

In [18]:
wallets[wallets.deal_price == 0]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
1588,2022-01-01 19:21:07,buy,2417312282534667090492932735795092001469160449...,1.0,2022-01-01 17:58:19,0x0000000000000000000000000000000000000000,0x357180aea6a6de030dc561fc7d9455c3e7271d12,0.0,ETH,18.0,...,0x443ecdf9864af31499e7ec43d7109ec2c6c797fa66c4...,0xf9777aaa44ccd066e8828f8c93b710dc13ae1ce35eb7...,13921384.0,0.0,0 days 01:22:48,2022-01-01T19:21:53.087963,demoncrazy,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a88a21896f963f59f2c3e0ee2247565dd9f257,0.0
1589,2022-01-01 19:11:30,buy,2417312282534667090492932735795092001469160449...,1.0,2022-01-01 17:58:00,0x0000000000000000000000000000000000000000,0x357180aea6a6de030dc561fc7d9455c3e7271d12,0.0,ETH,18.0,...,0xe0955f340393640e3eb8d517ee2cbba8e0d95a326253...,0x107ce240e83c26577ef80e643088309f379ed09b7edc...,13921349.0,0.0,0 days 01:13:30,2022-01-01T19:12:13.846942,demoncrazy,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a88a21896f963f59f2c3e0ee2247565dd9f257,0.0
1590,2022-01-01 19:07:30,buy,2417312282534667090492932735795092001469160449...,1.0,2022-01-01 18:01:15,0x0000000000000000000000000000000000000000,0x357180aea6a6de030dc561fc7d9455c3e7271d12,0.0,ETH,18.0,...,0x264c41d336dd85b9e5aff83a21333e1ef338e32f2bb1...,0xc57c8497652001dc0385f7df57cab712321fcd2d3c24...,13921333.0,0.0,0 days 01:06:15,2022-01-01T19:08:13.332175,demoncrazy,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a88a21896f963f59f2c3e0ee2247565dd9f257,0.0
1705,2021-08-25 14:53:03,buy,2461859473977232508107101225196807725398377359...,27.0,2021-08-23 11:03:54,0x0000000000000000000000000000000000000000,0x366da17e4ce9f7b7e294044928483ca18b291ad7,0.0,ETH,18.0,...,0x4107879a0873df3740a5c07bd1d849bcc6edf58e39be...,0x449dc711f5b9468b43f02286a000be9edb79bb63e334...,13095165.0,0.0,2 days 03:49:09,2021-08-25T14:53:44.861868,crownedshhans,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a88a21896f963f59f2c3e0ee2247565dd9f257,0.0
1709,2022-05-07 20:14:22,sell,2192766832090105266929319151737047525070907282...,1.0,2022-05-04 13:04:18,0x0000000000000000000000000000000000000000,0x307a9ed60faabde3c98eabd2903fcf59f4ec16f9,0.0,ETH,18.0,...,0x6f8b29ee8e9fd3b375fddff06c5cc777babde143c105...,0x10645d263e031eb33799e93fd46634beeb9634a9d4b3...,14731848.0,1.0,3 days 07:10:04,2022-05-07T20:15:02.005799,the-multiverse-of-leos,0x7f268357a8c2552623316e2562d90e642bb538e5,0x307a9ed60faabde3c98eabd2903fcf59f4ec16f9,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2396900,2021-10-25 13:37:26,buy,8049430702452934601805365049091252991673968081...,318.0,2021-10-25 09:14:42,0x0000000000000000000000000000000000000000,0xb1f629cb1b6841e5ac98aa5b1cffab1b6c88d8b7,0.0,ETH,18.0,...,0xf9e1b509d462cc2f495030274ee427d4d539099f98c1...,0x956a6240baf1431ca723b896601d79aafc6b423ab66a...,13486907.0,1.0,0 days 04:22:44,2021-10-25T13:37:53.931085,ghxsts-cxlture,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0b67a3d8708e3375a5226208eb3764e345b9ada2,0.0
2396927,2021-10-12 21:55:50,buy,8049430702452934601805365049091252991673968081...,321.0,2021-09-23 14:33:16,0x0000000000000000000000000000000000000000,0xb1f629cb1b6841e5ac98aa5b1cffab1b6c88d8b7,0.0,ETH,18.0,...,0x69f08dff10af4be087d806b680ce57056e3885772116...,0xea0b471322f3c2fe6ac41ac496b341995770dc0fd8da...,13406197.0,1.0,19 days 07:22:34,2021-10-12T21:56:12.436029,ghxsts-cxlture,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0b67a3d8708e3375a5226208eb3764e345b9ada2,0.0
2396928,2021-10-12 21:35:05,buy,8049430702452934601805365049091252991673968081...,384.0,2021-10-11 11:26:18,0x0000000000000000000000000000000000000000,0xb1f629cb1b6841e5ac98aa5b1cffab1b6c88d8b7,0.0,ETH,18.0,...,0x021bd30738781c57280825851190f66059e3429ae9d4...,0xd1b635ca2c6335de144ebfba355034e063e9d3519995...,13406100.0,1.0,1 days 10:08:47,2021-10-12T21:35:36.925393,ghxsts-cxlture,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0b67a3d8708e3375a5226208eb3764e345b9ada2,0.0
2397060,2021-08-29 19:05:29,buy,3854399958718048308645366768870891195385370715...,1.0,2021-08-29 04:36:29,0x0000000000000000000000000000000000000000,0x55372173689c288552885d897d32f5f706f79aa6,0.0,ETH,18.0,...,0x19775255f725515fd0daa893f6d230eed7372c2cb5d6...,0x0ef791b5b9e90fabe91a7729db7728096d39a2aaaf38...,13122142.0,1.0,0 days 14:29:00,2021-08-29T19:05:58.694639,bitgans,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0b67a3d8708e3375a5226208eb3764e345b9ada2,0.0


### missing payment tokens

In [19]:
print("number of rows missing payment token data:",
      sum(wallets.payment_token_symbol.isna()))

number of rows missing payment token data: 39


### quantity is missing

In [20]:
wallets[~wallets.payment_token_symbol.isna()].quantity.quantile(q=[x / 1000 for x in range(0, 1000)]).tail(20)

0.980     1.0
0.981     1.0
0.982     1.0
0.983     1.0
0.984     1.0
0.985     1.0
0.986     1.0
0.987     1.0
0.988     1.0
0.989     2.0
0.990     2.0
0.991     2.0
0.992     2.0
0.993     2.0
0.994     3.0
0.995     3.0
0.996     5.0
0.997     5.0
0.998     7.0
0.999    10.0
Name: quantity, dtype: float64

Since 98.9% of users trade one tokens at a time, do we impute the following NA with 1?

In [21]:
wallets[wallets.quantity.isna()]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,transaction_hash,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
97775,2020-06-27 19:12:07,buy,147226,1.0,NaT,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,0x42a60d2f2ffa2150c568010a8d425f0aad284fd2,3.249210e+16,ETH,18.0,...,0x2ff432a6205be89a625a7016815b5a81991541d94173...,0x0c2cd5d23b6676863b2278e225ed650f4c0884860c4b...,10349619.0,,NaT,2020-06-27T19:12:48.763876,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,75.729669
97776,2020-06-20 16:16:52,buy,44584,2.0,NaT,0x6d1f5fac38edca69b9d02637a173c1e62d331896,0x8b3ad493c077e894a034db7eb53e8285560298fd,2.450000e+16,ETH,18.0,...,0x21746dd417132bb6e3b6ca77c809ba62322302ff16ea...,0x7124a292f83004a5f888acd3d808a4e6544c94568534...,10303653.0,,NaT,2020-06-20T16:17:15.773754,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,57.102395
97777,2020-06-19 17:01:06,buy,140164,4.0,NaT,0xc301878610b1952e94a57d9958fba4ec043537e4,0x9b7061023cd42263448d48c48572507f19f39b78,7.500000e+16,ETH,18.0,...,0xb0993b84033070e3b6e11c10e55ffcc78386f287188c...,0xc08102bade64f5976a023213267cc09b06dc714a5df4...,10297405.0,,NaT,2020-06-19T17:01:53.569908,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,174.803250
97778,2020-06-19 16:56:36,buy,71500,3.0,NaT,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,0x937b9093cd9d8798930e394d188f9b5596d49f54,5.000000e+16,ETH,18.0,...,0x97ce92af142c265957c66c5db4909a66d282b39e2df6...,0xe42efeaef744ded41ffd2d35a40eccef40a87c89a98b...,10297384.0,,NaT,2020-06-19T16:57:06.608060,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,116.535500
97779,2020-06-19 16:52:38,buy,121128,2.0,NaT,0x7a9112792211461205db4191381866b0508fa4a8,0xb4d055d63cc6a7bfb51a588ddaeb245ce5e3fc48,1.890000e+16,ETH,18.0,...,0xeb9f95b93ee31757b257ba06a30d35a56bb0de35a511...,0xf0237c4fada0f5fb6fdde8c1ba8702bac7cf1598ea44...,10297361.0,,NaT,2020-06-19T16:52:55.760877,axie,0xf4985070ce32b6b1994329df787d1acc9a2dd9e2,0x3bd77b00f02c8bcff586c565e2c5e6b6c5878ec3,44.050419
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2168131,2019-05-20 01:12:22,buy,23300,1.0,2019-05-20 00:09:28,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x37e0c7d0b3b606c2dd9ec83fdf3c1c6aff70e02b60d8...,"b""t`\x91E\x9d\x96;\x9e\xe3\xd6y'\xfaO\xca\xbe\...",7793907.0,0.0,0 days 01:02:54,2019-05-20T01:12:56.975962,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168132,2019-05-20 01:12:07,buy,32194,1.0,2019-05-20 00:11:56,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x0362ec6a23c7cc9aff6ca78587ffaa09bf8708ad51cd...,"b""w\xb2\xa2\xecN\xe2\xf4\x10N\xd8\xfc\xa5\x8d'...",7793905.0,0.0,0 days 01:00:11,2019-05-20T01:12:31.755591,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168133,2019-05-20 01:10:19,buy,23301,1.0,2019-05-20 00:09:59,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x55145bccaeed167e32804c9b2af89a859993ddba6598...,b'\x06B\x1c7D\xcd\x89\xd1\xbe\xcb\xf1\xd5~?v\x...,7793900.0,0.0,0 days 01:00:20,2019-05-20T01:10:57.798340,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100
2168134,2019-05-20 01:09:42,buy,33912,1.0,2019-05-20 00:16:54,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,0x05f9bbbd6af1699cb3ca8c14bd38b2b47bd6b2ec,1.000000e+15,ETH,18.0,...,0x613a7d28e158d5db005e808dbd4d152cbf0ed8968b68...,b'4>S\xda\xff\x15\x02\xe5\x1dP=\xaa\xfb{]\xc7\...,7793896.0,0.0,0 days 00:52:48,2019-05-20T01:10:15.673318,neon-district,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xa5a0b7c3dd5dddbfbd51e56b9170bb6d1253788b,2.382100


In [22]:
sum(wallets.quantity.isna())

4262

Outlier or bad data?

In [23]:
max(wallets.quantity)

1e+22

## WIP

1. 限定數量為一(不算bundle) 2022/6/09

2. 將錢包地址分組

3. 暫時給定任一個錢包地址來進行以下特徵計算 (最後再用迴圈串起來)

_*Note to Fred:*_ 'quantity' has already been converted to float during the initial data load.

In [24]:
#2022/6/09
df_temp2 = wallets.drop(columns=["payment_token_usdprice", "asset_bundle","auction_type","transaction_hash", \
                       "block_hash","block_number","is_private","duration"])

#限定數量為一(不算bundle) 2022/6/09
df_temp2 = df_temp2[df_temp2['quantity'] == 1]

In [25]:
#將錢包地址分組
sectors = df_temp2.groupby("wallet_address_input")

#暫時給定任一個錢包地址來進行以下特徵計算 (最後再用迴圈串起來)
df_temp3 = sectors.get_group("0x5338035c008ea8c4b850052bc8dad6a33dc2206c")
df_temp3 = df_temp3.reset_index(drop = True)

In [26]:
df_temp3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5496 entries, 0 to 5495
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   event_timestamp         5496 non-null   datetime64[ns]
 1   event_type              5496 non-null   object        
 2   token_id                5496 non-null   object        
 3   num_sales               5496 non-null   float64       
 4   listing_time            5448 non-null   datetime64[ns]
 5   token_owner_address     5496 non-null   object        
 6   token_seller_address    5496 non-null   object        
 7   deal_price              5496 non-null   float64       
 8   payment_token_symbol    5496 non-null   object        
 9   payment_token_decimals  5496 non-null   float64       
 10  quantity                5496 non-null   float64       
 11  created_date            5496 non-null   object        
 12  collection_slug         5496 non-null   object  

In [27]:
df_temp3.head()

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,quantity,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd
0,2022-05-04 03:29:18,buy,2977,1.0,2022-05-04 03:19:37,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x4984fc170325e8fe57e9de1c2b74ce5eabb6f9da,8.45e+17,ETH,18.0,1.0,2022-05-04T03:29:50.900288,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2020.38655
1,2022-05-04 03:29:18,buy,3016,1.0,2022-05-04 00:14:49,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xc293bc1602efeba837cb240c49476e1d3fe0fd98,8.45e+17,ETH,18.0,1.0,2022-05-04T03:29:50.461508,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2020.38655
2,2022-05-04 03:29:18,buy,4956,1.0,2022-05-03 19:39:05,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xe2fb909159dea75b1520c382ca102989cdd1a276,8.4e+17,ETH,18.0,1.0,2022-05-04T03:29:50.009580,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2008.4316
3,2022-05-04 03:29:18,buy,5078,1.0,2022-05-04 01:40:56,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0x8a45d09b2dbbf1657fb8c14561b6525443631d22,8.5e+17,ETH,18.0,1.0,2022-05-04T03:29:49.627993,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2032.3415
4,2022-05-04 03:29:18,buy,5800,1.0,2022-05-04 02:52:43,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,0xcc60f720388551bc9159cfed814a15de2f49d1e9,8.49e+17,ETH,18.0,1.0,2022-05-04T03:29:49.240455,fragments-by-james-jean,0x7f268357a8c2552623316e2562d90e642bb538e5,0x5338035c008ea8c4b850052bc8dad6a33dc2206c,2029.95051


In [28]:
#買入價格，給定以太鏈。Eth = deal_price/(10**18)
df_temp3["cost"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], 0,df_temp3["deal_price"]/10**18)
#賣出價格
df_temp3["sellprice"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], df_temp3["deal_price"]/10**18, 0)

## cost and sellprice probably aren't necessary since there is the 'Buy_Sell' created below
## a deal_price_usd is recommended

#日期轉換
df_temp3["Datetime"] = pd.to_datetime(df_temp3["event_timestamp"]) # this can be stored in event_timestamp instead of a new column
#買賣戳記
# Consider overwriting event_type column
df_temp3["Buy_Sell"] = np.where(df_temp3["wallet_address_input"][0]==df_temp3["token_seller_address"], "S", 'B')
#投資組合(庫存) << what do we plan to store here?
df_temp3["Profolio"] = np.NaN
#損益 << Profit? How do we plan to calculate this for each row of event?
df_temp3["PL"] = 0
#token持有數量 << 
df_temp3["NFT_total_num"] = 0
#用collection_slug和tokenid組一個獨立欄位，用以紀錄錢包所持有的token
df_temp3["collection_slug_tokenid"] = df_temp3["collection_slug"] + df_temp3["token_id"]
#token從二級買進到賣出所持有的時間
df_temp3["HoldPeriod"] = np.NaN
df_temp3["Position"] = 0
df_temp3["Sell"] = 0

In [29]:
df_temp3.iloc[:,22:].head()

Unnamed: 0,NFT_total_num,collection_slug_tokenid,HoldPeriod,Position,Sell
0,0,fragments-by-james-jean2977,,0,0
1,0,fragments-by-james-jean3016,,0,0
2,0,fragments-by-james-jean4956,,0,0
3,0,fragments-by-james-jean5078,,0,0
4,0,fragments-by-james-jean5800,,0,0


In [30]:
df_temp3.iloc[:,22:].tail()

Unnamed: 0,NFT_total_num,collection_slug_tokenid,HoldPeriod,Position,Sell
5491,0,fortune-media1,,0,0
5492,0,fortune-media1,,0,0
5493,0,knightstory35613,,0,0
5494,0,rumble-kong-league3933,,0,0
5495,0,rumble-kong-league3933,,0,0


### Is the code block below attempting to calcuate the _current cumulative stat_ of each wallet?

In [31]:
porfolio_dict = {}#紀錄持有的NFT集合
porfolio_costdict = {}#紀錄買入成本
porfolio_datedict = {}#紀錄買入時間
count = 0
error = []
#資料時間是從新到舊，計算時要倒序，從舊到新去累計上來。
for i in range(len(df_temp3)-1,-1,-1):
    #初次買進NFT項目
    if df_temp3["collection_slug"][i] not in porfolio_dict.keys():
        if df_temp3["Buy_Sell"][i]=="B":
            #庫存加一
            count = count+1
            porfolio_dict[df_temp3["collection_slug"][i]] = [df_temp3["token_id"][i]]
            df_temp3.loc[i, "Profolio"] = [porfolio_dict]
            df_temp3.loc[i, "NFT_total_num"] = count
            #NFT成本
            porfolio_costdict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["cost"][i]
            #NFT買進時間
            porfolio_datedict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["Datetime"][i]
            #position
            df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
            
        else:
            #賣出代表過去有可能發生來自於其他錢包轉移，但無法計算到先前持有的成本。
            df_temp3.loc[i, "NFT_total_num"] = count
            df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
    else:
        #手上持有此項目的NFT
        if df_temp3["token_id"][i] not in porfolio_dict[df_temp3["collection_slug"][i]]:
            if df_temp3["Buy_Sell"][i]=="B":
                #買進加碼
                porfolio_dict[df_temp3["collection_slug"][i]].append(df_temp3["token_id"][i])
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                #庫存加一
                count = count+1
                df_temp3.loc[i, "NFT_total_num"] = count
                #NFT成本
                porfolio_costdict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["cost"][i]
                #NFT買進時間
                porfolio_datedict[df_temp3["collection_slug_tokenid"][i]] = df_temp3["Datetime"][i]
                #position
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
                
            else:
                #賣出。有可能發生來自於其他錢包轉移，但無法計算到先前持有的成本。
                df_temp3.loc[i, "NFT_total_num"] = count
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
        else:
            if df_temp3["Buy_Sell"][i]=="B":
                #不可能發生，因為tokenid是唯一的?
                df_temp3.loc[i, "NFT_total_num"] = count
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
            else:
                #損益發生點，完成一次買入跟賣出
                #庫存減一
                count = count-1
                df_temp3.loc[i, "NFT_total_num"] = count
                #將token從porfolio移除
                porfolio_dict[df_temp3["collection_slug"][i]].remove(df_temp3["token_id"][i])
                df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                if df_temp3["collection_slug_tokenid"][i] in porfolio_costdict.keys():
                    profit = df_temp3["sellprice"][i] - porfolio_costdict[df_temp3["collection_slug_tokenid"][i]]
                    df_temp3.loc[i, "PL"] =  profit        
                    #丟棄key and value因為賣出了
                    porfolio_costdict.pop(df_temp3["collection_slug_tokenid"][i])
                    df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())
                    #TOKEN從買入到賣出持有間隔時間
                    date_substrate = df_temp3["Datetime"][i] - porfolio_datedict[df_temp3["collection_slug_tokenid"][i]]
                    df_temp3.loc[i, "HoldPeriod"] =  date_substrate
                    #賣出戳記
                    df_temp3.loc[i, "Sell"] =  1
                    
                else:
                    #通常不會到這裡
                    error.append([df_temp3["wallet_address_input"][0],df_temp3["collection_slug_tokenid"][i]])
                    df_temp3.loc[i, "Profolio"] = [porfolio_dict]
                    df_temp3.loc[i, "Position"] = sum(porfolio_costdict.values())

In [32]:
#損益為正
def positive_SIGN(row):
    if row['PL_sign'] == 1:
        return 1
    return 0

#損益為負
def negative_SIGN(row):
    if row['PL_sign'] == -1 :
        return 1
    return 0

In [33]:
#累計損益是在一個錢包裡完成完整的買進賣出動作所累計的。
df_temp3['cum_PL'] = df_temp3.loc[::-1, 'PL'].cumsum()[::-1]
#總利潤
df_temp3['TotalRevenue'] = df_temp3['cum_PL'] - df_temp3["Position"]
#損益正負符號
df_temp3["PL_sign"] = np.sign(list(df_temp3["PL"].values))
#累計賣出數量
df_temp3["cum_Sell"] = df_temp3.loc[::-1, 'Sell'].cumsum()[::-1]
#損益為正做記號
df_temp3["positive_sign"] = df_temp3.apply(lambda row: positive_SIGN(row), axis=1)
#損益為負做記號
df_temp3["negative_sign"] = df_temp3.apply(lambda row: negative_SIGN(row), axis=1)
#累積正損益數
df_temp3["cum_positive_sign"] = df_temp3.loc[::-1, 'positive_sign'].cumsum()[::-1]
#累積負損益數
df_temp3["cum_negative_sign"] = df_temp3.loc[::-1, 'negative_sign'].cumsum()[::-1]
#勝率
df_temp3["winrate"] = df_temp3["cum_positive_sign"] / df_temp3['cum_Sell']
#輸錢率
df_temp3["lossrate"] = df_temp3["cum_negative_sign"] / df_temp3['cum_Sell']
#用0填補缺值
df_temp3["winrate"] = df_temp3["winrate"].fillna(0)
df_temp3["lossrate"] = df_temp3["lossrate"].fillna(0)
#接受問價而賣出做紀號
df_temp3["Bid_sell"] = np.where((df_temp3["payment_token_symbol"]=="WETH")&(df_temp3["Buy_Sell"]=="S"), 1,0)
#透過問價而買入做紀號
df_temp3["Bid_buy"] = np.where((df_temp3["payment_token_symbol"]=="WETH")&(df_temp3["Buy_Sell"]=="B"), 1,0)
#累計問價買入數
df_temp3["cum_Bid_buy"] = df_temp3.loc[::-1, 'Bid_buy'].cumsum()[::-1]
#累計接受問價賣出數
df_temp3["cum_Bid_sell"] = df_temp3.loc[::-1, 'Bid_sell'].cumsum()[::-1]

#勝率(透過問價而買入&接受問價而賣出)前者代表很會釣魚，後者代表失去信心或是無法抵抗高價誘惑
df_temp3["Bid_sell_rate"] = df_temp3["cum_Bid_sell"] / df_temp3["cum_Sell"]
df_temp3["Bid_sell_rate"] = df_temp3["Bid_sell_rate"].fillna(0)
df_temp3["Bid_buy_rate"] = df_temp3["cum_Bid_buy"] / df_temp3["NFT_total_num"]
df_temp3["Bid_buy_rate"] = df_temp3["Bid_buy_rate"].fillna(0) #2022/06/09
#TOKEN賣出數/手上TOKEN持有數
df_temp3["sellposition_rate"] = df_temp3["cum_Sell"]/df_temp3["NFT_total_num"]

In [34]:
df_temp3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5496 entries, 0 to 5495
Data columns (total 44 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   event_timestamp          5496 non-null   datetime64[ns]
 1   event_type               5496 non-null   object        
 2   token_id                 5496 non-null   object        
 3   num_sales                5496 non-null   float64       
 4   listing_time             5448 non-null   datetime64[ns]
 5   token_owner_address      5496 non-null   object        
 6   token_seller_address     5496 non-null   object        
 7   deal_price               5496 non-null   float64       
 8   payment_token_symbol     5496 non-null   object        
 9   payment_token_decimals   5496 non-null   float64       
 10  quantity                 5496 non-null   float64       
 11  created_date             5496 non-null   object        
 12  collection_slug          5496 non-

In [35]:
df_temp3.describe().loc["mean"]

num_sales                 9.213319e+01
deal_price                5.946526e+17
payment_token_decimals    1.800000e+01
quantity                  1.000000e+00
deal_price_usd            1.421770e+03
cost                      1.670759e-01
sellprice                 4.275767e-01
PL                       -7.837791e-03
NFT_total_num             5.413473e+02
Position                  1.784616e+02
Sell                      9.916303e-02
cum_PL                    4.250707e+01
TotalRevenue             -1.359545e+02
PL_sign                   1.855895e-02
cum_Sell                  2.252778e+02
positive_sign             5.877001e-02
negative_sign             4.021106e-02
cum_positive_sign         1.615608e+02
cum_negative_sign         6.364938e+01
winrate                   7.558965e-01
lossrate                  2.437949e-01
Bid_sell                  8.551674e-03
Bid_buy                   1.819505e-04
cum_Bid_buy               2.805677e-01
cum_Bid_sell              2.139374e+01
Bid_sell_rate            

## WIP

In [36]:
wallets.token_seller_address.nunique()

215099

In [37]:
wallets.quantity.unique()

array([1.00000000e+00, 2.00000000e+00, 4.00000000e+00, 7.00000000e+01,
       3.00000000e+00, 9.00000000e+00, 1.00000000e+01, 2.00000000e+01,
       5.00000000e+00, 1.50000000e+01, 6.00000000e+00, 2.50000000e+01,
       5.00000000e+01, 7.00000000e+00, 8.00000000e+00, 1.10000000e+01,
       1.20000000e+01, 6.50000000e+02, 1.40000000e+01, 1.30000000e+01,
       1.90000000e+01, 1.60000000e+01,            nan, 1.28000000e+02,
       1.00000000e+11, 1.80000000e+01, 2.40000000e+01, 2.20000000e+01,
       2.10000000e+01, 1.70000000e+01, 2.90000000e+01, 3.50000000e+01,
       4.00000000e+01, 2.80000000e+01, 3.00000000e+01, 1.00000000e+02,
       6.40000000e+01, 3.20000000e+01, 1.92000000e+02, 4.20000000e+02,
       2.50000000e+02, 5.10000000e+01, 2.00000000e+02, 6.90000000e+01,
       1.00000000e+03, 8.00000000e+01, 1.72000000e+02, 1.88000000e+02,
       5.50000000e+01, 1.00000000e+22, 4.00000000e+18, 2.70000000e+01,
       1.00000000e+09, 1.00000000e+04, 2.50000000e+03, 2.30000000e+01,
      

## Create features

In [38]:
wallets.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398450 entries, 0 to 2398449
Data columns (total 24 columns):
 #   Column                  Non-Null Count    Dtype          
---  ------                  --------------    -----          
 0   event_timestamp         2398450 non-null  datetime64[ns] 
 1   event_type              2398450 non-null  object         
 2   token_id                2382796 non-null  object         
 3   num_sales               2382796 non-null  float64        
 4   listing_time            2268796 non-null  datetime64[ns] 
 5   token_owner_address     2382796 non-null  object         
 6   token_seller_address    2395806 non-null  object         
 7   deal_price              2398450 non-null  float64        
 8   payment_token_symbol    2398411 non-null  object         
 9   payment_token_decimals  2398445 non-null  float64        
 10  payment_token_usdprice  2397937 non-null  float64        
 11  quantity                2394188 non-null  float64        
 12  

### Wallet age

In [39]:
grp=wallets.rename(columns={"token_seller_address": "user_account_address"}) \
    .groupby("user_account_address")
grp.agg({"event_timestamp": [max, min]}) \
    .assign(wallet_age=lambda x : x.loc[:, ("event_timestamp", "max")] - x.loc[:, ("event_timestamp", "min")]) \
    .sort_values(by=["wallet_age", ("event_timestamp", "min")])

Unnamed: 0_level_0,event_timestamp,event_timestamp,wallet_age
Unnamed: 0_level_1,max,min,Unnamed: 3_level_1
user_account_address,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
0x9bb6e0fd262cf8f4340d0466fbf03e5605a9c6c5,2018-02-23 22:44:46,2018-02-23 22:44:46,0 days 00:00:00
0xd3321eab160b18ce8540934dc6180354bdeaa6d9,2018-02-24 02:08:16,2018-02-24 02:08:16,0 days 00:00:00
0x77b14fae06182a5f5bcefafeb5283156b4b57b08,2018-02-24 02:45:49,2018-02-24 02:45:49,0 days 00:00:00
0x8e83809eca1ce61e2e1932a03e9049333638b1be,2018-02-24 06:36:28,2018-02-24 06:36:28,0 days 00:00:00
0x1dda7c712aaf1f323d61557ecb18f3f9728c8be5,2018-02-24 17:06:19,2018-02-24 17:06:19,0 days 00:00:00
...,...,...,...
0x442dccee68425828c106a3662014b4f131e3bd9b,2022-05-05 02:13:02,2018-03-01 06:27:09,1525 days 19:45:53
0x12a0e25e62c1dbd32e505446062b26aecb65f028,2022-05-07 12:12:30,2018-02-24 18:31:49,1532 days 17:40:41
0x2a5ba6819249aa93c0ad8711a9f8058360083fb7,2022-05-09 13:13:33,2018-02-26 02:14:48,1533 days 10:58:45
0xd387a6e4e84a6c86bd90c158c6028a58cc8ac459,2022-05-08 21:45:05,2018-02-24 19:53:03,1534 days 01:52:02


### _buy_ vs _sell_ to date

- The total number aka __count__ of transactions and the quantity aka __sum__ of NFT
- The median and the total amount of transactions in USD

In [40]:
df=wallets.rename(columns={"token_seller_address": "user_account_address"}) \
    .loc[:, ["user_account_address", "event_type", "quantity", "deal_price_usd"]] \
    .pivot_table(index="user_account_address",
                 columns="event_type",
                 values=["quantity", "deal_price_usd"],
                 aggfunc={"quantity": ["count", "sum"], "deal_price_usd": ["median", "sum"]},
                 fill_value=0)
df

Unnamed: 0_level_0,deal_price_usd,deal_price_usd,deal_price_usd,deal_price_usd,quantity,quantity,quantity,quantity
Unnamed: 0_level_1,median,median,sum,sum,count,count,sum,sum
event_type,buy,sell,buy,sell,buy,sell,buy,sell
user_account_address,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3
0x000000000000123ca35c69ba3f852a46b2a27c94,1768.236130,0.0,3536.472260,0.0,2,0,2.0,0.0
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,4392.555350,0.0,8785.110700,0.0,2,0,2.0,0.0
0x00000000000360176d958e11c140308cd0863679,513.410040,0.0,10278.447125,0.0,12,0,12.0,0.0
0x000000000004d7463d0f9c77383600bc82d612f5,5624.725000,0.0,5624.725000,0.0,1,0,1.0,0.0
0x00000000000a486c964069bb7390ae37010a04ca,64.473485,0.0,128.946970,0.0,2,0,2.0,0.0
...,...,...,...,...,...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,1145.906910,0.0,1145.906910,0.0,1,0,1.0,0.0
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,2730.215700,0.0,5460.431400,0.0,2,0,2.0,0.0
0xfffff6e70842330948ca47254f2be673b1cb0db7,345.487500,0.0,22076.943750,0.0,3,0,3.0,0.0
0xffffff5800b709071d4adc74759ae4b89bef2a9d,113.776970,0.0,227.553940,0.0,2,0,2.0,0.0


Examples which user had _bundled_ multitple NFTs in a transaction

In [41]:
df[ df[("quantity", "sum", "sell")] > df[("quantity", "count", "sell")] ].loc[:, "quantity"]

Unnamed: 0_level_0,count,count,sum,sum
event_type,buy,sell,buy,sell
user_account_address,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0x0004ff7e7217dc672874fece2c7588581e97b1a7,1,12,1.0,16.0
0x000f9aa9783be4d2955faca0f9a4d3c676fc9e0b,30,436,30.0,445.0
0x0015b091ba5d9b3a7a84b77bc33007b1f4700dc7,5,61,5.0,67.0
0x00668bd79ede077b99bbe1c4db59418bc333d4cf,87,482,91.0,529.0
0x00845d3a8773c9323a1046d9fa885917f39987ba,21,313,21.0,314.0
...,...,...,...,...
0xff9911abdbe9d1f7d1a19595b93905c2a9ad60f4,59,596,59.0,597.0
0xffacee28004c857ef41a8b6ebd82e8c6d2c68355,1,21,1.0,22.0
0xffaeb8245a90057fe513f45ef571e102788fd71d,4,21,4.0,22.0
0xffce09ca00041e196e10458d5f981c0a1a76fe98,5,127,5.0,128.0


Another example showing the transaction history

In [42]:
wallets.set_index("token_seller_address") \
    .loc["0x0004ff7e7217dc672874fece2c7588581e97b1a7",
    ["event_timestamp", "event_type", "quantity"]].sort_values("event_timestamp")

Unnamed: 0_level_0,event_timestamp,event_type,quantity
token_seller_address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-09 18:42:46,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-19 20:50:23,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-21 14:28:56,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-21 14:28:56,buy,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-28 11:47:31,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-29 15:30:31,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-09-30 01:05:38,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-10-06 17:24:58,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-10-12 12:55:16,sell,1.0
0x0004ff7e7217dc672874fece2c7588581e97b1a7,2021-12-25 22:55:30,sell,1.0


### `cash_flow` as a simple method to calculate profit

In [43]:
wallets["cash_flow"] = np.where(wallets.event_type == "buy",
                                -wallets.deal_price_usd,
                                 wallets.deal_price_usd)
wallets.loc[:, ["event_type", "cash_flow"]]

Unnamed: 0,event_type,cash_flow
0,sell,10786.09500
1,buy,-2636.60100
2,buy,-2396.91000
3,buy,-234.89718
4,buy,-575.25840
...,...,...
2398445,buy,-585.74511
2398446,buy,-352.85850
2398447,buy,-235.23900
2398448,buy,-1705.48275


profit in January

In [44]:
wallets.set_index("event_timestamp") \
    .loc["2022-01"] \
    .groupby("token_seller_address")["cash_flow"].sum().sort_values(ascending=False)

token_seller_address
0x17082a8fbae3c10d73a361f218ae77bafb62bf4d    5.245384e+06
0xb3ee5011a7965905cde351ea4905ff4725189a3b    3.814523e+06
0x69bab6810fa99475854bca0a3dd72ae6a0728ece    3.662032e+06
0x91338ccfb8c0adb7756034a82008531d7713009d    3.202948e+06
0x7a9fe22691c811ea339d9b73150e6911a5343dca    2.156538e+06
                                                  ...     
0xcc2a855946a3c20683858fe6ee15acf8b836f0b3   -1.006100e+06
0x28f8ca3b0eddd849c93986df0fd194252c4e4b03   -1.254518e+06
0x6639c089adfba8bb9968da643c6be208a70d6daa   -1.439506e+06
0x1919db36ca2fa2e15f9000fd9cdc2edcf863e685   -1.650039e+06
0xcfbae0fd418b61f563cfbceab3faad56e3a993b3   -2.079963e+06
Name: cash_flow, Length: 58915, dtype: float64

Big Trader?

In [45]:
wallets.loc[wallets.token_seller_address == "0x17082a8fbae3c10d73a361f218ae77bafb62bf4d"]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd,cash_flow
19400,2022-03-05 05:10:24,buy,5524,3.0,2022-03-04 23:28:42,0x0b7b36fd11da4f49dea97a90999201bcaf0120ec,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,3.300000e+17,ETH,18.0,...,0x1f0cdd19830e8ba862a67238d031d3c3d308c0c21ca3...,14324928.0,0.0,0 days 05:41:42,2022-03-05T05:10:47.222267,nuclear-nerds-of-the-accidental-apocalypse,0x7f268357a8c2552623316e2562d90e642bb538e5,0x4e080dd8f496f80ca25ff0ffac6ea5785dcd8588,791.1684,-791.1684
34981,2021-07-19 12:45:15,buy,5623,7.0,2021-07-19 10:26:00,0xf57d762b6ece30242c4a2a1c022ed155ed5fba83,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,1.890000e+18,ETH,18.0,...,0x287f84422c6bc0ff01321fa01e08b04f124f35d6b740...,12857058.0,0.0,0 days 02:19:15,2021-07-19T12:45:45.452405,bored-ape-kennel-club,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xcc8e6d83a65bfabf0007f2633c5fc3f8c85c680c,4516.1739,-4516.1739
64755,2021-10-11 01:54:45,buy,4729,1.0,2021-10-11 00:02:53,0x08abed3598f358b6749130b04b05bba04e1c77b1,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,1.300000e+17,ETH,18.0,...,0x9bc7691d9131165f18b71de549afff092c4b7d11716b...,13394512.0,0.0,0 days 01:51:52,2021-10-11T01:55:22.816206,spunks-nft,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x08abed3598f358b6749130b04b05bba04e1c77b1,309.4780,-309.4780
82417,2022-01-05 05:40:47,buy,2585,3.0,2022-01-05 05:24:47,0x06ebcc73c07a74dd0e2ba1050e07543b618be295,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,1.500000e+19,ETH,18.0,...,0x3af87b10d6753dea3dce33a4d1ba66c651238b186e2a...,13943622.0,0.0,0 days 00:16:00,2022-01-05T05:41:18.999748,mutant-ape-yacht-club,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xab1dd7ccf8d14a5c817d9c03855ff95634d040c7,35898.3000,-35898.3000
131777,2022-01-21 12:38:36,buy,3641,1.0,NaT,0x44dd1ed9129b54bbb802127b8981272e7a553458,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,8.500000e+19,WETH,18.0,...,0xb44cee2d1541d13231a7170475e5a44c773e201c0496...,14048952.0,,NaT,2022-01-21T12:39:03.595310,boredapeyachtclub,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x44dd1ed9129b54bbb802127b8981272e7a553458,190490.1000,-190490.1000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2161599,2021-07-16 11:04:23,buy,2807,1.0,2021-07-16 10:09:04,0x87ec3ede5dd11e67d749293899d5554617b61541,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,4.000000e+16,ETH,18.0,...,0x1182cf7aaed8a0e598be2f8de6b00912a0794b326519...,12837539.0,0.0,0 days 00:55:19,2021-07-16T11:07:17.578336,divine-zodiac,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x87ec3ede5dd11e67d749293899d5554617b61541,95.2840,-95.2840
2259293,2021-08-24 16:02:47,buy,4930,1.0,2021-08-24 14:25:51,0xd686b93c14819ebe26568db204bab347289af538,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,7.000000e+17,ETH,18.0,...,0x2eb805f12b18ef8fc0273aac1b04805c168f66d7d813...,13089018.0,0.0,0 days 01:36:56,2021-08-24T16:03:18.477213,vogu,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xe8c8eab7617f6ee168577498562c7ceff762113d,1688.2250,-1688.2250
2293937,2022-02-19 08:26:32,buy,,,2021-11-22 19:14:52,,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,5.000000e+17,ETH,18.0,...,0x0629d4ad2c06b61426452b3ff40efeeb17207a638db8...,14235459.0,0.0,88 days 13:11:40,2022-02-19T08:26:48.457149,wearetheoutkast,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x858c8349e9f1d6da491c08aaf91ddc9b10f7da16,1218.1700,-1218.1700
2340990,2021-07-17 07:16:51,buy,5311,1.0,2021-07-15 08:23:35,0x8c72dc9bd9af2921db19bc950e3c65a15f7e9dc3,0x17082a8fbae3c10d73a361f218ae77bafb62bf4d,4.000000e+16,ETH,18.0,...,0xe2595bd7879e6084267695858416ea03714f10297fd4...,12842893.0,0.0,1 days 22:53:16,2021-07-17T07:17:19.576400,hd--punks,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x8c72dc9bd9af2921db19bc950e3c65a15f7e9dc3,94.8500,-94.8500


## Transform dataset (WIP)

In [46]:
wallets.columns

Index(['event_timestamp', 'event_type', 'token_id', 'num_sales',
       'listing_time', 'token_owner_address', 'token_seller_address',
       'deal_price', 'payment_token_symbol', 'payment_token_decimals',
       'payment_token_usdprice', 'quantity', 'asset_bundle', 'auction_type',
       'transaction_hash', 'block_hash', 'block_number', 'is_private',
       'duration', 'created_date', 'collection_slug', 'contract_address',
       'wallet_address_input', 'deal_price_usd', 'cash_flow'],
      dtype='object')

In [47]:
wallets[wallets.token_seller_address == "0x000000000000123ca35c69ba3f852a46b2a27c94"]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,...,block_hash,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd,cash_flow
86922,2022-02-17 09:37:34,buy,7527,1.0,2022-02-15 20:00:50,0xd311bdacb151b72bddfee9cbdc414af22a5e38dc,0x000000000000123ca35c69ba3f852a46b2a27c94,6.8e+17,ETH,18.0,...,0xb726498fab138863e12d88abf217b70b1facdc69c2b4...,14222873.0,0.0,1 days 13:36:44,2022-02-17T09:37:47.808970,raidpartyfighters,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x4bdc1cad2d045ec17955688d0fefeb99a385c30f,1584.8828,-1584.8828
2277773,2022-02-17 09:08:33,buy,354,2.0,2022-02-15 20:01:20,0xd311bdacb151b72bddfee9cbdc414af22a5e38dc,0x000000000000123ca35c69ba3f852a46b2a27c94,7.99e+17,ETH,18.0,...,0xfce46d606649b8608f9e1b31e3ee6de18d157fbd8d05...,14222748.0,0.0,1 days 13:07:13,2022-02-17T09:08:52.893135,raidpartyfighters,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x150fa2afc4db393b4d231cbace82ecfc7d3b4be9,1951.58946,-1951.58946


In [48]:
wallets.groupby("token_seller_address")["event_type"].nunique().sort_values()

token_seller_address
0x000000000000123ca35c69ba3f852a46b2a27c94    1
0xa9016ca47a8234986e5f32424948c86e7685917f    1
0xa90177eb7a438b534518d6c152becd730bd65121    1
0xa901bdf0b405069f671320b9d7bfeeb30dade032    1
0xa901ca455ba935b5dd8bbf8dd986ec34b931f8e3    1
                                             ..
0x0aa568cfc61041aa215cce4a39b883004276a0be    2
0xa5eae3eacf95a344cc5c54413729cf5331b9b495    2
0x4949338bb2586b9e99b6fb48f3ce8f3cd88a5aac    2
0x87f4efff19b3ddd7302f5c2219382ff1211139eb    2
0x5cd1c9be0bbe4294d70a87a826323958caf94e4a    2
Name: event_type, Length: 215099, dtype: int64

# Explore Data

In [49]:
wallets.groupby(["token_seller_address", "event_type"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,num_sales,deal_price,payment_token_decimals,payment_token_usdprice,quantity,block_number,is_private,deal_price_usd,cash_flow
token_seller_address,event_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0x000000000000123ca35c69ba3f852a46b2a27c94,buy,3.0,1.479000e+18,36.0,4773.25,2.0,28445621.0,0.0,3536.472260,-3536.472260
0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,buy,10.0,3.730000e+18,36.0,4737.92,2.0,25493004.0,0.0,8785.110700,-8785.110700
0x00000000000360176d958e11c140308cd0863679,buy,16799.0,4.304900e+18,216.0,28500.43,12.0,156312940.0,0.0,10278.447125,-10278.447125
0x000000000004d7463d0f9c77383600bc82d612f5,buy,2.0,2.500000e+18,18.0,2249.89,1.0,13613175.0,0.0,5624.725000,-5624.725000
0x00000000000a486c964069bb7390ae37010a04ca,buy,3.0,5.440000e+16,36.0,4731.51,2.0,29313998.0,0.0,128.946970,-128.946970
...,...,...,...,...,...,...,...,...,...,...
0xffffe59e4ebefce216470864fd92407023288cb4,buy,3.0,4.790000e+17,18.0,2392.29,1.0,14106408.0,0.0,1145.906910,-1145.906910
0xffffe96d5df4b535022bcf9a901716ba3ebd8a82,buy,4.0,2.330000e+18,36.0,4713.89,2.0,28107432.0,0.0,5460.431400,-5460.431400
0xfffff6e70842330948ca47254f2be673b1cb0db7,buy,6.0,9.195000e+18,54.0,7077.25,3.0,38203705.0,0.0,22076.943750,-22076.943750
0xffffff5800b709071d4adc74759ae4b89bef2a9d,buy,2.0,9.600000e+16,36.0,4732.86,2.0,29325598.0,0.0,227.553940,-227.553940


## Which user has bought and sold NFT during the specified period?

In [50]:
x = wallets.groupby("token_seller_address")["event_type"].nunique().reset_index()
x = x[x.event_type > 1]

In [51]:
y = wallets.merge(x, on="token_seller_address")
y.set_index("event_timestamp", inplace=True)

In [52]:
y.loc["2022-04"]["token_seller_address"].nunique()
#query('token_seller_address == "0xfffa6fc6acc3dbe04b175862376f1c5ff88cf9c1"')

5189

In [53]:
hide_columns = ['token_owner_address', 'payment_token_decimals',
                'payment_token_usdprice',
                'transaction_hash', 'block_hash', 'block_number']
wallets.loc[:,~wallets.columns.isin(hide_columns)]

Unnamed: 0,event_timestamp,event_type,token_id,num_sales,listing_time,token_seller_address,deal_price,payment_token_symbol,quantity,asset_bundle,auction_type,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd,cash_flow
0,2022-05-07 13:20:01,sell,13921,1.0,2022-05-07 03:34:35,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,4.500000e+18,ETH,1.0,,,0.0,0 days 09:45:26,2022-05-07T13:20:33.224540,otherdeed,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,10786.09500,10786.09500
1,2022-05-07 09:03:02,buy,562954248415769,4.0,2022-05-07 08:06:21,0xd44a7b02e9692f491fb360d6a509e37c06bcd579,1.100000e+18,ETH,1.0,,,0.0,0 days 00:56:41,2022-05-07T09:03:17.428510,10ktf,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,2636.60100,-2636.60100
2,2022-05-07 03:28:14,buy,281479271685666,2.0,2022-05-07 03:17:41,0x56a7a519cb9d369334a24c98b44164d18a9b8385,1.000000e+18,ETH,1.0,,,0.0,0 days 00:10:33,2022-05-07T03:28:49.898598,10ktf,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,2396.91000,-2396.91000
3,2022-05-06 11:45:23,buy,1022766819668093232954669218231971723193525699...,1.0,2022-05-06 01:36:57,0x278d9db7032ffe25c5fcec6fb517f4e2041805d3,9.800000e+16,ETH,1.0,,,0.0,0 days 10:08:26,2022-05-06T11:45:37.683956,ens,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,234.89718,-234.89718
4,2022-05-05 03:36:13,buy,7588,1.0,2022-05-05 03:12:17,0xef9fdc930d645299d01440d82b6c417cbd8f7162,2.400000e+17,ETH,1.0,,,0.0,0 days 00:23:56,2022-05-05T03:36:43.828893,somethingtoken,0x7f268357a8c2552623316e2562d90e642bb538e5,0x82dc39052703cb51718b92fd62a6da6d1e749a0c,575.25840,-575.25840
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2398445,2021-10-11 22:22:25,buy,5008,1.0,2021-10-11 17:39:07,0x4de910a6ca7cec4fe0db9edc24c3a66d6558ea3f,2.490000e+17,ETH,1.0,,,0.0,0 days 04:43:18,2021-10-11T22:22:56.482359,eponym,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a,585.74511,-585.74511
2398446,2021-10-11 20:04:51,buy,6886,1.0,2021-10-11 17:51:29,0xf9d681c3b81aa1d0ecb3fdb4c69ca57714eb63f4,1.500000e+17,ETH,1.0,,,0.0,0 days 02:13:22,2021-10-11T20:05:14.211280,eponym,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a,352.85850,-352.85850
2398447,2021-10-11 17:45:54,buy,8855789430591774980265252005795383506440366856...,1.0,2021-10-08 14:18:44,0xc3c9fdee83ad8c7b29b5ce2c6b8d19fa116c0e74,1.000000e+17,ETH,1.0,,,0.0,3 days 03:27:10,2021-10-11T17:46:35.187753,legendz,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a,235.23900,-235.23900
2398448,2021-10-10 00:51:27,buy,8953,3.0,2021-10-09 03:30:15,0x15f7320adb990020956d29edb6ba17f3d468001e,7.250000e+17,ETH,1.0,,,0.0,0 days 21:21:12,2021-10-10T00:52:15.403998,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a,1705.48275,-1705.48275


## Collections

Checking out the size  aka the number of sales by collection

In [54]:
df=wallets.groupby("collection_slug", as_index=False).size() \
    .sort_values("size", ascending=False).reset_index(drop=True)
df.head(20)

Unnamed: 0,collection_slug,size
0,cool-cats-nft,39846
1,parallelalpha,18308
2,pudgypenguins,13679
3,deadfellaz,12764
4,robotos-official,11859
5,boredapeyachtclub,11832
6,rarible,11239
7,mutant-ape-yacht-club,10691
8,thewickedcraniums,10569
9,cryptoadz-by-gremplin,10207


_Are these popular collections on OpenSea or is it bias from data collection process?_

In [55]:
df.query('size < 100')

Unnamed: 0,collection_slug,size
2708,lilium,99
2709,metapals-pass,99
2710,elondaogmi,99
2711,coolmonkes-boosters,99
2712,otter-army,99
...,...,...
25058,perpetual,1
25059,dailystudies,1
25060,perreomarketsina,1
25061,perriev-1,1


_Private / personal collections?_

In [56]:
wallets.groupby(["collection_slug", "event_type"], as_index=False).size() \
    .pivot(index="collection_slug", columns="event_type", values="size") \
    .assign(diff=lambda x: x.buy - x.sell) \
    .sort_values(by=["buy", "sell"], ascending=False).head(20)

event_type,buy,sell,diff
collection_slug,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
cool-cats-nft,19500.0,20346.0,-846.0
parallelalpha,9457.0,8851.0,606.0
rarible,6600.0,4639.0,1961.0
boredapeyachtclub,5789.0,6043.0,-254.0
pudgypenguins,5684.0,7995.0,-2311.0
deadfellaz,5512.0,7252.0,-1740.0
ape-gang-old,5472.0,4624.0,848.0
robotos-official,5200.0,6659.0,-1459.0
cryptoadz-by-gremplin,4938.0,5269.0,-331.0
thewickedcraniums,4723.0,5846.0,-1123.0


It is reasonable to expect there are more buy events for a given collection,
but how do we explain more selling events? Could it be mint > transfer > sell>

## NFT_ID

In [57]:
wallets["nft_id"] = wallets.collection_slug + '-' + wallets.token_id

In [58]:
wallets.groupby(["collection_slug", "nft_id", "event_type"], as_index=False).size() \
    .pivot(index=["collection_slug", "nft_id"], columns="event_type", values="size") \
    .sort_values(by=["buy", "sell"], ascending=False).head(20)

Unnamed: 0_level_0,event_type,buy,sell
collection_slug,nft_id,Unnamed: 2_level_1,Unnamed: 3_level_1
adidasoriginals,adidasoriginals-0,1727.0,3417.0
pixel-vault-mintpass,pixel-vault-mintpass-0,1707.0,2071.0
888innercircle,888innercircle-888,1444.0,2515.0
bobutoken,bobutoken-1,1222.0,718.0
lostpoets,lostpoets-1,1057.0,2326.0
metaverse-hq,metaverse-hq-70196056058896361747704672441801371315898722973429726505227809712513925252572,868.0,565.0
oncyber,oncyber-5,736.0,1319.0
woodies-mint-passport,woodies-mint-passport-2,715.0,819.0
unitedplanets,unitedplanets-5,696.0,1477.0
rtfkt-mnlth,rtfkt-mnlth-1,656.0,1077.0


_*Tokens that have been exchanged mulitple times._

## Weekly Price and Volume (WIP)

In [59]:
grp=wallets.query('collection_slug == "cool-cats-nft"') \
    .groupby(pd.Grouper(key="event_timestamp", freq="1W"))

In [60]:
grp.agg({"deal_price": ["sum", "median"], "token_id": "count"}) \
    .assign(sum_pct_chg=lambda x: x[("deal_price", "sum")].pct_change(),
            median_pct_chg=lambda x: x[("deal_price", "median")].pct_change()) \
    .loc["2022-01":,].head(20)

Unnamed: 0_level_0,deal_price,deal_price,token_id,sum_pct_chg,median_pct_chg
Unnamed: 0_level_1,sum,median,count,Unnamed: 4_level_1,Unnamed: 5_level_1
event_timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2022-01-02,6.192346e+21,9.74e+18,591,1.67689,0.307558
2022-01-09,5.530636e+21,1.12e+19,460,-0.106859,0.149897
2022-01-16,4.325907e+21,1.21e+19,341,-0.217828,0.080357
2022-01-23,3.358874e+21,1.266e+19,254,-0.223545,0.046281
2022-01-30,7.591932e+21,1.48e+19,508,1.260261,0.169036
2022-02-06,5.958029e+21,1.42e+19,417,-0.215216,-0.040541
2022-02-13,2.210485e+21,1.1645e+19,180,-0.628991,-0.17993
2022-02-20,3.283386e+21,1.14e+19,280,0.485369,-0.021039
2022-02-27,3.904059e+21,8.1e+18,475,0.189035,-0.289474
2022-03-06,1.634373e+21,7.69e+18,204,-0.581366,-0.050617


### Subsetting the event by date

In [61]:
wallets.set_index("event_timestamp").loc['2021-10-10']

Unnamed: 0_level_0,event_type,token_id,num_sales,listing_time,token_owner_address,token_seller_address,deal_price,payment_token_symbol,payment_token_decimals,payment_token_usdprice,...,block_number,is_private,duration,created_date,collection_slug,contract_address,wallet_address_input,deal_price_usd,cash_flow,nft_id
event_timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-10-10 01:17:18,buy,9931,2.0,2021-10-09 18:56:55,0xfe6273fb8ccca5ef8304450cd34447989363bb7c,0x025c7ca2e2892bf6cb3664817828c93dfcee9172,2.700000e+17,ETH,18.0,2396.91,...,13387957.0,0.0,0 days 06:20:23,2021-10-10T01:18:21.875655,zombiecat,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x088941f320d9980c02a73de3a56210206f819af1,647.165700,-647.165700,zombiecat-9931
2021-10-10 04:19:06,buy,808,5.0,2021-10-10 01:15:18,0xef3398709aa0de1a3edc741b19065b62ce400003,0x5fd36a4a4bcfe5ea059706a6a09c26b62be4059a,8.900000e+17,ETH,18.0,2396.91,...,13388747.0,0.0,0 days 03:03:48,2021-10-10T04:19:41.181945,mutantcats,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x2cc286af1d641712bc4ff3407991bfcb2b26b63b,2133.249900,-2133.249900,mutantcats-808
2021-10-10 00:42:32,buy,4096,3.0,2021-10-08 19:20:52,0x2fc0b1d8f79ac8ac781105c6a601543d0cfdd672,0xee5ce06accce11bc77c5a93723c8032d9108f22d,9.000000e+17,ETH,18.0,2396.91,...,13387812.0,0.0,1 days 05:21:40,2021-10-10T00:43:06.074973,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x2cc286af1d641712bc4ff3407991bfcb2b26b63b,2157.219000,-2157.219000,onchainmonkey-4096
2021-10-10 12:51:34,buy,1838,3.0,2021-10-10 12:37:43,0x9759cd43042bb2ce7ba22d3e2beb675153442d80,0xedc0c829caafb2755582bf3cc2c56c4ad403be43,8.450000e+17,ETH,18.0,2396.91,...,13390976.0,0.0,0 days 00:13:51,2021-10-10T12:51:53.525937,thehumanoids,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x9759cd43042bb2ce7ba22d3e2beb675153442d80,2025.388950,-2025.388950,thehumanoids-1838
2021-10-10 22:36:37,sell,2192766832090105266929319151737047525070907282...,1.0,2021-10-10 10:59:27,0x0000000000000000000000000000000000000000,0x307a9ed60faabde3c98eabd2903fcf59f4ec16f9,1.200000e+17,ETH,18.0,2396.91,...,13393593.0,1.0,0 days 11:37:10,2021-10-10T22:37:10.228097,cryptoleos,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x307a9ed60faabde3c98eabd2903fcf59f4ec16f9,287.629200,287.629200,cryptoleos-21927668320901052669293191517370475...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-10-10 01:54:53,buy,4716,9.0,2021-10-10 01:46:56,0xabc6a5fd49166f728c699bde072147bb89626bbb,0x4cddff23d036e15fe786508ffa39b27f73b4a01a,8.111000e+17,ETH,18.0,2352.39,...,13388100.0,0.0,0 days 00:07:57,2021-10-10T01:55:31.205653,unstackedtoadz,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0c99ce8c2c27839f93658ee82877eb9a7a8c9fbd,1908.023529,-1908.023529,unstackedtoadz-4716
2021-10-10 01:52:02,buy,5689,6.0,2021-10-10 01:35:31,0xdcd58462d2c40a5299edc905b8f484b4a7dad390,0x2b5481a537b3639ed18e805209e4de4793b92954,8.000000e+17,ETH,18.0,2352.39,...,13388090.0,0.0,0 days 00:16:31,2021-10-10T01:53:32.220151,unstackedtoadz,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x0c99ce8c2c27839f93658ee82877eb9a7a8c9fbd,1881.912000,-1881.912000,unstackedtoadz-5689
2021-10-10 20:59:21,sell,35000371,7.0,2021-10-08 19:18:26,0x92e9ca19fd44ed10d8183090b04eb72453ea22ac,0xee2401e429ad36a609059db84ceeb349f276cd60,2.220000e+18,ETH,18.0,2352.39,...,13393191.0,0.0,2 days 01:40:55,2021-10-10T20:59:51.457111,aerial-view-by-dalenz,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0xee2401e429ad36a609059db84ceeb349f276cd60,5222.305800,5222.305800,aerial-view-by-dalenz-35000371
2021-10-10 00:51:27,buy,8953,3.0,2021-10-09 03:30:15,0xdbe991bb2a089d37e687cbc8b5e046626a0f3dca,0x15f7320adb990020956d29edb6ba17f3d468001e,7.250000e+17,ETH,18.0,2352.39,...,13387845.0,0.0,0 days 21:21:12,2021-10-10T00:52:15.403998,onchainmonkey,0x7be8076f4ea4a4ad08075c2508e481d6c946d12b,0x899f7e7bba83a4462144e576f8f14f018bb30d2a,1705.482750,-1705.482750,onchainmonkey-8953


# Exporting Data for Other Experiments

# Note
Feature engineering in ML
1. Feature Creation
1. Transformations
1. Feature Extraction
1. and Feature Selection.