## OPTYFI CryptoPunks Data Competition 

CryptoPunks are one of the earliest NFT projects that lead the cryptoart movement. The project consists of 10,000 unqiue CryptoPunk tokens each with a different set of physical attributes. The prices these tokens are trading at is rising at a considerable rate and the question is: what drives the price of these tokens and can we predict it?     

The provided datasets contain infromation on the physical attributes of each of the 10,000 CryptoPunks and transaction data for when a Punk is sold. The data set was prepared by one hot encoding the features and anonymizing the token IDs and feature names.\

The goal of this competition is to train a predictive model to predict the price of a sale given the transaction and token information. The final score will be calculated as the mean squared error (MSE) of your submission file price predictions to the hidden test data.\

MSE = $\frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2$

Please submit your **submission.csv** and **price_estimation.csv** to david@opty.fi.

[CryptoPunks](https://www.larvalabs.com/cryptopunks)


In [14]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import xgboost as xgb

## cryptopunks_token_data.csv
Each row reperesent a unique cryptopunk identified by it's token id (this number has been randomly changed). The following columns 0-90 are features that represent the attributes of the punk.

In [15]:
cryptopunks_tokens = pd.read_csv('/content/cryptopunks_token_data.csv')
cryptopunks_tokens.head()

Unnamed: 0,token_id,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,...,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90
0,9276,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,4,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2511,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,980,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0
3,9278,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,2163,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


## cryptopunks_transactions.csv
Each row is the sale of a cryptopunk at a certain time. The column 'price' represents a price value that the punk was sold at.  

In [16]:
cryptopunks_transactions = pd.read_csv('/content/cryptopunks_transactions.csv')
cryptopunks_transactions.head()

Unnamed: 0,txn_id,token_id,txn_type,time,price [ETH]
0,0,6941,1,0.895753,96.0
1,1,574,1,0.946493,137.77
2,2,4672,1,0.791861,26.5
3,3,3820,1,0.895977,130.0
4,4,9696,1,0.946683,159.0


## submission_data_cryptopunk_transactions.csv
This is the fraction of cryptopunks_transactions.csv that was withheld. The goal of this competition is to make accurate predictions for the 'price [ETH]' column on this data. 

In [17]:
submission = pd.read_csv('/content/submission_data_cryptopunks_transactions.csv')
submission.head()

Unnamed: 0,txn_id,token_id,txn_type,time
0,11497,675,1,0.839811
1,11498,4168,1,0.785025
2,11499,8384,1,0.807303
3,11500,3359,1,0.663873
4,11501,6759,1,0.773599


## price_estimation.csv 
This dataframe is to make predictions on a fair price for a Punk sold at time 1 with transactrion type 1. Make predictions on the data and append the predictions to this dataframe.

In [26]:
price_estimation = pd.read_csv('/content/price_estimation.csv')
price_estimation.head()

Unnamed: 0,token_id,txn_type,time
0,0,1,1
1,1,1,1
2,2,1,1
3,3,1,1
4,4,1,1


##Example

In [18]:
X_df = cryptopunks_transactions.join(cryptopunks_tokens, on='token_id', rsuffix='_1').drop(['token_id', 'txn_id', 'token_id_1', 'price [ETH]'], axis=1)
y_df = cryptopunks_transactions['price [ETH]']

X = X_df.values
y = y_df.values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [19]:
xgb_regressor = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.1, max_depth = 5, alpha = 10, n_estimators = 500)

xgb_regressor.fit(X_train, y_train)
preds = xgb_regressor.predict(X_test)

score = mean_squared_error(y_test, preds)
print('Avg Error: ' + str(np.sqrt(score)))

Avg Error: 42.599923887688014


In [20]:
submission_df = submission.join(cryptopunks_tokens, on='token_id', rsuffix='_1').drop(['token_id', 'txn_id', 'token_id_1'], axis=1)

In [21]:
submission_data = submission_df.values
submission_label = xgb_regressor.predict(submission_data)

In [24]:
submission['price [ETH]'] = submission_label
submission.head()

Unnamed: 0,txn_id,token_id,txn_type,time,label,price [ETH]
0,11497,675,1,0.839811,26.05579,26.05579
1,11498,4168,1,0.785025,35.047634,35.047634
2,11499,8384,1,0.807303,27.075163,27.075163
3,11500,3359,1,0.663873,-1.660334,-1.660334
4,11501,6759,1,0.773599,15.663399,15.663399


In [25]:
submission.to_csv('submission.csv', index=False)

In [29]:
price_estimation_df = price_estimation.join(cryptopunks_tokens, on='token_id', rsuffix='_1').drop(['token_id', 'token_id_1'], axis=1)
price_estimation_data = price_estimation_df.values
price_estimation_label = xgb_regressor.predict(price_estimation_data)
price_estimation['price [ETH]'] = price_estimation_label
price_estimation.head()

Unnamed: 0,token_id,txn_type,time,price [ETH]
0,0,1,1,78.062439
1,1,1,1,80.062698
2,2,1,1,115.729736
3,3,1,1,111.239746
4,4,1,1,73.616096


In [None]:
price_estimation.to_csv('price_estimation.csv', index=False)