Given that spring break is on the way, you have a longer time for this assignment and I included two steps for this. Step 1 is the reader and you need to have that correctly done. Step 2 is more for your exploration and learning at this stage, try to do something about it as much as you can, a naive initial model would be ok here.

Step 1: Data Reader is a program that reads data items from files and folders into your machine learning program, and creates training/testing examples. Depending on how clean/structured is your data this step might need a bit of work for you.

Please, push the code to your GitHub repository and post the commit's URL here in D2L. 

Step 2: Initial Model. When you have the data reader in place, use a learning algorithm on the data, and evaluate its performance, even a naive initial working model that is made by your team would be sufficient.  Also, please write up the following information: 

What is the setting of your project? is it supervised, unsupervised, or semisupervised? (I think everybody got supervision for this class though). 

How you represent X (Features/Representation  and their explanation, justification)
How do you represent Y (binary? multi-class? sequences? trees? graphs?)

Which libraries you will use?

Note that, all this information will be a part of your final project report.

 --You might use any machine learning or deep learning libraries for your project or even use your own implemented algorithm, in any case, you will need to do the following:

Make a test/train split of the data
2. Use your machine learning algorithm to learn from your training set
3. Evaluate the model with the test set.
4. Send a report summary about the model's performance on train and test datasets (it is fine that it is very low at this stage).
5. Push your code to your repository on GitHub, and post a link to it here in response to this assignment.

====================================================================================================================

For our first naive initial model we will build a NFT recommendations engine using cosine_similarity. Cosine_similarity appears to be a very popular way to build recommendation engines. For our second model we will build it using Collaborative Filters and test which model provides us with better results. The various recommendation models recommended in our research were Collaborative Filtering ("Wisdom of the Crowd") models and Similarity Based recomendatoin models. Our Final model will be which ever one performs better. 

In [1]:
#Import the libraries we need
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

Import NFT dataset. It can be downloaded from this URL : 
https://www.kaggle.com/datasets/hemil26/nft-collections-dataset?resource=download

In [2]:
data_frame = pd.read_csv('nft_sales.csv', error_bad_lines=False)

In [3]:
data_frame.head()

Unnamed: 0,Collections,Sales,Buyers,Txns,Owners
0,Axie Infinity,"$4,090,222,023",1790587,17670824,2130467
1,Bored Ape Yacht Club,"$2,439,754,017",12052,32670,6586
2,CryptoPunks,"$2,388,467,992",6076,22269,3804
3,Mutant Ape Yacht Club,"$1,744,822,678",23768,51775,13121
4,Art Blocks,"$1,310,734,558",33549,184470,36091


In [4]:
#Remove NaN rows
data_frame = data_frame.dropna()

Since we are using Cosine Similairty for our first recommendations model, we will need to find a way to convert our number inputs into text. We will do this the following way. We first will clean the data by converting each element in a column to a integer value. Then we will find the percentiles of the values in each column. Then we will replace the value with a string denoting what percentile the value belonged in.

In [5]:
#Remove Commas
data_frame["Owners"]=data_frame["Owners"].str.replace(',','')
#Remove empty rows

#turn into int
data_frame["Owners"] = data_frame["Owners"].astype(int)

In [6]:
#Get Percentiles
print(data_frame["Owners"].quantile(0.25))
print(data_frame["Owners"].quantile(0.50))
print(data_frame["Owners"].quantile(0.75))

3253.0
4411.0
5719.5


In [7]:
#Convert number of owners to text for cosine similarity. This is not the size of the collection. This is the number of unique wallets which own a part of the collection
def round_numbers_oweners(number_input):
    if number_input < 3253:
        return "small"
    elif number_input < 4411:
        return "medium"
    elif number_input < 5719:
        return "large"
    else:
        return "huge"

In [8]:
#Remove Commas
data_frame["Txns"]=data_frame["Txns"].str.replace(',','')
#Remove empty rows

#turn into int
data_frame["Txns"] = data_frame["Txns"].astype(int)

#Get Percentiles
print(data_frame["Txns"].quantile(0.25))
print(data_frame["Txns"].quantile(0.50))
print(data_frame["Txns"].quantile(0.75))

13111.0
18437.0
24606.5


In [9]:
def round_numbers_transactions(number_input):
    if number_input < 13111:
        return "pequeno"
    elif number_input < 18437:
        return "medio"
    elif number_input < 24606:
        return "grande"
    else:
        return "enorme"

In [10]:
#Remove Commas
data_frame["Buyers"]=data_frame["Buyers"].str.replace(',','')
#Remove empty rows

#turn into int
data_frame["Buyers"] = data_frame["Buyers"].astype(int)

#Get Percentiles
print(data_frame["Buyers"].quantile(0.25))
print(data_frame["Buyers"].quantile(0.50))
print(data_frame["Buyers"].quantile(0.75))

5324.0
8239.0
11204.5


In [11]:
def round_numbers_buyers(number_input):
    if number_input < 5324:
        return "jageun"
    elif number_input < 8239:
        return "junggan"
    elif number_input < 11204:
        return "keun"
    else:
        return "eomcheongnan"

In [12]:
#Remove Commas
data_frame["Sales"]=data_frame["Sales"].str.replace(',','')
#Remove Money Sign
data_frame["Sales"]=data_frame["Sales"].str.replace('$','')
#Remove empty rows

#turn into int
data_frame["Sales"] = data_frame["Sales"].astype('int64')

#Get Percentiles
print(data_frame["Sales"].quantile(0.25))
print(data_frame["Sales"].quantile(0.50))
print(data_frame["Sales"].quantile(0.75))

29761266.5
46444604.0
87674392.0


In [13]:
def round_numbers_sales(number_input):
    if number_input < 29761266:
        return "petit"
    elif number_input < 46444604:
        return "moyen"
    elif number_input < 87674392:
        return "enorme"
    else:
        return "massif"

In [14]:
# Convert Number input into text defining the percentile the value was in
data_frame["Sales"]= data_frame["Sales"].apply(round_numbers_sales)
data_frame["Buyers"]= data_frame["Buyers"].apply(round_numbers_buyers)
data_frame["Txns"]= data_frame["Txns"].apply(round_numbers_transactions)
data_frame["Owners"]= data_frame["Owners"].apply(round_numbers_oweners)

In [15]:
#This creates a new features column which has text from all of the other coloumns
data_frame["Features"] = data_frame["Collections"] + " " + (data_frame["Sales"]) + " " + (data_frame["Buyers"]) + " " + (data_frame["Txns"]) + " " + (data_frame["Owners"])

In [16]:
data_frame.head()

Unnamed: 0,Collections,Sales,Buyers,Txns,Owners,Features
0,Axie Infinity,massif,eomcheongnan,enorme,huge,Axie Infinity massif eomcheongnan enorme huge
1,Bored Ape Yacht Club,massif,eomcheongnan,enorme,huge,Bored Ape Yacht Club massif eomcheongnan enorm...
2,CryptoPunks,massif,junggan,grande,medium,CryptoPunks massif junggan grande medium
3,Mutant Ape Yacht Club,massif,eomcheongnan,enorme,huge,Mutant Ape Yacht Club massif eomcheongnan enor...
4,Art Blocks,massif,eomcheongnan,enorme,huge,Art Blocks massif eomcheongnan enorme huge


In [17]:
#convert text from features to matrix
cm = CountVectorizer().fit_transform(data_frame["Features"])

In [18]:
#Get the cosine similarity matrix from the count matrix
cs = cosine_similarity(cm)
print(cs)

[[1.         0.57735027 0.18257419 ... 0.         0.         0.16666667]
 [0.57735027 1.         0.15811388 ... 0.         0.         0.14433757]
 [0.18257419 0.15811388 1.         ... 0.36514837 0.18257419 0.18257419]
 ...
 [0.         0.         0.36514837 ... 1.         0.33333333 0.33333333]
 [0.         0.         0.18257419 ... 0.33333333 1.         0.33333333]
 [0.16666667 0.14433757 0.18257419 ... 0.33333333 0.33333333 1.        ]]


In [31]:
#Get the id of the collection
collection_id = data_frame[data_frame.Collections == 'Nifty League DEGENs'].index[0]
name_of_collection = 'Nifty League DEGENs'
collection_id

183

In [32]:
#Create list of tuples in the form (collection_id, similairty_score)
scores = list(enumerate(cs[collection_id]))
print(scores)

[(0, 0.0), (1, 0.0), (2, 0.0), (3, 0.0), (4, 0.0), (5, 0.0), (6, 0.0), (7, 0.19999999999999998), (8, 0.0), (9, 0.0), (10, 0.0), (11, 0.0), (12, 0.0), (13, 0.0), (14, 0.18257418583505539), (15, 0.15811388300841894), (16, 0.39999999999999997), (17, 0.1690308509457033), (18, 0.0), (19, 0.19999999999999998), (20, 0.0), (21, 0.0), (22, 0.39999999999999997), (23, 0.0), (24, 0.0), (25, 0.3380617018914066), (26, 0.18257418583505539), (27, 0.19999999999999998), (28, 0.0), (29, 0.0), (30, 0.36514837167011077), (31, 0.1690308509457033), (32, 0.0), (33, 0.0), (34, 0.19999999999999998), (35, 0.0), (36, 0.18257418583505539), (37, 0.0), (38, 0.0), (39, 0.0), (40, 0.19999999999999998), (41, 0.0), (42, 0.0), (43, 0.0), (44, 0.1690308509457033), (45, 0.0), (46, 0.0), (47, 0.0), (48, 0.18257418583505539), (49, 0.0), (50, 0.1690308509457033), (51, 0.18257418583505539), (52, 0.0), (53, 0.19999999999999998), (54, 0.0), (55, 0.0), (56, 0.0), (57, 0.19999999999999998), (58, 0.14907119849998596), (59, 0.182574

In [33]:
#Sort Similarity Scores from highest to lowest
scores_sorted = sorted(scores, key=lambda x:x[1], reverse=True)
scores_sorted

[(183, 0.9999999999999999),
 (112, 0.6),
 (198, 0.6),
 (212, 0.6),
 (213, 0.6),
 (217, 0.6),
 (222, 0.6),
 (186, 0.5477225575051662),
 (211, 0.5477225575051662),
 (216, 0.5477225575051662),
 (220, 0.5477225575051662),
 (229, 0.5477225575051662),
 (196, 0.50709255283711),
 (199, 0.50709255283711),
 (208, 0.50709255283711),
 (215, 0.50709255283711),
 (224, 0.47434164902525683),
 (132, 0.4472135954999579),
 (16, 0.39999999999999997),
 (22, 0.39999999999999997),
 (82, 0.39999999999999997),
 (88, 0.39999999999999997),
 (107, 0.39999999999999997),
 (116, 0.39999999999999997),
 (126, 0.39999999999999997),
 (144, 0.39999999999999997),
 (149, 0.39999999999999997),
 (153, 0.39999999999999997),
 (178, 0.39999999999999997),
 (182, 0.39999999999999997),
 (187, 0.39999999999999997),
 (192, 0.39999999999999997),
 (200, 0.39999999999999997),
 (206, 0.39999999999999997),
 (226, 0.39999999999999997),
 (30, 0.36514837167011077),
 (97, 0.36514837167011077),
 (104, 0.36514837167011077),
 (113, 0.3651483716

In [37]:
#Get the 10 most similar NFT Collections
j = 1
for i in range(0, 11):
    collection_name = data_frame.iloc[scores_sorted[i][0]].values[0]
    if collection_name != name_of_collection:
        print(str(j) + " : " + collection_name)
        j+=1
    

1 : BAPETAVERSE
2 : Akutars
3 : Chimpers
4 : Voxies
5 : Nouns
6 : Solsteads
7 : Deafbeef
8 : X Rabbits Club
9 : Cupcats Official
10 : Llamaverse Genesis
11 : Stoner Cats


In [38]:
#Now creating a function which will do the following for us
def getRecommendations(input_collection_name):
    collection_id = data_frame[data_frame.Collections == input_collection_name].index[0]
    scores = list(enumerate(cs[collection_id]))
    scores_sorted = sorted(scores, key=lambda x:x[1], reverse=True)
    
    j = 1
    for i in range(0, 11):
        collection_name = data_frame.iloc[scores_sorted[i][0]].values[0]
        if collection_name != input_collection_name:
            print(str(j) + " : " +collection_name)
            j+=1


In [39]:
getRecommendations("CryptoPunks")

1 : FLUF World
2 : Cryptoadz
3 : CyberBrokers
4 : Aurory
5 : Hashmasks
6 : Animetas
7 : Acrocalypse
8 : 0N1 Force
9 : Emblem Vault
10 : Vox Collectibles


In [40]:
getRecommendations("ZombieClub Token")

1 : ApeKidsClub
2 : Smilesss
3 : SolPunks
4 : The Humanoids
5 : Koala Intelligence Agency
6 : the littles NFT
7 : EveraiDuo
8 : Shinsekai
9 : CatBloxGenesis
10 : Arcade Land
11 : Galaxy Eggs


In [41]:
getRecommendations("MoodRollers")

1 : Antonym
2 : Cryptoadz
3 : Pixelmon
4 : CryptoSkulls
5 : GalacticApes
6 : MutantCats
7 : Shinsekai
8 : CatBloxGenesis
9 : Acrocalypse
10 : 0N1 Force
11 : Sup Ducks


Testing this model is a bit different from the various classification models covered in class. Since this is a recommendations model, the only real way to see if our model is working would be to show the recommendations to real users and ask them to rate our recommendations. Testing will be different for our Collaborative Filtering Model, since we can test to see if a user belongs to certain groups or not