# Modelling

## Jaccard Similarity

[Calculate Jaccard Similarity in Python - Data Science Parichay](https://datascienceparichay.com/article/jaccard-similarity-python/)

TLDR: 
Jaccard Similarity is find the count of overlapping rows over total number of rows

[Clear Example of Jaccard Similarity // Visual Explanation of What is the Jaccard Index? - YouTube](https://www.youtube.com/watch?v=YotbvhndSf4)


![image-4.png.webp (273×92)](https://datascienceparichay.com/wp-content/uploads/2021/11/image-4.png.webp)

In [6]:
import pandas as pd
from sklearn.metrics.pairwise import pairwise_distances

In [4]:
df = pd.read_csv('../data/raw/data.csv')
df.head()

Unnamed: 0,modelId,lastModified,tags,pipeline_tag,author,architectures,model_type,datasets,downloads,library_name
0,jonatasgrosman/wav2vec2-large-xlsr-53-english,2023-03-25T10:56:55.000Z,"['pytorch', 'jax', 'safetensors', 'wav2vec2', ...",automatic-speech-recognition,jonatasgrosman,['Wav2Vec2ForCTC'],wav2vec2,"['common_voice', 'mozilla-foundation/common_vo...",47102358,transformers
1,bert-base-uncased,2022-11-16T15:15:39.000Z,"['pytorch', 'tf', 'jax', 'rust', 'safetensors'...",fill-mask,,['BertForMaskedLM'],bert,"['bookcorpus', 'wikipedia']",46484719,transformers
2,Davlan/distilbert-base-multilingual-cased-ner-hrl,2022-06-27T10:49:50.000Z,"['pytorch', 'tf', 'distilbert', 'token-classif...",token-classification,Davlan,['DistilBertForTokenClassification'],distilbert,,29407063,transformers
3,gpt2,2022-12-16T15:44:21.000Z,"['pytorch', 'tf', 'jax', 'tflite', 'rust', 'sa...",text-generation,,['GPT2LMHeadModel'],gpt2,,21999611,transformers
4,xlm-roberta-base,2023-04-07T12:46:17.000Z,"['pytorch', 'tf', 'jax', 'onnx', 'safetensors'...",fill-mask,,['XLMRobertaForMaskedLM'],xlm-roberta,,20333162,transformers


In [5]:
df.columns

Index(['modelId', 'lastModified', 'tags', 'pipeline_tag', 'author',
       'architectures', 'model_type', 'datasets', 'downloads', 'library_name'],
      dtype='object')

Model with minimal columns for POC

In [7]:
df_mini = df[['modelId', 'pipeline_tag', 'architectures', 'model_type', 'library_name']]

In [9]:
# Step 1: Select feature columns
feature_cols = df_mini.columns[1:]
print(feature_cols)

Index(['pipeline_tag', 'architectures', 'model_type', 'library_name'], dtype='object')


In [10]:
# minimal cleaning
df_mini['architectures'] = df_mini['architectures'].apply(lambda x: str(x).strip('[]'))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_mini['architectures'] = df_mini['architectures'].apply(lambda x: str(x).strip('[]'))


In [35]:
# One-hot encode columns 1 to end
df_one_hot = pd.get_dummies(df_mini.iloc[:, 1:])

print(df_one_hot.columns)

Index(['pipeline_tag_audio-classification', 'pipeline_tag_audio-to-audio',
       'pipeline_tag_automatic-speech-recognition',
       'pipeline_tag_conversational', 'pipeline_tag_depth-estimation',
       'pipeline_tag_document-question-answering',
       'pipeline_tag_feature-extraction', 'pipeline_tag_fill-mask',
       'pipeline_tag_graph-ml', 'pipeline_tag_image-classification',
       ...
       'library_name_span_marker', 'library_name_speechbrain',
       'library_name_stable-baselines3', 'library_name_stable-diffusion',
       'library_name_stanza', 'library_name_timm', 'library_name_transformers',
       'library_name_txtai', 'library_name_ultralytics',
       'library_name_yolov5'],
      dtype='object', length=1007)


In [36]:
# Concatenate the encoded DataFrame with the modelId column
df_one_hot = pd.concat([df_mini['modelId'], df_one_hot], axis=1)


In [37]:
df_one_hot.head()

Unnamed: 0,modelId,pipeline_tag_audio-classification,pipeline_tag_audio-to-audio,pipeline_tag_automatic-speech-recognition,pipeline_tag_conversational,pipeline_tag_depth-estimation,pipeline_tag_document-question-answering,pipeline_tag_feature-extraction,pipeline_tag_fill-mask,pipeline_tag_graph-ml,...,library_name_span_marker,library_name_speechbrain,library_name_stable-baselines3,library_name_stable-diffusion,library_name_stanza,library_name_timm,library_name_transformers,library_name_txtai,library_name_ultralytics,library_name_yolov5
0,jonatasgrosman/wav2vec2-large-xlsr-53-english,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,bert-base-uncased,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
2,Davlan/distilbert-base-multilingual-cased-ner-hrl,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,gpt2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,xlm-roberta-base,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0


In [18]:
# Step 3: Calculate pairwise Jaccard similarity
similarity_matrix = 1 - pairwise_distances(df_one_hot.to_numpy(), metric='jaccard')
print(similarity_matrix.shape)



(10000, 10000)


In [41]:
def get_recommendations(df, similarity_matrix, modelId, recommend_no, method = "jaccard"):
    if modelId in df['modelId'].values:
        index = df[df['modelId'] == modelId].index[0]
        print(f"Find model of index {index}")
        
    else:
        print(f'Error: product_name "{product_name}" not found in dataframe.')
    
    if "jaccard" == method.lower():
        similar_indices = similarity_matrix[index].argsort()[::-1][1:recommend_no+1]
        similar_models = df.iloc[similar_indices]['modelId']
        similarity_scores = similarity_matrix[index][similar_indices]
        
        print(f'Target Model: {df.iloc[index]["modelId"]}')
        print(f'{recommend_no} Recommended Models:\n{similar_models}')
        print(f'Similarity Scores: {similarity_scores}')
    
    else:
        print(f"{method} not supported")

In [42]:
get_recommendations(df_one_hot, similarity_matrix, "bert-base-uncased", 5, method="jaccard")

Find model of index 1
Target Model: bert-base-uncased
5 Recommended Models:
4376                             asafaya/bert-mini-arabic
9562    anon-submission-mk/bert-base-macedonian-bulgar...
3610                     jcblaise/bert-tagalog-base-cased
1292                                             hfl/rbt3
5972             pierreguillou/bert-base-cased-pt-lenerbr
Name: modelId, dtype: object
Similarity Scores: [1. 1. 1. 1. 1.]


In [43]:
df_mini.iloc[[1,4376,9562, 3610]]

Unnamed: 0,modelId,pipeline_tag,architectures,model_type,library_name
1,bert-base-uncased,fill-mask,'BertForMaskedLM',bert,transformers
4376,asafaya/bert-mini-arabic,fill-mask,'BertForMaskedLM',bert,transformers
9562,anon-submission-mk/bert-base-macedonian-bulgar...,fill-mask,'BertForMaskedLM',bert,transformers
3610,jcblaise/bert-tagalog-base-cased,fill-mask,'BertForMaskedLM',bert,transformers


Note how the similarity score is just 1 for the top 5 records, essentially, the performance of this simple model is likely the same as doing a filtered search based on these parameters.

This can serves as a simple baseline, and we may have to consider more features to yield more interesting results than a filtered search

In [44]:
get_recommendations(df_one_hot, similarity_matrix, "asafaya/bert-mini-arabic", 5, method="jaccard")

Find model of index 4376
Target Model: asafaya/bert-mini-arabic
5 Recommended Models:
4376                             asafaya/bert-mini-arabic
9562    anon-submission-mk/bert-base-macedonian-bulgar...
3610                     jcblaise/bert-tagalog-base-cased
1292                                             hfl/rbt3
5972             pierreguillou/bert-base-cased-pt-lenerbr
Name: modelId, dtype: object
Similarity Scores: [1. 1. 1. 1. 1.]


When we chain the search eg. get_recommend for Model 1, method returns Model 2,3,4. get_recommend for 2 may not return Model 1. It is not cyclic

## Cosine Similarity

![2b4a7a82-ad4c-4b2a-b808-e423a334de6f.png (488×376)](https://www.oreilly.com/api/v2/epubs/9781788295758/files/assets/2b4a7a82-ad4c-4b2a-b808-e423a334de6f.png)

In [49]:
import sys
sys.path.append('../src')
import preprocess

In [54]:
pwd

'/Users/walter/code/aiap12/HFMRS/notebooks'

In [60]:
input_path= "../../data/raw/data.csv"
output_path = "data/processed/data.csv"

preprocess_pipeline = preprocess.Preprocess(input_path = input_path, output_path = output_path, return_df=True)

df_soup = preprocess_pipeline.main()

22:41:45 : INFO : Soup of words
22:41:47 : INFO : Cleaned Data
22:41:47 : INFO : Returning Pandas dataframe
22:41:47 : INFO : Exported to csv file - /Users/walter/code/aiap12/HFMRS/data/processed/data.csv


In [61]:
df_soup.head()

Unnamed: 0,modelId,tags,pipeline_tag,author,architectures,model_type,datasets,downloads,library_name,soup
0,jonatasgrosman/wav2vec2-large-xlsr-53-english,pytorch jax safetensors wav2vec2 ...,automatic speech recognition,jonatasgrosman,wav2vec2forctc,wav2vec2,common voice mozilla foundation common vo...,47102358,transformers,pytorch jax safetensors wav2vec2 ...
1,bert-base-uncased,pytorch tf jax rust safetensors ...,fill mask,,bertformaskedlm,bert,bookcorpus wikipedia,46484719,transformers,pytorch tf jax rust safetensors ...
2,Davlan/distilbert-base-multilingual-cased-ner-hrl,pytorch tf distilbert token classif...,token classification,davlan,distilbertfortokenclassification,distilbert,,29407063,transformers,pytorch tf distilbert token classif...
3,gpt2,pytorch tf jax tflite rust sa...,text generation,,gpt2lmheadmodel,gpt2,,21999611,transformers,pytorch tf jax tflite rust sa...
4,xlm-roberta-base,pytorch tf jax onnx safetensors ...,fill mask,,xlmrobertaformaskedlm,xlm roberta,,20333162,transformers,pytorch tf jax onnx safetensors ...


In [66]:
# Import CountVectorizer and create the count matrix
from sklearn.feature_extraction.text import CountVectorizer

# Compute the Cosine Similarity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity

count = CountVectorizer()
count_matrix = count.fit_transform(df_soup['soup'])

#Construct a reverse map of indices and movie titles
indices = pd.Series(df_soup.index, index=df_soup['modelId']).drop_duplicates()

cosine_sim = cosine_similarity(count_matrix, count_matrix)

In [68]:
def get_recommendations_cosine(df, cosine_sim, model, recommend_no):
    # Get the index of the movie that matches the title
    idx = indices[model]

    # Get the pairwise similarity scores of all models with that models
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the models based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the X most similar movies
    sim_scores = sim_scores[1:recommend_no+1]

    # Get the movie indices
    return_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return df.iloc[return_indices]

In [70]:
cosine_sim_result = get_recommendations_cosine(df_soup, cosine_sim, 'bert-base-uncased', 5)
print(cosine_sim_result['modelId'])

11                            bert-base-cased
66                         bert-large-uncased
218     bert-large-uncased-whole-word-masking
303                          bert-large-cased
2091      bert-large-cased-whole-word-masking
Name: modelId, dtype: object


In [71]:
# take jaccard similarity and compare again
get_recommendations(df_one_hot, similarity_matrix, "bert-base-uncased", 5, method="jaccard")

Find model of index 1
Target Model: bert-base-uncased
5 Recommended Models:
4376                             asafaya/bert-mini-arabic
9562    anon-submission-mk/bert-base-macedonian-bulgar...
3610                     jcblaise/bert-tagalog-base-cased
1292                                             hfl/rbt3
5972             pierreguillou/bert-base-cased-pt-lenerbr
Name: modelId, dtype: object
Similarity Scores: [1. 1. 1. 1. 1.]


The results are different.

## KNN

[sklearn.neighbors.NearestNeighbors — scikit-learn 1.2.2 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html)

In [74]:
from sklearn.neighbors import NearestNeighbors

In [86]:
knn_feat_cols = df_one_hot.iloc[:, 1:]

In [88]:
X = knn_feat_cols

In [89]:
knn = NearestNeighbors(n_neighbors=len(df), algorithm='auto').fit(X)
distances, indices = knn.kneighbors(X)

In [93]:
distances

array([[0.        , 0.        , 0.        , ..., 2.82842712, 2.82842712,
        2.82842712],
       [0.        , 0.        , 0.        , ..., 2.82842712, 2.82842712,
        2.82842712],
       [0.        , 0.        , 0.        , ..., 2.82842712, 2.82842712,
        2.82842712],
       ...,
       [0.        , 0.        , 0.        , ..., 2.82842712, 2.82842712,
        2.82842712],
       [0.        , 0.        , 0.        , ..., 2.82842712, 2.82842712,
        2.82842712],
       [0.        , 1.41421356, 1.41421356, ..., 2.82842712, 2.82842712,
        2.82842712]])

In [91]:
def get_recommendations_knn(df, distances, indices, modelId, recommend_no):
    if modelId in df['modelId'].values:
        index = df[df['modelId'] == modelId].index[0]
        print(f"Find model of index {index}")
        
    else:
        print(f'Error: product_name "{product_name}" not found in dataframe.')

    similar_indices = indices[index].argsort()[::-1][1:recommend_no+1]
    similar_models = df.iloc[similar_indices]['modelId']
    similarity_scores = 1 - distances[index][1:recommend_no+1]

    print(f'Target Model: {df.iloc[index]["modelId"]}')
    print(f'{recommend_no} Recommended Models:\n{similar_models}')
    print(f'Similarity Scores: {similarity_scores}') 

In [92]:
get_recommendations_knn(df_mini, distances, indices, "bert-base-uncased", 5)

Find model of index 1
Target Model: bert-base-uncased
5 Recommended Models:
1283                    microsoft/GODEL-v1_1-base-seq2seq
5859                        theojolliffe/bart-cnn-science
5857                       patrickvonplaten/wav2vec2-base
5850    ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli
5849                    sultan/BioM-ALBERT-xxlarge-SQuAD2
Name: modelId, dtype: object
Similarity Scores: [1. 1. 1. 1. 1.]


In [90]:
product_index = 1  # Index of product to recommend similar products for
k = 5  # Number of similar products to recommend
similar_indices = indices[product_index][1:k+1]  # Ignore first index, which is the product itself

# 4. Return the k most similar products
similar_products = df.iloc[similar_indices]['modelId']
similarity_scores = 1 - distances[product_index][1:k+1]  # Convert distances to similarities

print(f'Target Product: {df.iloc[product_index]["modelId"]}')
print(f'{k} Recommended Products:\n{similar_products}')
print(f'Similarity Scores: {similarity_scores}')

Target Product: bert-base-uncased
5 Recommended Products:
4330              IDEA-CCNL/Erlangshen-Ubert-110M-Chinese
4022                        OpenMatch/cocodr-base-msmarco
9443    redewiedergabe/bert-base-historical-german-rw-...
3113                        uer/chinese_roberta_L-2_H-768
226                                        klue/bert-base
Name: modelId, dtype: object
Similarity Scores: [1. 1. 1. 1. 1.]


## K-Means

In [97]:
from sklearn.cluster import KMeans

In [94]:
df_mini

Unnamed: 0,modelId,pipeline_tag,architectures,model_type,library_name
0,jonatasgrosman/wav2vec2-large-xlsr-53-english,automatic-speech-recognition,'Wav2Vec2ForCTC',wav2vec2,transformers
1,bert-base-uncased,fill-mask,'BertForMaskedLM',bert,transformers
2,Davlan/distilbert-base-multilingual-cased-ner-hrl,token-classification,'DistilBertForTokenClassification',distilbert,transformers
3,gpt2,text-generation,'GPT2LMHeadModel',gpt2,transformers
4,xlm-roberta-base,fill-mask,'XLMRobertaForMaskedLM',xlm-roberta,transformers
...,...,...,...,...,...
9995,ans/vaccinating-covid-tweets,text-classification,'RobertaForSequenceClassification',roberta,transformers
9996,bakrianoo/sinai-voice-ar-stt,automatic-speech-recognition,'Wav2Vec2ForCTC',wav2vec2,transformers
9997,danyaljj/gpt2_question_generation_given_paragr...,text-generation,'GPT2LMHeadModel',gpt2,transformers
9998,dkleczek/Polish-Hate-Speech-Detection-Herbert-...,text-classification,'BertForSequenceClassification',bert,transformers


In [95]:
df_oh_feats = pd.get_dummies(df_mini.iloc[:, 1:])
df_oh_feats

Unnamed: 0,pipeline_tag_audio-classification,pipeline_tag_audio-to-audio,pipeline_tag_automatic-speech-recognition,pipeline_tag_conversational,pipeline_tag_depth-estimation,pipeline_tag_document-question-answering,pipeline_tag_feature-extraction,pipeline_tag_fill-mask,pipeline_tag_graph-ml,pipeline_tag_image-classification,...,library_name_span_marker,library_name_speechbrain,library_name_stable-baselines3,library_name_stable-diffusion,library_name_stanza,library_name_timm,library_name_transformers,library_name_txtai,library_name_ultralytics,library_name_yolov5
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,1,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9996,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9997,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9998,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [110]:
kmeans = KMeans(n_clusters=10)
kmeans.fit(df_oh_feats)



In [112]:
df_one_hot

Unnamed: 0,modelId,pipeline_tag_audio-classification,pipeline_tag_audio-to-audio,pipeline_tag_automatic-speech-recognition,pipeline_tag_conversational,pipeline_tag_depth-estimation,pipeline_tag_document-question-answering,pipeline_tag_feature-extraction,pipeline_tag_fill-mask,pipeline_tag_graph-ml,...,library_name_span_marker,library_name_speechbrain,library_name_stable-baselines3,library_name_stable-diffusion,library_name_stanza,library_name_timm,library_name_transformers,library_name_txtai,library_name_ultralytics,library_name_yolov5
0,jonatasgrosman/wav2vec2-large-xlsr-53-english,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,bert-base-uncased,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
2,Davlan/distilbert-base-multilingual-cased-ner-hrl,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,gpt2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,xlm-roberta-base,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,ans/vaccinating-covid-tweets,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9996,bakrianoo/sinai-voice-ar-stt,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9997,danyaljj/gpt2_question_generation_given_paragr...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
9998,dkleczek/Polish-Hate-Speech-Detection-Herbert-...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [116]:
def get_recommendations_kmeans(df, df_encoded, model, modelId, recommend_no):
    # Encode the input product's features
    product_encoded = df[df['modelId'] == modelId].drop('modelId', axis=1)
    # print(product_encoded)
    # Predict the cluster of the input product
    cluster = kmeans.predict(product_encoded)
    
    # Get indices of all products in the same cluster as the input product
    indices = df_encoded[kmeans.labels_ == cluster[0]].index
    
    # Compute similarity scores between the input product and all products in the same cluster
    similarity_scores = cosine_similarity(product_encoded, df_encoded.loc[indices])[0]
    
    # Sort the products by similarity score and return top k recommendations
    similar_indices = similarity_scores.argsort()[::-1][1:recommend_no+1]
    similar_products = df.iloc[indices[similar_indices]]['modelId']
    
    return similar_products

In [117]:
similar_products_kmeans = get_recommendations_kmeans(df_one_hot, df_oh_feats, kmeans, "bert-base-uncased", 5)
similar_products_kmeans

4349    KBLab/bert-base-swedish-cased-new
4035                  SI2M-Lab/DarijaBERT
4096    alexanderfalk/danbert-small-cased
4098               avichr/Legal-heBERT_ft
4102      recobo/agriculture-bert-uncased
Name: modelId, dtype: object

## Hierarchical Clustering (#todo)

In [118]:
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

In [121]:
def get_recommendations_hierarchical(df, df_encoded, modelId, recommend_no):
    # Encode the input product's features
    product_encoded = df[df['modelId'] == modelId].drop('modelId', axis=1)
    
    # Compute pairwise cosine similarity between all products
    similarity_matrix = cosine_similarity(df_encoded)
    
    # Perform hierarchical clustering on the pairwise similarity matrix
    linkage_matrix = linkage(similarity_matrix, method='ward')
    
    # Determine the cluster containing the input product
    cluster = fcluster(linkage_matrix, t=0.5, criterion='distance', depth=10)
    return cluster

#todo

In [123]:
cluster = get_recommendations_hierarchical(df_one_hot, df_oh_feats, "bert-base-uncased", 5)
cluster

array([471, 148, 523, ..., 242, 147, 187], dtype=int32)

In [125]:
cluster.size

10000

Aborted for now as it takes very long to run

Note that hierarchical clustering can be computationally expensive for large datasets, so it may not be the best choice for very large product catalogs.

## DBSCAN (#todo)

[Movie Recommendation System (DBSCAN) | Kaggle](https://www.kaggle.com/code/olyapotemkina/movie-recommendation-system-dbscan)

In [126]:
from sklearn.cluster import DBSCAN

In [132]:
similarity_matrix = cosine_similarity(df_oh_feats)

In [133]:
dbscan = DBSCAN(eps=0.5, min_samples=2, metric="precomputed")

In [134]:
cluster_labels = dbscan.fit_predict(similarity_matrix)

In [131]:
len(df_oh_feats)

10000

In [136]:
cluster_labels.size

10000