# Web 3 Recruiter Skillset Recommendations using Market Basket Analysis

## 1. Introduction

In the rapidly evolving Web 3 space, identifying the right candidates with the perfect combination of skills is essential. **Market Basket Analysis (MBA)**, a popular technique originally used in retail to analyze customer purchases, can be repurposed to help recruiters discover common **skillsets** that tend to appear together in successful candidates. In our Web 3 job recruitment platform **Swip3**, we apply **Market Basket Analysis** to recommend candidates based on the skillsets that are frequently associated with one another.

For recruiters, this provides critical insights into not only which skills are essential for a given role but also which combinations of skills are most likely to lead to success in the dynamic Web 3 landscape. **Swip3** leverages these insights to optimize candidate recommendations, ensuring that recruiters find individuals who possess complementary and valuable skills for specific job roles in decentralized technologies.

## 2. Use Case: Swip3 Skillset Recommendations

**Swip3** aims to match Web 3 researchers and developers with companies seeking talent in the blockchain and decentralized technology space. By applying **Market Basket Analysis** to past successful hires and candidate skillsets, we can identify **frequent patterns** in skill combinations that are valued by recruiters.

For example:
- Candidates proficient in **Solidity** are often skilled in **Ethereum** development.
- Those with experience in **Smart Contracts** often have expertise in building **dApps**.
- Developers familiar with **Zero-Knowledge Proofs** may also have experience in **Blockchain Security**.

These relationships can guide recruiters to select candidates with the most **comprehensive** and **complementary** skillsets, even if those skills were not explicitly mentioned in the job description.

## 3. Definition of Algorithm

**Market Basket Analysis** is powered by **association rule learning**, a rule-based machine learning technique that identifies relationships between items in a dataset. In the case of **Swip3**, the "items" are **skills** or **competencies** possessed by candidates. The objective is to find **strong associations** between these skillsets, which can then be used to make more informed recruitment decisions.

The most commonly used algorithm for this analysis is the **Apriori Algorithm**, which finds frequent itemsets (skill combinations) and generates **association rules** that predict the likelihood of one skill leading to another. This is based on two key metrics:


### 3.1. Support
- **Definition**: Support measures how frequently a combination of skillsets (itemset) appears in the dataset. For example, if **Solidity** and **Smart Contracts** frequently co-occur in the profiles of successful candidates, this combination will have high support.
  
$$
Support(X \Rightarrow Y) = \frac{|X \cup Y|}{N}
$$

Where \( X \) and \( Y \) represent skillsets, and \( N \) is the total number of candidate profiles analyzed.


### 3.2. Confidence
- **Definition**: Confidence measures the strength of the association between skillsets. It tells us how likely it is that a candidate who possesses one skill will also possess the related skill.
  
$$
Confidence(X \Rightarrow Y) = \frac{|X \cap Y|}{|X|}
$$

For example, if a candidate is skilled in **Solidity**, how likely are they to also have experience with **Ethereum**?


### 3.3. Lift
- **Definition**: Lift measures how much more likely two skillsets are to appear together than if they were independent of each other. A Lift greater than 1 indicates a strong association.
  
$$
Lift(X \Rightarrow Y) = \frac{Confidence(X \Rightarrow Y)}{Support(Y)}
$$

For example, if candidates proficient in **Blockchain Security** are much more likely to also know **Zero-Knowledge Proofs** than by chance, the Lift value will be significantly greater than 1.

### 3.4 Conviction
The conviction of a rule is defined as

$$
conv(X \Rightarrow Y) = \frac{1 - supp(Y)}{1 - conf(X \Rightarrow Y)}
$$

It can be interpreted as the ratio of the expected frequency that X occurs without Y if X and Y were independent divided by the observed frequency of incorrect predictions. A high value means that the consequent depends strongly on the antecedent.


## 4. Application in Swip3

In **Swip3**, we apply the **Apriori Algorithm** to analyze candidate profiles, discovering frequent patterns in Web 3 skillsets. Here’s how this can be used:

### 4.1. Identifying Strong Skill Combinations
- By analyzing historical data of successful hires, we can determine common skill combinations that are highly valued by recruiters. For example, we might find that candidates proficient in **DeFi protocols** also tend to have skills in **Smart Contract Auditing**.

### 4.2. Improving Job Descriptions and Recommendations
- Based on the findings of our analysis, **Swip3** can recommend improved job descriptions to recruiters, suggesting which skill combinations are most likely to attract candidates who meet the role’s requirements.
  
For instance:
- A job posting for a **Web 3 Security Researcher** might be updated to highlight **Zero-Knowledge Proofs** and **Smart Contract Auditing**, based on frequent co-occurrences in previous hires.

### 4.3. Personalized Candidate Recommendations
- Recruiters can benefit from personalized candidate suggestions based on skillset combinations. If a recruiter is looking for a **Solidity Developer**, Swip3 can suggest candidates who also possess complementary skills such as **Ethereum**, **dApps development**, or **Zero-Knowledge Proofs**, increasing the likelihood of a successful hire.


## Load Library

In [1]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from bs4 import BeautifulSoup

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity

## Dataset Import

In [3]:
job_list = pd.read_csv('C:/Users/joeyi/Swip3/model/web3_researcher_skills.csv', index_col=False)
job_list

Unnamed: 0,ITEM1,ITEM2,ITEM3,ITEM4
0,Solidity,Ethereum,Smart Contracts,dApps
1,Smart Contracts,Solidity,Blockchain,dApps
2,Smart Contracts,Ethereum,Smart Contracts,dApps
3,DeFi,Web3.js,dApps,Zero-Knowledge Proofs
4,Blockchain,Smart Contracts,Smart Contracts,dApps
...,...,...,...,...
95,DeFi,Web3.js,dApps,Zero-Knowledge Proofs
96,Blockchain,Smart Contracts,Smart Contracts,dApps
97,Smart Contracts,Ethereum,dApps,Zero-Knowledge Proofs
98,Blockchain,Web3.js,dApps,Zero-Knowledge Proofs


## Data Preprocessing

In [6]:
num_records = len(job_list)
num_records

100

In [28]:
records = []
for i in range(0, num_records):
    records.append([str(job_list.values[i,j]) 
                  for j in range(0,4)])

In [30]:
import pandas as pd
records = []
for i in range(0, num_records):
    records.append([str(job_list.values[i, j]) for j in range(0, 4)])
df = pd.DataFrame(records)
one_hot_encoded_df = pd.get_dummies(df.stack()).groupby(level=0).sum()
print(one_hot_encoded_df)


    Blockchain  Cryptography  DeFi  Ethereum  Smart Contracts  Solidity  \
0            0             0     0         1                1         1   
1            1             0     0         0                1         1   
2            0             0     0         1                2         0   
3            0             0     1         0                0         0   
4            1             0     0         0                2         0   
..         ...           ...   ...       ...              ...       ...   
95           0             0     1         0                0         0   
96           1             0     0         0                2         0   
97           0             0     0         1                1         0   
98           1             0     0         0                0         0   
99           1             0     0         1                0         0   

    Web3.js  Zero-Knowledge Proofs  dApps  
0         0                      0      1  
1         0

## Application of Apriori

#### Specify the parameters of apriori class.

- The list
- min_support
- min_confidence
- min_lift
- min_length (the minimum number of items that you want in your rules, typically 2)

In [39]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Assuming your records are already populated as a list of transactions
records = []
for i in range(0, num_records):
    records.append([str(job_list.values[i, j]) for j in range(0, 4)])

# Step 1: Convert the records into a DataFrame for one-hot encoding
df = pd.DataFrame(records)

# Step 2: Perform one-hot encoding using pd.get_dummies
one_hot_encoded_df = pd.get_dummies(df.stack()).groupby(level=0).sum()

# Step 3: Ensure that the values are binary (convert any value > 1 to 1)
one_hot_encoded_df = one_hot_encoded_df.clip(upper=1)

# Step 4: Apply the Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(one_hot_encoded_df, min_support=0.05, use_colnames=True)

# Step 5: Generate association rules based on the frequent itemsets
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# Step 6: Filter the rules based on lift and the length of the antecedents
rules = rules[(rules['lift'] >= 3) & (rules['antecedents'].apply(len) >= 2)]

# Step 7: Convert the association rules to a list, including support, confidence, and lift
results = []
for index, rule in rules.iterrows():
    antecedents = list(rule['antecedents'])
    consequents = list(rule['consequents'])
    rule_str = ', '.join(antecedents) + " -> " + ', '.join(consequents)
    
    support = rule['support']
    confidence = rule['confidence']
    lift = rule['lift']
    
    # Store the rule, support, confidence, and lift
    results.append((rule_str, support, confidence, lift))

# Step 8: Convert the results to a DataFrame
labels = ['Rule', 'Support', 'Confidence', 'Lift']
association_results_df = pd.DataFrame(results, columns=labels)

# Display the DataFrame
print(association_results_df)

# Step 9: Print the number of association rules generated
print(f"Number of association rules generated: {len(association_results_df)}")


                                                Rule  Support  Confidence  \
0             Zero-Knowledge Proofs, DeFi -> Web3.js     0.05    1.000000   
1                             dApps, DeFi -> Web3.js     0.05    0.555556   
2  Blockchain, Web3.js -> Zero-Knowledge Proofs, ...     0.05    1.000000   
3      Zero-Knowledge Proofs, dApps, DeFi -> Web3.js     0.05    1.000000   
4      Zero-Knowledge Proofs, DeFi -> dApps, Web3.js     0.05    1.000000   
5      dApps, DeFi -> Zero-Knowledge Proofs, Web3.js     0.05    0.555556   
6      Web3.js, DeFi -> Zero-Knowledge Proofs, dApps     0.05    1.000000   

       Lift  
0  5.555556  
1  3.086420  
2  3.125000  
3  5.555556  
4  7.142857  
5  3.086420  
6  3.125000  
Number of association rules generated: 7




In [41]:
association_results_df

Unnamed: 0,Rule,Support,Confidence,Lift
0,"Zero-Knowledge Proofs, DeFi -> Web3.js",0.05,1.0,5.555556
1,"dApps, DeFi -> Web3.js",0.05,0.555556,3.08642
2,"Blockchain, Web3.js -> Zero-Knowledge Proofs, ...",0.05,1.0,3.125
3,"Zero-Knowledge Proofs, dApps, DeFi -> Web3.js",0.05,1.0,5.555556
4,"Zero-Knowledge Proofs, DeFi -> dApps, Web3.js",0.05,1.0,7.142857
5,"dApps, DeFi -> Zero-Knowledge Proofs, Web3.js",0.05,0.555556,3.08642
6,"Web3.js, DeFi -> Zero-Knowledge Proofs, dApps",0.05,1.0,3.125
