Welcome!
Downloading the spacy trained model & pipeline

In [None]:
!python -m spacy download en_core_web_sm

In [None]:
!python -m spacy download en_core_web_md

In [None]:
!python -m spacy download en_core_web_lg

Importing Relevant Libraries Keeping a list of tags in **my_list** and searching for tags in **test**


In [None]:
import pandas as pd
import spacy
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
my_list =['Communication', 'Self-learning', 'Attitude', 'Listening Skills', 'Leadership', 'Adaptability', 'Team  Player', 'Python',  'Django', 'PostgreSQL', 'Restframe work', 'Angular', 'Cypress', 'Selenium', 'Java', 'SaaS sales experience', 'Lead generation', 'Problem solving', 'Good  Candidate']
df = pd.DataFrame()
df['tags'] = my_list
test = 'speaking'


We must test which trained models are appropriate for our use cases because we are using trained models. We already know that the word **'speaking'** is directly related to the word **'communication'** We'll see which of these models produces** 'communication'** as the highest score.

In [None]:
nlp = spacy.load("en_core_web_sm")
test_vec = nlp(test)
all_docs =[nlp(row) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = all_docs[i].similarity(test_vec)
  sims.append(sim)
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

Our test fails when we use the model **en core web sm**

In [None]:
nlp = spacy.load("en_core_web_md")
test_vec = nlp(test)
all_docs =[nlp(row) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = all_docs[i].similarity(test_vec)
  sims.append(sim)
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

Our test is partially fulfilled by using the model **en core web md**


In [None]:
nlp = spacy.load("en_core_web_lg")
test_vec = nlp(test)
all_docs =[nlp(row) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = all_docs[i].similarity(test_vec)
  sims.append(sim)
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

Our test is partially fulfilled by using the model **en core web lg**

Having gone through each **spacy** model, We didn't obtain the outcome we were hoping for. Now we'll move on to **Google** models.

In [None]:
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
nlp = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
test_vec = nlp([test])
all_docs =[nlp([row]) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = cosine_similarity(all_docs[i],test_vec)
  sims.append(sim[0][0])
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

Our test is fully satisfied by the use of **universal-sentence-encoder/4**. Observe yet another Google model.

In [None]:
nlp = hub.load("https://tfhub.dev/google/universal-sentence-encoder-large/5")
test_vec = nlp([test])
all_docs =[nlp([row]) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = cosine_similarity(all_docs[i],test_vec)
  sims.append(sim[0][0])
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

Our test fails when we use the model **universal-sentence-encoder-large/5**

In conclusion, Google's **universal-sentence-encoder/4** provides the best outcome for the particular use case.

# The finalized code appears below cell. we can search with any input.


---



In [None]:
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
nlp = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
test_vec = nlp([input()])
all_docs =[nlp([row]) for row in df['tags']]
sims = []
doc_id = []
for i in range(len(all_docs)):
  sim = cosine_similarity(all_docs[i],test_vec)
  sims.append(sim[0][0])
  doc_id.append(i)
  sims_docs= pd.DataFrame(list(zip(doc_id, sims)), columns= ['doc_id', 'sims'])
sims_docs_sorted =sims_docs.sort_values(by='sims', ascending=False)
top_5=df.iloc[sims_docs_sorted['doc_id'][1:6]]
top_5_with_score= pd.concat([top_5,sims_docs_sorted['sims'][1:6]], axis = 1)
top_5_with_score

The top 1 result

In [None]:
print('similar tag is {}{}{} and corresponding similarity score is {}{:.2f}{}'.format('\033[1m',top_5_with_score['tags'].iloc[0], '\033[0m', '\033[1m', top_5_with_score['sims'].iloc[0],'\033[0m'))

P.S. There are lot other trained models available in the market which i have not used here. I have selected only the most popular highly rated models. we can also train from scratch and calculate similarty using  cosine similarity formula Cos(x, y) = x . y / ||x|| * ||y||. There are other methods to calculate similarities as follows, Euclidean Distance, Manhattan Distance, Jaccard Similarity, Minkowski Distance. 

# Thank you.