#### Author: Ernie Sumoso

1- Find the similarity between any two words.

Input :

- word1="amazing"
- word2="terrible"
- word3="excellent"

Desired Output:

- #> similarity between amazing and terrible is 0.46189071343764604
- #> similarity between amazing and excellent is 0.6388207086737778

In [1]:
import nltk
from nltk.corpus import wordnet

def get_similarty(w1, w2):
    s1 = wordnet.synsets(w1)[0]
    s2 = wordnet.synsets(w2)[0]
    print(s1, s2)
    return s1.wup_similarity(s2)

print(get_similarty("amazing", "terrible"))
print(get_similarty("amazing", "excellent"))

Synset('amaze.v.01') Synset('awful.s.02')
0.3333333333333333
Synset('amaze.v.01') Synset('excellent.s.01')
0.3333333333333333


2-Find the similarity between any two text documents

Input :
- text1="John lives in Canada"
- text2="James lives in America, though he's not from there"

Desired Output :
 0.792817083631068

In [2]:
import numpy as np
from numpy.linalg import norm
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

def get_similarity_between_documents(doc1, doc2):
    docs = [doc1, doc2]
    tfidf = TfidfVectorizer()
    feature_matrix = tfidf.fit_transform(docs)
    col_names = tfidf.get_feature_names_out()
    arr = feature_matrix.toarray()
    df = pd.DataFrame(arr, columns = col_names)
    return 1 - np.dot(df.iloc[0], df.iloc[1])/(norm(df.iloc[0])*norm(df.iloc[1]))

doc1 = "John lives in Canada"
doc2 = "James lives in America, though he's not from there"
get_similarity_between_documents(doc1, doc2)

0.7939163649860618

3- How to detect the language of entered text with Spacy ?

Q. Find out the language of the given text

Input :

text= "El agente imprime su pase de abordaje. Los oficiales de seguridad del aeropuerto pasan junto a él con un perro grande. El perro está olfateando alrededor del equipaje de las personas tratando de detectar drogas o explosivos."

Desired Output:

{'language': 'es', 'score': 0.9999963653206719}

 El agente imprime su pase de abordaje. {'language': 'es', 'score': 0.9999969081229643} 

In [3]:
# !pip install spacy_langdetect
# !pip install spacy-language-detection
# !python -m spacy download en_core_web_lg
# !python -m spacy download en_core_web_sm
# !python -m spacy download en

In [5]:
import spacy
from spacy.language import Language
from spacy_language_detection import LanguageDetector

def get_lang_detector(nlp, name):
    return LanguageDetector(seed=42)  # We use the seed 42

def get_text_language(text):
    nlp_model = spacy.load("en_core_web_sm")
    Language.factory("language_detector", func=get_lang_detector)
    nlp_model.add_pipe('language_detector', last=True)
    doc = nlp_model(text)
    return doc._.language

print(get_text_language("El agente imprime su pase de abordaje. Los oficiales de seguridad del aeropuerto pasan junto a él con un perro grande. El perro está olfateando alrededor del equipaje de las personas tratando de detectar drogas o explosivos."))
print(get_text_language("El agente imprime su pase de abordaje."))

{'language': 'es', 'score': 0.9999941404260209}
{'language': 'es', 'score': 0.9999931701366478}
