# 0. Overview

Take discussion questions of Devopedia articles and calculate similarity.

For sentence embeddings, we use [Sentence BERT](https://www.sbert.net/).

To compute similarity by comparing embeddings, we use [FAISS](https://github.com/facebookresearch/faiss). An old version of `faiss` can be installed from PyPI but we prefer a more recent version. For this, we install `faiss` via Conda.

# 1. Installation

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

In [12]:
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
!conda install -c pytorch faiss-gpu

In [None]:
!pip install -U sentence-transformers

In [14]:
# Connect to Google Drive to get Devopedia dataset
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# 2. Preprocess Data

In [15]:
ArticleFile = '/content/drive/My Drive/NLP-Resources/Devopedia/devopediaArticles.v2525.json'
import pandas as pd
articles = pd.read_json(ArticleFile, orient='index')

In [21]:
secs = articles['secs'].apply(pd.Series)

In [30]:
def get_qsts(qas):
    return tuple(qa['qst'] for qa in qas)

In [38]:
qsts = secs['Discussion'].apply(get_qsts).explode()

# 3. Main Task

In [55]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
embeddings = model.encode(qsts.tolist())

In [59]:
import faiss
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

In [68]:
k = 4 # three nearest matches but exclude self when printing
D, I = index.search(embeddings, k)

In [91]:
# Show some results
toshow = np.random.choice(I.shape[0], 10)
for i, results in zip(toshow, I[toshow,:]):
    print("Question:", qsts.iloc[i])
    print(" Nearest: ")
    for result in results[1:]: # exclude self
        print("         ", qsts.iloc[result])
    print("")

Question: In which languages are dependencies more often used?
 Nearest: 
          Could you give examples of dependency managers in various languages?
          What are the different types of dependency injection?
          Are there frameworks for dependency injection?

Question: What are some use cases of fingerprinting algorithms?
 Nearest: 
          What are the characteristics of a good fingerprinting algorithm?
          Could you describe some fingerprinting algorithms?
          How's digital fingerprinting related to fingerprinting algorithms?

Question: What are the differences between 4G/LTE and 5G NR channels?
 Nearest: 
          How is 5G NR RLC different from LTE RLC?
          How do 5G NR L2 sub-layers differ from LTE's E-UTRA sub-layers?
          What's the difference between 5G NR L1 and L3 measurements?

Question: How does Xamarin compare against other cross-platform frameworks?
 Nearest: 
          When should I use Xamarin? 
          Why should I use Xamarin