## Visualizing Urban Dictionary definitions with BERT and t-SNE
In this notebook we'll go through how to take definitions from the [Urban Dictionary](https://www.urbandictionary.com/) and run them through BERT to create sentence embeddings, then visualize them with t-SNE.

In [6]:
import json
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.manifold import TSNE

First we'll define our transformer model. In this case we'll use the RoBERTa-large model trained on STS data, which is optimized for semantic textual similarity. While this is a big model, it ran fine on my laptop.

See the UKP Lab's [sentence-transformers](https://github.com/UKPLab/sentence-transformers) repo for a pretrained model zoo and implementation details.

In [4]:
transformer = SentenceTransformer('roberta-large-nli-stsb-mean-tokens')

100%|██████████| 1.31G/1.31G [02:31<00:00, 8.65MB/s] 


### Names

In [12]:
with open("data/names_definitions.json", 'r') as f:
    data = json.loads(f.read())
terms = [x['term'] for x in data]
definitions = [x['definition'] for x in data]
bert_embeddings = transformer.encode(definitions)
tsne_embeddings = TSNE(n_components=2, perplexity=80, random_state=0).fit_transform(np.array(bert_embeddings))

starsigns = []
for term, definition, emb in zip(terms, definitions, tsne_embeddings.tolist()):
    starsigns.append({
        'term': term,
        'definition': definition,
        'x': emb[0],
        'y': emb[1]
    })

# Write to file
with open("results/names.json", 'w') as f:
    f.write(json.dumps(starsigns))

### Star signs

In [13]:
with open("data/starsigns_definitions.json", 'r') as f:
    data = json.loads(f.read())
terms = [x['term'] for x in data]
definitions = [x['definition'] for x in data]
bert_embeddings = transformer.encode(definitions)
tsne_embeddings = TSNE(n_components=2, perplexity=80, random_state=0).fit_transform(np.array(bert_embeddings))

starsigns = []
for term, definition, emb in zip(terms, definitions, tsne_embeddings.tolist()):
    starsigns.append({
        'term': term,
        'definition': definition,
        'x': emb[0],
        'y': emb[1]
    })

# Write to file
with open("results/starsigns.json", 'w') as f:
    f.write(json.dumps(starsigns))