<a href="https://colab.research.google.com/github/ddannenb/sentence-transformers/blob/master/sentence_similarity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers

In [None]:
!pip install -U sentence-transformers

In [None]:
import scipy
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')

In [None]:
# Get a sample corpus to search over
_c="""
Local government should focus on technology sector
Discussion:
Bringing in technology companies - particularly software, bio tech, renewable energy and clean manufacturing is important to the economic growth of Northern New Mexico.
Now, regional government is doing too little to promote this direction with current activity and investment almost exclusively centered on growing tourism.
I’ve worked in Tech for over 35 years.
The first 20 of those for big companies in Phoenix and the more recent 18 years in a variety of ventures while living here in Santa Fe.
I can say with certainty that it is possible to run a Tech company here but it’s not an easy or obvious place to do so.
Also, for me, doing Tech in New Mexico has meant hiring remote, out of state employees, working with investors and partners long distance and typically billing out of state clients.
Very little of my business has contributed directly back to the Northern New Mexico Economy in creating new jobs.
This can change if local government and the handful of tech companies currently here work together to make it happen.
What is the motivation to shift local government away from a near singular focus on tourism and onto growing Tech?
Foremost, Tech companies pay high salaries and their demands on local resources and infrastructure are relatively low, particularly in comparison to tourism.
According to glassdoor.com, which surveys salaries nationwide, software engineers earn on average 103K per year.
Adding tech companies and their high paying jobs to the economic mix of the region means that more cash flows into the community which means more tax revenues and more business for the established construction, service and retail industry.
High paying companies provide opportunities for salary growth in the community as local workers develop needed skills through workforce training or pursuing advanced degrees.
As for infrastructure, a software company needs employees, an accessible office space, high speed internet and not much else to produce a valuable product.
The region cannot sustain continued growth of tourism which in contrast is a strain on resources.
It requires steady investment in infrastructures such as utilities, roads, policing and emergency services to support growing peak tourist numbers.
It does little to improve resident quality of life bringing more crowds and higher prices.
With low salaries in hospitality – 43K per year average, according to glass door – there is little salary growth opportunity for those working in the industry.
What is the equation for bringing in Tech?
Northern New Mexico already has many of the pieces in place to attract Tech.
It is a great place to live.
It has a beautiful and rich cultural heritage and a great mix of city life and the outdoors.
It’s people values diversity and sustainability, which is in line with the belief systems of Tech.
LANL is already contributing in providing some technology spin outs and has the intellectual capacity to fuel more local innovation if other pieces are in place.
Tech companies thrive in a place that has other tech companies, universities and investors that create an innovation ecosystem.
Although, typically a research university is central to the ecosystem, SFCC has shown that it can meet the challenge having already deployed complex initiatives in areas such as biofuels production, micro grids, solar energy, and greenhouse operations plus their partnerships with UNM and other research universities.
I think that SFCC can be a cornerstone for developing a Tech ecosystem here.
"""

In [None]:
# Convert the corpus into a list of headlines
corpus=[i for i in _c.split('\n')if i != ''and len(i.split(' '))>=4]

In [None]:
# Get a vector for each headline (sentence) in the corpus
corpus_embeddings = model.encode(corpus)

In [None]:
# Define search queries and embed them to vectors as well
queries = [
    'Can Santa Fe survive the continued strain on resources caused by tourism?']
query_embeddings = model.encode(queries)

In [None]:
# For each search term return closest s_nentences
closest_n = 2
for query, query_embedding in zip(queries, query_embeddings):
    distances = scipy.spatial.distance.cdist([query_embedding], corpus_embeddings, "cosine")[0]

    results = zip(range(len(distances)), distances)
    results = sorted(results, key=lambda x: x[1])

    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")

    for idx, distance in results[0:closest_n]:
        print(corpus[idx].strip(), "(Score: %.4f)" % (1-distance))


In [57]:
# Clustering

from sklearn.cluster import KMeans
import numpy as np

num_clusters = 5
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_
for i in range(num_clusters):
    print()
    print(f'Cluster {i + 1} contains:')
    clust_sent = np.where(cluster_assignment == i)
    for k in clust_sent[0]:
        print(f'- {corpus[k]}')


Cluster 1 contains:
- Bringing in technology companies - particularly software, bio tech, renewable energy and clean manufacturing is important to the economic growth of Northern New Mexico.
- This can change if local government and the handful of tech companies currently here work together to make it happen.
- Adding tech companies and their high paying jobs to the economic mix of the region means that more cash flows into the community which means more tax revenues and more business for the established construction, service and retail industry.
- High paying companies provide opportunities for salary growth in the community as local workers develop needed skills through workforce training or pursuing advanced degrees.
- It requires steady investment in infrastructures such as utilities, roads, policing and emergency services to support growing peak tourist numbers.
- Northern New Mexico already has many of the pieces in place to attract Tech.
- It’s people values diversity and susta