# Searching Embeddings
Once we have generated the embeddings, we want to be able to search them. In order to do this efficiently, we use `gensim`. However, to do this we first must make the embeddings in a file format that `gensim` accepts. Assuming that you have run the `create_embeddings` notebook, the code below will convert that output file, `test_embed.txt`, into the glove file format.

In [None]:
in_file = "test_embed.txt"
out_file = "test_embed_glove.txt"

vec_size = 0
line_count = 0

with open(in_file, "r") as in_file:
    with open(out_file, "w") as out_file:
        for line in in_file:
            row = line.split("\t")
            if len(row) == 3 and row[1] == "text_embedding":
                line_count += 1

                row_id = row[0]
                vector = row[2].split(",")

                if vec_size == 0:
                    vec_size = len(vector)

                reformat_vector = ['{:.3f}'.format(float(x)) for x in vector]
                full_row = f"{row_id} {' '.join(reformat_vector)}\n"

                out_file.write(full_row)

header_out = "test_embed_header.txt"
with open(header_out, "w") as out_file:
    out_file.write(f"{line_count} {vec_size}\n")

## Another conversion
However, in order to get this into the format that we want, we have to add a header with the number of lines and the dimension of the vector. The above script outputted that, so we can do this with the following command

In [None]:
!cat test_embed_header.txt test_embed_glove.txt > test_embed_wordvec.txt

In [None]:
!cat test_embed_wordvec.txt

# Loading the file and doing searches
Now, we want to load this file and do some searches. However, the search wouldn't be very interesting because the above file only contains one software entry. We've included the larger file below, and can use this to test out the searches

In [None]:
!gunzip software_wordvec.txt.gz

In [None]:
print("importing sentence_transformers")
from sentence_transformers import SentenceTransformer
print("importing gensim")
from gensim.models import KeyedVectors

print("loading model")
model = SentenceTransformer("bert-base-nli-cls-token")

print("loading embeddings")
software_embeddings = KeyedVectors.load_word2vec_format("software_wordvec.txt", binary=False)


while True:
    query = input("What is your query?: ")
    embedding = model.encode([query])[0]
    results = software_embeddings.similar_by_vector(embedding, topn=10)
    for obj_id, similarity in results:
        # get the github url from the object id
        parts = obj_id.split("/")
        github_url = f"https://github.com/{parts[1]}/{parts[2]}"
        
        
        print(f"cosine similarity: {'{:.3f}'.format(similarity)}, url: {github_url}")

importing sentence_transformers
importing gensim
loading model
loading embeddings
What is your query?: this is a test
cosine similarity: 0.960, url: https://github.com/jolespin/soothsayer
cosine similarity: 0.960, url: https://github.com/bslceb/planets
cosine similarity: 0.956, url: https://github.com/Chiliad-Spring/TACH-Two-level-Attributed-Consistent-Hashing
cosine similarity: 0.924, url: https://github.com/NevermoreBryce/Spoon-Knife
cosine similarity: 0.872, url: https://github.com/TobbeTripitaka/strat2file
cosine similarity: 0.861, url: https://github.com/GiuliaFedrizzi/swd2_2020-03-11
cosine similarity: 0.859, url: https://github.com/unicus-skmk/test
cosine similarity: 0.859, url: https://github.com/ppernot/SK-Ana
cosine similarity: 0.859, url: https://github.com/TrapperTeam/Trapper
cosine similarity: 0.859, url: https://github.com/downsj/charts
