I tried out different re-ranking models based on the guide on sentence-transformers website, all the top ranked one performed well. 
given that the re-ranking of poems is a subjective test, i dont think the re ranker made any major difference. ymmv

In [2]:
from sentence_transformers import SentenceTransformer, util


In [3]:
model_name = "sentence-transformers/all-mpnet-base-v2"
model = SentenceTransformer(model_name)
model.max_seq_length = 384

In [4]:
corpus_embedding_cache_path = (
    "data/all-poems-corpus-embeddings-sentence-transformers-all-mpnet-base-v2.pkl"
)

In [5]:
import pickle

# Load the Poem embeddings

with open(corpus_embedding_cache_path, "rb") as fIn:
    poem_cache_data = pickle.load(fIn)
    poems = poem_cache_data["poems"]
    corpus_embeddings = poem_cache_data["embeddings"]

In [6]:
import ngtpy

corpus_ix_name = (
    "indices/ngt_index_corpus_embeddings_sentence-transformers-all-mpnet-base-v2"
)
ngt_corpus_index = ngtpy.Index(bytes(corpus_ix_name, encoding="utf8"))


In [101]:
# Load index
# "indices/ngt_index_corpus_embeddings_sentence-transformers-all-mpnet-base-v2" - definite improvement on reranking
import ngtpy

merged_ix_name = (
    "indices/ngt_index_merged_embeddings_sentence-transformers-all-mpnet-base-v2"
)
ngt_merged_index = ngtpy.Index(bytes(corpus_ix_name, encoding="utf8"))


In [65]:
EMB_SIZE = corpus_embeddings.shape[1]

In [None]:
from sentence_transformers import CrossEncoder

cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")


In [196]:
query = ["what it's like to wake up in the morning"]
query_embedding = model.encode(query)


In [197]:
corpus_results = ngt_corpus_index.search(query_embedding, size=15)

# print("ID\tDistance")
# for result in corpus_results:
#     print("{}\t{}".format(*result))
# print(
#     "# of distance computations="
#     + str(ngt_corpus_index.get_num_of_distance_computations())
# )

In [198]:
# Re-Ranking
cross_inp = [[query[0], poems[result[0]]] for result in corpus_results]
cross_scores = cross_encoder.predict(cross_inp)


In [199]:
corpus_result_dict = [{"corpus_id": tup[0], "score": tup[1]}
                      for tup in corpus_results]

# Add 'cross-score' to each dict
for idx in range(len(cross_scores)):
    corpus_result_dict[idx]["cross-score"] = cross_scores[idx]


In [200]:
cross_scores

array([ -4.6374702 ,   4.019782  ,  -0.3510906 ,  -2.6707845 ,
         5.9262915 ,  -5.978517  ,  -6.489192  ,  -0.38145903,
        -8.265888  , -10.927521  ,  -6.7275505 ,  -3.9414783 ,
        -6.9339647 , -10.733882  , -10.004258  ], dtype=float32)

In [202]:
import numpy as np

x = np.concatenate((query_embedding, query_embedding))
merged_results = ngt_merged_index.search(x, size=15)

In [203]:
# Re-Ranking
merged_cross_inp = [[query[0], poems[result[0]]] for result in merged_results]
merged_cross_scores = cross_encoder.predict(merged_cross_inp)


In [204]:
merged_cross_scores


array([  4.019782 ,  -4.6374702,  -1.5630159,  -7.324999 ,  -2.7436388,
        -8.265888 , -10.245801 ,  -0.7476058,   0.6734859,  -0.3510906,
        -6.489192 ,  -8.761709 ,  -5.5522566,  -7.7540994,   1.3845894],
      dtype=float32)

In [205]:
merged_corpus_result_dict = [
    {"corpus_id": tup[0], "score": tup[1]} for tup in merged_results
]

# Add 'cross-score' to each dict
for idx in range(len(merged_cross_scores)):
    merged_corpus_result_dict[idx]["cross-score"] = merged_cross_scores[idx]

In [206]:
[poems[result[0]] for result in corpus_results[0:3]]


['By: James K. Zimmerman\n this morning I felt my life if you were dead the expansiveness of the bed the birds still singing the remnants of the smell of coffee in the morning the emptiness of thought the deafening silence of my heart',
 'By: Rose Fyleman\n I wake in the morning early And always, the very first thing, I poke out my head and I sit up in bed And I sing and I sing and I sing',
 'By: Richard Tagett\n I like to lie with you wordless on black cloud rooft beach in late june 5 o’clock tempest on clump weed bed with sand fitting your contours like tailor made and I like to wash my summer brown face in north cold hudson rapids with octagon soap knees niched in steamy rocks where last night’s frog stared at our buddhist sleep but most of all I like to see the morning happen . . . I like to go down vertical mountains where lanny goshkitch meditated crashing poplars sap sticky arms flailing as thermosed green tea anoints sneakers and blood soakt brow I taste and love myself a split

In [207]:
[poems[result[0]] for result in merged_results[0:3]]

['By: Rose Fyleman\n I wake in the morning early And always, the very first thing, I poke out my head and I sit up in bed And I sing and I sing and I sing',
 'By: James K. Zimmerman\n this morning I felt my life if you were dead the expansiveness of the bed the birds still singing the remnants of the smell of coffee in the morning the emptiness of thought the deafening silence of my heart',
 'By: Kate Rushin\n In the hour before dawn, I rise up to give myself a little bit before it all starts again. “Rise up” is not really what I do; I lie there, awake, on my pallet, and very still, barely breathing. I listen, make sure no one else is stirring, make sure nobody hears me. I take a few moments to listen to my blood beating in my ear, hear my own breath easing out my lips. I let myself sink, ease down again, for just a few minutes in the cool gray before it all starts all over again and goes and goes until the middle of the night and I collapse on rough cloth, too tired to ease into sleep

In [208]:
print("Top-3 Cross-Encoder Re-ranker hits")
print("query was: ", query)
reranked_hits = sorted(
    corpus_result_dict, key=lambda x: x["cross-score"], reverse=True)
for hit in reranked_hits[0:3]:
    print(
        "\t{:.3f}\t{}".format(
            hit["cross-score"], poems[hit["corpus_id"]].replace("\n", " ")
        )
    )


Top-3 Cross-Encoder Re-ranker hits
query was:  ["what it's like to wake up in the morning"]
	5.926	By: Edward Hirsch  I used to mock my father and his chums for getting up early on Sunday morning and drinking coffee at a local spot but now I’m one of those chumps. No one cares about my old humiliations but they go on dragging through my sleep like a string of empty tin cans rattling behind an abandoned car. It’s like this: just when you think you have forgotten that red-haired girl who left you stranded in a parking lot forty years ago, you wake up early enough to see her disappearing around the corner of your dream on someone else’s motorcycle roaring onto the highway at sunrise. And so now I’m sitting in a dimly lit café﻿ full of early morning risers where the windows are covered with soot and the coffee is warm and bitter
	4.020	By: Rose Fyleman  I wake in the morning early And always, the very first thing, I poke out my head and I sit up in bed And I sing and I sing and I sing
	-0.

In [209]:
print("Top-3 Cross-Encoder merged Re-ranker hits")
print("query was: ", query)
merged_reranked_hits = sorted(
    merged_corpus_result_dict, key=lambda x: x["cross-score"], reverse=True
)
for hit in merged_reranked_hits[0:3]:
    print(
        "\t{:.3f}\t{}".format(
            hit["cross-score"], poems[hit["corpus_id"]].replace("\n", " ")
        )
    )

Top-3 Cross-Encoder merged Re-ranker hits
query was:  ["what it's like to wake up in the morning"]
	4.020	By: Rose Fyleman  I wake in the morning early And always, the very first thing, I poke out my head and I sit up in bed And I sing and I sing and I sing
	1.385	By: Rusty Morrison  In through our bedroom window, the full dawn-scape concusses. Difficult to sustain sleep's equilibrium of wordlessness. Naming anything, like stepping barefoot in wet sand up to my ankles. Name after name, sinking me farther beneath waking's buoyancy. House, this morning, is pale with the rush of what night siphoned off. Objects, still emptied of resemblance, hum their chord-less cantos. Bloodless, my knuckles knock on walls without echo, testing singularities. Sun on the cutlery offers an ageless sheen. Though it ages the silver relentlessly. New, but still rudimentary tools to be gleaned from my over-used weaponry
	0.673	By: Ron Padgett  The morning coffee. I'm not sure why I drink it. Maybe it's the rit

In [7]:
import pandas as pd


In [9]:
df = pd.read_csv("data/all-poems-en-for-embedding.csv")


In [10]:
df.head()


Unnamed: 0,poem_id,cleaned_gen,embed_content
0,58566,Emotions: The poem evokes a sense of curiosity...,"By: Kathleen Jamie\nWell, friend, we’re here a..."
1,53997,Emotions: The poem evokes a sense of melanchol...,By: Walter Clyde Curry\nGrieve not that winter...
2,27139,Emotions: - Loneliness - Sadness - Nostalgi...,By: Theodore Weiss\nwho can bear the idea of E...
3,53778,Emotions: The poem evokes a sense of restlessn...,By: Israel Zangwill\nProsaic miles of streets ...
4,152076,"Emotions: Confusion, frustration, boredom, a d...",By: Matthew Zapruder\nAcross the deep eternal ...


In [13]:
df[df.poem_id == 57527].embed_content

13137    By: Ko Un\nYou fools who ask what god is shoul...
Name: embed_content, dtype: object