Bank of all prompts_responses

In [60]:
bank = [
    "This is a beautiful day",
    "Today is Wednesday",
    "I went to the beach",
    "The beach is nice",
    "What am I singing?",
    "What am I saying?",
    "Stanford is cool",
    "Are there lots of people in this class?",
    "Which courses are the hardest?",
    "This curriculum is nicely-structured.",
    "Do you play any sports?",
    "Sports fans are irritating haha.",
]

Embed every instance into a glove embedding space

In [61]:
from zeugma.embeddings import EmbeddingTransformer
import numpy as np

glove = EmbeddingTransformer('glove')
bank_embeds = glove.transform(bank)
bank_embeds = bank_embeds / np.linalg.norm(bank_embeds, axis=-1)[:, None]
print(bank_embeds.shape)


(12, 25)


Which point has **highest** cosine similarity to "I went to the beach"?

In [62]:
src_idx = 2
print(f"src sentence: {bank[src_idx]}")

# distances from src pt to all others
from sklearn.metrics.pairwise import cosine_similarity
sims = cosine_similarity(bank_embeds[src_idx:src_idx+1], Y=bank_embeds, dense_output=True)
print(sims.shape)

# sort cosine sims, [::-1] for descending order, idx 1 and not 0 to avoid picking self
dst_idx = sims[0].argsort()[::-1][1]

print(f"dst sentence: {bank[dst_idx]}")


src sentence: I went to the beach
(1, 12)
dst sentence: The beach is nice


Which point has **lowest** cosine similarity to "I went to the beach"?

In [63]:
src_idx = 2
print(f"src sentence: {bank[src_idx]}")

# distances from src pt to all others
from sklearn.metrics.pairwise import cosine_similarity
sims = cosine_similarity(bank_embeds[src_idx:src_idx+1], Y=bank_embeds, dense_output=True)
print(sims.shape)

# sort cosine sims, ascending order, idx 1 and not 0 to avoid picking self
dst_idx = sims[0].argsort()[1]

print(f"dst sentence: {bank[dst_idx]}")


src sentence: I went to the beach
(1, 12)
dst sentence: This curriculum is nicely-structured.


When suggesting next pts to the user, after k iters, we have k centroids
Which pt is the **furthest away from ALL centroids**?

Say pts seen so far are:
- "This is a beautiful day"
- "I went to the beach"
- "What am I singing?"
- "Stanford is cool"
- "Are there lots of people in this class?",

I would expect the pt away from all of these to talk about a different topic (sports)

In [64]:
src_idxs = [0, 2, 4, 6, 7]
print(f"src sentences:")
for idx in src_idxs:
    print(bank[idx])

# distances from src pts to all others
from sklearn.metrics.pairwise import cosine_similarity
sims = cosine_similarity(bank_embeds[src_idxs], Y=bank_embeds, dense_output=True)
print(sims.shape)

# sum cosine sims vertically (for every dst pt)
sims = sims.sum(axis=0)

# sort sums sims, ascending order
order = sims.argsort()

# we do not even have to remove src pts because logically they won't bew the farthest from themselves!
# order = [idx for idx in order if not idx in src_idxs]

# chosen pt
dst_idx = order[0]

print(f"dst sentence: {bank[dst_idx]}")


src sentences:
This is a beautiful day
I went to the beach
What am I singing?
Stanford is cool
Are there lots of people in this class?
(5, 12)
dst sentence: Sports fans are irritating haha.


### So an example loop that goes for T iters choosing pts that are always unlocking an unexplored region of the embedding space

In [66]:
T = 5

# say idx 0 chosen to begin
src_idxs = [0]
print(f'we begin by randomly picking: {bank[src_idxs[0]]}')

for t in range(T):
    # distances from src pts to all others
    sims = cosine_similarity(bank_embeds[src_idxs], Y=bank_embeds, dense_output=True)

    # sum cosine sims vertically (for every dst pt)
    sims = sims.sum(axis=0)

    # sort sums sims, ascending order
    order = sims.argsort()
    order = [x for x in order if not x in src_idxs]

    # chosen pt
    dst_idx = order[0]
    print(f"new pick: {bank[dst_idx]}")
    
    # ADD NEW PT TO SET OF ALREADY PICKED IDXS
    src_idxs.append(dst_idx)


we begin by randomly picking: This is a beautiful day
new pick: Sports fans are irritating haha.
new pick: This curriculum is nicely-structured.
new pick: What am I singing?
new pick: I went to the beach
new pick: What am I saying?
