In [1]:
from sentence_transformers import CrossEncoder

In [2]:
model = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2-v2', max_length=512)
scores = model.predict(
    [('How many people live in Berlin?', 
      'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'), 
    ('How many people live in Berlin?', 'Berlin is well known for its museums.')]
)
scores

array([ 7.1523685, -6.2870407], dtype=float32)

In [3]:
model = CrossEncoder('cross-encoder/stsb-TinyBERT-L-4')
scores = model.predict(
    [("The weather today is beautiful", "It's raining!"), 
                        ("The weather today is beautiful", "Today is a sunny day")]
)
scores

array([0.46552283, 0.6350212 ], dtype=float32)

In [4]:
from datasets import load_dataset

dataset = load_dataset('jamescalam/ai-arxiv-chunked')
dataset['train']

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 41584
})

In [5]:
dataset['train'][0]

{'doi': '1910.01108',
 'chunk-id': '0',
 'chunk': 'DistilBERT, a distilled version of BERT: smaller,\nfaster, cheaper and lighter\nVictor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF\nHugging Face\n{victor,lysandre,julien,thomas}@huggingface.co\nAbstract\nAs Transfer Learning from large-scale pre-trained models becomes more prevalent\nin Natural Language Processing (NLP), operating these large models in on-theedge and/or under constrained computational training or inference budgets remains\nchallenging. In this work, we propose a method to pre-train a smaller generalpurpose language representation model, called DistilBERT, which can then be ﬁnetuned with good performances on a wide range of tasks like its larger counterparts.\nWhile most prior work investigated the use of distillation for building task-speciﬁc\nmodels, we leverage knowledge distillation during the pre-training phase and show\nthat it is possible to reduce the size of a BERT model by 40%, while retaining 97%\nof i

In [6]:
chunks = dataset['train']['chunk']
len(chunks)

41584

In [7]:
from sentence_transformers import SentenceTransformer

bi_encoder = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
bi_encoder.max_seq_length = 512
corpus_embeddings = bi_encoder.encode(chunks, convert_to_tensor=True, show_progress_bar=True)

Batches:   0%|          | 0/1300 [00:00<?, ?it/s]

In [8]:
from sentence_transformers import util

query = "what is rlhf?"
top_k = 25
query_emb = bi_encoder.encode(query, convert_to_tensor=True).cuda()
hits = util.semantic_search(query_emb, corpus_embeddings, top_k=top_k)[0]
hits

[{'corpus_id': 14679, 'score': 0.6051740646362305},
 {'corpus_id': 14725, 'score': 0.5746961832046509},
 {'corpus_id': 39564, 'score': 0.5672644376754761},
 {'corpus_id': 17387, 'score': 0.5642737150192261},
 {'corpus_id': 5628, 'score': 0.5358618497848511},
 {'corpus_id': 14805, 'score': 0.5182210206985474},
 {'corpus_id': 14802, 'score': 0.5044544339179993},
 {'corpus_id': 20652, 'score': 0.5011643767356873},
 {'corpus_id': 9761, 'score': 0.499434232711792},
 {'corpus_id': 9755, 'score': 0.4980108439922333},
 {'corpus_id': 9763, 'score': 0.49280524253845215},
 {'corpus_id': 20653, 'score': 0.48917996883392334},
 {'corpus_id': 20711, 'score': 0.4854634702205658},
 {'corpus_id': 14750, 'score': 0.4836026430130005},
 {'corpus_id': 14680, 'score': 0.4797966778278351},
 {'corpus_id': 14716, 'score': 0.4774951934814453},
 {'corpus_id': 14806, 'score': 0.47698211669921875},
 {'corpus_id': 35250, 'score': 0.46669578552246094},
 {'corpus_id': 14821, 'score': 0.46372905373573303},
 {'corpus_id

In [26]:
retrival_corpus_ids = [hit['corpus_id'] for hit in hits]
len(retrival_corpus_ids)

25

In [18]:
for i, hit in enumerate(hits[:3]):
    sample = dataset['train'][hit['corpus_id']]
    score = hit['score']
    print(f"score:{score}, source:{sample['source']}")
    print(f"{sample['chunk']}")
    print('-------------------\n')

score:0.6051740646362305, source:http://arxiv.org/pdf/2204.05862
learning from human feedback, which we improve on a roughly weekly cadence. See Section 2.3.
4This means that our helpfulness dataset goes ‘up’ in desirability during the conversation, while our harmlessness
dataset goes ‘down’ in desirability. We chose the latter to thoroughly explore bad behavior, but it is likely not ideal
for teaching good behavior. We believe this difference in our data distributions creates subtle problems for RLHF, and
suggest that others who want to use RLHF to train safer models consider the analysis in Section 4.4.
5
1071081091010
Number of Parameters0.20.30.40.50.6Mean Eval Acc
Mean Zero-Shot Accuracy
Plain Language Model
RLHF
1071081091010
Number of Parameters0.20.30.40.50.60.7Mean Eval Acc
Mean Few-Shot Accuracy
Plain Language Model
RLHFFigure 3 RLHF model performance on zero-shot and few-shot NLP tasks. For each model size, we plot
the mean accuracy on MMMLU, Lambada, HellaSwag, OpenBookQA, 

In [24]:
from sentence_transformers import CrossEncoder

cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
cross_inp = [[query, chunks[hit['corpus_id']]] for hit in hits]
cross_scores = cross_encoder.predict(cross_inp)
cross_scores

array([ 1.2227583 ,  2.2057667 ,  1.2897233 ,  5.048051  ,  4.4136815 ,
        0.72113264,  1.2272785 ,  0.16303732,  2.563827  ,  2.9497519 ,
        2.3555288 ,  1.3877895 ,  1.3640015 ,  1.584944  ,  2.229299  ,
        0.8184781 ,  1.676284  ,  2.0563674 ,  1.5945492 ,  2.3106794 ,
        0.9949666 ,  1.0842168 , -0.4363297 ,  2.2930846 ,  5.590805  ],
      dtype=float32)

In [31]:
for idx in range(len(cross_scores)):
    hits[idx]['cross-score'] = cross_scores[idx]
hits = sorted(hits, key=lambda x: x['cross-score'], reverse=True)
hits

[{'corpus_id': 14797, 'score': 0.45471128821372986, 'cross-score': 5.590805},
 {'corpus_id': 9755, 'score': 0.4980108439922333, 'cross-score': 5.048051},
 {'corpus_id': 9761, 'score': 0.499434232711792, 'cross-score': 4.4136815},
 {'corpus_id': 14725, 'score': 0.5746961832046509, 'cross-score': 2.9497519},
 {'corpus_id': 14680, 'score': 0.4797966778278351, 'cross-score': 2.563827},
 {'corpus_id': 35250, 'score': 0.46669578552246094, 'cross-score': 2.3555288},
 {'corpus_id': 14728, 'score': 0.4577748775482178, 'cross-score': 2.3106794},
 {'corpus_id': 20652, 'score': 0.5011643767356873, 'cross-score': 2.2930846},
 {'corpus_id': 20653, 'score': 0.48917996883392334, 'cross-score': 2.229299},
 {'corpus_id': 17387, 'score': 0.5642737150192261, 'cross-score': 2.2057667},
 {'corpus_id': 14802, 'score': 0.5044544339179993, 'cross-score': 2.0563674},
 {'corpus_id': 39564, 'score': 0.5672644376754761, 'cross-score': 1.676284},
 {'corpus_id': 14679, 'score': 0.6051740646362305, 'cross-score': 1.5

In [34]:
msmarco_l6_corpus_ids = [hit['corpus_id'] for hit in hits]
len(msmarco_l6_corpus_ids)

25

In [33]:
for i, hit in enumerate(hits[:3]):
    sample = dataset['train'][hit['corpus_id']]
    score = hit['score']
    print(f"score:{score}, cross-score:{hit['cross-score']}, source:{sample['source']}")
    print(f"{sample['chunk']}")
    print('-------------------\n')

score:0.45471128821372986, cross-score:5.5908050537109375, source:http://arxiv.org/pdf/2204.05862
size of 1024 tokens, except for the ‘online’ model described in Section 4.5, where we trained with 2048,
which may help stabilize RLHF on long contexts.
For both PMP and human feedback ﬁnetuning, we append a special ‘end-of-context’ token at the end of each
sample, such that the PM score is predicted directly on top of this token. As explained in Appendix C.4 of
[Askell et al., 2021], this appears to improve PM performance.
40
1071081091010
Number of Parameters0.10.20.30.40.50.60.70.8Accuracy
Zero-Shot Accuracy on Lambada
Plain Language Model
RLHF
1071081091010
Number of Parameters0.30.40.50.60.7Accuracy
Zero-Shot Accuracy on ARC-Easy
Plain Language Model
RLHF
1071081091010
Number of Parameters0.250.300.350.400.450.500.550.60Accuracy
Zero-Shot Accuracy on ARC-Challenge
Plain Language Model
RLHF
1071081091010
Number of Parameters0.2500.2750.3000.3250.3500.3750.4000.425Accuracy
Zero-Shot Acc

In [37]:
cross_encoder = CrossEncoder('BAAI/bge-reranker-base')
cross_inp = [[query, chunks[hit['corpus_id']]] for hit in hits]
cross_scores = cross_encoder.predict(cross_inp)
for idx in range(len(cross_scores)):
    hits[idx]['cross-score'] = cross_scores[idx]

In [39]:
hits = sorted(hits, key=lambda x: x['cross-score'], reverse=True)
hits

[{'corpus_id': 20638, 'score': 0.4512876570224762, 'cross-score': 0.9574586},
 {'corpus_id': 17387, 'score': 0.5642737150192261, 'cross-score': 0.9408788},
 {'corpus_id': 14679, 'score': 0.6051740646362305, 'cross-score': 0.886943},
 {'corpus_id': 9761, 'score': 0.499434232711792, 'cross-score': 0.871898},
 {'corpus_id': 39564, 'score': 0.5672644376754761, 'cross-score': 0.80197424},
 {'corpus_id': 20632, 'score': 0.4619232714176178, 'cross-score': 0.8005458},
 {'corpus_id': 14725, 'score': 0.5746961832046509, 'cross-score': 0.71381146},
 {'corpus_id': 9763, 'score': 0.49280524253845215, 'cross-score': 0.6758993},
 {'corpus_id': 14797, 'score': 0.45471128821372986, 'cross-score': 0.671386},
 {'corpus_id': 9771, 'score': 0.45408281683921814, 'cross-score': 0.64563036},
 {'corpus_id': 14680, 'score': 0.4797966778278351, 'cross-score': 0.63509965},
 {'corpus_id': 14750, 'score': 0.4836026430130005, 'cross-score': 0.6212804},
 {'corpus_id': 14805, 'score': 0.5182210206985474, 'cross-score'

In [40]:
bge_corpus_ids = [hit['corpus_id'] for hit in hits]

In [41]:
for i, hit in enumerate(hits[:3]):
    sample = dataset['train'][hit['corpus_id']]
    score = hit['score']
    print(f"score:{score}, cross-score:{hit['cross-score']}, source:{sample['source']}")
    print(f"{sample['chunk']}")
    print('-------------------\n')

score:0.4512876570224762, cross-score:0.9574586153030396, source:http://arxiv.org/pdf/2302.07459
We examine the inﬂuence of the amount of RLHF training for two reasons. First, RLHF [13, 57] is an
increasingly popular technique for reducing harmful behaviors in large language models [3, 21, 52]. Some of
these models are already deployed [52], so we believe the impact of RLHF deserves further scrutiny. Second,
previous work shows that the amount of RLHF training can signiﬁcantly change metrics on a wide range of
personality, political preference, and harm evaluations for a given model size [41]. As a result, it is important
to control for the amount of RLHF training in the analysis of our experiments.
3.2 Experiments
3.2.1 Overview
We test the effect of natural language instructions on two related but distinct moral phenomena: stereotyping
and discrimination. Stereotyping involves the use of generalizations about groups in ways that are often
harmful or undesirable.4To measure stereotypi

In [43]:
for i in range(25):
    print(f"top {i+1}. bi-encoder {retrival_corpus_ids[i]}, cross-encoder {msmarco_l6_corpus_ids[i]}, bge {bge_corpus_ids[i]}")

top 1. bi-encoder 14679, cross-encoder 14797, bge 20638
top 2. bi-encoder 14725, cross-encoder 9755, bge 17387
top 3. bi-encoder 39564, cross-encoder 9761, bge 14679
top 4. bi-encoder 17387, cross-encoder 14725, bge 9761
top 5. bi-encoder 5628, cross-encoder 14680, bge 39564
top 6. bi-encoder 14805, cross-encoder 35250, bge 20632
top 7. bi-encoder 14802, cross-encoder 14728, bge 14725
top 8. bi-encoder 20652, cross-encoder 20652, bge 9763
top 9. bi-encoder 9761, cross-encoder 20653, bge 14797
top 10. bi-encoder 9755, cross-encoder 17387, bge 9771
top 11. bi-encoder 9763, cross-encoder 14802, bge 14680
top 12. bi-encoder 20653, cross-encoder 39564, bge 14750
top 13. bi-encoder 20711, cross-encoder 14679, bge 14805
top 14. bi-encoder 14750, cross-encoder 14750, bge 9755
top 15. bi-encoder 14680, cross-encoder 14806, bge 14821
top 16. bi-encoder 14716, cross-encoder 14821, bge 14802
top 17. bi-encoder 14806, cross-encoder 5628, bge 5628
top 18. bi-encoder 35250, cross-encoder 20632, bge 1