Skip to content

beefyben/DocGenius2

Repository files navigation

pipeline_tag sentence-similarity
tags
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
prompt-retrieval
text-reranking
sentence-transformers
feature-extraction
sentence-similarity
transformers
t5
English
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
language en
inference false
license apache-2.0
model-index
name results
INSTRUCTOR
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_counterfactual
MTEB AmazonCounterfactualClassification (en)
en
test
e8379541af4e31359cca9fbcf4b00f2671dba205
type value
accuracy
88.13432835820896
type value
ap
59.298209334395665
type value
f1
83.31769058643586
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_polarity
MTEB AmazonPolarityClassification
default
test
e2d317d38cd51312af73b3d32a06d1a08b442046
type value
accuracy
91.526375
type value
ap
88.16327709705504
type value
f1
91.51095801287843
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_reviews_multi
MTEB AmazonReviewsClassification (en)
en
test
1399c76144fd37290681b995c656ef9b2e06e26d
type value
accuracy
47.856
type value
f1
45.41490917650942
task dataset metrics
type
Retrieval
type name config split revision
arguana
MTEB ArguAna
default
test
None
type value
map_at_1
31.223
type value
map_at_10
47.947
type value
map_at_100
48.742000000000004
type value
map_at_1000
48.745
type value
map_at_3
43.137
type value
map_at_5
45.992
type value
mrr_at_1
32.432
type value
mrr_at_10
48.4
type value
mrr_at_100
49.202
type value
mrr_at_1000
49.205
type value
mrr_at_3
43.551
type value
mrr_at_5
46.467999999999996
type value
ndcg_at_1
31.223
type value
ndcg_at_10
57.045
type value
ndcg_at_100
60.175
type value
ndcg_at_1000
60.233000000000004
type value
ndcg_at_3
47.171
type value
ndcg_at_5
52.322
type value
precision_at_1
31.223
type value
precision_at_10
8.599
type value
precision_at_100
0.991
type value
precision_at_1000
0.1
type value
precision_at_3
19.63
type value
precision_at_5
14.282
type value
recall_at_1
31.223
type value
recall_at_10
85.989
type value
recall_at_100
99.075
type value
recall_at_1000
99.502
type value
recall_at_3
58.89
type value
recall_at_5
71.408
task dataset metrics
type
Clustering
type name config split revision
mteb/arxiv-clustering-p2p
MTEB ArxivClusteringP2P
default
test
a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
type value
v_measure
43.1621946393635
task dataset metrics
type
Clustering
type name config split revision
mteb/arxiv-clustering-s2s
MTEB ArxivClusteringS2S
default
test
f910caf1a6075f7329cdf8c1a6135696f37dbd53
type value
v_measure
32.56417132407894
task dataset metrics
type
Reranking
type name config split revision
mteb/askubuntudupquestions-reranking
MTEB AskUbuntuDupQuestions
default
test
2000358ca161889fa9c082cb41daa8dcfb161a54
type value
map
64.29539304390207
type value
mrr
76.44484017060196
task dataset metrics
type
STS
type name config split revision
mteb/biosses-sts
MTEB BIOSSES
default
test
d3fb88f8f02e40887cd149695127462bbcf29b4a
type value
cos_sim_spearman
84.38746499431112
task dataset metrics
type
Classification
type name config split revision
mteb/banking77
MTEB Banking77Classification
default
test
0fd18e25b25c072e09e0d92ab615fda904d66300
type value
accuracy
78.51298701298701
type value
f1
77.49041754069235
task dataset metrics
type
Clustering
type name config split revision
mteb/biorxiv-clustering-p2p
MTEB BiorxivClusteringP2P
default
test
65b79d1d13f80053f67aca9498d9402c2d9f1f40
type value
v_measure
37.61848554098577
task dataset metrics
type
Clustering
type name config split revision
mteb/biorxiv-clustering-s2s
MTEB BiorxivClusteringS2S
default
test
258694dd0231531bc1fd9de6ceb52a0853c6d908
type value
v_measure
31.32623280148178
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackAndroidRetrieval
default
test
None
type value
map_at_1
35.803000000000004
type value
map_at_10
48.848
type value
map_at_100
50.5
type value
map_at_1000
50.602999999999994
type value
map_at_3
45.111000000000004
type value
map_at_5
47.202
type value
mrr_at_1
44.635000000000005
type value
mrr_at_10
55.593
type value
mrr_at_100
56.169999999999995
type value
mrr_at_1000
56.19499999999999
type value
mrr_at_3
53.361999999999995
type value
mrr_at_5
54.806999999999995
type value
ndcg_at_1
44.635000000000005
type value
ndcg_at_10
55.899
type value
ndcg_at_100
60.958
type value
ndcg_at_1000
62.302
type value
ndcg_at_3
51.051
type value
ndcg_at_5
53.351000000000006
type value
precision_at_1
44.635000000000005
type value
precision_at_10
10.786999999999999
type value
precision_at_100
1.6580000000000001
type value
precision_at_1000
0.213
type value
precision_at_3
24.893
type value
precision_at_5
17.740000000000002
type value
recall_at_1
35.803000000000004
type value
recall_at_10
68.657
type value
recall_at_100
89.77199999999999
type value
recall_at_1000
97.67
type value
recall_at_3
54.066
type value
recall_at_5
60.788
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackEnglishRetrieval
default
test
None
type value
map_at_1
33.706
type value
map_at_10
44.896
type value
map_at_100
46.299
type value
map_at_1000
46.44
type value
map_at_3
41.721000000000004
type value
map_at_5
43.486000000000004
type value
mrr_at_1
41.592
type value
mrr_at_10
50.529
type value
mrr_at_100
51.22
type value
mrr_at_1000
51.258
type value
mrr_at_3
48.205999999999996
type value
mrr_at_5
49.528
type value
ndcg_at_1
41.592
type value
ndcg_at_10
50.77199999999999
type value
ndcg_at_100
55.383
type value
ndcg_at_1000
57.288
type value
ndcg_at_3
46.324
type value
ndcg_at_5
48.346000000000004
type value
precision_at_1
41.592
type value
precision_at_10
9.516
type value
precision_at_100
1.541
type value
precision_at_1000
0.2
type value
precision_at_3
22.399
type value
precision_at_5
15.770999999999999
type value
recall_at_1
33.706
type value
recall_at_10
61.353
type value
recall_at_100
80.182
type value
recall_at_1000
91.896
type value
recall_at_3
48.204
type value
recall_at_5
53.89699999999999
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackGamingRetrieval
default
test
None
type value
map_at_1
44.424
type value
map_at_10
57.169000000000004
type value
map_at_100
58.202
type value
map_at_1000
58.242000000000004
type value
map_at_3
53.825
type value
map_at_5
55.714
type value
mrr_at_1
50.470000000000006
type value
mrr_at_10
60.489000000000004
type value
mrr_at_100
61.096
type value
mrr_at_1000
61.112
type value
mrr_at_3
58.192
type value
mrr_at_5
59.611999999999995
type value
ndcg_at_1
50.470000000000006
type value
ndcg_at_10
63.071999999999996
type value
ndcg_at_100
66.964
type value
ndcg_at_1000
67.659
type value
ndcg_at_3
57.74399999999999
type value
ndcg_at_5
60.367000000000004
type value
precision_at_1
50.470000000000006
type value
precision_at_10
10.019
type value
precision_at_100
1.29
type value
precision_at_1000
0.13899999999999998
type value
precision_at_3
25.558999999999997
type value
precision_at_5
17.467
type value
recall_at_1
44.424
type value
recall_at_10
77.02
type value
recall_at_100
93.738
type value
recall_at_1000
98.451
type value
recall_at_3
62.888
type value
recall_at_5
69.138
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackGisRetrieval
default
test
None
type value
map_at_1
26.294
type value
map_at_10
34.503
type value
map_at_100
35.641
type value
map_at_1000
35.724000000000004
type value
map_at_3
31.753999999999998
type value
map_at_5
33.190999999999995
type value
mrr_at_1
28.362
type value
mrr_at_10
36.53
type value
mrr_at_100
37.541000000000004
type value
mrr_at_1000
37.602000000000004
type value
mrr_at_3
33.917
type value
mrr_at_5
35.358000000000004
type value
ndcg_at_1
28.362
type value
ndcg_at_10
39.513999999999996
type value
ndcg_at_100
44.815
type value
ndcg_at_1000
46.839
type value
ndcg_at_3
34.02
type value
ndcg_at_5
36.522
type value
precision_at_1
28.362
type value
precision_at_10
6.101999999999999
type value
precision_at_100
0.9129999999999999
type value
precision_at_1000
0.11399999999999999
type value
precision_at_3
14.161999999999999
type value
precision_at_5
9.966
type value
recall_at_1
26.294
type value
recall_at_10
53.098
type value
recall_at_100
76.877
type value
recall_at_1000
91.834
type value
recall_at_3
38.266
type value
recall_at_5
44.287
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackMathematicaRetrieval
default
test
None
type value
map_at_1
16.407
type value
map_at_10
25.185999999999996
type value
map_at_100
26.533
type value
map_at_1000
26.657999999999998
type value
map_at_3
22.201999999999998
type value
map_at_5
23.923
type value
mrr_at_1
20.522000000000002
type value
mrr_at_10
29.522
type value
mrr_at_100
30.644
type value
mrr_at_1000
30.713
type value
mrr_at_3
26.679000000000002
type value
mrr_at_5
28.483000000000004
type value
ndcg_at_1
20.522000000000002
type value
ndcg_at_10
30.656
type value
ndcg_at_100
36.864999999999995
type value
ndcg_at_1000
39.675
type value
ndcg_at_3
25.319000000000003
type value
ndcg_at_5
27.992
type value
precision_at_1
20.522000000000002
type value
precision_at_10
5.795999999999999
type value
precision_at_100
1.027
type value
precision_at_1000
0.13999999999999999
type value
precision_at_3
12.396
type value
precision_at_5
9.328
type value
recall_at_1
16.407
type value
recall_at_10
43.164
type value
recall_at_100
69.695
type value
recall_at_1000
89.41900000000001
type value
recall_at_3
28.634999999999998
type value
recall_at_5
35.308
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackPhysicsRetrieval
default
test
None
type value
map_at_1
30.473
type value
map_at_10
41.676
type value
map_at_100
43.120999999999995
type value
map_at_1000
43.230000000000004
type value
map_at_3
38.306000000000004
type value
map_at_5
40.355999999999995
type value
mrr_at_1
37.536
type value
mrr_at_10
47.643
type value
mrr_at_100
48.508
type value
mrr_at_1000
48.551
type value
mrr_at_3
45.348
type value
mrr_at_5
46.744
type value
ndcg_at_1
37.536
type value
ndcg_at_10
47.823
type value
ndcg_at_100
53.395
type value
ndcg_at_1000
55.271
type value
ndcg_at_3
42.768
type value
ndcg_at_5
45.373000000000005
type value
precision_at_1
37.536
type value
precision_at_10
8.681
type value
precision_at_100
1.34
type value
precision_at_1000
0.165
type value
precision_at_3
20.468
type value
precision_at_5
14.495
type value
recall_at_1
30.473
type value
recall_at_10
60.092999999999996
type value
recall_at_100
82.733
type value
recall_at_1000
94.875
type value
recall_at_3
45.734
type value
recall_at_5
52.691
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackProgrammersRetrieval
default
test
None
type value
map_at_1
29.976000000000003
type value
map_at_10
41.097
type value
map_at_100
42.547000000000004
type value
map_at_1000
42.659000000000006
type value
map_at_3
37.251
type value
map_at_5
39.493
type value
mrr_at_1
37.557
type value
mrr_at_10
46.605000000000004
type value
mrr_at_100
47.487
type value
mrr_at_1000
47.54
type value
mrr_at_3
43.721
type value
mrr_at_5
45.411
type value
ndcg_at_1
37.557
type value
ndcg_at_10
47.449000000000005
type value
ndcg_at_100
53.052
type value
ndcg_at_1000
55.010999999999996
type value
ndcg_at_3
41.439
type value
ndcg_at_5
44.292
type value
precision_at_1
37.557
type value
precision_at_10
8.847
type value
precision_at_100
1.357
type value
precision_at_1000
0.16999999999999998
type value
precision_at_3
20.091
type value
precision_at_5
14.384
type value
recall_at_1
29.976000000000003
type value
recall_at_10
60.99099999999999
type value
recall_at_100
84.245
type value
recall_at_1000
96.97200000000001
type value
recall_at_3
43.794
type value
recall_at_5
51.778999999999996
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackRetrieval
default
test
None
type value
map_at_1
28.099166666666665
type value
map_at_10
38.1365
type value
map_at_100
39.44491666666667
type value
map_at_1000
39.55858333333334
type value
map_at_3
35.03641666666666
type value
map_at_5
36.79833333333334
type value
mrr_at_1
33.39966666666667
type value
mrr_at_10
42.42583333333333
type value
mrr_at_100
43.28575
type value
mrr_at_1000
43.33741666666667
type value
mrr_at_3
39.94975
type value
mrr_at_5
41.41633333333334
type value
ndcg_at_1
33.39966666666667
type value
ndcg_at_10
43.81741666666667
type value
ndcg_at_100
49.08166666666667
type value
ndcg_at_1000
51.121166666666674
type value
ndcg_at_3
38.73575
type value
ndcg_at_5
41.18158333333333
type value
precision_at_1
33.39966666666667
type value
precision_at_10
7.738916666666667
type value
precision_at_100
1.2265833333333331
type value
precision_at_1000
0.15983333333333336
type value
precision_at_3
17.967416666666665
type value
precision_at_5
12.78675
type value
recall_at_1
28.099166666666665
type value
recall_at_10
56.27049999999999
type value
recall_at_100
78.93291666666667
type value
recall_at_1000
92.81608333333334
type value
recall_at_3
42.09775
type value
recall_at_5
48.42533333333334
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackStatsRetrieval
default
test
None
type value
map_at_1
23.663
type value
map_at_10
30.377
type value
map_at_100
31.426
type value
map_at_1000
31.519000000000002
type value
map_at_3
28.069
type value
map_at_5
29.256999999999998
type value
mrr_at_1
26.687
type value
mrr_at_10
33.107
type value
mrr_at_100
34.055
type value
mrr_at_1000
34.117999999999995
type value
mrr_at_3
31.058000000000003
type value
mrr_at_5
32.14
type value
ndcg_at_1
26.687
type value
ndcg_at_10
34.615
type value
ndcg_at_100
39.776
type value
ndcg_at_1000
42.05
type value
ndcg_at_3
30.322
type value
ndcg_at_5
32.157000000000004
type value
precision_at_1
26.687
type value
precision_at_10
5.491
type value
precision_at_100
0.877
type value
precision_at_1000
0.11499999999999999
type value
precision_at_3
13.139000000000001
type value
precision_at_5
9.049
type value
recall_at_1
23.663
type value
recall_at_10
45.035
type value
recall_at_100
68.554
type value
recall_at_1000
85.077
type value
recall_at_3
32.982
type value
recall_at_5
37.688
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackTexRetrieval
default
test
None
type value
map_at_1
17.403
type value
map_at_10
25.197000000000003
type value
map_at_100
26.355
type value
map_at_1000
26.487
type value
map_at_3
22.733
type value
map_at_5
24.114
type value
mrr_at_1
21.37
type value
mrr_at_10
29.091
type value
mrr_at_100
30.018
type value
mrr_at_1000
30.096
type value
mrr_at_3
26.887
type value
mrr_at_5
28.157
type value
ndcg_at_1
21.37
type value
ndcg_at_10
30.026000000000003
type value
ndcg_at_100
35.416
type value
ndcg_at_1000
38.45
type value
ndcg_at_3
25.764
type value
ndcg_at_5
27.742
type value
precision_at_1
21.37
type value
precision_at_10
5.609
type value
precision_at_100
0.9860000000000001
type value
precision_at_1000
0.14300000000000002
type value
precision_at_3
12.423
type value
precision_at_5
9.009
type value
recall_at_1
17.403
type value
recall_at_10
40.573
type value
recall_at_100
64.818
type value
recall_at_1000
86.53699999999999
type value
recall_at_3
28.493000000000002
type value
recall_at_5
33.660000000000004
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackUnixRetrieval
default
test
None
type value
map_at_1
28.639
type value
map_at_10
38.951
type value
map_at_100
40.238
type value
map_at_1000
40.327
type value
map_at_3
35.842
type value
map_at_5
37.617
type value
mrr_at_1
33.769
type value
mrr_at_10
43.088
type value
mrr_at_100
44.03
type value
mrr_at_1000
44.072
type value
mrr_at_3
40.656
type value
mrr_at_5
42.138999999999996
type value
ndcg_at_1
33.769
type value
ndcg_at_10
44.676
type value
ndcg_at_100
50.416000000000004
type value
ndcg_at_1000
52.227999999999994
type value
ndcg_at_3
39.494
type value
ndcg_at_5
42.013
type value
precision_at_1
33.769
type value
precision_at_10
7.668
type value
precision_at_100
1.18
type value
precision_at_1000
0.145
type value
precision_at_3
18.221
type value
precision_at_5
12.966
type value
recall_at_1
28.639
type value
recall_at_10
57.687999999999995
type value
recall_at_100
82.541
type value
recall_at_1000
94.896
type value
recall_at_3
43.651
type value
recall_at_5
49.925999999999995
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackWebmastersRetrieval
default
test
None
type value
map_at_1
29.57
type value
map_at_10
40.004
type value
map_at_100
41.75
type value
map_at_1000
41.97
type value
map_at_3
36.788
type value
map_at_5
38.671
type value
mrr_at_1
35.375
type value
mrr_at_10
45.121
type value
mrr_at_100
45.994
type value
mrr_at_1000
46.04
type value
mrr_at_3
42.227
type value
mrr_at_5
43.995
type value
ndcg_at_1
35.375
type value
ndcg_at_10
46.392
type value
ndcg_at_100
52.196
type value
ndcg_at_1000
54.274
type value
ndcg_at_3
41.163
type value
ndcg_at_5
43.813
type value
precision_at_1
35.375
type value
precision_at_10
8.676
type value
precision_at_100
1.678
type value
precision_at_1000
0.253
type value
precision_at_3
19.104
type value
precision_at_5
13.913
type value
recall_at_1
29.57
type value
recall_at_10
58.779
type value
recall_at_100
83.337
type value
recall_at_1000
95.979
type value
recall_at_3
44.005
type value
recall_at_5
50.975
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack
MTEB CQADupstackWordpressRetrieval
default
test
None
type value
map_at_1
20.832
type value
map_at_10
29.733999999999998
type value
map_at_100
30.727
type value
map_at_1000
30.843999999999998
type value
map_at_3
26.834999999999997
type value
map_at_5
28.555999999999997
type value
mrr_at_1
22.921
type value
mrr_at_10
31.791999999999998
type value
mrr_at_100
32.666000000000004
type value
mrr_at_1000
32.751999999999995
type value
mrr_at_3
29.144
type value
mrr_at_5
30.622
type value
ndcg_at_1
22.921
type value
ndcg_at_10
34.915
type value
ndcg_at_100
39.744
type value
ndcg_at_1000
42.407000000000004
type value
ndcg_at_3
29.421000000000003
type value
ndcg_at_5
32.211
type value
precision_at_1
22.921
type value
precision_at_10
5.675
type value
precision_at_100
0.872
type value
precision_at_1000
0.121
type value
precision_at_3
12.753999999999998
type value
precision_at_5
9.353
type value
recall_at_1
20.832
type value
recall_at_10
48.795
type value
recall_at_100
70.703
type value
recall_at_1000
90.187
type value
recall_at_3
34.455000000000005
type value
recall_at_5
40.967
task dataset metrics
type
Retrieval
type name config split revision
climate-fever
MTEB ClimateFEVER
default
test
None
type value
map_at_1
10.334
type value
map_at_10
19.009999999999998
type value
map_at_100
21.129
type value
map_at_1000
21.328
type value
map_at_3
15.152
type value
map_at_5
17.084
type value
mrr_at_1
23.453
type value
mrr_at_10
36.099
type value
mrr_at_100
37.069
type value
mrr_at_1000
37.104
type value
mrr_at_3
32.096000000000004
type value
mrr_at_5
34.451
type value
ndcg_at_1
23.453
type value
ndcg_at_10
27.739000000000004
type value
ndcg_at_100
35.836
type value
ndcg_at_1000
39.242
type value
ndcg_at_3
21.263
type value
ndcg_at_5
23.677
type value
precision_at_1
23.453
type value
precision_at_10
9.199
type value
precision_at_100
1.791
type value
precision_at_1000
0.242
type value
precision_at_3
16.2
type value
precision_at_5
13.147
type value
recall_at_1
10.334
type value
recall_at_10
35.177
type value
recall_at_100
63.009
type value
recall_at_1000
81.938
type value
recall_at_3
19.914
type value
recall_at_5
26.077
task dataset metrics
type
Retrieval
type name config split revision
dbpedia-entity
MTEB DBPedia
default
test
None
type value
map_at_1
8.212
type value
map_at_10
17.386
type value
map_at_100
24.234
type value
map_at_1000
25.724999999999998
type value
map_at_3
12.727
type value
map_at_5
14.785
type value
mrr_at_1
59.25
type value
mrr_at_10
68.687
type value
mrr_at_100
69.133
type value
mrr_at_1000
69.14099999999999
type value
mrr_at_3
66.917
type value
mrr_at_5
67.742
type value
ndcg_at_1
48.625
type value
ndcg_at_10
36.675999999999995
type value
ndcg_at_100
41.543
type value
ndcg_at_1000
49.241
type value
ndcg_at_3
41.373
type value
ndcg_at_5
38.707
type value
precision_at_1
59.25
type value
precision_at_10
28.525
type value
precision_at_100
9.027000000000001
type value
precision_at_1000
1.8339999999999999
type value
precision_at_3
44.833
type value
precision_at_5
37.35
type value
recall_at_1
8.212
type value
recall_at_10
23.188
type value
recall_at_100
48.613
type value
recall_at_1000
73.093
type value
recall_at_3
14.419
type value
recall_at_5
17.798
task dataset metrics
type
Classification
type name config split revision
mteb/emotion
MTEB EmotionClassification
default
test
4f58c6b202a23cf9a4da393831edf4f9183cad37
type value
accuracy
52.725
type value
f1
46.50743309855908
task dataset metrics
type
Retrieval
type name config split revision
fever
MTEB FEVER
default
test
None
type value
map_at_1
55.086
type value
map_at_10
66.914
type value
map_at_100
67.321
type value
map_at_1000
67.341
type value
map_at_3
64.75800000000001
type value
map_at_5
66.189
type value
mrr_at_1
59.28600000000001
type value
mrr_at_10
71.005
type value
mrr_at_100
71.304
type value
mrr_at_1000
71.313
type value
mrr_at_3
69.037
type value
mrr_at_5
70.35
type value
ndcg_at_1
59.28600000000001
type value
ndcg_at_10
72.695
type value
ndcg_at_100
74.432
type value
ndcg_at_1000
74.868
type value
ndcg_at_3
68.72200000000001
type value
ndcg_at_5
71.081
type value
precision_at_1
59.28600000000001
type value
precision_at_10
9.499
type value
precision_at_100
1.052
type value
precision_at_1000
0.11100000000000002
type value
precision_at_3
27.503
type value
precision_at_5
17.854999999999997
type value
recall_at_1
55.086
type value
recall_at_10
86.453
type value
recall_at_100
94.028
type value
recall_at_1000
97.052
type value
recall_at_3
75.821
type value
recall_at_5
81.6
task dataset metrics
type
Retrieval
type name config split revision
fiqa
MTEB FiQA2018
default
test
None
type value
map_at_1
22.262999999999998
type value
map_at_10
37.488
type value
map_at_100
39.498
type value
map_at_1000
39.687
type value
map_at_3
32.529
type value
map_at_5
35.455
type value
mrr_at_1
44.907000000000004
type value
mrr_at_10
53.239000000000004
type value
mrr_at_100
54.086
type value
mrr_at_1000
54.122
type value
mrr_at_3
51.235
type value
mrr_at_5
52.415
type value
ndcg_at_1
44.907000000000004
type value
ndcg_at_10
45.446
type value
ndcg_at_100
52.429
type value
ndcg_at_1000
55.169000000000004
type value
ndcg_at_3
41.882000000000005
type value
ndcg_at_5
43.178
type value
precision_at_1
44.907000000000004
type value
precision_at_10
12.931999999999999
type value
precision_at_100
2.025
type value
precision_at_1000
0.248
type value
precision_at_3
28.652
type value
precision_at_5
21.204
type value
recall_at_1
22.262999999999998
type value
recall_at_10
52.447
type value
recall_at_100
78.045
type value
recall_at_1000
94.419
type value
recall_at_3
38.064
type value
recall_at_5
44.769
task dataset metrics
type
Retrieval
type name config split revision
hotpotqa
MTEB HotpotQA
default
test
None
type value
map_at_1
32.519
type value
map_at_10
45.831
type value
map_at_100
46.815
type value
map_at_1000
46.899
type value
map_at_3
42.836
type value
map_at_5
44.65
type value
mrr_at_1
65.037
type value
mrr_at_10
72.16
type value
mrr_at_100
72.51100000000001
type value
mrr_at_1000
72.53
type value
mrr_at_3
70.682
type value
mrr_at_5
71.54599999999999
type value
ndcg_at_1
65.037
type value
ndcg_at_10
55.17999999999999
type value
ndcg_at_100
58.888
type value
ndcg_at_1000
60.648
type value
ndcg_at_3
50.501
type value
ndcg_at_5
52.977
type value
precision_at_1
65.037
type value
precision_at_10
11.530999999999999
type value
precision_at_100
1.4460000000000002
type value
precision_at_1000
0.168
type value
precision_at_3
31.483
type value
precision_at_5
20.845
type value
recall_at_1
32.519
type value
recall_at_10
57.657000000000004
type value
recall_at_100
72.30199999999999
type value
recall_at_1000
84.024
type value
recall_at_3
47.225
type value
recall_at_5
52.113
task dataset metrics
type
Classification
type name config split revision
mteb/imdb
MTEB ImdbClassification
default
test
3d86128a09e091d6018b6d26cad27f2739fc2db7
type value
accuracy
88.3168
type value
ap
83.80165516037135
type value
f1
88.29942471066407
task dataset metrics
type
Retrieval
type name config split revision
msmarco
MTEB MSMARCO
default
dev
None
type value
map_at_1
20.724999999999998
type value
map_at_10
32.736
type value
map_at_100
33.938
type value
map_at_1000
33.991
type value
map_at_3
28.788000000000004
type value
map_at_5
31.016
type value
mrr_at_1
21.361
type value
mrr_at_10
33.323
type value
mrr_at_100
34.471000000000004
type value
mrr_at_1000
34.518
type value
mrr_at_3
29.453000000000003
type value
mrr_at_5
31.629
type value
ndcg_at_1
21.361
type value
ndcg_at_10
39.649
type value
ndcg_at_100
45.481
type value
ndcg_at_1000
46.775
type value
ndcg_at_3
31.594
type value
ndcg_at_5
35.543
type value
precision_at_1
21.361
type value
precision_at_10
6.3740000000000006
type value
precision_at_100
0.931
type value
precision_at_1000
0.104
type value
precision_at_3
13.514999999999999
type value
precision_at_5
10.100000000000001
type value
recall_at_1
20.724999999999998
type value
recall_at_10
61.034
type value
recall_at_100
88.062
type value
recall_at_1000
97.86399999999999
type value
recall_at_3
39.072
type value
recall_at_5
48.53
task dataset metrics
type
Classification
type name config split revision
mteb/mtop_domain
MTEB MTOPDomainClassification (en)
en
test
d80d48c1eb48d3562165c59d59d0034df9fff0bf
type value
accuracy
93.8919288645691
type value
f1
93.57059586398059
task dataset metrics
type
Classification
type name config split revision
mteb/mtop_intent
MTEB MTOPIntentClassification (en)
en
test
ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
type value
accuracy
67.97993616051072
type value
f1
48.244319183606535
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_intent
MTEB MassiveIntentClassification (en)
en
test
31efe3c427b0bae9c22cbb560b8f15491cc6bed7
type value
accuracy
68.90047074646941
type value
f1
66.48999056063725
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_scenario
MTEB MassiveScenarioClassification (en)
en
test
7d571f92784cd94a019292a1f45445077d0ef634
type value
accuracy
73.34566240753195
type value
f1
73.54164154290658
task dataset metrics
type
Clustering
type name config split revision
mteb/medrxiv-clustering-p2p
MTEB MedrxivClusteringP2P
default
test
e7a26af6f3ae46b30dde8737f02c07b1505bcc73
type value
v_measure
34.21866934757011
task dataset metrics
type
Clustering
type name config split revision
mteb/medrxiv-clustering-s2s
MTEB MedrxivClusteringS2S
default
test
35191c8c0dca72d8ff3efcd72aa802307d469663
type value
v_measure
32.000936217235534
task dataset metrics
type
Reranking
type name config split revision
mteb/mind_small
MTEB MindSmallReranking
default
test
3bdac13927fdc888b903db93b2ffdbd90b295a69
type value
map
31.68189362520352
type value
mrr
32.69603637784303
task dataset metrics
type
Retrieval
type name config split revision
nfcorpus
MTEB NFCorpus
default
test
None
type value
map_at_1
6.078
type value
map_at_10
12.671
type value
map_at_100
16.291
type value
map_at_1000
17.855999999999998
type value
map_at_3
9.610000000000001
type value
map_at_5
11.152
type value
mrr_at_1
43.963
type value
mrr_at_10
53.173
type value
mrr_at_100
53.718999999999994
type value
mrr_at_1000
53.756
type value
mrr_at_3
50.980000000000004
type value
mrr_at_5
52.42
type value
ndcg_at_1
42.415000000000006
type value
ndcg_at_10
34.086
type value
ndcg_at_100
32.545
type value
ndcg_at_1000
41.144999999999996
type value
ndcg_at_3
39.434999999999995
type value
ndcg_at_5
37.888
type value
precision_at_1
43.653
type value
precision_at_10
25.014999999999997
type value
precision_at_100
8.594
type value
precision_at_1000
2.169
type value
precision_at_3
37.049
type value
precision_at_5
33.065
type value
recall_at_1
6.078
type value
recall_at_10
16.17
type value
recall_at_100
34.512
type value
recall_at_1000
65.447
type value
recall_at_3
10.706
type value
recall_at_5
13.158
task dataset metrics
type
Retrieval
type name config split revision
nq
MTEB NQ
default
test
None
type value
map_at_1
27.378000000000004
type value
map_at_10
42.178
type value
map_at_100
43.32
type value
map_at_1000
43.358000000000004
type value
map_at_3
37.474000000000004
type value
map_at_5
40.333000000000006
type value
mrr_at_1
30.823
type value
mrr_at_10
44.626
type value
mrr_at_100
45.494
type value
mrr_at_1000
45.519
type value
mrr_at_3
40.585
type value
mrr_at_5
43.146
type value
ndcg_at_1
30.794
type value
ndcg_at_10
50.099000000000004
type value
ndcg_at_100
54.900999999999996
type value
ndcg_at_1000
55.69499999999999
type value
ndcg_at_3
41.238
type value
ndcg_at_5
46.081
type value
precision_at_1
30.794
type value
precision_at_10
8.549
type value
precision_at_100
1.124
type value
precision_at_1000
0.12
type value
precision_at_3
18.926000000000002
type value
precision_at_5
14.16
type value
recall_at_1
27.378000000000004
type value
recall_at_10
71.842
type value
recall_at_100
92.565
type value
recall_at_1000
98.402
type value
recall_at_3
49.053999999999995
type value
recall_at_5
60.207
task dataset metrics
type
Retrieval
type name config split revision
quora
MTEB QuoraRetrieval
default
test
None
type value
map_at_1
70.557
type value
map_at_10
84.729
type value
map_at_100
85.369
type value
map_at_1000
85.382
type value
map_at_3
81.72
type value
map_at_5
83.613
type value
mrr_at_1
81.3
type value
mrr_at_10
87.488
type value
mrr_at_100
87.588
type value
mrr_at_1000
87.589
type value
mrr_at_3
86.53
type value
mrr_at_5
87.18599999999999
type value
ndcg_at_1
81.28999999999999
type value
ndcg_at_10
88.442
type value
ndcg_at_100
89.637
type value
ndcg_at_1000
89.70700000000001
type value
ndcg_at_3
85.55199999999999
type value
ndcg_at_5
87.154
type value
precision_at_1
81.28999999999999
type value
precision_at_10
13.489999999999998
type value
precision_at_100
1.54
type value
precision_at_1000
0.157
type value
precision_at_3
37.553
type value
precision_at_5
24.708
type value
recall_at_1
70.557
type value
recall_at_10
95.645
type value
recall_at_100
99.693
type value
recall_at_1000
99.995
type value
recall_at_3
87.359
type value
recall_at_5
91.89699999999999
task dataset metrics
type
Clustering
type name config split revision
mteb/reddit-clustering
MTEB RedditClustering
default
test
24640382cdbf8abc73003fb0fa6d111a705499eb
type value
v_measure
63.65060114776209
task dataset metrics
type
Clustering
type name config split revision
mteb/reddit-clustering-p2p
MTEB RedditClusteringP2P
default
test
282350215ef01743dc01b456c7f5241fa8937f16
type value
v_measure
64.63271250680617
task dataset metrics
type
Retrieval
type name config split revision
scidocs
MTEB SCIDOCS
default
test
None
type value
map_at_1
4.263
type value
map_at_10
10.801
type value
map_at_100
12.888
type value
map_at_1000
13.224
type value
map_at_3
7.362
type value
map_at_5
9.149000000000001
type value
mrr_at_1
21
type value
mrr_at_10
31.416
type value
mrr_at_100
32.513
type value
mrr_at_1000
32.58
type value
mrr_at_3
28.116999999999997
type value
mrr_at_5
29.976999999999997
type value
ndcg_at_1
21
type value
ndcg_at_10
18.551000000000002
type value
ndcg_at_100
26.657999999999998
type value
ndcg_at_1000
32.485
type value
ndcg_at_3
16.834
type value
ndcg_at_5
15.204999999999998
type value
precision_at_1
21
type value
precision_at_10
9.84
type value
precision_at_100
2.16
type value
precision_at_1000
0.35500000000000004
type value
precision_at_3
15.667
type value
precision_at_5
13.62
type value
recall_at_1
4.263
type value
recall_at_10
19.922
type value
recall_at_100
43.808
type value
recall_at_1000
72.14500000000001
type value
recall_at_3
9.493
type value
recall_at_5
13.767999999999999
task dataset metrics
type
STS
type name config split revision
mteb/sickr-sts
MTEB SICK-R
default
test
a6ea5a8cab320b040a23452cc28066d9beae2cee
type value
cos_sim_spearman
81.27446313317233
task dataset metrics
type
STS
type name config split revision
mteb/sts12-sts
MTEB STS12
default
test
a0d554a64d88156834ff5ae9920b964011b16384
type value
cos_sim_spearman
76.27963301217527
task dataset metrics
type
STS
type name config split revision
mteb/sts13-sts
MTEB STS13
default
test
7e90230a92c190f1bf69ae9002b8cea547a64cca
type value
cos_sim_spearman
88.18495048450949
task dataset metrics
type
STS
type name config split revision
mteb/sts14-sts
MTEB STS14
default
test
6031580fec1f6af667f0bd2da0a551cf4f0b2375
type value
cos_sim_spearman
81.91982338692046
task dataset metrics
type
STS
type name config split revision
mteb/sts15-sts
MTEB STS15
default
test
ae752c7c21bf194d8b67fd573edf7ae58183cbe3
type value
cos_sim_spearman
89.00896818385291
task dataset metrics
type
STS
type name config split revision
mteb/sts16-sts
MTEB STS16
default
test
4d8694f8f0e0100860b497b999b3dbed754a0513
type value
cos_sim_spearman
85.48814644586132
task dataset metrics
type
STS
type name config split revision
mteb/sts17-crosslingual-sts
MTEB STS17 (en-en)
en-en
test
af5e6fb845001ecf41f4c1e033ce921939a2a68d
type value
cos_sim_spearman
90.30116926966582
task dataset metrics
type
STS
type name config split revision
mteb/sts22-crosslingual-sts
MTEB STS22 (en)
en
test
6d1ba47164174a496b7fa5d3569dae26a6813b80
type value
cos_sim_spearman
67.74132963032342
task dataset metrics
type
STS
type name config split revision
mteb/stsbenchmark-sts
MTEB STSBenchmark
default
test
b0fddb56ed78048fa8b90373c8a3cfc37b684831
type value
cos_sim_spearman
86.87741355780479
task dataset metrics
type
Reranking
type name config split revision
mteb/scidocs-reranking
MTEB SciDocsRR
default
test
d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
type value
map
82.0019012295875
type value
mrr
94.70267024188593
task dataset metrics
type
Retrieval
type name config split revision
scifact
MTEB SciFact
default
test
None
type value
map_at_1
50.05
type value
map_at_10
59.36
type value
map_at_100
59.967999999999996
type value
map_at_1000
60.023
type value
map_at_3
56.515
type value
map_at_5
58.272999999999996
type value
mrr_at_1
53
type value
mrr_at_10
61.102000000000004
type value
mrr_at_100
61.476
type value
mrr_at_1000
61.523
type value
mrr_at_3
58.778
type value
mrr_at_5
60.128
type value
ndcg_at_1
53
type value
ndcg_at_10
64.43100000000001
type value
ndcg_at_100
66.73599999999999
type value
ndcg_at_1000
68.027
type value
ndcg_at_3
59.279
type value
ndcg_at_5
61.888
type value
precision_at_1
53
type value
precision_at_10
8.767
type value
precision_at_100
1.01
type value
precision_at_1000
0.11100000000000002
type value
precision_at_3
23.444000000000003
type value
precision_at_5
15.667
type value
recall_at_1
50.05
type value
recall_at_10
78.511
type value
recall_at_100
88.5
type value
recall_at_1000
98.333
type value
recall_at_3
64.117
type value
recall_at_5
70.867
task dataset metrics
type
PairClassification
type name config split revision
mteb/sprintduplicatequestions-pairclassification
MTEB SprintDuplicateQuestions
default
test
d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
type value
cos_sim_accuracy
99.72178217821782
type value
cos_sim_ap
93.0728601593541
type value
cos_sim_f1
85.6727976766699
type value
cos_sim_precision
83.02063789868667
type value
cos_sim_recall
88.5
type value
dot_accuracy
99.72178217821782
type value
dot_ap
93.07287396168348
type value
dot_f1
85.6727976766699
type value
dot_precision
83.02063789868667
type value
dot_recall
88.5
type value
euclidean_accuracy
99.72178217821782
type value
euclidean_ap
93.07285657982895
type value
euclidean_f1
85.6727976766699
type value
euclidean_precision
83.02063789868667
type value
euclidean_recall
88.5
type value
manhattan_accuracy
99.72475247524753
type value
manhattan_ap
93.02792973059809
type value
manhattan_f1
85.7727737973388
type value
manhattan_precision
87.84067085953879
type value
manhattan_recall
83.8
type value
max_accuracy
99.72475247524753
type value
max_ap
93.07287396168348
type value
max_f1
85.7727737973388
task dataset metrics
type
Clustering
type name config split revision
mteb/stackexchange-clustering
MTEB StackExchangeClustering
default
test
6cbc1f7b2bc0622f2e39d2c77fa502909748c259
type value
v_measure
68.77583615550819
task dataset metrics
type
Clustering
type name config split revision
mteb/stackexchange-clustering-p2p
MTEB StackExchangeClusteringP2P
default
test
815ca46b2622cec33ccafc3735d572c266efdb44
type value
v_measure
36.151636938606956
task dataset metrics
type
Reranking
type name config split revision
mteb/stackoverflowdupquestions-reranking
MTEB StackOverflowDupQuestions
default
test
e185fbe320c72810689fc5848eb6114e1ef5ec69
type value
map
52.16607939471187
type value
mrr
52.95172046091163
task dataset metrics
type
Summarization
type name config split revision
mteb/summeval
MTEB SummEval
default
test
cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
type value
cos_sim_pearson
31.314646669495666
type value
cos_sim_spearman
31.83562491439455
type value
dot_pearson
31.314590842874157
type value
dot_spearman
31.83363065810437
task dataset metrics
type
Retrieval
type name config split revision
trec-covid
MTEB TRECCOVID
default
test
None
type value
map_at_1
0.198
type value
map_at_10
1.3010000000000002
type value
map_at_100
7.2139999999999995
type value
map_at_1000
20.179
type value
map_at_3
0.528
type value
map_at_5
0.8019999999999999
type value
mrr_at_1
72
type value
mrr_at_10
83.39999999999999
type value
mrr_at_100
83.39999999999999
type value
mrr_at_1000
83.39999999999999
type value
mrr_at_3
81.667
type value
mrr_at_5
83.06700000000001
type value
ndcg_at_1
66
type value
ndcg_at_10
58.059000000000005
type value
ndcg_at_100
44.316
type value
ndcg_at_1000
43.147000000000006
type value
ndcg_at_3
63.815999999999995
type value
ndcg_at_5
63.005
type value
precision_at_1
72
type value
precision_at_10
61.4
type value
precision_at_100
45.62
type value
precision_at_1000
19.866
type value
precision_at_3
70
type value
precision_at_5
68.8
type value
recall_at_1
0.198
type value
recall_at_10
1.517
type value
recall_at_100
10.587
type value
recall_at_1000
41.233
type value
recall_at_3
0.573
type value
recall_at_5
0.907
task dataset metrics
type
Retrieval
type name config split revision
webis-touche2020
MTEB Touche2020
default
test
None
type value
map_at_1
1.894
type value
map_at_10
8.488999999999999
type value
map_at_100
14.445
type value
map_at_1000
16.078
type value
map_at_3
4.589
type value
map_at_5
6.019
type value
mrr_at_1
22.448999999999998
type value
mrr_at_10
39.82
type value
mrr_at_100
40.752
type value
mrr_at_1000
40.771
type value
mrr_at_3
34.354
type value
mrr_at_5
37.721
type value
ndcg_at_1
19.387999999999998
type value
ndcg_at_10
21.563
type value
ndcg_at_100
33.857
type value
ndcg_at_1000
46.199
type value
ndcg_at_3
22.296
type value
ndcg_at_5
21.770999999999997
type value
precision_at_1
22.448999999999998
type value
precision_at_10
19.796
type value
precision_at_100
7.142999999999999
type value
precision_at_1000
1.541
type value
precision_at_3
24.490000000000002
type value
precision_at_5
22.448999999999998
type value
recall_at_1
1.894
type value
recall_at_10
14.931
type value
recall_at_100
45.524
type value
recall_at_1000
83.243
type value
recall_at_3
5.712
type value
recall_at_5
8.386000000000001
task dataset metrics
type
Classification
type name config split revision
mteb/toxic_conversations_50k
MTEB ToxicConversationsClassification
default
test
d7c0de2777da35d6aae2200a62c6e0e5af397c4c
type value
accuracy
71.049
type value
ap
13.85116971310922
type value
f1
54.37504302487686
task dataset metrics
type
Classification
type name config split revision
mteb/tweet_sentiment_extraction
MTEB TweetSentimentExtractionClassification
default
test
d604517c81ca91fe16a244d1248fc021f9ecee7a
type value
accuracy
64.1312959818902
type value
f1
64.11413877009383
task dataset metrics
type
Clustering
type name config split revision
mteb/twentynewsgroups-clustering
MTEB TwentyNewsgroupsClustering
default
test
6125ec4e24fa026cec8a478383ee943acfbd5449
type value
v_measure
54.13103431861502
task dataset metrics
type
PairClassification
type name config split revision
mteb/twittersemeval2015-pairclassification
MTEB TwitterSemEval2015
default
test
70970daeab8776df92f5ea462b6173c0b46fd2d1
type value
cos_sim_accuracy
87.327889372355
type value
cos_sim_ap
77.42059895975699
type value
cos_sim_f1
71.02706903250873
type value
cos_sim_precision
69.75324344950394
type value
cos_sim_recall
72.34828496042216
type value
dot_accuracy
87.327889372355
type value
dot_ap
77.4209479346677
type value
dot_f1
71.02706903250873
type value
dot_precision
69.75324344950394
type value
dot_recall
72.34828496042216
type value
euclidean_accuracy
87.327889372355
type value
euclidean_ap
77.42096495861037
type value
euclidean_f1
71.02706903250873
type value
euclidean_precision
69.75324344950394
type value
euclidean_recall
72.34828496042216
type value
manhattan_accuracy
87.31000774870358
type value
manhattan_ap
77.38930750711619
type value
manhattan_f1
71.07935314027831
type value
manhattan_precision
67.70957726295677
type value
manhattan_recall
74.80211081794195
type value
max_accuracy
87.327889372355
type value
max_ap
77.42096495861037
type value
max_f1
71.07935314027831
task dataset metrics
type
PairClassification
type name config split revision
mteb/twitterurlcorpus-pairclassification
MTEB TwitterURLCorpus
default
test
8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
type value
cos_sim_accuracy
89.58939729110878
type value
cos_sim_ap
87.17594155025475
type value
cos_sim_f1
79.21146953405018
type value
cos_sim_precision
76.8918527109307
type value
cos_sim_recall
81.67539267015707
type value
dot_accuracy
89.58939729110878
type value
dot_ap
87.17593963273593
type value
dot_f1
79.21146953405018
type value
dot_precision
76.8918527109307
type value
dot_recall
81.67539267015707
type value
euclidean_accuracy
89.58939729110878
type value
euclidean_ap
87.17592466925834
type value
euclidean_f1
79.21146953405018
type value
euclidean_precision
76.8918527109307
type value
euclidean_recall
81.67539267015707
type value
manhattan_accuracy
89.62626615438352
type value
manhattan_ap
87.16589873161546
type value
manhattan_f1
79.25143598295348
type value
manhattan_precision
76.39494177323712
type value
manhattan_recall
82.32984293193716
type value
max_accuracy
89.62626615438352
type value
max_ap
87.17594155025475
type value
max_f1
79.25143598295348

hkunlp/instructor-large

We introduce Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks (MTEB leaderboard)! The model is easy to use with our customized sentence-transformer library. For more details, check out our paper and project page!

**************************** Updates ****************************

Quick start


Installation

pip install InstructorEmbedding

Compute your customized embeddings

Then you can use the model like this to calculate domain-specific and task-aware embeddings:

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
instruction = "Represent the Science title:"
embeddings = model.encode([[instruction,sentence]])
print(embeddings)

Use cases


Calculate embeddings for your customized texts

If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:

                          Represent the domain text_type for task_objective:

  • domain is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc.
  • text_type is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc.
  • task_objective is optional, and it specifies the objective of embedding, e.g., retrieve a document, classify the sentence, etc.

Calculate Sentence similarities

You can further use the model to compute similarities between two groups of sentences, with customized embeddings.

from sklearn.metrics.pairwise import cosine_similarity
sentences_a = [['Represent the Science sentence: ','Parton energy loss in QCD matter'], 
               ['Represent the Financial statement: ','The Federal Reserve on Wednesday raised its benchmark interest rate.']]
sentences_b = [['Represent the Science sentence: ','The Chiral Phase Transition in Dissipative Dynamics'],
               ['Represent the Financial statement: ','The funds rose less than 0.5 per cent on Friday']]
embeddings_a = model.encode(sentences_a)
embeddings_b = model.encode(sentences_b)
similarities = cosine_similarity(embeddings_a,embeddings_b)
print(similarities)

Information Retrieval

You can also use customized embeddings for information retrieval.

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
query  = [['Represent the Wikipedia question for retrieving supporting documents: ','where is the food stored in a yam plant']]
corpus = [['Represent the Wikipedia document for retrieval: ','Capitalism has been dominant in the Western world since the end of feudalism, but most feel[who?] that the term "mixed economies" more precisely describes most contemporary economies, due to their containing both private-owned and state-owned enterprises. In capitalism, prices determine the demand-supply scale. For example, higher demand for certain goods and services lead to higher prices and lower demand for certain goods lead to lower prices.'],
          ['Represent the Wikipedia document for retrieval: ',"The disparate impact theory is especially controversial under the Fair Housing Act because the Act regulates many activities relating to housing, insurance, and mortgage loans—and some scholars have argued that the theory's use under the Fair Housing Act, combined with extensions of the Community Reinvestment Act, contributed to rise of sub-prime lending and the crash of the U.S. housing market and ensuing global economic recession"],
          ['Represent the Wikipedia document for retrieval: ','Disparate impact in United States labor law refers to practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral. Although the protected classes vary by statute, most federal civil rights laws protect based on race, color, religion, national origin, and sex as protected traits, and some laws include disability status and other traits as well.']]
query_embeddings = model.encode(query)
corpus_embeddings = model.encode(corpus)
similarities = cosine_similarity(query_embeddings,corpus_embeddings)
retrieved_doc_id = np.argmax(similarities)
print(retrieved_doc_id)

Clustering

Use customized embeddings for clustering texts in groups.

import sklearn.cluster
sentences = [['Represent the Medicine sentence for clustering: ','Dynamical Scalar Degree of Freedom in Horava-Lifshitz Gravity'],
             ['Represent the Medicine sentence for clustering: ','Comparison of Atmospheric Neutrino Flux Calculations at Low Energies'],
             ['Represent the Medicine sentence for clustering: ','Fermion Bags in the Massive Gross-Neveu Model'],
             ['Represent the Medicine sentence for clustering: ',"QCD corrections to Associated t-tbar-H production at the Tevatron"],
             ['Represent the Medicine sentence for clustering: ','A New Analysis of the R Measurements: Resonance Parameters of the Higher,  Vector States of Charmonium']]
embeddings = model.encode(sentences)
clustering_model = sklearn.cluster.MiniBatchKMeans(n_clusters=2)
clustering_model.fit(embeddings)
cluster_assignment = clustering_model.labels_
print(cluster_assignment)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors