### Turotial: Coarse-grained Topic Retrieval and Fine-grained Line Retrieval Evaluation ###

In this tutorial, we present the process of evaluating the model LongChat-13B-16K on Topic Retrieval with 5-topic testcases and use our auto_topic_eval module to check the accuracy of the outputs. We demonstrate how to run the Line Retrieval evaluation with 200-line testcases as well.

We use `eval.py` for the Topic and the Line Retrieval evaluation. 

* `--model-name-or-path`: huggingface model name or path to the checkpoint.
* `--task`: pass "topics" for Topic Retrieval evaluation; pass "lines" for Line Retrieval evaluation.
* `--longchat_flash_attn`: enable flash attention that only applies to the LongChat models
* `--eval_shortest_only`: evaluate only on the shortest context lenght. Remove it to reproduce our full table.
* `--num_gpus`: number of GPGs to be used for evaluation. Default is 1.
* `--max_gpu_memory`: maximum GPU memory (GiB) can be used per GPU to load the model. Default is 40.

To check the final accuracy: 

* No extra action needed for Line Retrieval. The accuracy is automaticly evaluated and printed in the logs.
* We use `auto_topic_eval.py` to evaluate Topic Retrieval using ChatGPT.
    * `--test_file`: the output file generated by eval.py

### Topic Evaluation

In [4]:
!CUDA_VISIBLE_DEVICES=1 python3 eval.py --model-name-or-path  lmsys/longchat-13b-16k --task topics --longchat_flash_attn --eval_shortest_only

output to evaluation/topics/predictions/longchat-13b-16k
lmsys/longchat-13b-16k
Downloading (…)lve/main/config.json: 100%|█████| 568/568 [00:00<00:00, 3.20MB/s]
Downloading (…)model.bin.index.json: 100%|█| 33.4k/33.4k [00:00<00:00, 43.9MB/s]
Downloading shards:   0%|                                 | 0/3 [00:00<?, ?it/s]
Downloading (…)l-00001-of-00003.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s][A
Downloading (…)l-00001-of-00003.bin:   0%| | 10.5M/9.95G [00:00<08:48, 18.8MB/s][A
Downloading (…)l-00001-of-00003.bin:   0%| | 21.0M/9.95G [00:00<07:37, 21.7MB/s][A
Downloading (…)l-00001-of-00003.bin:   0%| | 31.5M/9.95G [00:01<05:15, 31.5MB/s][A
Downloading (…)l-00001-of-00003.bin:   0%| | 41.9M/9.95G [00:01<05:57, 27.7MB/s][A
Downloading (…)l-00001-of-00003.bin:   1%| | 52.4M/9.95G [00:01<04:23, 37.5MB/s][A
Downloading (…)l-00001-of-00003.bin:   1%| | 62.9M/9.95G [00:02<05:04, 32.5MB/s][A
Downloading (…)l-00001-of-00003.bin:   1%| | 73.4M/9.95G [00:02<03:57, 41.6MB/s][A
Downl

Downloading (…)l-00001-of-00003.bin:  12%| | 1.23G/9.95G [00:45<04:15, 34.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  12%| | 1.24G/9.95G [00:45<03:26, 42.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.25G/9.95G [00:45<04:00, 36.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.26G/9.95G [00:45<04:34, 31.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.27G/9.95G [00:46<03:37, 39.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.28G/9.95G [00:46<04:17, 33.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.29G/9.95G [00:46<03:25, 42.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.30G/9.95G [00:46<04:09, 34.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.31G/9.95G [00:47<04:36, 31.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.32G/9.95G [00:47<03:39, 39.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.33G/9.95G [00:48<06:03, 23.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  13%|▏| 1.34G/9.95G [00:48<04:41, 30.6M

Downloading (…)l-00001-of-00003.bin:  23%|▏| 2.29G/9.95G [01:28<08:48, 14.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  23%|▏| 2.30G/9.95G [01:28<06:34, 19.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  23%|▏| 2.31G/9.95G [01:29<06:12, 20.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  23%|▏| 2.32G/9.95G [01:29<04:48, 26.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  23%|▏| 2.33G/9.95G [01:29<05:05, 25.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.34G/9.95G [01:30<05:41, 22.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.35G/9.95G [01:30<04:33, 27.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.36G/9.95G [01:30<03:37, 35.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.37G/9.95G [01:31<05:59, 21.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.38G/9.95G [01:31<04:38, 27.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.39G/9.95G [01:32<06:26, 19.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  24%|▏| 2.40G/9.95G [01:32<05:01, 25.0M

Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.34G/9.95G [02:17<04:16, 25.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.36G/9.95G [02:17<03:36, 30.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.37G/9.95G [02:19<05:51, 18.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.38G/9.95G [02:19<04:45, 23.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.40G/9.95G [02:20<04:47, 22.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.41G/9.95G [02:20<03:55, 27.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  34%|▎| 3.43G/9.95G [02:21<04:16, 25.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  35%|▎| 3.45G/9.95G [02:21<03:05, 35.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  35%|▎| 3.46G/9.95G [02:22<04:19, 25.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  35%|▎| 3.48G/9.95G [02:22<03:05, 34.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  35%|▎| 3.49G/9.95G [02:23<04:15, 25.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  35%|▎| 3.51G/9.95G [02:24<04:37, 23.2M

Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.47G/9.95G [03:04<02:43, 33.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.48G/9.95G [03:05<03:05, 29.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.49G/9.95G [03:05<02:29, 36.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.50G/9.95G [03:05<02:56, 30.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.51G/9.95G [03:06<03:12, 28.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  45%|▍| 4.52G/9.95G [03:06<02:36, 34.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.53G/9.95G [03:07<03:19, 27.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.54G/9.95G [03:07<02:39, 33.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.55G/9.95G [03:07<02:11, 41.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.56G/9.95G [03:08<03:43, 24.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.57G/9.95G [03:08<02:57, 30.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  46%|▍| 4.58G/9.95G [03:08<02:25, 36.8M

Downloading (…)l-00001-of-00003.bin:  55%|▌| 5.48G/9.95G [03:36<02:03, 36.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  55%|▌| 5.49G/9.95G [03:36<02:22, 31.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  55%|▌| 5.51G/9.95G [03:36<01:56, 38.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  55%|▌| 5.52G/9.95G [03:37<02:18, 32.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.53G/9.95G [03:37<02:31, 29.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.54G/9.95G [03:37<02:04, 35.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.55G/9.95G [03:38<02:26, 30.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.56G/9.95G [03:38<02:01, 36.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.57G/9.95G [03:38<01:44, 42.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.58G/9.95G [03:38<01:31, 47.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.59G/9.95G [03:39<01:55, 37.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  56%|▌| 5.60G/9.95G [03:39<02:37, 27.6M

Downloading (…)l-00001-of-00003.bin:  65%|▋| 6.50G/9.95G [04:07<02:02, 28.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  65%|▋| 6.51G/9.95G [04:07<01:38, 34.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.52G/9.95G [04:07<01:55, 29.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.53G/9.95G [04:07<01:34, 36.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.54G/9.95G [04:08<01:49, 31.0MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.55G/9.95G [04:08<01:29, 37.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.56G/9.95G [04:09<01:52, 30.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.57G/9.95G [04:09<01:29, 37.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.60G/9.95G [04:10<02:03, 27.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  66%|▋| 6.61G/9.95G [04:10<01:42, 32.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  67%|▋| 6.62G/9.95G [04:10<01:58, 28.1MB/s][A
Downloading (…)l-00001-of-00003.bin:  67%|▋| 6.63G/9.95G [04:10<01:38, 33.7M

Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.53G/9.95G [04:40<01:01, 39.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.54G/9.95G [04:40<01:13, 32.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.55G/9.95G [04:40<01:00, 39.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.56G/9.95G [04:41<01:11, 33.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.57G/9.95G [04:41<00:58, 40.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.58G/9.95G [04:41<01:12, 32.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.59G/9.95G [04:41<00:59, 39.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  76%|▊| 7.60G/9.95G [04:42<01:10, 33.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  77%|▊| 7.61G/9.95G [04:42<01:18, 29.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  77%|▊| 7.62G/9.95G [04:42<01:03, 36.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  77%|▊| 7.63G/9.95G [04:43<01:12, 31.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  77%|▊| 7.64G/9.95G [04:43<00:59, 39.0M

Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.55G/9.95G [05:13<00:40, 34.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.56G/9.95G [05:13<00:33, 41.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.57G/9.95G [05:13<00:41, 33.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.58G/9.95G [05:14<00:48, 28.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.59G/9.95G [05:14<00:39, 34.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  86%|▊| 8.60G/9.95G [05:14<00:44, 30.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.61G/9.95G [05:15<00:35, 37.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.62G/9.95G [05:15<00:41, 32.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.63G/9.95G [05:15<00:45, 29.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.64G/9.95G [05:16<00:36, 35.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.65G/9.95G [05:16<00:33, 38.2MB/s][A
Downloading (…)l-00001-of-00003.bin:  87%|▊| 8.66G/9.95G [05:16<00:28, 44.5M

Downloading (…)l-00001-of-00003.bin:  96%|▉| 9.56G/9.95G [05:44<00:10, 37.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  96%|▉| 9.57G/9.95G [05:45<00:09, 39.6MB/s][A
Downloading (…)l-00001-of-00003.bin:  96%|▉| 9.58G/9.95G [05:45<00:10, 33.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  96%|▉| 9.59G/9.95G [05:45<00:08, 40.8MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.60G/9.95G [05:46<00:10, 33.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.62G/9.95G [05:46<00:08, 40.3MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.63G/9.95G [05:46<00:09, 33.5MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.64G/9.95G [05:47<00:10, 29.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.65G/9.95G [05:47<00:08, 35.9MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.66G/9.95G [05:47<00:09, 30.7MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.67G/9.95G [05:48<00:07, 36.4MB/s][A
Downloading (…)l-00001-of-00003.bin:  97%|▉| 9.68G/9.95G [05:48<00:08, 30.9M

Downloading (…)l-00002-of-00003.bin:   6%|  | 608M/9.90G [00:23<13:28, 11.5MB/s][A
Downloading (…)l-00002-of-00003.bin:   6%|  | 619M/9.90G [00:23<10:24, 14.9MB/s][A
Downloading (…)l-00002-of-00003.bin:   6%|▏ | 629M/9.90G [00:23<08:55, 17.3MB/s][A
Downloading (…)l-00002-of-00003.bin:   6%|▏ | 640M/9.90G [00:24<06:49, 22.6MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 650M/9.90G [00:24<07:50, 19.7MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 661M/9.90G [00:24<06:02, 25.5MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 671M/9.90G [00:25<08:13, 18.7MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 682M/9.90G [00:25<06:23, 24.1MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 692M/9.90G [00:26<08:28, 18.1MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 703M/9.90G [00:26<06:29, 23.6MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 713M/9.90G [00:28<09:47, 15.7MB/s][A
Downloading (…)l-00002-of-00003.bin:   7%|▏ | 724M/9.90G [00:28<07:29, 20.4M

Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.64G/9.90G [01:00<04:01, 34.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.65G/9.90G [01:01<04:34, 30.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.66G/9.90G [01:01<05:01, 27.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.67G/9.90G [01:01<04:01, 34.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.68G/9.90G [01:02<04:30, 30.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.69G/9.90G [01:02<03:34, 38.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.70G/9.90G [01:02<04:30, 30.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.71G/9.90G [01:02<03:37, 37.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.72G/9.90G [01:03<04:27, 30.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  17%|▏| 1.73G/9.90G [01:03<03:43, 36.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  18%|▏| 1.74G/9.90G [01:03<03:53, 35.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  18%|▏| 1.75G/9.90G [01:04<04:32, 30.0M

Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.65G/9.90G [01:32<03:22, 35.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.66G/9.90G [01:33<03:49, 31.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.67G/9.90G [01:33<04:15, 28.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.68G/9.90G [01:33<03:27, 34.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.69G/9.90G [01:33<03:01, 39.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.71G/9.90G [01:34<02:36, 46.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  27%|▎| 2.72G/9.90G [01:34<03:32, 33.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  28%|▎| 2.73G/9.90G [01:35<04:07, 29.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  28%|▎| 2.74G/9.90G [01:35<03:31, 33.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  28%|▎| 2.75G/9.90G [01:35<03:24, 35.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  28%|▎| 2.76G/9.90G [01:35<02:47, 42.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  28%|▎| 2.77G/9.90G [01:36<03:15, 36.5M

Downloading (…)l-00002-of-00003.bin:  37%|▎| 3.67G/9.90G [02:04<03:01, 34.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  37%|▎| 3.68G/9.90G [02:05<03:50, 27.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  37%|▎| 3.69G/9.90G [02:05<03:04, 33.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  37%|▎| 3.70G/9.90G [02:05<02:30, 41.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  37%|▎| 3.71G/9.90G [02:06<04:29, 23.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.72G/9.90G [02:06<03:36, 28.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.73G/9.90G [02:07<03:52, 26.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.74G/9.90G [02:07<03:05, 33.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.75G/9.90G [02:07<03:58, 25.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.76G/9.90G [02:08<03:24, 30.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.77G/9.90G [02:08<02:45, 37.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  38%|▍| 3.79G/9.90G [02:08<03:12, 31.8M

Downloading (…)l-00002-of-00003.bin:  47%|▍| 4.69G/9.90G [02:37<02:59, 29.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  47%|▍| 4.70G/9.90G [02:38<02:25, 35.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.71G/9.90G [02:38<02:51, 30.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.72G/9.90G [02:38<02:19, 37.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.73G/9.90G [02:39<03:02, 28.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.74G/9.90G [02:39<02:49, 30.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.75G/9.90G [02:39<02:52, 29.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.76G/9.90G [02:40<02:24, 35.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.77G/9.90G [02:40<02:52, 29.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.78G/9.90G [02:40<02:17, 37.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.79G/9.90G [02:41<02:58, 28.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  48%|▍| 4.80G/9.90G [02:41<02:22, 35.7M

Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.71G/9.90G [03:13<02:39, 26.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.73G/9.90G [03:14<03:45, 18.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.74G/9.90G [03:14<02:53, 24.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.75G/9.90G [03:15<02:59, 23.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.76G/9.90G [03:15<02:21, 29.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.77G/9.90G [03:15<02:31, 27.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.78G/9.90G [03:16<02:38, 26.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  58%|▌| 5.79G/9.90G [03:16<02:05, 32.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  59%|▌| 5.80G/9.90G [03:16<02:17, 29.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  59%|▌| 5.81G/9.90G [03:16<01:50, 37.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  59%|▌| 5.82G/9.90G [03:17<02:07, 32.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  59%|▌| 5.83G/9.90G [03:17<01:43, 39.3M

Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.73G/9.90G [03:46<01:43, 30.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.74G/9.90G [03:47<01:52, 28.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.75G/9.90G [03:47<01:29, 35.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.76G/9.90G [03:47<01:20, 39.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.77G/9.90G [03:48<01:35, 32.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  68%|▋| 6.78G/9.90G [03:48<01:18, 39.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.79G/9.90G [03:48<01:33, 33.2MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.81G/9.90G [03:48<01:16, 40.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.82G/9.90G [03:49<01:33, 33.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.83G/9.90G [03:49<01:16, 40.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.84G/9.90G [03:49<01:31, 33.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  69%|▋| 6.85G/9.90G [03:50<01:53, 26.9M

Downloading (…)l-00002-of-00003.bin:  78%|▊| 7.76G/9.90G [04:22<01:49, 19.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  78%|▊| 7.77G/9.90G [04:22<01:23, 25.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.78G/9.90G [04:23<01:53, 18.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.79G/9.90G [04:23<01:28, 23.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.80G/9.90G [04:24<01:39, 21.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.81G/9.90G [04:24<01:17, 26.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.82G/9.90G [04:25<01:37, 21.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.83G/9.90G [04:25<01:15, 27.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.84G/9.90G [04:25<01:19, 26.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.85G/9.90G [04:26<01:34, 21.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  79%|▊| 7.86G/9.90G [04:26<01:15, 27.1MB/s][A
Downloading (…)l-00002-of-00003.bin:  80%|▊| 7.87G/9.90G [04:26<00:59, 34.0M

Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.78G/9.90G [05:00<00:38, 29.0MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.79G/9.90G [05:00<00:31, 35.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.80G/9.90G [05:01<00:36, 30.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.81G/9.90G [05:01<00:29, 37.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.82G/9.90G [05:01<00:34, 31.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.83G/9.90G [05:02<00:37, 28.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.85G/9.90G [05:02<00:31, 33.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  89%|▉| 8.86G/9.90G [05:02<00:26, 38.8MB/s][A
Downloading (…)l-00002-of-00003.bin:  90%|▉| 8.87G/9.90G [05:03<00:30, 33.4MB/s][A
Downloading (…)l-00002-of-00003.bin:  90%|▉| 8.88G/9.90G [05:03<00:35, 28.9MB/s][A
Downloading (…)l-00002-of-00003.bin:  90%|▉| 8.89G/9.90G [05:03<00:29, 34.5MB/s][A
Downloading (…)l-00002-of-00003.bin:  90%|▉| 8.90G/9.90G [05:03<00:24, 40.8M

Downloading (…)l-00002-of-00003.bin:  99%|▉| 9.80G/9.90G [05:35<00:03, 31.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  99%|▉| 9.81G/9.90G [05:35<00:03, 28.3MB/s][A
Downloading (…)l-00002-of-00003.bin:  99%|▉| 9.83G/9.90G [05:35<00:02, 34.7MB/s][A
Downloading (…)l-00002-of-00003.bin:  99%|▉| 9.84G/9.90G [05:36<00:02, 30.6MB/s][A
Downloading (…)l-00002-of-00003.bin:  99%|▉| 9.85G/9.90G [05:36<00:02, 25.1MB/s][A
Downloading (…)l-00002-of-00003.bin: 100%|▉| 9.86G/9.90G [05:36<00:01, 31.1MB/s][A
Downloading (…)l-00002-of-00003.bin: 100%|▉| 9.87G/9.90G [05:37<00:00, 38.5MB/s][A
Downloading (…)l-00002-of-00003.bin: 100%|▉| 9.88G/9.90G [05:37<00:01, 22.6MB/s][A
Downloading (…)l-00002-of-00003.bin: 100%|▉| 9.89G/9.90G [05:38<00:00, 28.5MB/s][A
Downloading (…)l-00002-of-00003.bin: 100%|█| 9.90G/9.90G [05:38<00:00, 29.3MB/s][A
Downloading shards:  67%|████████████████        | 2/3 [11:43<05:50, 350.04s/it]
Downloading (…)l-00003-of-00003.bin:   0%|          | 0.00/6.18G [00:00<?, ?B/s

Downloading (…)l-00003-of-00003.bin:  15%|▎ | 923M/6.18G [00:32<04:16, 20.5MB/s][A
Downloading (…)l-00003-of-00003.bin:  15%|▎ | 933M/6.18G [00:33<05:10, 16.9MB/s][A
Downloading (…)l-00003-of-00003.bin:  15%|▎ | 944M/6.18G [00:33<03:55, 22.3MB/s][A
Downloading (…)l-00003-of-00003.bin:  15%|▎ | 954M/6.18G [00:33<02:59, 29.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▎ | 965M/6.18G [00:34<05:18, 16.4MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▎ | 975M/6.18G [00:34<04:00, 21.6MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▎ | 986M/6.18G [00:35<03:04, 28.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▎ | 996M/6.18G [00:35<04:19, 20.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▏| 1.01G/6.18G [00:36<03:18, 26.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  16%|▏| 1.02G/6.18G [00:37<05:07, 16.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  17%|▏| 1.03G/6.18G [00:37<03:53, 22.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  17%|▏| 1.04G/6.18G [00:37<03:46, 22.7M

Downloading (…)l-00003-of-00003.bin:  31%|▎| 1.94G/6.18G [01:08<02:38, 26.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 1.95G/6.18G [01:08<02:07, 33.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 1.96G/6.18G [01:08<02:19, 30.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 1.97G/6.18G [01:09<02:30, 28.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 1.98G/6.18G [01:09<02:03, 34.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 1.99G/6.18G [01:09<02:15, 30.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  32%|▎| 2.00G/6.18G [01:09<01:49, 38.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  33%|▎| 2.01G/6.18G [01:10<02:09, 32.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  33%|▎| 2.02G/6.18G [01:10<01:45, 39.5MB/s][A
Downloading (…)l-00003-of-00003.bin:  33%|▎| 2.03G/6.18G [01:10<02:04, 33.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  33%|▎| 2.04G/6.18G [01:11<02:43, 25.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  33%|▎| 2.06G/6.18G [01:11<02:14, 30.8M

Downloading (…)l-00003-of-00003.bin:  48%|▍| 2.96G/6.18G [01:40<01:37, 33.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  48%|▍| 2.97G/6.18G [01:40<02:31, 21.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  48%|▍| 2.98G/6.18G [01:41<01:56, 27.5MB/s][A
Downloading (…)l-00003-of-00003.bin:  48%|▍| 2.99G/6.18G [01:41<02:42, 19.6MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.00G/6.18G [01:42<02:04, 25.5MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.01G/6.18G [01:43<02:50, 18.6MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.02G/6.18G [01:43<02:11, 24.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.03G/6.18G [01:43<02:12, 23.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.04G/6.18G [01:43<01:45, 29.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  49%|▍| 3.05G/6.18G [01:44<02:01, 25.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  50%|▍| 3.06G/6.18G [01:44<01:35, 32.7MB/s][A
Downloading (…)l-00003-of-00003.bin:  50%|▍| 3.07G/6.18G [01:44<01:46, 29.2M

Downloading (…)l-00003-of-00003.bin:  64%|▋| 3.97G/6.18G [02:13<01:13, 29.9MB/s][A
Downloading (…)l-00003-of-00003.bin:  64%|▋| 3.98G/6.18G [02:13<01:00, 36.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  65%|▋| 4.00G/6.18G [02:13<01:11, 30.5MB/s][A
Downloading (…)l-00003-of-00003.bin:  65%|▋| 4.01G/6.18G [02:13<00:58, 37.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  65%|▋| 4.02G/6.18G [02:14<01:12, 29.9MB/s][A
Downloading (…)l-00003-of-00003.bin:  65%|▋| 4.03G/6.18G [02:14<01:01, 35.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  65%|▋| 4.04G/6.18G [02:14<00:50, 42.4MB/s][A
Downloading (…)l-00003-of-00003.bin:  66%|▋| 4.05G/6.18G [02:15<01:28, 24.1MB/s][A
Downloading (…)l-00003-of-00003.bin:  66%|▋| 4.06G/6.18G [02:15<01:09, 30.3MB/s][A
Downloading (…)l-00003-of-00003.bin:  66%|▋| 4.07G/6.18G [02:16<01:18, 27.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  66%|▋| 4.08G/6.18G [02:16<01:02, 33.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  66%|▋| 4.09G/6.18G [02:16<01:09, 29.9M

Downloading (…)l-00003-of-00003.bin:  82%|▊| 5.05G/6.18G [02:52<00:44, 25.0MB/s][A
Downloading (…)l-00003-of-00003.bin:  82%|▊| 5.08G/6.18G [02:52<00:36, 30.3MB/s][A
Downloading (…)l-00003-of-00003.bin:  82%|▊| 5.10G/6.18G [02:53<00:31, 33.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  83%|▊| 5.11G/6.18G [02:54<00:40, 26.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  83%|▊| 5.13G/6.18G [02:54<00:28, 36.9MB/s][A
Downloading (…)l-00003-of-00003.bin:  83%|▊| 5.14G/6.18G [02:55<00:40, 25.8MB/s][A
Downloading (…)l-00003-of-00003.bin:  83%|▊| 5.15G/6.18G [02:55<00:33, 31.2MB/s][A
Downloading (…)l-00003-of-00003.bin:  83%|▊| 5.16G/6.18G [02:55<00:27, 37.7MB/s][A
Downloading (…)l-00003-of-00003.bin:  84%|▊| 5.17G/6.18G [02:56<00:42, 23.6MB/s][A
Downloading (…)l-00003-of-00003.bin:  84%|▊| 5.19G/6.18G [02:57<00:43, 22.7MB/s][A
Downloading (…)l-00003-of-00003.bin:  84%|▊| 5.21G/6.18G [02:57<00:29, 32.4MB/s][A
Downloading (…)l-00003-of-00003.bin:  85%|▊| 5.22G/6.18G [02:58<00:40, 23.5M

Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing P

34it [04:45,  8.21s/it]Label: The role of sports in society, Predict: ['The first topic we discussed was the role of sports in society.'], prompt length: 3351
35it [04:53,  8.12s/it]Label: The benefits of reading for pleasure, Predict: ['The first topic we discussed was the benefits of reading for pleasure.'], prompt length: 3559
36it [05:02,  8.37s/it]Label: The role of sports in society, Predict: ['The first topic we discussed was the role of sports in society.'], prompt length: 3069
37it [05:09,  8.04s/it]Label: The effects of air pollution on human health, Predict: ['The first topic we discussed was the effects of air pollution on human health.'], prompt length: 3096
38it [05:18,  8.33s/it]Label: The role of art in society, Predict: ['The first topic we discussed was the role of art in society.'], prompt length: 3398
39it [05:26,  8.20s/it]Label: The impact of social media on communication, Predict: ['The first topic we discussed was the impact of social media on communication.'], 

In [7]:
!export OPENAI_API_KEY=<YOUR-OPENAI-API-KEY>
!CUDA_VISIBLE_DEVICES=1 python3 auto_topic_eval.py --test_file evaluation/topics/predictions/longchat-13b-16k/5_response.txt

--------------- Start auto-evaluation, you should verify it does this correctly --------------
Question #0: Label: The psychology of creativity, Predict: 'The first topic we discussed was the psychology of creativity. - auto-eval goes with correct
Question #1: Label: The benefits of learning a new language, Predict: 'The first topic we discussed was the benefits of learning a new language. - auto-eval goes with correct
Question #2: Label: The effects of climate change on ocean ecosystems, Predict: 'The first topic we discussed was the effects of climate change on ocean ecosystems. - auto-eval goes with correct
Question #3: Label: The role of art in society, Predict: 'The first topic we discussed was the role of art in society. - auto-eval goes with correct
Question #4: Label: The effects of climate change on ocean ecosystems, Predict: 'The first topic we discussed was the effects of climate change on ocean ecosystems. - auto-eval goes with correct
Question #5: Label: The benefits of vo

Question #48: Label: The impact of social media on communication, Predict: 'The first topic we discussed was the impact of social media on communication. - auto-eval goes with correct
Question #49: Label: The role of education in society, Predict: 'The first topic we discussed was "The role of education in society." - auto-eval goes with correct
---------- End auto-evaluation, predict accuracy 1.0 ---------------


### Line Retrieval Evaluation

In [5]:
!CUDA_VISIBLE_DEVICES=1 python3 eval.py --model-name-or-path  lmsys/longchat-13b-16k --task lines --longchat_flash_attn --eval_shortest_only

output to evaluation/lines/predictions/longchat-13b-16k
lmsys/longchat-13b-16k
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from 16384 to 2048
Condensing Positional embeddings from

Label: 333, Predict: The <REGISTER_CONTENT> in line sleepy-bijou is <333>., Parsed: 333, prompt length: 4783
34it [12:05, 20.53s/it]Using conversation template: vicuna_v1.1
Label: 18152, Predict: The <REGISTER_CONTENT> in line wide-eyed-frenzy is <18152>., Parsed: 18152, prompt length: 4827
35it [12:29, 21.52s/it]Using conversation template: vicuna_v1.1
Label: 49178, Predict: The <REGISTER_CONTENT> in line elite-pottery is <49178>., Parsed: 49178, prompt length: 4789
36it [12:50, 21.43s/it]Using conversation template: vicuna_v1.1
Label: 37243, Predict: The <REGISTER_CONTENT> in line taboo-cinema is <37243>., Parsed: 37243, prompt length: 4796
37it [13:11, 21.37s/it]Using conversation template: vicuna_v1.1
Label: 23545, Predict: The <REGISTER_CONTENT> in line scary-kid is <23545>., Parsed: 23545, prompt length: 4782
38it [13:32, 21.32s/it]Using conversation template: vicuna_v1.1
Label: 7156, Predict: The <REGISTER_CONTENT> in line filthy-eternity is <7156>., Parsed: 7156, prompt length: