In [None]:
%pip install openicl
# Restart the kernel after the installation is completed

# 2. Using Different Language Models with OpenICL

In this chapter, we will show you how to use OpenICL to do in-context learning (ICL) with different language models. Mainly including [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), [FLAN-T5](https://arxiv.org/abs/2109.01652), [XGLM](https://arxiv.org/abs/2112.10668), OpenAI's [GPT-3](https://arxiv.org/abs/2005.14165) API and [OPT-175B](https://arxiv.org/abs/2205.01068) API.

## 2-1 Huggingface Library's Models

In this section, we will take GPT2, FLAN-T5, and XGLM as examples to show you how to use the models in the [huggingface library](https://huggingface.co/models) with OpenICL. Generally speaking, you only need to assign the corresponding name to the `model_name` parameter when declaring `Inferencer`, but we will still provide you with some specific examples.

### 2-1-1 GPT-2

This example can be found in [tutorial1](https://github.com/Shark-NLP/OpenICL/blob/main/examples/tutorials/openicl_tutorial1_getting_started.ipynb). But this time, we set `batch_size` for `TopkRetriever` and `PPLInference` to speed up. It can be noticed that the values ​​of the two `batch_size`(s) could be set to be different (`8` and `6`). That is because, at the beginning of retrieval and inference, the corresponding components will receive the complete dataset or the retrieval results for the entire test set, instead of processing the data in batches.

In [None]:
from openicl import DatasetReader, PromptTemplate, TopkRetriever, PPLInferencer

# Define a DatasetReader, loading dataset from huggingface.
data = DatasetReader('gpt3mix/sst2', input_columns=['text'], output_column='label')

# SST-2 Template Example
tp_dict = {
     0: '</E>Positive Movie Review: </text>',
     1: '</E>Negative Movie Review: </text>' 
}
template = PromptTemplate(tp_dict, {'text' : '</text>'}, ice_token='</E>')

# TopK Retriever
retriever = TopkRetriever(data, ice_num=2, batch_size=8)

# Define a Inferencer
inferencer = PPLInferencer(model_name='gpt2', batch_size=6)

# Inference
predictions = inferencer.inference(retriever, ice_template=template, output_json_filename='gpt2_sst2')
print(predictions)

### 2-1-2 XGLM

When it comes to machine translation, it is a good choice to use XGLM. But when using XGLM, we **don't suggest** to set `batch_size` in `GenInferencer`. (When calling the `model.generate` method of [huggingface transformers library](https://huggingface.co/docs/transformers/index), padding is needed if you want to input multiple pieces of data at a time. But we found in the test that if padding exists, the generation of XGLM will be affected). The code for evaluating the ICL performance of XGLM (7.5B) on WMT16 (de-en) dataset
with direct inference strategy is as follows:

In [None]:
from openicl import DatasetReader, PromptTemplate, BM25Retriever, GenInferencer
from datasets import load_dataset

# Loading dataset from huggingface 
dataset = load_dataset('wmt16', name='de-en')

# Data Preprocessing
dataset = dataset.map(lambda example: example['translation']).remove_columns('translation')

# Define a DatasetReader, selecting 5 pieces of data randomly.
data = DatasetReader(dataset, input_columns='de', output_column='en', ds_size=5)

# WMT16 en->de Template Example
template = PromptTemplate('</E></de> = </en>', {'en' : '</en>', 'de' : '</de>'}, ice_token='</E>')

# BM25 Retriever
retriever = BM25Retriever(data, ice_num=1, index_split='validation', test_split='test')

# Define a Inferencer
inferencer = GenInferencer(model_name='facebook/xglm-7.5B')

# Inference
predictions = inferencer.inference(retriever, ice_template=template, output_json_filename='xglm_wmt')
print(predictions)

### 2-1-3 FLAN-T5

In this section, we will use FLAN-T5 with OpenICL to reproduce the results in the figure below:

<div align="center">
<img src="https://s1.ax1x.com/2023/03/10/ppuZnQP.png"  width=300px />
<p>(figure in <a href="https://arxiv.org/abs/2109.01652">Finetuned Language Models Are Zero-Shot Learners</a>)</p>
</div>

In [3]:
from openicl import DatasetReader, PromptTemplate, TopkRetriever, GenInferencer

# Define a DatasetReader, loading dataset from huggingface and selecting 10 pieces of data randomly.
data = DatasetReader('snli', input_columns=['premise', 'hypothesis'], output_column='label', ds_size=10)

# SNLI Template
ice_tp_dict = {
    0: 'Premise:</premise>\nHypothesis:</hypothesis>\nDoes the premise entail the hypothesis?\n-yes',
    1: 'Premise:</premise>\nHypothesis:</hypothesis>\nDoes the premise entail the hypothesis?\n-It is not possible to tell',
    2: 'Premise:</premise>\nHypothesis:</hypothesis>\nDoes the premise entail the hypothesis?\n-no'
}
ice_template = PromptTemplate(ice_tp_dict, column_token_map={'premise' : '</premise>', 'hypothesis' : '</hypothesis>'})

prompt_tp_str = '</E>Premise:</premise>\nHypothesis:</hypothesis>\nDoes the premise entail the hypothesis?\nOPTIONS:\n-yes -It is not possible to tell -no'
prompt_template = PromptTemplate(prompt_tp_str, column_token_map={'premise' : '</premise>', 'hypothesis' : '</hypothesis>'}, ice_token='</E>')

# TopK Retriever
retriever = TopkRetriever(data, ice_num=2, index_split='train', test_split='test')

# Define a Inferencer
inferencer = GenInferencer(model_name='google/flan-t5-small', max_model_token_num=1000)

# Inference
predictions = inferencer.inference(retriever, ice_template=ice_template, prompt_template=prompt_template, output_json_filename='flan-t5-small')
print(predictions)

Found cached dataset snli (/home/zhangyudejia/.cache/huggingface/datasets/snli/plain_text/1.0.0/1f60b67533b65ae0275561ff7828aad5ee4282d0e6f844fd148d05d3c6ea251b)
100%|██████████| 3/3 [00:00<00:00, 377.71it/s]
[2023-03-10 14:38:26,447] [openicl.icl_retriever.icl_topk_retriever] [INFO] Creating index for index set...
  0%|          | 0/10 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 10/10 [00:00<00:00, 72.29it/s]
[2023-03-10 14:38:29,583] [openicl.icl_retriever.icl_topk_retriever] [INFO] Embedding test set...
  0%|          | 0/10 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|████

['yes', 'yes', 'yes', 'yes', 'yes', 'no', 'it is not possible to tell', 'yes', 'yes', 'it is not possible to tell']





## 2-2 Using API-based model