Large Language Models as Zero-Shot Conversational Recommenders

This is the evaluation data and Large Language Models (LLMs) results from our CIKM'23 paper:

Large Language Models as Zero-Shot Conversational Recommenders, Zhankui He*, Zhouhang Xie*, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Majumder, Nathan Kallus, Julian McAuley, Conference on Information and Knowledge Management, CIKM'23. * denotes equal contribution.

Please check the arxiv version of this paper, where we will update more detailed information than the CIKM'23 version. This is a work from UCSD-Netflix collaboration.

Please contact Zhankui He if you have any questions, thanks!

Dataset

Disclaimer

⚠️ Please note that conversations processed from Reddit raw data may include content that is not entirely conducive to a positive experience (e.g., toxic speech). Exercise caution and discretion when utilizing this information.

Testing Data for Our Paper

We uploaded the test data in data/ folder, which can be used to form the prompt to query different LLMs. The inspired and redial datasets are adapted from the data provided by CRSLab, where we added some additional data fields like is_user.

File Name	Description	Example
`entity2id.json`	The mapping from the movie names (in `DBPedia` or `IMDB` format) to item ids.	`{"<http://dbpedia.org/resource/Hoffa_(film)>": 0}`
`item_ids.json`	A list of all the item ids.	`[0, 2049, 16388, 12292, 6, 4109, ...]`
`test.json`	Conversational context (in `input` field) and target recommendation item (in `rec` field)	`{"input": "System: How did you like Hustlers? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}`
`test_p2.json`	Similar to `test.json`, but we use historical mentioned movies only. P2 stands for "Placeholder 2"	`{"input": "System: Hustlers\n User:" "rec": [9722]}`
`test_p3.json`	Similar to `test.json`, but we use historical conversational text without mentioned movies. P3 stands for "Placeholder 3"	`{"input": "System: How did you like ? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}`
`test_p4.json`	Similar to `test.json`, but we use historical conversational text with randomly mentioned movies. P4 stands for "Placeholder 4"	`{"input": "System: How did you like Titanic? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}`
`test_raw.json`	Raw data file similar to the files provided in `CRSLab`	`{"context": ["", ... "the last movie i saw in the theater was Hustlers . I generally like comedy, drama and documentaries"], "resp": "How did you like Hustlers? ...", "rec": [9722], "entity": [9722, 15748], "prev_entity": [15748, 17158, 8683, 8881, 16785, 9722], "dialog_id": "test_0", "turn_id": "test_4", "is_user": 0, "entity_name": ["drama", "Hustlers"]}`

We provide the training, validation and testing data of these three datasets in huggingface datasets hub, please check inspired_cikm, redial_cikm and reddit_cikm.

Complete Version `Reddit-Movie` Dataset

We upload the raw (Reddit-Movie-raw) and processed data of one-year (Reddit-Movie-small-V1) and ten-year (Reddit-Movie-large-V1) conversational recommendation in movie domain from Reddit to huggingface datasets hub. This dataset is processed from the reddit dump on pushshift.io and only for research use.

Data Name	Data Size	Time Range
`Reddit-Movie-raw`	2.81GB	January 2012 - December 2022
`Reddit-Movie-small-V1`	510MB	January 2022 - December 2022
`Reddit-Movie-large-V1`	2.35GB	January 2012 - December 2022

NOTE: Different from the previous conversational recommendation datasets from crowdsourcing, our Reddit-Movie dataset is constructed by mining web data. Therefore, it is noisy and needs efforts for data cleaning such as named entity recognition and entity linking. Thus we use V1 to highlight that this processed version is the first verion. Welcome to contribute to cleaner processed versions (such as V2) in the future!

LLMs Results

Generated Results

Here text means the file of generated text and extracted predicted movies; r* denotes the results with different post-processing as below:

Result Type	OOV Items Filtered ?	Seen Items Filtered ?
`r1`	❌	❌
`r2`	❌	✅
`r3`	✅	❌
`r4`	✅	✅

"OOV Items" means the out-of-vocabulary items, i.e., the items that are not legal in the pre-defined candidate set. We can filter the generated OOV items out or not. Also, "Seen Items" means the items mentioned in the current conversation already, which is related to the "Repeated Items Can Be Shortcuts" finding in our paper.

We provide the results on inspired, redial and reddit datasets:

LLMs on `inspired`	General	Placeholder 2	Placeholder 3	Placeholder 4
`GPT-4`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`GPT-3.5-turbo`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`Vicuna-13B`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`BAIZE-V2`	text, r1, r2, r3, r4	--	--	--

LLMs on `redial`	General	Placeholder 2	Placeholder 3	Placeholder 4
`GPT-4`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`GPT-3.5-turbo`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`Vicuna-13B`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`BAIZE-V2`	text, r1, r2, r3, r4	--	--	--

LLMs on `reddit`	General	Placeholder 2	Placeholder 3	Placeholder 4
`GPT-4`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`GPT-3.5-turbo`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`Vicuna-13B`	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4	text, r1, r2, r3, r4
`BAIZE-V2`	text, r1, r2, r3, r4	--	--	--

Generating Scripts

We provide the scripts so that you are able to generate the results by yourselves.

For OpenAI LLMs (`GPT-4` and `GPT-3.5-turbo`)

Install related dependencies

pip install openai
pip install jsonargparse
pip install tqdm

Create config YAML file. Here an example from src/gpt-3.5-turbo/general/inspired_config.yaml

cat src/gpt-3.5-turbo/general/inspired_config.yaml

from_json: ../data/inspired/test.jsonl
to_json: gpt-3.5-turbo/general/inspired_test.jsonl
prompt: "Pretend you are a movie recommender system. \n I will give you a conversation between a user and you (a recommender system). Based on the conversation, you reply me with 20 recommendations without extra sentences.\n Here is the conversation: {}"
model: gpt-3.5-turbo
temperature: 0.0
max_tokens: 512
n_threads: 10
n_print: 100 # print the progress after n samples
n_samples: -1 # how many samples in `from_json` to query, set -1 means all samples

Create a ${YOUR_DIR}/config.yaml similar to src/gpt-3.5-turbo/general/inspired_config.yaml with your arguments, then try:
```
cd src
OPENAI_API_KEY=sk-... OPENAI_ORG=org-... python openai.py --config ${YOUR_DIR}/config.yaml 
```

Pose-process your results in ${YOUR_DIR} as:

DIR=... # e.g., gpt-3.5-turbo/general/
DATA=... # e.g., inspired
OPENAI_API_KEY=sk-...
OPENAI_ORG=org-...

OPENAI_API_KEY=${OPENAI_API_KEY} OPENAI_ORG=${OPENAI_ORG} python tools/post_fix.py \
    --from_json ${YOUR_DIR}/${DATA}_test.jsonl \
    --prompt_config ${YOUR_DIR}/${DATA}_config.yaml

# if this extraction is not special
cp gpt-3.5-turbo/general/extract.py ${YOUR_DIR}
python ${YOUR_DIR}/extract.py --dataset ${DATA}

python tools/evaluate.py \
    --from_json ${YOUR_DIR}/intermediate/${DATA}/extracted.jsonl

For Local LLMs (`BAIZE-V2` and `Vicuna`)

Install related dependencies

pip install transformers
pip install jsonargparse
pip install tqdm
pip install fschat

We create ${YOUR_DIR}/config.yaml file similar to the openai config file, such as:

cat src/baize/general/inspired_config.yaml

from_json: ../data/inspired/test.jsonl
to_json: baize/general/inspired_test.json
pretrained_model_name_or_path: ../../llms/baize_13b/models--project-baize--baize-v2-13b/snapshots/983e06d987a05584de7d251e8177945dc98600cf
temperature: 0.0 
max_tokens: 512 
n_print: 100 
prompt: "Pretend you are a movie recommender system. \n I will give you a conversation between a user and you (a recommender system). Based on the conversation, you reply me with 20 recommendations without extra sentences.\n Here is the conversation: {}"

We launch the model similarily and then post-process the results in the same way, where we can leave the OPENAI_API_KEY and OPENAI_ORG blank.
```
cd src
python localmodels.py --config ${YOUR_DIR}/config.yaml 
```

More Generated Results?

For more model results like UniCRS and other baselines, we are organizing our results and developing a conversational recommender system toolkit to build those models in a convenient way. Stay tuned!

Resources Summary

We share the resource created in this project:

Type	Link	Note
Code	This repo	LLMs scripts and results.
Data	Reddit-Movie-raw	The raw data from movie recommendation related conversations on Reddit.
Data	Reddit-Movie-small-V1	One-year (2022) version of Reddit-Movie CRS data, with V1 processing.
Data	Reddit-Movie-large-V1	Ten-year (2012-2022) version of Reddit-Movie CRS data, with V1 processing.
Data	ChatGPT Annotations	ChatGPT annotations for movie NER. [COMING SOON!]
Model	Move-Extractor-T5	T5 Model fine-tuned on ChatGPT annotations to extract movie names from raw text. [COMING SOON!]
Code	Reddit Processing	Python scripts to process Reddit raw text to CRS data. [COMING SOON!]

Citation

Please cite our paper if you are using our shared resources. Thanks!

@inproceedings{he23large,
  title = Large language models as zero-shot conversational recommenders",
  author = "Zhankui He and Zhouhang Xie and Rahul Jha and Harald Steck and Dawen Liang and Yesu Feng and Bodhisattwa Majumder and Nathan Kallus and Julian McAuley",
  year = "2023",
  booktitle = "CIKM"
}

And our original reddit dataset is from pushshift.io, so please cite it as well:

@inproceedings{baumgartner2020pushshift,
  title={The pushshift reddit dataset},
  author={Baumgartner, Jason and Zannettou, Savvas and Keegan, Brian and Squire, Megan and Blackburn, Jeremy},
  booktitle={Proceedings of the international AAAI conference on web and social media},
  volume={14},
  pages={830--839},
  year={2020}
}

Contact Information

Please contact Zhankui He if you have any questions or suggestions. Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
img		img
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Large Language Models as Zero-Shot Conversational Recommenders

Dataset

Disclaimer

Testing Data for Our Paper

Complete Version `Reddit-Movie` Dataset

LLMs Results

Generated Results

Generating Scripts

For OpenAI LLMs (`GPT-4` and `GPT-3.5-turbo`)

For Local LLMs (`BAIZE-V2` and `Vicuna`)

More Generated Results?

Resources Summary

Citation

Contact Information

About

Releases

Packages

Languages

AaronHeee/LLMs-as-Zero-Shot-Conversational-RecSys

Folders and files

Latest commit

History

Repository files navigation

Large Language Models as Zero-Shot Conversational Recommenders

Dataset

Disclaimer

Testing Data for Our Paper

Complete Version Reddit-Movie Dataset

LLMs Results

Generated Results

Generating Scripts

For OpenAI LLMs (GPT-4 and GPT-3.5-turbo)

For Local LLMs (BAIZE-V2 and Vicuna)

More Generated Results?

Resources Summary

Citation

Contact Information

About

Resources

Stars

Watchers

Forks

Languages

Complete Version `Reddit-Movie` Dataset

For OpenAI LLMs (`GPT-4` and `GPT-3.5-turbo`)

For Local LLMs (`BAIZE-V2` and `Vicuna`)