Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

coveooss/fantastic-embeddings-sigir-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Fantastic Embeddings and How to Align Them: Zero-Shot Inference in a Multi-Shop Scenario

Public Data Release 1.1.0

Overview

This repo contains the description of the data released together with our SIGIR eCom 2020 paper Fantastic Embeddings and How to Align Them: Zero-Shot Inference in a Multi-Shop Scenario.

Data Download

The dataset is available for research and educational purposes at this page. To obtain the dataset, you are required to fill a form with information about you and your institution, and agree to the Terms And Conditions for fair usage of the data.

For convenience, Terms And Conditions are also included in a pure txt format in this repo: usage of the data implies the acceptance of these Terms And Conditions.

Data Structure

The dataset is provided in five files inside a zip archive:

  • a json file, structured as list of lists. Each list contains a cross-shop session, that is, a shopping session initiated on Shop A and terminated on Shop B (and vice versa); items in each list are ordered chronologically, and they all have the syntax SHOP1_SKU41, that is an identifier of the shop (hashed) first, followed by _ and a hashed identifier of the product the shopper interacted with. A sample json file is provided in this repo: cross-shop session ["SHOP1_SKU21", "SHOP1_SKU32", "SHOP2_SKU13"] means that an anonymous shopper interacted with products 21 and 32 on the first shop, then browsed to the second shop and interacted with 13. Please remember that each shop has a different identifier policy, which makes the aligning problem interesting. The cross-shop dataset contains a total of 12 259 sessions;
  • two json files, labelled original_vectors, one for each shop: they contain a map between product identifiers (hashed in the same way as in the cross-shop dataset) and related product embeddings as trained separately for each shop (a previous release included a pickle file - versions >= 1.1.0 are the recommended versions);
  • two json files, labelled aligned_vectors, one for each shop: they contain a map between product identifiers and related product embeddings, after the alignment proposed in the paper.

We refer the reader to the original work for an extended explanation of the alignment problem. Usage of this data implies the acceptance of the Terms And Conditions as set forward in the download page.

Contacts

For questions about the paper or the dataset, please reach out to Jacopo Tagliabue.

Acknowledgments

The original paper is a collaboration between industry and academia, over a dataset gently provided by Coveo. The authors of the paper are:

The authors wish to thank Richard Tessier and Coveo's legal team for supporting our research and believing in this data sharing initiative.

How to Cite our Work

If you make use of this dataset, please cite our work:

@inproceedings{BianchiSIGIReCom2020,
  title = {Fantastic Embeddings and How to Align Them: Zero-Shot Inference in a Multi-Shop Scenario},
  author = {Bianchi, Federico and Tagliabue, Jacopo and Yu, Bingqing and Bigon, Luca and Greco, Ciro},
  url = {https://arxiv.org/abs/2007.14906},
  booktitle = {Proceedings of the SIGIR 2020 eCom workshop, July 2020, Virtual Event, published at
http://ceur-ws.org (to appear)},
  year = {2020}
}

Releases

No releases published

Packages

No packages published