Skip to content

Glorf/recipenlg

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
ner
 
 
 
 

RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation

This is an archive of code which was used to produce dataset and results available in our INLG 2020 paper: RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation

What's exciting about it?

The dataset we publish contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.

Where is the dataset?

Please visit the website of our project: recipenlg.cs.put.poznan.pl to download it.
NOTE: The dataset contains all the data we gathered including from other datasets. To access only our gathered recipes (with no 12 instead of 1/2 etc), filter the dataset for source=Gathered. It results in approx 1.6M recipes of better quality.

I've used the dataset in my research. How to cite you?

Use the following BibTeX entry:

@inproceedings{bien-etal-2020-recipenlg,
    title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation",
    author = "Bie{\'n}, Micha{\l}  and
      Gilski, Micha{\l}  and
      Maciejewska, Martyna  and
      Taisner, Wojciech  and
      Wisniewski, Dawid  and
      Lawrynowicz, Agnieszka",
    booktitle = "Proceedings of the 13th International Conference on Natural Language Generation",
    month = dec,
    year = "2020",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.inlg-1.4",
    pages = "22--28",
}

Where are your models?

The pyTorch model is available in HuggingFace model hub as mbien/recipenlg. You can therefore easily import it into your solution as follows:

from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mbien/recipenlg")
model = AutoModelWithLMHead.from_pretrained("mbien/recipenlg")

You can also check the generation performance interactively on our website (link above).
The SpaCy NER model is available in the ner directory

Could you explain X and Y?

Yes, sure! If you feel some information is missing in our paper, please check first in our thesis, which is much more detailed. In case of further questions, you're invited to send us a github issue, we will respond as fast as we can!

How to run the code?

We worked on the project interactively, and our core result is a new dataset. That's why the repo is rather a set of loosely connected python files and jupyter notebooks than a working runnable solution itself. However if you feel some part crucial for the reproduction is missing or you are dedicated to make the experience smoother, send us a feature request or (preferably), a pull request.

About

Set of scripts and notebooks used to produce results visible in RecipeNLG paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published