Google Research Datasets
- 1.1k followers
- Mountain View, CA
- http://research.google
Pinned Loading
Repositories
-
- cultural_familiarity_annotations Public
The dataset consists of AI generated stories and accompanied human ratings on their cultural fluency and relevance.
- common-crawl-domain-names Public
Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl").
- rag_conflicts Public
CONFLICTS is a QA dataset annotated with knowledge conflict types. Each instance comprises a query, a set of retrieved relevant passages, a corresponding conflict type label, and, for specific types, the ground truth correct answer
- wit-retrieval Public
- Amplify_SSA Public
An annotated dataset of 8,091 adversarial queries in seven Sub-Saharan African languages.
- artydiqa Public
ArTyDi-QA is a dataset for Question Answering (QA) and Question Generation (QG) in Modern Standard Arabic (MSA), adapted from TyDiQA. It features extractive QA where models find answer spans or identify unanswerable questions, and a QG task involving formulating questions from context and answer pairs.
- screen_qa Public
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.
People
This organization has no public members. You must be a member to see who’s a part of this organization.