Skip to content

MichiganNLP/AtlasNLP

Repository files navigation

AtlasNLP Website

This repository hosts the public website for AtlasNLP, a country-aware atlas of dataset representation in NLP.

Website: https://anonymous.4open.science/w/AtlasNLP-6D06/

AtlasNLP maps NLP datasets by the countries and populations they represent, the locations where datasets are produced, and the NLP tasks they cover. The project is designed to make geographic gaps in NLP dataset representation more visible and to support more transparent, country-aware dataset documentation and evaluation.

About AtlasNLP

NLP datasets are often organized by language, task, or benchmark, but this does not always reveal which countries or populations are represented. AtlasNLP addresses this gap by organizing datasets around country-level metadata.

The resource includes:

  • AtlasNLP-Core: a large-scale ACL-derived collection of over 18,000 NLP datasets constructed through automated extraction and validation.
  • AtlasNLP-Gold: a human-curated reference set used for validation and expanded coverage of underrepresented regions.
  • Country-aware metadata: content countries, producer countries, task categories, languages, modality, synthetic status, and related dataset properties.
  • Interactive visualizations: maps, country-task coverage, language concentration, producer-content relationships, and dataset summary statistics.

About

AtlasNLP is a resource that maps dataset representation by country and shows that coverage, production, and task diversity are deeply uneven across the global NLP ecosystem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors