Skip to content

Add CSV identifier validation with comprehensive error reporting#1437

Open
LoicOuth wants to merge 3 commits intoPokeAPI:masterfrom
LoicOuth:test/validate-ascii-resource-names
Open

Add CSV identifier validation with comprehensive error reporting#1437
LoicOuth wants to merge 3 commits intoPokeAPI:masterfrom
LoicOuth:test/validate-ascii-resource-names

Conversation

@LoicOuth
Copy link
Contributor

@LoicOuth LoicOuth commented Mar 12, 2026

Add CSV identifier validation to enforce ASCII slug format

Issue #1436

Context

The API currently returns 400 Bad Request when accessing resources with Unicode characters or special characters in their identifiers (e.g., /api/v2/item/kofu's-wallet). After discussion in #1436, we agreed that resource identifiers should be treated as URL-safe ASCII slugs for consistency across the API.

Approach

Instead of modifying the API to accept Unicode characters, this PR adds preventive validation at the data source level (CSV files). This approach:

  • Validates early: Catches invalid identifiers before data is loaded into the database
  • Prevents future issues: CI will block PRs with invalid identifiers
  • Maintains consistency: Ensures all resources follow the same naming convention
  • Simple to maintain: No complex regex or Unicode handling in API code

Implementation

Added CSVResourceNameValidationTestCase in pokemon_v2/test_models.py:

  • Validates 50 CSV files with identifier columns
  • Enforces pattern: ^[a-z0-9-]+$ (lowercase letters, numbers, hyphens only)
  • Reports both missing files and invalid identifiers
  • Runs automatically in CircleCI via make test

Test Results

Expected behavior: This test will fail on master, which is intentional. It identifies 16 invalid identifiers that need normalization : check PR #1438

Question for Maintainers

⚠️ Empty identifiers: Currently, the test skips empty identifiers (e.g., location_areas.csv has many rows with empty identifier fields).

Should empty identifiers be:

  • Option A: Allowed (current behavior - skip validation)
  • Option B: Rejected (fail the test if identifier is empty)

Example from location_areas.csv:

id,location_id,game_index,identifier
1,1,1,
2,2,2,
3,3,3,

- Validates 50 CSV files with identifier columns
- Reports missing files and invalid identifiers together
- Found 16 invalid identifiers across items, locations, and move_meta_categories
@LoicOuth LoicOuth force-pushed the test/validate-ascii-resource-names branch from a47d2e3 to 61568ec Compare March 12, 2026 09:44
Copy link
Contributor

@jemarq04 jemarq04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are good and well documented - I've left a couple comments. I think it's likely best to wait until the fixes in #1438 are merged before we approve this, to make sure that these new tests would pass with the fixes.

Comment on lines +33 to +83
("abilities.csv", "identifier"),
("berry_firmness.csv", "identifier"),
("conquest_episodes.csv", "identifier"),
("conquest_kingdoms.csv", "identifier"),
("conquest_move_displacements.csv", "identifier"),
("conquest_move_ranges.csv", "identifier"),
("conquest_stats.csv", "identifier"),
("conquest_warrior_archetypes.csv", "identifier"),
("conquest_warrior_skills.csv", "identifier"),
("conquest_warrior_stats.csv", "identifier"),
("conquest_warriors.csv", "identifier"),
("contest_types.csv", "identifier"),
("egg_groups.csv", "identifier"),
("encounter_conditions.csv", "identifier"),
("encounter_condition_values.csv", "identifier"),
("encounter_methods.csv", "identifier"),
("evolution_triggers.csv", "identifier"),
("genders.csv", "identifier"),
("generations.csv", "identifier"),
("growth_rates.csv", "identifier"),
("items.csv", "identifier"),
("item_categories.csv", "identifier"),
("item_flags.csv", "identifier"),
("item_fling_effects.csv", "identifier"),
("item_pockets.csv", "identifier"),
("languages.csv", "identifier"),
("locations.csv", "identifier"),
("location_areas.csv", "identifier"),
("moves.csv", "identifier"),
("move_battle_styles.csv", "identifier"),
("move_damage_classes.csv", "identifier"),
("move_flags.csv", "identifier"),
("move_meta_ailments.csv", "identifier"),
("move_meta_categories.csv", "identifier"),
("move_targets.csv", "identifier"),
("natures.csv", "identifier"),
("pal_park_areas.csv", "identifier"),
("pokeathlon_stats.csv", "identifier"),
("pokedexes.csv", "identifier"),
("pokemon.csv", "identifier"),
("pokemon_colors.csv", "identifier"),
("pokemon_forms.csv", "identifier"),
("pokemon_habitats.csv", "identifier"),
("pokemon_move_methods.csv", "identifier"),
("pokemon_shapes.csv", "identifier"),
("pokemon_species.csv", "identifier"),
("regions.csv", "identifier"),
("stats.csv", "identifier"),
("types.csv", "identifier"),
("versions.csv", "identifier"),
("version_groups.csv", "identifier"),
Copy link
Contributor

@jemarq04 jemarq04 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all have the same name for the column: identifier. Maybe just have this be a list of strings (the base file names) and update things to match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe it's better. When I started create the test, I didn't know if all the name had the same column name. That's why I did it this way. I will make the modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is done ;)


def get_csv_path(self, filename):
"""Get the absolute path to a CSV file in data/v2/csv/"""
from django.conf import settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why import in this function every time it's called instead of at the top of the file?

Also, considering the function can be put in one line and still be easily read I think maybe we can just retrieve the path directly in the test method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! You're absolutely right about both points. For the import, I apologize - I should have placed it at the top of the file. I will make the changes as you suggested ;).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is done ;)

Copy link
Contributor

@jemarq04 jemarq04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Like I mentioned before, I think we'll wait to merge this until #1438 is fully merged, but this will be a good test for future contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants