Add CSV identifier validation with comprehensive error reporting#1437
Add CSV identifier validation with comprehensive error reporting#1437LoicOuth wants to merge 3 commits intoPokeAPI:masterfrom
Conversation
- Validates 50 CSV files with identifier columns - Reports missing files and invalid identifiers together - Found 16 invalid identifiers across items, locations, and move_meta_categories
a47d2e3 to
61568ec
Compare
jemarq04
left a comment
There was a problem hiding this comment.
The tests are good and well documented - I've left a couple comments. I think it's likely best to wait until the fixes in #1438 are merged before we approve this, to make sure that these new tests would pass with the fixes.
pokemon_v2/test_models.py
Outdated
| ("abilities.csv", "identifier"), | ||
| ("berry_firmness.csv", "identifier"), | ||
| ("conquest_episodes.csv", "identifier"), | ||
| ("conquest_kingdoms.csv", "identifier"), | ||
| ("conquest_move_displacements.csv", "identifier"), | ||
| ("conquest_move_ranges.csv", "identifier"), | ||
| ("conquest_stats.csv", "identifier"), | ||
| ("conquest_warrior_archetypes.csv", "identifier"), | ||
| ("conquest_warrior_skills.csv", "identifier"), | ||
| ("conquest_warrior_stats.csv", "identifier"), | ||
| ("conquest_warriors.csv", "identifier"), | ||
| ("contest_types.csv", "identifier"), | ||
| ("egg_groups.csv", "identifier"), | ||
| ("encounter_conditions.csv", "identifier"), | ||
| ("encounter_condition_values.csv", "identifier"), | ||
| ("encounter_methods.csv", "identifier"), | ||
| ("evolution_triggers.csv", "identifier"), | ||
| ("genders.csv", "identifier"), | ||
| ("generations.csv", "identifier"), | ||
| ("growth_rates.csv", "identifier"), | ||
| ("items.csv", "identifier"), | ||
| ("item_categories.csv", "identifier"), | ||
| ("item_flags.csv", "identifier"), | ||
| ("item_fling_effects.csv", "identifier"), | ||
| ("item_pockets.csv", "identifier"), | ||
| ("languages.csv", "identifier"), | ||
| ("locations.csv", "identifier"), | ||
| ("location_areas.csv", "identifier"), | ||
| ("moves.csv", "identifier"), | ||
| ("move_battle_styles.csv", "identifier"), | ||
| ("move_damage_classes.csv", "identifier"), | ||
| ("move_flags.csv", "identifier"), | ||
| ("move_meta_ailments.csv", "identifier"), | ||
| ("move_meta_categories.csv", "identifier"), | ||
| ("move_targets.csv", "identifier"), | ||
| ("natures.csv", "identifier"), | ||
| ("pal_park_areas.csv", "identifier"), | ||
| ("pokeathlon_stats.csv", "identifier"), | ||
| ("pokedexes.csv", "identifier"), | ||
| ("pokemon.csv", "identifier"), | ||
| ("pokemon_colors.csv", "identifier"), | ||
| ("pokemon_forms.csv", "identifier"), | ||
| ("pokemon_habitats.csv", "identifier"), | ||
| ("pokemon_move_methods.csv", "identifier"), | ||
| ("pokemon_shapes.csv", "identifier"), | ||
| ("pokemon_species.csv", "identifier"), | ||
| ("regions.csv", "identifier"), | ||
| ("stats.csv", "identifier"), | ||
| ("types.csv", "identifier"), | ||
| ("versions.csv", "identifier"), | ||
| ("version_groups.csv", "identifier"), |
There was a problem hiding this comment.
These all have the same name for the column: identifier. Maybe just have this be a list of strings (the base file names) and update things to match?
There was a problem hiding this comment.
Yeah, maybe it's better. When I started create the test, I didn't know if all the name had the same column name. That's why I did it this way. I will make the modification.
pokemon_v2/test_models.py
Outdated
|
|
||
| def get_csv_path(self, filename): | ||
| """Get the absolute path to a CSV file in data/v2/csv/""" | ||
| from django.conf import settings |
There was a problem hiding this comment.
Why import in this function every time it's called instead of at the top of the file?
Also, considering the function can be put in one line and still be easily read I think maybe we can just retrieve the path directly in the test method.
There was a problem hiding this comment.
Thanks for the feedback! You're absolutely right about both points. For the import, I apologize - I should have placed it at the top of the file. I will make the changes as you suggested ;).
Add CSV identifier validation to enforce ASCII slug format
Issue #1436
Context
The API currently returns
400 Bad Requestwhen accessing resources with Unicode characters or special characters in their identifiers (e.g.,/api/v2/item/kofu's-wallet). After discussion in #1436, we agreed that resource identifiers should be treated as URL-safe ASCII slugs for consistency across the API.Approach
Instead of modifying the API to accept Unicode characters, this PR adds preventive validation at the data source level (CSV files). This approach:
Implementation
Added
CSVResourceNameValidationTestCaseinpokemon_v2/test_models.py:^[a-z0-9-]+$(lowercase letters, numbers, hyphens only)make testTest Results
Expected behavior: This test will fail on master, which is intentional. It identifies 16 invalid identifiers that need normalization : check PR #1438
Question for Maintainers
location_areas.csvhas many rows with empty identifier fields).Should empty identifiers be:
Example from
location_areas.csv: