-
Notifications
You must be signed in to change notification settings - Fork 39
Adding some more Gen8 Data #5
Comments
Just a clarification, I do not intend to over-write any records from csv edits made in this fork already, only add records or fill in missing values. The talk of more comprehensive scraping in the future applies to CSVs that have not yet already been touched by this fork, but that I'd have some data for in the pull. |
Hi, to answer the first question: We will just go incrementally. So Generation 8 has the Mid point: We would really like to have the files filled as much as possible. Ideally a script for a CSV file, so we can re-run it again and there are no conflicts because we are not running two scripts on the very same file. - At Pokeapi, if there is a Second point: yes. Data goes to Final point: up to you. In the end, we are interested in updated CSV files with data coming from a trustful source. I was thinking of going with simple scripts since it's easier to understand what's going on. |
stale, closing |
Started new thread following this
Brief work write-up
Hello again @Naramsim . By the way I'm new to the world of open-source contribution so bear with me if I ever ask silly questions !
First I'd like to ask a general questions about new data.
Some CSV/Tables follow a general policy of assigning some ID's starting at 0, others starting at 10000. For example
pokemon
andpokemon_forms
for which theis_default
column is false. I'd keep this trend, but I was wondering if a similar idea should be applied for the IDs of things that are added in this fork. I.e making new records inpokedexes
start after some large number (10000) if they are newly added (10026 instead of 26 forgalar
pokedex following 25 forupdated-poni
). Or maybe some other large number because this conveys newness rather than alternativeness.Also, just FYI my scraping does not rely on visiting any individual Pkmn pages on Bulbapedia , rather just the various list pages with specific info associated with each mon (in a given gen or other category). That's why (if you check the write-up) many of columns for some tables are empty (i.e my
pokemon
). Is that okay ? Of course a more comprehensive scraping approach than mine in the future would be able to fix all those NULLs :)Secondly I wanted to ask about the file structure / repos. For this fork if doing CSV edits only, would I only modify under
pokedex/data/csv/
? (without touching the scraping directories that also have csv files) ? Also, how does the PokeAPI/pokeapi repo use a copy of the above mentioned directory from this fork ? Does one just make sure identical csv pulls are accepted simultaneously to both repos ?My action plan atm is to rebuild the DB using csv from this fork and re-run my scraping code, and seeing what, if any, duplicates/contradictions it makes. Then update the logic for detecting existing-records so it complies with this fork. Then finally work on dumping tables to csv.
The text was updated successfully, but these errors were encountered: