-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract more data from practicalplants.json for the companionship algorithm #20
Comments
Could you maybe prioritize which properties are most important for the algorithm, to start with them? |
Yes, these would be useful:
From these it might be possible to parse something from the free-form text, but likely difficult:
Rest look either useless for the algorithm, or missing from most crops. |
I have checked and family has 283 unique values, and genus over 1000. What is your view about how to show it in UI? Should I just try extract these all values in this ticket, and then there would be another ticket about how this would be shown in UI? |
I think dropdown is ok to get started, it can then be fixed/improved in another issues and PRs, also it is good and ok to split this to two PRs: one for getting Note that we currently use |
Fertility has these unique values: Should this be converted just to have two unique value possibilities, "self fertile" and "self sterile". And then fertility would be an array property. And objects that contain "self fertile, self sterile" as fertility value, would get both values into that fertility array property. Same as there is preparsing for pollinator values. |
Exactly, there are plants that have some opposition to self-fertilization but not total opposition. |
Actually fertility values are already implemented. |
After running migrate and deleting locally indexed db as described in issue #73 , it looks that all Genus and Family values are showing properly in the UI with uppercase starting letter. Edit issue Object properties extraction |
There could be three properties edibleParts, medicinalParts, materialParts, and these would all be arrays that contain symmetrical objects that have the properties part and use. If you take a look at the document for Rosmarinus officinalis by using the practicalplants MediaWiki API, you can see that the property edible part and use is not a single object but an array, and this seems to be the case for other crops as well. I created #83 for fixing this. It looks like the properties edible parts and edible uses are redundant, if needed there could be utility functions |
I would assume that in that case this issue could be closed or blocked, and another ticket specifying how to extract these values could be created, and that issue could be done after issue #83 is solved. As you mentioned, I have checked and it is array in original extract as you mentioned, but in json file it is an object, so extraction depends on #83. Rest of properties Here is statistics about unique records:
|
About these textual properties, I haven't analyzed these completely yet, but especially Some notes for the other properties:
It is likely that the textual properties need several iterations before the most useful properties and sets of values for them are found. |
Let's keep this issue open until there are more specific issues that cover all properties that we want to extract. These properties are already covered:
These properties are not yet covered by smaller issues:
|
Fertility extraction was already implemented. Ok sure. Issue can stay open. |
Sorry, I opened #95 for extracting data from textual properties. |
The function readCrops() in db/practicalplants.js selects useful properties from raw practicalplants.org data and normalizes their content to a format that is easier to handle for the companionship algorithm.
There are still quite a few properties that are not selected and normalized by readCrops(). It would be useful to have more data available for the creation of new goodness functions for the companionship algorithm.
Look for "TODO" in db/practicalplants.js.
The text was updated successfully, but these errors were encountered: