Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve text to ingredient parsing #277

Merged
merged 3 commits into from
Dec 26, 2020

Conversation

aarondoet
Copy link
Contributor

The previous implementation of parsing ingredients was very simple. I now wrote a parser
that I would consider good. It takes care of several edge cases and notations.

  • Supports fraction unicode (½, ¼, ⅜, ...)
  • Supports notations like and 1 1/2
  • Supports unit directly after the amount without space inbetween (2g, 2½g)
  • Supports notes (5g onion (cubed) -> amount: 5, unit: g, ingredient: onion, note: cubed)
  • Supports notes (5g onion, cubed -> amount: 5, unit: g, ingredient: onion, note: cubed)
  • Does not break when both commas and brackets exist

The previous implementation of parsing ingredients was very simple. I now wrote a parser
that I would consider good. It takes care of several edge cases and notations.

- Supports fraction unicode (½, ¼, ⅜, ...)
- Supports notations like `1½` and `1 1/2`
- Supports unit directly after the amount without space inbetween (`2g`, `2½g`)
- Supports notes (`5g onion (cubed)` -> amount: 5, unit: g, ingredient: onion, note: cubed)
- Supports notes (`5g onion, cubed` -> amount: 5, unit: g, ingredient: onion, note: cubed)
- Does not break when both commas and brackets exist
@aarondoet
Copy link
Contributor Author

I also wrote a test for the parse function how I'd expect it to work. It includes some edge cases: https://anonfiles.com/recepc11p3/testIngredientParser_py

Feel free to look over it and say if you would expect any of them to give a different result. I also included cases in which it gets detected wrong that I think is impossible to fix, so even in the expectations I wrote there are examples that you would not want like that in real life but that I don't think can be improved.

There is just one case in which I did not know what to do and that is when only a vague amount is given like 2-3.

@aarondoet
Copy link
Contributor Author

aarondoet commented Dec 21, 2020

I'd recommend testing if I did not break anything else, but I tried importing recipes from 3 or 4 different websites and they all seemed to still work as expected. I did not notice any difference (besides notes getting detected too of course)

What I tested importing (and also used to base lots of edge cases on):

@aarondoet
Copy link
Contributor Author

i just tested what it gives with something like 2-3 as amount and it seems that it just takes the first number
grafik

@aarondoet
Copy link
Contributor Author

has issue with multiple ingredients being in one entry separated by comma, but that is something not used a lot and usually only for stuff like salt and pepper
grafik

@vabene1111
Copy link
Collaborator

This looks really really good, thank you very much! For some reason GitHub did not run the normal test suite for this PR (only the CodQL checks) but there should be some tests verifying if the importer works. Have you run those tests ?

I suspect, since the tests are pretty basic/bad that they wont run successfully as you changed the structure of where stuff goes (the note recognizing is awesome by the way)

i guess i will have to re-write the import tests to be more useful than simply checking if the length of the result is correct but you part looks really good.

If you want you can add the test suite you already wrote to the automatic tests included in this repository, otherwise i will do this.

Will probably take a while for me to get this merged as i would do it together with the rewrite of the normal tests but i am looking forward to getting this included!

@aarondoet
Copy link
Contributor Author

aarondoet commented Dec 22, 2020

I have no experience at all with automated tests (not yet, that's something I always wanted to take a look at but somehow I never did) but I'll give my best to figure out how to do that

@vabene1111
Copy link
Collaborator

you dont need to, i can do it when merging :) but if you want feel free to give it a try. I heard for years how awesome test driven development and all that fancy stuff is and nerver actually used it.

With this project i did my first tiny steps towards some decent test coverage and i must admit that i really liked it, it actually helped my find a few edge cases that i would otherwise not have found.

The python file you linked above is already pretty much all you need. Basically you open the test directoy, find the file where the other parser related tests are and add another function to it that tests your new parts. Its basically just python code where you add a few assert statements which then get checked.

If you need any help feel free to ask and if you don't feel like writing the tests dont worry, i can do so when i have some more time again.

@aarondoet
Copy link
Contributor Author

I guess I'll just leave that up to you.
One thing I just noted: I am using empty strings as default values. Idk if that is true but my feeling is that you used None as default value. If that is an issue just let me know, I think I could easily change that.

@vabene1111
Copy link
Collaborator

i am honestly not sure any more. I have changed that too many times to support empty fields on some things that i previously considered required that i will have to look into this.

There is also an issue related to this i guess #275

i will have to take a deep look into this and then i will fix it in both cases

@vabene1111 vabene1111 merged commit 566eea1 into TandoorRecipes:develop Dec 26, 2020
@vabene1111
Copy link
Collaborator

vabene1111 commented Dec 26, 2020

if you are interested, i added your test in 840f5ec and it was super easy as you already did basically everything that needed to be done. I just changed your if check to an assertNotEqual and everything worked as expected.

I might add some more checks than only the non equal one

The other issue will be looked at independently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants