-
Notifications
You must be signed in to change notification settings - Fork 22
[ENH] add folder and data for tests #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
214c0bc to
6f90142
Compare
|
Hi @KristijanArmeni I walked the debugger and found this issue: The error It looks like your input files are tab-separated, but the We should certainly note this in the test documentation.
Edit: we did in fact already express an intention to allow In any case though, your input data in The column |
|
Thank you @soul-codes, good catch, i would have missed that. That's an easy fix then, but: does the input schema have to be fixed? In #106 I was trying to implement an update such that the hashtag analyzer could work with two input data schemas (see below). That's because I encountered the two schemas in the datasets I've been analyzing. It would good to be flexible to some degree, though I'm sure input data will inevitably come in lots of different schemas. If we have to choose then I'd think 1) would be more common. Two assumed input data schemas:
|
We can do this, but it would require us to break from the existing "contracts" between analyzers and the application. [Edit:] to be clear, we cannot do this right now. A "proper" way to do this would be to have a pre-processing step where we can perform the destructive conversion from I would recommend that we reduce the scope of this PR by only allowing the more prevalent kind of data (i.e. choose one). I know that sucks, but for the PR's scope (having data for testing), it's more important that the existing behavior is tested than us introducing a new behavior (being able to handle a different schema). Let's keep the pre-processing in the back of our mind though. Whatever cannot be accomplished right now, feel free to open an issue and we can think about how to make it work as a generic contract for all analyzers. |
|
Thanks @soul-codes ! I agree, I was coming to the same conclusion myself, i.e. choose the more prevalent schema and expect users to conform in cases when it is different. I think Schema 1 above seems a reasonable default expectation to me (especially if it is coherent with the ngrams test). I won't get to it today, but can give it a shot over the week, after Tuesday or so. (if you feel like ironing it out, welcome to!) |
This is a WIP PR for adding tests for the hashtag analyzer following the recipe outlined of the test_example_base.py.
It's currently failing (see below). Input and help welcome while I troubleshoot this (@soul-codes ).
Things added
hashtag_test_input.csvhashtag_test_output.jsonFailing
Right now running
pytestin the root folder locally on this branch,test_hashtag_analyzer.pywill lead to the following error stack originatingtest_primary_analyzer:Error stack
If I put a break point on l. 39 in test_primary_analyzer, like so:
Show breakpoint
I see that in test_primary_analyzer,
inputevaluates to this:But
input_pathevaluates to this: