Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

compare namespace in counter and min_count #3644

Merged
merged 7 commits into from
Mar 31, 2021

Conversation

nadgeri14
Copy link
Contributor

fix for #2292

@nadgeri14
Copy link
Contributor Author

@matt-gardner Can you please review the PR?
Thanks :)

@matt-gardner
Copy link
Contributor

Two issues:

  1. There are some failing tests - can you look at the build log and figure out what tests are failing, and why, and fix them?
  2. Can you add a test to vocabulary_test.py that will make sure this implementation is correct? The logic is simple enough, but a test like that makes sure that we don't have regressions later.

@nadgeri14
Copy link
Contributor Author

@matt-gardner The build error is while executing:
Unit Tests (pytest) in allennlp/common/testing/model_test_case.py:33

If I am not mistaken for example if we have

counter = defaultdict(<function Vocabulary.from_instances.<locals>.<lambda> at 0x7f3cd59e0510>, {'source_tokens': defaultdict(<c...: 1, 'secret': 1, 'sneaky': 1, 'retain': 1, 'composure': 1, 'ecstatic': 1, 'not': 1, 'disruption': 1, 'bursting': 1})})

min_count = {'tokens': 1}
It should throw an error saying tokens is not present in counter, is my assmuption right?

@matt-gardner
Copy link
Contributor

Yes, that's what's happening. You need to figure out what's wrong in the code that's causing that error, then fix it, and put the fix in this PR. Probably easiest if you run the failing test locally.

@brendan-ai2
Copy link
Contributor

@nadgeri14, it would be great to have this fix in if you still have the time to work on it!

@nadgeri14
Copy link
Contributor Author

@brendan-ai2 @matt-gardner Hey sorry for lately being inactive, I have exams, as a result, I could not attend it. I will try to close the PR by the end of next week.

@schmmd
Copy link
Member

schmmd commented Feb 14, 2020

@nadgeri14 no worries--thanks for the contribution!

@schmmd
Copy link
Member

schmmd commented Apr 24, 2020

@nadgeri14 should we leave this open? Do you still expect to get to this?

@schmmd schmmd changed the base branch from master to main December 23, 2020 18:48
dirkgr
dirkgr previously approved these changes Mar 30, 2021
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment on the changelog. Feel free to merge!

CHANGELOG.md Outdated
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Allow the order of examples in the task cards to be specified explicitly
- `histogram_interval` parameter is now deprecated in `TensorboardWriter`, please use `distribution_interval` instead.
- Memory usage is not logged in tensorboard during training now. `ConsoleLoggerCallback` should be used instead.
- If you use the `min_count` parameter of the Vocabulary, but you specify a namespace that does not exist, the vocabulary creation should raise a `ConfigurationError`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change the wording here. It no longer "should". Now it "does"!

dirkgr
dirkgr previously approved these changes Mar 30, 2021
@ArjunSubramonian ArjunSubramonian merged commit bf8e71e into allenai:main Mar 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vocabulary min_count should error when using a nonexistent namespace
8 participants