-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GeometricMeanRarity #7
Conversation
The |
src/openrarity/scoring/base.py
Outdated
@dataclass | ||
class BaseRarityFormula: | ||
|
||
formula_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to 1) create enum to provide explicit formula names 2) why we need formula id if we have a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we don't know what the names will be, a priori
- gp
src/openrarity/scoring/base.py
Outdated
def score_token(token: Token) -> float: | ||
raise NotImplementedError | ||
|
||
# base aggregate scorers: can override more efficient methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline , can we create an efficient computation abstraction ( type alias should work it out) to perform batch-based computations in NumPy format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gp, we will create another class to batch computations in a future PR
-- equivalent to the nth power of "statistical rarity" | ||
''' | ||
|
||
def score_token(self, token: Token) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a rule of thumb we should strike for 90% test coverage in the library - quality is very important to keep consistent computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will write tests in future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , makes sense
@@ -0,0 +1,296 @@ | |||
[[package]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need poetry lock again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice progress ! some re-factoring needed and good to go ( assuming we will write more unit tests for the calculation logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now , we need to refactor class level documentation and add unit-tests to make sure the computation is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, good progress.
TODO: override aggregate calculations with vectorized methods