-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add query baseline query classifier and boilerplate in pipeline #1083
Conversation
@tholor please let me know how to proceed with this. |
outgoing_edges = 2 | ||
query_vectorizer = pickle.load( | ||
urllib.request.urlopen( | ||
"https://raw.githubusercontent.com/shahrukhx01/ocr-test/main/query_vectorizer.pickle" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Deepset team host these model on their s3 (@tholor WDYT?)
Also It is better to pass model via constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree. I will upload it to our s3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. You can find it at https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle
Also added a tiny readme: https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalitpagaria thanks for your feedback.
@tholor could you also please upload the TF-IDF vectorizer pickle for feature extraction on S3. https://raw.githubusercontent.com/shahrukhx01/ocr-test/main/query_vectorizer.pickle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -593,6 +595,29 @@ def run(self, **kwargs): | |||
return kwargs, "output_1" | |||
|
|||
|
|||
class QueryClassifier: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be subclass of BaseComponent
@shahrukhx01 Thank you for PR.
Yes I think deepset team can host these model on their s3
In this case think
|
Proposed changes:
Status (please check what you already did):
Discussion Points:
I have added the baseline model, however,
Issue: Linked Issue
PS:
This is my first PR on here, please let me know any contribution guideline that I might have missed. Also, any instructions manuals for contributors, which would help me get started with the codebase quickly since I'd like to actively contribute to the haystack in general. Thanks!