-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stack overflow ingest #181
Conversation
Deploying with Cloudflare Pages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you test the ingestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to try the following before merging this change to the main:
- Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)
- Make sure that the historical data from Stackoverflow is ingested. ( very old relevant post or comment)
- Deploy this change after running ingestion DAG to the dev backend and request @vatsrahul1001 to test this. The goal is that the performance should not degrade
If that's the case, I'm unsure whether we should remove the archive. As we already have the archive data, shouldn't we just ingest that data, and use the API to parse data after that? |
We will create a new Weaviate class and do a fresh ingest. We don't need to remove archive data in old database |
ee0c2cc
to
7057444
Compare
7057444
to
bb1494c
Compare
@sunank200 As #194 has been done, should we re-review this one? |
refactor 47933b5 and make it consistent with existing archive logic
closes: #126