Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stack overflow ingest #181

Merged
merged 11 commits into from
Dec 19, 2023
Merged

Add stack overflow ingest #181

merged 11 commits into from
Dec 19, 2023

Conversation

Lee-W
Copy link
Collaborator

@Lee-W Lee-W commented Nov 29, 2023

refactor 47933b5 and make it consistent with existing archive logic

closes: #126

Copy link

cloudflare-pages bot commented Nov 29, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8e5e26d
Status: ✅  Deploy successful!
Preview URL: https://a2f2bd4e.ask-astro.pages.dev
Branch Preview URL: https://add-stack-overflow-ingest.ask-astro.pages.dev

View logs

Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test the ingestion?

Copy link
Collaborator

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to try the following before merging this change to the main:

  1. Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)
  2. Make sure that the historical data from Stackoverflow is ingested. ( very old relevant post or comment)
  3. Deploy this change after running ingestion DAG to the dev backend and request @vatsrahul1001 to test this. The goal is that the performance should not degrade

@Lee-W
Copy link
Collaborator Author

Lee-W commented Dec 1, 2023

Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)

If that's the case, I'm unsure whether we should remove the archive. As we already have the archive data, shouldn't we just ingest that data, and use the API to parse data after that?

@sunank200
Copy link
Collaborator

Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)

If that's the case, I'm unsure whether we should remove the archive. As we already have the archive data, shouldn't we just ingest that data, and use the API to parse data after that?

We will create a new Weaviate class and do a fresh ingest. We don't need to remove archive data in old database

@Lee-W Lee-W force-pushed the add-stack-overflow-ingest branch 4 times, most recently from ee0c2cc to 7057444 Compare December 11, 2023 23:51
@Lee-W
Copy link
Collaborator Author

Lee-W commented Dec 19, 2023

@sunank200 As #194 has been done, should we re-review this one?

@Lee-W Lee-W merged commit 14ad2bf into main Dec 19, 2023
7 checks passed
@Lee-W Lee-W deleted the add-stack-overflow-ingest branch December 19, 2023 09:15
@Lee-W Lee-W mentioned this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ingest for stack overflow
4 participants