Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tutorial datasets to new S3 bucket #79

Closed
julian-risch opened this issue Nov 25, 2022 · 1 comment
Closed

Move tutorial datasets to new S3 bucket #79

julian-risch opened this issue Nov 25, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@julian-risch
Copy link
Member

With the new S3 bucket https://core-engineering.s3.eu-central-1.amazonaws.com/public/ and its public folder, we should move and possibly also rename all datasets used in the tutorials.

There are individual copies of some datasets for each tutorial to facilitate telemetry. We need to decide on a naming scheme. I would be okay with a number as a suffix just like we did until now but maybe we can come up with an alternative? The downside of the number is that it might stay in sync with the order of the tutorials on our website and the separation into beginner/intermediate/advanced tutorials.

This is how it's currently done: https://github.com/deepset-ai/haystack/blob/ddeaf2c98c157af1e26c637bcb563c6ea52fdcb7/haystack/telemetry.py#L187
"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip": "1",

What do you think? @brandenchan @bilgeyucel

Changes are needed in Haystack to make sure telemetry continues working. There is an issue for that in Haystack here: deepset-ai/haystack#3634

@julian-risch julian-risch added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Nov 25, 2022
@bilgeyucel
Copy link
Contributor

We use hugginface datasets now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants