You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
This seems to be caused by intermittent downtime of the clic.cimec.unitn.it server.
While we are likely allowed to redistribute the MEN dataset (it is not 100% clear if the permissive license applies to the dataset or just only models trained on the dataset) and could thereby control the source server, we can't redistribute most other datasets.
Therefore I suggest not to clean the cached datasets downloaded from external hosts between subsequent test runs. To make sure we catch URLs that go down permanently, we would have to set up a separate CI job that once in a while runs the complete test suite after cleaning the cached datasets.
I don't have access to the CI configuration, but I suppose this is feasible. If there is no objection I will implement the cache of external datasets now to make sure we won't experience any such intermittent test failures in future. Then in a follow-up we can set up the scheduled test without cache. For that I either need access to the CI configuration or someone (@szha ?) would have to enable it.
Please let me know if you have any alternative suggestions
Well, is the cache of the dataset that is not allowed to redistribute allowed? I suggest we only use the datasets that are clear to the license issue. Which datasets used in the embedding evaluation having a free-distribution license?
With cache I mean that the CI server doesn't re-download the dataset on every run. This means even if the authors webserver goes down intermittently, the tests would still pass.
Redistribution would allow us to upload the datasets to S3 and replace the links to the authors webserver with a link to our S3 bucket. S3 is unlikely to go down, so our tests wouldn't be flaky in the first case. Unfortunately 90% of the datasets for Embedding Eval as well as CoNLL do not allow redistribution (and possibly others).
However we plan to contact the dataset owners once the toolkit is released and ask for special redistribution permission. This would save them data transfer costs caused by users of our toolkit, so they may agree.
Closing this now as #58 and #62 should have solved the issue.
You may need to make a code-change to rerun the tests.
This error caused this PR CI failed: #55
The detailed error msg is as below:
The text was updated successfully, but these errors were encountered: