Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

Closed
KellenSunderland opened this issue Jan 6, 2018 · 3 comments
Closed

Comments

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Jan 6, 2018

Description

All PRs are currently failing due to a R test that's attempting to download a dataset from an unavailable web server.

The unavailable file is http://files.grouplens.org/datasets/movielens/ml-100k.zip.
The affected test: "Matrix Factorization"

Environment info (Required)

CI

Notes

Disabled in PR #9333

@szha Would you be able to tag this one as a flaky test?

@KellenSunderland
Copy link
Contributor Author

Looks like this dataset has been posted again. It's a 4.9 MB file. The licensing and redistribution details are included in an attached README. The relevant sections to me are:

Neither the University of Minnesota nor any of the researchers
involved can guarantee the correctness of the data, its suitability
for any particular purpose, or the validity of results based on the
use of the data set.  The data set may be used for any research
purposes under the following conditions:

     * The user may not state or imply any endorsement from the
       University of Minnesota or the GroupLens Research Group.

     * The user must acknowledge the use of the data set in
       publications resulting from the use of the data set
       (see below for citation information).

     * The user may not redistribute the data without separate
       permission.

     * The user may not use this information for any commercial or
       revenue-bearing purposes without first obtaining permission
       from a faculty member of the GroupLens Research Project at the
       University of Minnesota.

To me it remains somewhat vague given these terms if we can cache this data in our CI setup somewhere. If no-one objects I'd recommend contacting the authors.

@KellenSunderland
Copy link
Contributor Author

Another option given the simplicity of the dataset (movie rankings) would be to generate a dataset locally that adheres to the same formatting.

@KellenSunderland
Copy link
Contributor Author

We're actually never calling predict and never calling expect_equal in this test, unlike for example the 'MNIST' test. This further convinces me that we don't need to be using real data here, and can probably use generated data. I also get the impression we can narrow the scope of this test, such that it focuses on a single 'unit under test'.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants