"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

KellenSunderland · 2018-01-06T22:02:13Z

Description

All PRs are currently failing due to a R test that's attempting to download a dataset from an unavailable web server.

The unavailable file is http://files.grouplens.org/datasets/movielens/ml-100k.zip.
The affected test: "Matrix Factorization"

Environment info (Required)

CI

Notes

Disabled in PR #9333

@szha Would you be able to tag this one as a flaky test?

KellenSunderland · 2018-01-09T09:31:05Z

Looks like this dataset has been posted again. It's a 4.9 MB file. The licensing and redistribution details are included in an attached README. The relevant sections to me are:

Neither the University of Minnesota nor any of the researchers
involved can guarantee the correctness of the data, its suitability
for any particular purpose, or the validity of results based on the
use of the data set.  The data set may be used for any research
purposes under the following conditions:

     * The user may not state or imply any endorsement from the
       University of Minnesota or the GroupLens Research Group.

     * The user must acknowledge the use of the data set in
       publications resulting from the use of the data set
       (see below for citation information).

     * The user may not redistribute the data without separate
       permission.

     * The user may not use this information for any commercial or
       revenue-bearing purposes without first obtaining permission
       from a faculty member of the GroupLens Research Project at the
       University of Minnesota.

To me it remains somewhat vague given these terms if we can cache this data in our CI setup somewhere. If no-one objects I'd recommend contacting the authors.

KellenSunderland · 2018-01-09T09:32:43Z

Another option given the simplicity of the dataset (movie rankings) would be to generate a dataset locally that adheres to the same formatting.

KellenSunderland · 2018-01-09T09:53:53Z

We're actually never calling predict and never calling expect_equal in this test, unlike for example the 'MNIST' test. This further convinces me that we don't need to be using real data here, and can probably use generated data. I also get the impression we can narrow the scope of this test, such that it focuses on a single 'unit under test'.

KellenSunderland mentioned this issue Jan 7, 2018

Temporarily disable test with failing http connection #9333

Merged

szha added Test Flaky labels Jan 8, 2018

KellenSunderland mentioned this issue Jan 11, 2018

Revert "Temporarily disable test with failing http connection (#9333)" #9379

Merged

szha mentioned this issue Jan 13, 2018

Flaky Tests Tracking Issue #9412

Closed

jeremiedb mentioned this issue Jan 27, 2018

Fix flaky test R #9598

Merged

szha closed this as completed Jan 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

KellenSunderland commented Jan 6, 2018 •

edited

Loading

KellenSunderland commented Jan 9, 2018

KellenSunderland commented Jan 9, 2018

KellenSunderland commented Jan 9, 2018

"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

"Matrix Factorization" R test failing due to unavailable HTTP server. #9332

Comments

KellenSunderland commented Jan 6, 2018 • edited Loading

Description

Environment info (Required)

Notes

KellenSunderland commented Jan 9, 2018

KellenSunderland commented Jan 9, 2018

KellenSunderland commented Jan 9, 2018

KellenSunderland commented Jan 6, 2018 •

edited

Loading