New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low ranking accuracy of the example with MovieLens20M? #24

Closed
saulvargas opened this Issue May 20, 2016 · 17 comments

Comments

Projects
None yet
5 participants
@saulvargas

saulvargas commented May 20, 2016

Hi,

I've been playing around today with DSSTNE with the goal of running the example with MovieLens20M and compare the NN in the example with some state-of-the-art CF algorithms that I have implemented here. From my evaluation (which is by no means exhaustive or perfect) the example provided by DSSTNE does not seem to be competitive with respect to state of the art CF algorithms.

To summarise, I have downloaded the original MovieLens 20M dataset and I have performed a random 80%-20% partition. I have transformed the training subset to the DSSTNE format, with the only difference that I do not include the timestamps of the dataset, but 1's for all movies (is this actually very important??). I have generated recommendations with my CF algorithms (popularity, user-based and matrix factorisation) and, following the steps in the example, the predictions of DSSTNE. Finally, I have evaluated the performance with the testing subset using precision at cutoff 10.

These are the results, the configuration provided in your example does not seem to work very well:
pop 0.10974162112149495
ub 0.24097987334078072
mf 0.25135912784469483
dsstne 0.056956854920365056

I am no expert in ANN's so I cannot figure out easily whether I should modify the parameters in the config.json provided in the example to make it work better. Have you compared the performance of the example with similar CF algorithms? If so, could you please share some results/insights?

Cheers
Saúl

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented May 20, 2016

Hey Saul,
Nice work, I suspect the problem here is one of open sourcing the software
without open sourcing the secret sauce networks actually in use. The
network supplied is not one (as far as I know) that Amazon uses for
recommendations, but rather a simple demo of a network one can build with
DSSTNE.

One could say similar things of TensorFlow. Google has open sourced its
very nice framework, but clearly not all their networks nor all their
implementations of its underlying engine.

Scott

On Fri, May 20, 2016 at 8:17 AM, Saúl notifications@github.com wrote:

Hi,

I've been playing around today with DSSTNE with the goal of running the
example with MovieLens20M and compare the NN in the example with some
state-of-the-art CF algorithms that I have implemented here
https://github.com/RankSys/RankSys. From my evaluation (which is by no
means exhaustive or perfect) the example provided by DSSTNE does not seem
to be competitive with respect to state of the art CF algorithms.

To summarise, I have downloaded the original MovieLens 20M dataset and I
have performed a random 80%-20% partition. I have transformed the training
subset to the DSSTNE format, with the only difference that I do not include
the timestamps of the dataset, but 1's for all movies (is this actually
very important??). I have generated recommendations with my CF algorithms
(popularity, user-based and matrix factorisation) and, following the steps
in the example, the predictions of DSSTNE. Finally, I have evaluated the
performance with the testing subset using precision at cutoff 10.

These are the results, the configuration provided in your example does not
seem to work very well:
pop 0.10974162112149495
ub 0.24097987334078072
mf 0.25135912784469483
dsstne 0.056956854920365056

I am no expert in ANN's so I cannot figure out easily whether I should
modify the parameters in the config.json provided in the example to make it
work better. Have you compared the performance of the example with similar
CF algorithms? If so, could you please share some results/insights?

Cheers
Saúl


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#24

@saulvargas

This comment has been minimized.

saulvargas commented May 20, 2016

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration Amazon uses for their systems. I believe it would suffice providing one that performs well enough in a public dataset such as MovieLens 20M that uses, for instance, configurations found in papers such as this one.

It is awesome that Amazon releases code like this, I am just kindly requesting a little bit of guidance on how to make the provided example work.

Best wishes
Saúl

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented May 20, 2016

So one easy first step would be to add denoising to the example network, no?

Second, if you're willing, that's a very cool paper (I saw something like
it at KDD 2015), and I'd love to help you implement it in DSSTNE if you're
interested in doing so. And I suspect doing so would either demonstrate
the flexibility of DSSTNE or help provide additional API hooks to provide
such flexibility. Interested?

Scott

On Fri, May 20, 2016 at 8:39 AM, Saúl notifications@github.com wrote:

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration
Amazon uses for their systems. I believe it would suffice providing one
that performs well enough in a public dataset such as MovieLens 20M that
uses, for instance, configurations found in papers such as this one
http://dl.acm.org/citation.cfm?id=2835837.

It is awesome that Amazon releases code like this, I am just kindly
requesting a little bit of guidance on how to make the provided example
work.

Best wishes
Saúl


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#24 (comment)

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented May 20, 2016

PS here's how to do that first step...

"Denoising" : {
    "p" : 0.3
},

On Fri, May 20, 2016 at 8:45 AM, Scott Le Grand varelse2005@gmail.com
wrote:

So one easy first step would be to add denoising to the example network,
no?

Second, if you're willing, that's a very cool paper (I saw something like
it at KDD 2015), and I'd love to help you implement it in DSSTNE if you're
interested in doing so. And I suspect doing so would either demonstrate
the flexibility of DSSTNE or help provide additional API hooks to provide
such flexibility. Interested?

Scott

On Fri, May 20, 2016 at 8:39 AM, Saúl notifications@github.com wrote:

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration
Amazon uses for their systems. I believe it would suffice providing one
that performs well enough in a public dataset such as MovieLens 20M that
uses, for instance, configurations found in papers such as this one
http://dl.acm.org/citation.cfm?id=2835837.

It is awesome that Amazon releases code like this, I am just kindly
requesting a little bit of guidance on how to make the provided example
work.

Best wishes
Saúl


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#24 (comment)

@rgeorgej

This comment has been minimized.

Contributor

rgeorgej commented May 23, 2016

@saulvargas is it possible to get us the scripts which you used to generate your test and train dataset

@saulvargas

This comment has been minimized.

saulvargas commented May 23, 2016

Hi @rgeorgej ,

Sure! I can prepare a simple repository with all required to reproduce the experiment I performed, although it might take me a couple of days... I'll let you know.

Cheers
Saúl

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented May 30, 2016

Hey Saul, I'd love to make a benchmark out of this for both training speed and predictive performance. Any progress here?

@saulvargas

This comment has been minimized.

saulvargas commented May 31, 2016

Hi,

I've been working in my spare time in a script and some Java code to fully reproduce the experiment I performed two weeks ago. It is still work in progress as I have to include the steps for DSSTNE, but meanwhile you can take a look here: https://github.com/saulvargas/dsstne-comparison/

Basically, if you execute run.sh, you download the original MovieLens20M dataset, perform a random 80/20 random split for training and test, generate some CF baselines with RankSys and the evaluate the precision@10 of these baselines.

Cheers
Saúl

@saulvargas

This comment has been minimized.

saulvargas commented Jun 4, 2016

Hi @rgeorgej

Sorry it took so long, but now I have all the code in https://github.com/saulvargas/dsstne-comparison/ that is required to reproduce the experiments I conducted, now including training DSSTNE with my 80/20% split for MovieLens20M data. I hope it helps. Just execute the steps in run.sh (the ub recommender might take a while to be trained).

Cheers
Saúl

@rgeorgej

This comment has been minimized.

Contributor

rgeorgej commented Jun 6, 2016

Thanks @saulvargas for all the help and we will take it to our side to get you a decent config with offline performance for the movie lens data

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented Jun 20, 2016

So with a fairly simple autoencoder applied to an 80/20 split of the dataset, I get a precision of 8.75%@10. That's a far cry from your best efforts, but that's out of the gate so let's see where we can take it.

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented Jun 20, 2016

So I was splitting 80/20 on users, you were splitting on movie views. I'll have that data for you by tomorrow. Got P@10 to 9.3% for that partitioning on the second try though so I suspect there's lots of headroom for improvement.

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented Jun 21, 2016

With this fix, 32.7% P@10, 48.4% P@1. Will post to github tonight after work, but here's the first submission, incorporating input denoising and a sparseness penalty in the hidden layer:

{
"Version" : 0.8,
"Name" : "MovieLens Benchmark #1",
"Kind" : "FeedForward",
"SparsenessPenalty" : {
"p" : 0.5,
"beta" : 2.0
},

"ShuffleIndices" : false,

"Denoising" : {
    "p" : 0.4
},

"ScaledMarginalCrossEntropy" : {
    "oneTarget" : 1.0,
    "zeroTarget" : 0.0,
    "oneScale" : 30.0,
    "zeroScale" : 1.0
},
"Layers" : [
    { "Name" : "Input0", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, 
    { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : [ "Input0" ], "N" : 256, "Activation" : "Sigmoid", "Sparse" : true },      
    { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "Source" : [ "Hidden" ], "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true }
],

"ErrorFunction" : "ScaledMarginalCrossEntropy"

}

@scottlegrand

This comment has been minimized.

Contributor

scottlegrand commented Jun 22, 2016

Round 2, MAP@10 of 41.1% and a P@10 of 35.3%. Network supplied below:
{
"Version" : 0.8,
"Name" : "MovieLens Benchmark #2",
"Kind" : "FeedForward",

"ShuffleIndices" : false,


"ScaledMarginalCrossEntropy" : {
    "oneTarget" : 1.0,
    "zeroTarget" : 0.0,
    "oneScale" : 1.0,
    "zeroScale" : 1.0
},
"Layers" : [
    { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, 
    { "Name" : "Hidden1", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : "Input", "N" : 1536, "Activation" : "Relu", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } },
    { "Name" : "Hidden2", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : ["Hidden1"], "N" : 1536, "Activation" : "Relu", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } },  
    { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true , "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01, "Bias" : -10.2 }}
],

"ErrorFunction" : "ScaledMarginalCrossEntropy"

}

@hadi-ds

This comment has been minimized.

hadi-ds commented Jun 29, 2016

Hi there,

I have a related issue about MovieLense example and the choice of ErrorFunction.

First, I wonder if during input/output data generation stage (generateNetCDF ...), time stamps (how long users watch a movie) are actually recorded in gl_input.nc file and used as an implicit measure of user-movie affinity, rather than replacing them as a label '1' indicating whether user has watched the movie or not?

If the former is the case, I think a regression type error function such as L2 should be used in this Auto Encoding. 'ScaledMarginalCrossEntropy' is relevant to classification setting (like the alternative scenario I mentioned above).

thanks,

@saulvargas

This comment has been minimized.

saulvargas commented Aug 26, 2016

Hi all,

Sorry it took me so long to get back to you with this.

Unfortunately I have not been successful at obtaining a decent ranking accuracy with any of the two last configurations that @scottlegrand kindly provided. They basically perform as bad as the original one for my evaluation methodology. I suspect we may be applying different evaluation protocols and, therefore, the provided configurations may not be adequate for the one I am interested about.

If I find some time, I will try to learn enough about ANNs so that I can understand how to come up with a configuration that results in high ranking accuracy for my setup. Therefore, I think we can close this issue now.

For your reference I am sharing the data I generated: https://www.dropbox.com/s/krk8mkzynn9igqv/dsstne-comparison-data.zip?dl=0

The code is already here: https://github.com/saulvargas/dsstne-comparison/

@saulvargas saulvargas closed this Aug 26, 2016

@VedAustin

This comment has been minimized.

VedAustin commented Dec 1, 2016

I was wondering if anyone ran this on the 100K dataset and evaluated its accuracy in terms of MSE? Are these benchmarks from DSSTNE publicly available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment