Questions about Dataset #66

yuasaonrails · 2016-09-06T09:30:55Z

Hi,
I was playing with the sample data, and now I have 3 questions.

Q1. How to make dataset with multiple feature values?
Currently, only one feature has one feature value. Is it possible to a feature has multiple values? If so how can I do that?

Q2. Changing all timestamps to 1 manually giving me a different result.
ml20m-all is the dataset of userId and movieId with timestamp.

userId movieId,timestamp: movieId,timestamp: movieId,timestamp…

On Issue#21, Mr.Rejith said “Currently no movie features are taken. Currently only 1/0 signals are supported from the wrapper script even though the Engine supports analog signals.”
So I changed all timestamps in ml20m-all to 1, and ran DSSTNE with modified data.
eg) 2,1112486027:29,1112484676:32,1112484819 to 2,1:29,1:32,1
I thought results would be the same, but it was not.
I am guessing that DSSTNE treats feature value as continuous value. Is this right? Then why did DSSTNE give me a different result?

Q3. Does DSSTNE support digital inputs?
On Issue#11, Mr.Rejith said “DSSTNE Engine supports analog inputs but we have not exposed it in the wrapper . if the Rating comes it could be viewed as an analog signals”
Analog inputs like Rating are continuous value, so I wondered if DSSTNE supports digital inputs like category id which is discrete value.

DSSTNE is wonderful. I feel like it has so much potential.
But I couldn’t figure how to use it well, and I couldn’t find detailed documentations online.

Thank you,
yuasa

rgeorgej · 2016-09-13T17:23:03Z

Answers to the Questions
A1. Right now in the utilities that we provided does not support it. you can write another wrapper where each dataset is a locally connected network with all the feature values. Its bit a hassle and i wont recommend it but that is the only way right now

A2. Can you explain the difference in results. is the Floating values different or the Recommendations completely different. Can you paste me both the results

A3. DSSTNE Engine supports it . In the dataset(NetCDF) format when you create you need add the analog value .https://github.com/amznlabs/amazon-dsstne/blob/master/src/amazon/dsstne/utils/NetCDFhelper.cpp#L264 is the wrapper that you should call
I can work on exposing it in the wrapper

tristanpenman · 2016-09-13T17:35:28Z

As far as Q2 is concerned, you may find that your results differ between runs due to DSSTNE's de-noising feature. When you use the example config.json file linked in the docs (https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json) the de-noising value is set to 0.2.

Can you try setting this value to 0 and re-running your tests?

If you continue to see differences in the output, we can dig into the issue further.

yuasaonrails · 2016-09-14T12:44:10Z

@rgeorgej

Thanks for your reply. It’s really helpful.

Re:A1
I understand that it is possible, but it's difficult. And I am not going to try since you don’t recommend to do it.

Re:A2
Here are the results.

What I have done: I changed all timestamp in ml20m-all to 1, and ran DSSTNE.
config.json is from https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json, and I didn’t edit it.
commands are following:
generateNetCDF -d gl_input -i ml20m-all -o gl_input.nc -f features_input -s samples_input -c generateNetCDF -d gl_output -i ml20m-all -o gl_output.nc -f features_output -s samples_input -c wget https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10 predict -b 1024 -d gl -i features_input -o features_output -k 10 -n gl.nc -f ml20m-all -s recs -r ml20m-all

The difference between results is that there are watched movie_ids in the result of modified ml20m-all.
-f option (filter out) seems not working with modified ml20m-all when I predict.

Result of original ml20m-all
1 2571,0.905:1206,0.864:1210,0.852:1270,0.794:1274,0.668:592,0.662:6874,0.653:1197,0.631:5618,0.612:3793,0.611:
2 1200,0.464:1240,0.460:1097,0.361:1127,0.288:32,0.265:593,0.251:2571,0.239:2628,0.237:1198,0.211:780,0.201:
3 2716,0.878:1580,0.874:1527,0.811:2021,0.800:750,0.798:1371,0.775:1387,0.705:3471,0.690:2174,0.679:1320,0.671:
4 500,0.561:597,0.532:457,0.526:587,0.414:780,0.371:592,0.370:442,0.355:344,0.328:539,0.317:364,0.275:
5 356,0.963:1,0.878:539,0.841:597,0.811:527,0.735:357,0.653:586,0.653:592,0.616:34,0.570:733,0.559:
6 95,0.611:786,0.544:5,0.537:32,0.525:36,0.397:104,0.391:376,0.384:25,0.379:608,0.363:784,0.350:
7 2724,0.791:1569,0.761:4022,0.743:2571,0.742:3623,0.741:2706,0.699:2763,0.696:4246,0.689:1961,0.683:1584,0.674:
8 586,0.735:318,0.679:420,0.639:410,0.638:440,0.612:225,0.602:34,0.590:474,0.575:160,0.564:300,0.516:
9 2858,0.468:2762,0.296:3578,0.271:2694,0.206:593,0.178:3273,0.175:2712,0.173:2572,0.173:3005,0.170:2541,0.169:
10 1193,0.441:110,0.354:1291,0.303:593,0.272:1213,0.250:1234,0.249:1270,0.245:1036,0.237:318,0.235:1225,0.223:

Result of modified ml20m-all
1 1214,0.968:1196,0.966:4993,0.962:260,0.957:1258,0.947:5952,0.947:1200,0.943:296,0.936:1198,0.918:2571,0.905:
2 1210,0.743:260,0.677:1196,0.624:1214,0.597:589,0.495:1200,0.464:1240,0.460:480,0.442:1270,0.437:1097,0.361:
3 1196,0.990:260,0.986:1210,0.983:1214,0.977:1200,0.974:1240,0.974:1270,0.967:589,0.957:541,0.957:1374,0.956:
4 480,0.892:589,0.819:356,0.784:377,0.700:500,0.561:597,0.532:457,0.526:586,0.477:367,0.462:587,0.414:
5 356,0.963:260,0.958:480,0.958:780,0.932:364,0.904:457,0.886:1,0.878:500,0.875:588,0.851:539,0.841:
6 780,0.937:736,0.869:648,0.835:1,0.741:1073,0.713:62,0.693:141,0.677:733,0.636:95,0.611:260,0.577:
7 1580,0.954:1721,0.954:2396,0.950:480,0.931:1097,0.928:597,0.922:1270,0.918:3408,0.904:2628,0.887:1210,0.874:
8 480,0.976:592,0.974:457,0.974:356,0.968:380,0.964:589,0.955:590,0.955:377,0.953:165,0.938:153,0.927:
9 2706,0.491:2858,0.468:2710,0.444:2683,0.424:2959,0.331:2762,0.296:3578,0.271:2694,0.206:858,0.184:593,0.178:
10 858,0.866:260,0.828:1198,0.790:1196,0.767:1210,0.678:1221,0.669:2028,0.491:912,0.446:1193,0.441:527,0.412:

For example:
1 2571,0.905:1206,0.864:1210,0.852:1270,0.794:1274,0.668:592,0.662:6874,0.653:1197,0.631:5618,0.612:3793,0.611:
1 1214,0.968:1196,0.966:4993,0.962:260,0.957:1258,0.947:5952,0.947:1200,0.943:296,0.936:1198,0.918:2571,0.905:
Floating value is the same for movie_id 2571; the value is 0.905 in both results. But latter result has movie ids which user_id has watched. I’ve checked that in ml20m-all.

Re:A3
I get confused, and I want your clarification. My question was whether or not DSSTNE supported digital input.
Your answer was DSSTNE supported it. Then you said “you need to add the analog value”.
From the source code you mentioned, I guess it supports. Could you clarify that please?
And please work on exposing it in the wrapper. I believe everyone would love it.

Thanks,
yuasa

yuasaonrails · 2016-09-14T12:46:41Z

@tristanpenman
Yes, I can!

I changed the de-noising value to 0 in config.json, and I got a different result.
However, even I edited config.json, I got watched movie id in result of modified ml20m-all.
I didn’t change any commands from example page.

original ml20m-all
1 1210,0.909:2571,0.900:1206,0.882:1270,0.807:1274,0.713:1197,0.690:6874,0.687:3793,0.680:5618,0.641:1199,0.638:
2 1200,0.497:1240,0.488:1097,0.391:1127,0.332:2628,0.280:2571,0.272:32,0.263:1374,0.227:780,0.218:1198,0.212:
3 1580,0.861:2716,0.845:750,0.841:2021,0.790:1371,0.781:1527,0.772:1387,0.734:3471,0.721:608,0.704:1136,0.689:
4 500,0.642:597,0.614:457,0.526:587,0.481:780,0.450:442,0.427:539,0.371:592,0.334:344,0.330:485,0.321:
5 356,0.965:539,0.862:1,0.846:597,0.827:527,0.746:357,0.686:586,0.680:733,0.615:592,0.571:34,0.566:
6 95,0.617:5,0.550:786,0.545:32,0.509:104,0.400:36,0.390:376,0.383:25,0.368:784,0.347:805,0.346:
7 2724,0.791:4022,0.754:3623,0.752:1569,0.747:2571,0.710:2763,0.705:4246,0.699:2706,0.693:3418,0.677:1240,0.672:
8 586,0.755:318,0.651:410,0.632:420,0.630:34,0.627:440,0.598:225,0.597:474,0.576:160,0.559:300,0.518:
9 2858,0.533:2762,0.322:3578,0.286:2694,0.228:2572,0.191:593,0.188:2712,0.185:2541,0.185:3005,0.175:2716,0.172:
10 1193,0.458:110,0.357:1291,0.349:1234,0.271:593,0.269:919,0.255:1270,0.252:1213,0.250:1036,0.242:1225,0.238:

modified ml20m-all
1 1214,0.982:1196,0.982:260,0.979:4993,0.977:1200,0.970:5952,0.966:1258,0.959:1198,0.947:296,0.944:541,0.933:
2 1210,0.787:260,0.692:1196,0.670:1214,0.612:589,0.565:480,0.516:1200,0.497:1270,0.490:1240,0.488:1097,0.391:
3 1196,0.992:260,0.989:1214,0.981:1210,0.981:1200,0.974:1270,0.974:1240,0.973:589,0.966:541,0.966:1374,0.956:
4 480,0.932:589,0.864:356,0.836:377,0.774:500,0.642:597,0.614:586,0.559:457,0.526:367,0.524:587,0.481:
5 260,0.975:356,0.965:480,0.964:780,0.946:457,0.902:364,0.900:500,0.885:1210,0.872:539,0.862:648,0.854:
6 780,0.942:736,0.877:648,0.843:1,0.780:1073,0.723:62,0.709:141,0.683:733,0.638:95,0.617:260,0.583:
7 2396,0.948:1721,0.947:1580,0.934:480,0.931:597,0.929:1097,0.924:1270,0.911:3408,0.909:2671,0.874:356,0.871:
8 480,0.984:457,0.978:356,0.977:592,0.976:380,0.968:589,0.963:377,0.963:590,0.959:165,0.947:153,0.937:
9 2706,0.555:2858,0.533:2683,0.478:2710,0.475:2959,0.395:2762,0.322:3578,0.286:2694,0.228:858,0.198:2572,0.191:
10 858,0.904:260,0.891:1198,0.852:1196,0.829:1210,0.768:1221,0.708:912,0.515:2028,0.505:1193,0.458:527,0.446:

head -1 ml20m-all | grep 1214
I saw user_id 1 watched movie_id 1214.

Can you dig this further?

Also, what do you mean by “you may find that your results differ between runs due to DSSTNE's de-noising feature”? I get the same result every time I run with the same config.json.

Thanks,
yuasa

tristanpenman · 2016-09-14T15:15:19Z

@yuasa, did you retrain the model after changing config.json, or did you just re-run 'predict'?

And when I say that your results may differ, I meant that re-running 'predict' with a de-noising value > 0 may give you different recommendations.

yuasaonrails · 2016-09-15T12:13:10Z

@tristanpenman

I retrained.
I removed gl.nc, ran 'train' because I thought train used config.json (since I saw -c config.json in train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10), and ran 'predict'.

And yes, I got different recommendations.

tristanpenman · 2016-09-15T16:38:02Z

@yuasaonrails, thanks for taking the time to do all the extra debugging. We're going to work on some features / enhancements to better support analog and digital inputs. I've created issue #69 as a starting point. You can watch that ticket via Github notifications to track progress.

In the mean time, I suggest closing this ticket. It has been referenced in issue #69, and can always be reopened in the future.

yuasaonrails · 2016-09-16T00:55:08Z

@tristanpenman
I thank you too.
It was my pleasure helping you guys and DSSTNE users in the world.
I can't wait for the wrapper!

Now, I am closing this issue.
Thanks again for your hard work on DSSTNE.

beeva-enriqueotero · 2016-11-16T14:41:37Z

Hello @yuasaonrails

I realized the problem you got is caused by the predict -f filter. Concretely due to a bug/hack on Filters.cpp.

So items are not filtered for low numeric values (<= 10.0), and already viewed items can be recommended.

Otherwise, apart from this (important!) filter issue, the behaviour is the same with timestamps or ratings. All these values are ignored with default 'indicator' type.

Regards, and thanks to DSSTNE team for sharing your work!

yuasaonrails · 2017-04-05T01:35:45Z

@beeva-enriqueotero
Hello.

Thanks for your help.
What do you mean by "The behaviour is the same with timestamps or ratings. All these values are ignored with default 'indicator' type."

beeva-enriqueotero · 2017-04-05T09:44:50Z

Hello @yuasaonrails

I mean that only implicit feedback ("indicator" type) is implemented. So any real or integer value, timestamp or rating is ignored. Apart from the predict -ffilter issue I referred on my previous comment.

There is an open issue asking for enhancement to correctly handle "analog" type: #69

Regards

yuasaonrails · 2017-04-05T12:09:36Z

@beeva-enriqueotero

Thank you very much for the clarification.
I understand.

Your help is greatly appreciated

rgeorgej assigned mohanasudhan Sep 7, 2016

rgeorgej added the help wanted label Sep 7, 2016

tristanpenman mentioned this issue Sep 15, 2016

Correctly handle analog and digital inputs in utility/wrapper application #69

Open

yuasaonrails closed this as completed Sep 16, 2016

yuasaonrails unassigned mohanasudhan Apr 5, 2017

beeva-enriqueotero mentioned this issue Nov 20, 2017

Does DSSTNE work on categorical string features/ binary features? #139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Dataset #66

Questions about Dataset #66

yuasaonrails commented Sep 6, 2016

rgeorgej commented Sep 13, 2016

tristanpenman commented Sep 13, 2016

yuasaonrails commented Sep 14, 2016

yuasaonrails commented Sep 14, 2016

tristanpenman commented Sep 14, 2016

yuasaonrails commented Sep 15, 2016

tristanpenman commented Sep 15, 2016

yuasaonrails commented Sep 16, 2016

beeva-enriqueotero commented Nov 16, 2016

yuasaonrails commented Apr 5, 2017

beeva-enriqueotero commented Apr 5, 2017

yuasaonrails commented Apr 5, 2017

Questions about Dataset #66

Questions about Dataset #66

Comments

yuasaonrails commented Sep 6, 2016

rgeorgej commented Sep 13, 2016

tristanpenman commented Sep 13, 2016

yuasaonrails commented Sep 14, 2016

yuasaonrails commented Sep 14, 2016

tristanpenman commented Sep 14, 2016

yuasaonrails commented Sep 15, 2016

tristanpenman commented Sep 15, 2016

yuasaonrails commented Sep 16, 2016

beeva-enriqueotero commented Nov 16, 2016

yuasaonrails commented Apr 5, 2017

beeva-enriqueotero commented Apr 5, 2017

yuasaonrails commented Apr 5, 2017