Skip to content
This repository has been archived by the owner on Aug 15, 2020. It is now read-only.

Questions about Dataset #66

Closed
yuasaonrails opened this issue Sep 6, 2016 · 12 comments
Closed

Questions about Dataset #66

yuasaonrails opened this issue Sep 6, 2016 · 12 comments

Comments

@yuasaonrails
Copy link

Hi,
I was playing with the sample data, and now I have 3 questions.

Q1. How to make dataset with multiple feature values?
Currently, only one feature has one feature value. Is it possible to a feature has multiple values? If so how can I do that?

Q2. Changing all timestamps to 1 manually giving me a different result.
ml20m-all is the dataset of userId and movieId with timestamp.

userId movieId,timestamp: movieId,timestamp: movieId,timestamp…

On Issue#21, Mr.Rejith said “Currently no movie features are taken. Currently only 1/0 signals are supported from the wrapper script even though the Engine supports analog signals.”
So I changed all timestamps in ml20m-all to 1, and ran DSSTNE with modified data.
eg) 2,1112486027:29,1112484676:32,1112484819 to 2,1:29,1:32,1
I thought results would be the same, but it was not.
I am guessing that DSSTNE treats feature value as continuous value. Is this right? Then why did DSSTNE give me a different result?

Q3. Does DSSTNE support digital inputs?
On Issue#11, Mr.Rejith said “DSSTNE Engine supports analog inputs but we have not exposed it in the wrapper . if the Rating comes it could be viewed as an analog signals”
Analog inputs like Rating are continuous value, so I wondered if DSSTNE supports digital inputs like category id which is discrete value.

DSSTNE is wonderful. I feel like it has so much potential.
But I couldn’t figure how to use it well, and I couldn’t find detailed documentations online.

Thank you,
yuasa

@rgeorgej
Copy link
Contributor

Answers to the Questions
A1. Right now in the utilities that we provided does not support it. you can write another wrapper where each dataset is a locally connected network with all the feature values. Its bit a hassle and i wont recommend it but that is the only way right now

A2. Can you explain the difference in results. is the Floating values different or the Recommendations completely different. Can you paste me both the results

A3. DSSTNE Engine supports it . In the dataset(NetCDF) format when you create you need add the analog value .https://github.com/amznlabs/amazon-dsstne/blob/master/src/amazon/dsstne/utils/NetCDFhelper.cpp#L264 is the wrapper that you should call
I can work on exposing it in the wrapper

@tristanpenman
Copy link
Contributor

As far as Q2 is concerned, you may find that your results differ between runs due to DSSTNE's de-noising feature. When you use the example config.json file linked in the docs (https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json) the de-noising value is set to 0.2.

Can you try setting this value to 0 and re-running your tests?

If you continue to see differences in the output, we can dig into the issue further.

@yuasaonrails
Copy link
Author

@rgeorgej

Thanks for your reply. It’s really helpful.

Re:A1
I understand that it is possible, but it's difficult. And I am not going to try since you don’t recommend to do it.

Re:A2
Here are the results.

What I have done: I changed all timestamp in ml20m-all to 1, and ran DSSTNE.
config.json is from https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json, and I didn’t edit it.
commands are following:
generateNetCDF -d gl_input -i ml20m-all -o gl_input.nc -f features_input -s samples_input -c generateNetCDF -d gl_output -i ml20m-all -o gl_output.nc -f features_output -s samples_input -c wget https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/configs/config.json train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10 predict -b 1024 -d gl -i features_input -o features_output -k 10 -n gl.nc -f ml20m-all -s recs -r ml20m-all

The difference between results is that there are watched movie_ids in the result of modified ml20m-all.
-f option (filter out) seems not working with modified ml20m-all when I predict.

Result of original ml20m-all
1 2571,0.905:1206,0.864:1210,0.852:1270,0.794:1274,0.668:592,0.662:6874,0.653:1197,0.631:5618,0.612:3793,0.611:
2 1200,0.464:1240,0.460:1097,0.361:1127,0.288:32,0.265:593,0.251:2571,0.239:2628,0.237:1198,0.211:780,0.201:
3 2716,0.878:1580,0.874:1527,0.811:2021,0.800:750,0.798:1371,0.775:1387,0.705:3471,0.690:2174,0.679:1320,0.671:
4 500,0.561:597,0.532:457,0.526:587,0.414:780,0.371:592,0.370:442,0.355:344,0.328:539,0.317:364,0.275:
5 356,0.963:1,0.878:539,0.841:597,0.811:527,0.735:357,0.653:586,0.653:592,0.616:34,0.570:733,0.559:
6 95,0.611:786,0.544:5,0.537:32,0.525:36,0.397:104,0.391:376,0.384:25,0.379:608,0.363:784,0.350:
7 2724,0.791:1569,0.761:4022,0.743:2571,0.742:3623,0.741:2706,0.699:2763,0.696:4246,0.689:1961,0.683:1584,0.674:
8 586,0.735:318,0.679:420,0.639:410,0.638:440,0.612:225,0.602:34,0.590:474,0.575:160,0.564:300,0.516:
9 2858,0.468:2762,0.296:3578,0.271:2694,0.206:593,0.178:3273,0.175:2712,0.173:2572,0.173:3005,0.170:2541,0.169:
10 1193,0.441:110,0.354:1291,0.303:593,0.272:1213,0.250:1234,0.249:1270,0.245:1036,0.237:318,0.235:1225,0.223:

Result of modified ml20m-all
1 1214,0.968:1196,0.966:4993,0.962:260,0.957:1258,0.947:5952,0.947:1200,0.943:296,0.936:1198,0.918:2571,0.905:
2 1210,0.743:260,0.677:1196,0.624:1214,0.597:589,0.495:1200,0.464:1240,0.460:480,0.442:1270,0.437:1097,0.361:
3 1196,0.990:260,0.986:1210,0.983:1214,0.977:1200,0.974:1240,0.974:1270,0.967:589,0.957:541,0.957:1374,0.956:
4 480,0.892:589,0.819:356,0.784:377,0.700:500,0.561:597,0.532:457,0.526:586,0.477:367,0.462:587,0.414:
5 356,0.963:260,0.958:480,0.958:780,0.932:364,0.904:457,0.886:1,0.878:500,0.875:588,0.851:539,0.841:
6 780,0.937:736,0.869:648,0.835:1,0.741:1073,0.713:62,0.693:141,0.677:733,0.636:95,0.611:260,0.577:
7 1580,0.954:1721,0.954:2396,0.950:480,0.931:1097,0.928:597,0.922:1270,0.918:3408,0.904:2628,0.887:1210,0.874:
8 480,0.976:592,0.974:457,0.974:356,0.968:380,0.964:589,0.955:590,0.955:377,0.953:165,0.938:153,0.927:
9 2706,0.491:2858,0.468:2710,0.444:2683,0.424:2959,0.331:2762,0.296:3578,0.271:2694,0.206:858,0.184:593,0.178:
10 858,0.866:260,0.828:1198,0.790:1196,0.767:1210,0.678:1221,0.669:2028,0.491:912,0.446:1193,0.441:527,0.412:

For example:
1 2571,0.905:1206,0.864:1210,0.852:1270,0.794:1274,0.668:592,0.662:6874,0.653:1197,0.631:5618,0.612:3793,0.611:
1 1214,0.968:1196,0.966:4993,0.962:260,0.957:1258,0.947:5952,0.947:1200,0.943:296,0.936:1198,0.918:2571,0.905:
Floating value is the same for movie_id 2571; the value is 0.905 in both results. But latter result has movie ids which user_id has watched. I’ve checked that in ml20m-all.

Re:A3
I get confused, and I want your clarification. My question was whether or not DSSTNE supported digital input.
Your answer was DSSTNE supported it. Then you said “you need to add the analog value”.
From the source code you mentioned, I guess it supports. Could you clarify that please?
And please work on exposing it in the wrapper. I believe everyone would love it.

Thanks,
yuasa

@yuasaonrails
Copy link
Author

@tristanpenman
Yes, I can!

I changed the de-noising value to 0 in config.json, and I got a different result.
However, even I edited config.json, I got watched movie id in result of modified ml20m-all.
I didn’t change any commands from example page.

original ml20m-all
1 1210,0.909:2571,0.900:1206,0.882:1270,0.807:1274,0.713:1197,0.690:6874,0.687:3793,0.680:5618,0.641:1199,0.638:
2 1200,0.497:1240,0.488:1097,0.391:1127,0.332:2628,0.280:2571,0.272:32,0.263:1374,0.227:780,0.218:1198,0.212:
3 1580,0.861:2716,0.845:750,0.841:2021,0.790:1371,0.781:1527,0.772:1387,0.734:3471,0.721:608,0.704:1136,0.689:
4 500,0.642:597,0.614:457,0.526:587,0.481:780,0.450:442,0.427:539,0.371:592,0.334:344,0.330:485,0.321:
5 356,0.965:539,0.862:1,0.846:597,0.827:527,0.746:357,0.686:586,0.680:733,0.615:592,0.571:34,0.566:
6 95,0.617:5,0.550:786,0.545:32,0.509:104,0.400:36,0.390:376,0.383:25,0.368:784,0.347:805,0.346:
7 2724,0.791:4022,0.754:3623,0.752:1569,0.747:2571,0.710:2763,0.705:4246,0.699:2706,0.693:3418,0.677:1240,0.672:
8 586,0.755:318,0.651:410,0.632:420,0.630:34,0.627:440,0.598:225,0.597:474,0.576:160,0.559:300,0.518:
9 2858,0.533:2762,0.322:3578,0.286:2694,0.228:2572,0.191:593,0.188:2712,0.185:2541,0.185:3005,0.175:2716,0.172:
10 1193,0.458:110,0.357:1291,0.349:1234,0.271:593,0.269:919,0.255:1270,0.252:1213,0.250:1036,0.242:1225,0.238:

modified ml20m-all
1 1214,0.982:1196,0.982:260,0.979:4993,0.977:1200,0.970:5952,0.966:1258,0.959:1198,0.947:296,0.944:541,0.933:
2 1210,0.787:260,0.692:1196,0.670:1214,0.612:589,0.565:480,0.516:1200,0.497:1270,0.490:1240,0.488:1097,0.391:
3 1196,0.992:260,0.989:1214,0.981:1210,0.981:1200,0.974:1270,0.974:1240,0.973:589,0.966:541,0.966:1374,0.956:
4 480,0.932:589,0.864:356,0.836:377,0.774:500,0.642:597,0.614:586,0.559:457,0.526:367,0.524:587,0.481:
5 260,0.975:356,0.965:480,0.964:780,0.946:457,0.902:364,0.900:500,0.885:1210,0.872:539,0.862:648,0.854:
6 780,0.942:736,0.877:648,0.843:1,0.780:1073,0.723:62,0.709:141,0.683:733,0.638:95,0.617:260,0.583:
7 2396,0.948:1721,0.947:1580,0.934:480,0.931:597,0.929:1097,0.924:1270,0.911:3408,0.909:2671,0.874:356,0.871:
8 480,0.984:457,0.978:356,0.977:592,0.976:380,0.968:589,0.963:377,0.963:590,0.959:165,0.947:153,0.937:
9 2706,0.555:2858,0.533:2683,0.478:2710,0.475:2959,0.395:2762,0.322:3578,0.286:2694,0.228:858,0.198:2572,0.191:
10 858,0.904:260,0.891:1198,0.852:1196,0.829:1210,0.768:1221,0.708:912,0.515:2028,0.505:1193,0.458:527,0.446:

head -1 ml20m-all | grep 1214
I saw user_id 1 watched movie_id 1214.

Can you dig this further?

Also, what do you mean by “you may find that your results differ between runs due to DSSTNE's de-noising feature”? I get the same result every time I run with the same config.json.

Thanks,
yuasa

@tristanpenman
Copy link
Contributor

@yuasa, did you retrain the model after changing config.json, or did you just re-run 'predict'?

And when I say that your results may differ, I meant that re-running 'predict' with a de-noising value > 0 may give you different recommendations.

@yuasaonrails
Copy link
Author

@tristanpenman

I retrained.
I removed gl.nc, ran 'train' because I thought train used config.json (since I saw -c config.json in train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10), and ran 'predict'.

And yes, I got different recommendations.

@tristanpenman
Copy link
Contributor

@yuasaonrails, thanks for taking the time to do all the extra debugging. We're going to work on some features / enhancements to better support analog and digital inputs. I've created issue #69 as a starting point. You can watch that ticket via Github notifications to track progress.

In the mean time, I suggest closing this ticket. It has been referenced in issue #69, and can always be reopened in the future.

@yuasaonrails
Copy link
Author

@tristanpenman
I thank you too.
It was my pleasure helping you guys and DSSTNE users in the world.
I can't wait for the wrapper!

Now, I am closing this issue.
Thanks again for your hard work on DSSTNE.

@beeva-enriqueotero
Copy link

Hello @yuasaonrails

I realized the problem you got is caused by the predict -f filter. Concretely due to a bug/hack on Filters.cpp.

So items are not filtered for low numeric values (<= 10.0), and already viewed items can be recommended.

Otherwise, apart from this (important!) filter issue, the behaviour is the same with timestamps or ratings. All these values are ignored with default 'indicator' type.

Regards, and thanks to DSSTNE team for sharing your work!

@yuasaonrails
Copy link
Author

@beeva-enriqueotero
Hello.

Thanks for your help.
What do you mean by "The behaviour is the same with timestamps or ratings. All these values are ignored with default 'indicator' type."

@beeva-enriqueotero
Copy link

Hello @yuasaonrails

I mean that only implicit feedback ("indicator" type) is implemented. So any real or integer value, timestamp or rating is ignored. Apart from the predict -ffilter issue I referred on my previous comment.

There is an open issue asking for enhancement to correctly handle "analog" type: #69

Regards

@yuasaonrails
Copy link
Author

@beeva-enriqueotero

Thank you very much for the clarification.
I understand.

Your help is greatly appreciated

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants