Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to see Affinity/score for each recommendations ? #607

Closed
DilipKumar3 opened this issue Jan 30, 2023 · 5 comments
Closed

How to see Affinity/score for each recommendations ? #607

DilipKumar3 opened this issue Jan 30, 2023 · 5 comments
Labels
question Further information is requested status/needs-triage

Comments

@DilipKumar3
Copy link

❓ Questions & Help

I would like to see Affinity score for each recommendations after predictions ?. how can i view that in a tabular form ?

Details

@rnyak rnyak added the question Further information is requested label Jan 30, 2023
@DilipKumar3
Copy link
Author

@rnyak Im trying to build sequential model on my synthetic data using transformer4rec. so im trying to build an end-end model.

post the above question im trying get to know how to map the recommendations back to original categorical values. ? session_id back to its original catgrorical values. ?

@rnyak
Copy link
Contributor

rnyak commented Jan 30, 2023

For the original item_id and encoded item_id mappings you can use the unique_item_id parquet files in the categories folder that is automatically generated when you run NVT workflow.fit(...)

Please see this ticket for an example: #359. you can read the discussions there.

basically if you do that

category_item_id =  cudf.read_parquet('/workspace/kdd/categories/unique.item_id.parquet').reset_index().rename(columns={"index": "encoded_item_id"})

you will get your encoded item_id column and original ids in the same cudf dataframe. then you should just do the mapping via a simple pandas or cudf function you can write.

Btw, are you using start_index=1 in the categorify? if yes, then just shift the index column +1 in the category_item_id df column encoded_item_id. If no, you can use category_item_iddf values as it is.

@DilipKumar3
Copy link
Author

Thanks for the code @rnyak

image
This is the Dataframe that im passing to the prediction. which has 13860 session(in my case customer_id)

the prediction.predictions has array of length 13856. why is this difference (passing this after trimming the users with 1 interactions) ?
when im checking the unique values of label in prediction it's only 3308 . why it does not match with 13856 ?
based on my understanding, each record in prediction.predictions belongs to one sessions(my case customer_id).

please correct me if my understanding is wrong.

@rnyak
Copy link
Contributor

rnyak commented Feb 1, 2023

you are not going to generate predictions based on number of sessions, the predictions are generated based on your unique item_id in your item_id column in your custom train dataset. so in your raw train set, what is the number of unique item is you have? and what you see in your schema.pbtxt file?

predictions is a 2-dimentional array. first dimension shows the number of rows in your test set that you are doing predictions on, and second dimension shows your unique item catalog +1 . For a given session (meaning each row in your transformed test set) you are getting scores for the number of unique items in your train set +1 .

when im checking the unique values of label in prediction it's only 3308 .
not sure I understand. what do you mean label? can you pls explain?

@rnyak
Copy link
Contributor

rnyak commented Mar 7, 2023

@DilipKumar3 I am closing this ticket due to low activity. if you have further question, please reopen the ticket.

@rnyak rnyak closed this as completed Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested status/needs-triage
Projects
None yet
Development

No branches or pull requests

2 participants