-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recommendation system implementation with new API #10535
Conversation
mov_combined_features = get_mov_combined_features() | ||
|
||
inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features) | ||
scale_infer = layers.scale(x=inference, scale=5.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new example, there is a new pattern like this. Should we follow that pattern here?
def inference_program:
...
return prediction
def train_program:
prediction = inference_program
...
return avg_cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Modified the code.
return mov_combined_features | ||
|
||
|
||
def train_network(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change it to train_program
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return avg_cost, scale_infer | ||
|
||
|
||
def inference_network(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this to inference_program
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle.dataset.movielens.test(), batch_size=BATCH_SIZE) | ||
|
||
def event_handler(event): | ||
if isinstance(event, fluid.EndIteration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now rename to fluid.EndEpochEvent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
def event_handler(event): | ||
if isinstance(event, fluid.EndIteration): | ||
if (event.batch_id % 10) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now re-name to event.epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
train_reader, | ||
EPOCH_NUM, | ||
event_handler=event_handler, | ||
data_feed_handler=partial(func_feed, feeding_map)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trainer takes in reader
and feed_order
. I am not sure if it will support data_feed_handler later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have missed what i have written in the description of the PR. I basically wanted to point out that the way data is fed (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py#L231) is a bit different here as compared to other book chapters, hence I think there is a need to invoke the concept of a data_feed_handler
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry about missing the PR description. I think we need to discuss with ReYung and JiaYi. So the train function can support it.
if (event.batch_id % 10) == 0: | ||
avg_cost = trainer.test(reader=test_reader) | ||
|
||
print('BatchID {0:04}, Loss {1:2.2}'.format(event.batch_id + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to event.epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacquesqiao : Can you please take a look? I modified the train()
and test()
methods to allow for a data_feed_handler
attribute.
@@ -301,7 +313,8 @@ def _train_by_any_executor(self, event_handler, exe, num_epochs, reader): | |||
begin_event = BeginStepEvent(epoch_id, step_id) | |||
event_handler(begin_event) | |||
if begin_event.fetch_metrics: | |||
metrics = exe.run(feed=data, | |||
metrics = exe.run(feed=data_feed_handler(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacquesqiao : I think probably this way of handling data
with data_feed_handler is incorrect right? We would need data from the reader, rather than the feeder.decorate_reader
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer to the above question is yes.
So do you have any recommendation on how we could handle this data_feed_handler with the decorate_reader
? I guess without the decorate_reader
we can't to parallel fetching, but now with its presence, it is tricky to incorporate the data_feed_handler
.
return [avg_cost, scale_infer] | ||
|
||
|
||
def func_feed(feeding, place, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can write a new reader above the default train reader? But not use a function handle, we can process the data in the new reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this approach. Let the Trainer handle the train process to keep it compact and simple. I also feel that way it will be easier for the user to understand the flow of the fluid programming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on this and it should be done by the end of Wed (May 23)
https://github.com/daming-lu/Paddle/tree/recommend_sys
Sid is on label_semantics, and Nicky on machine_translation, which might also need a new reader.
Closing this PR (see #10894). |
Fix #10423
In this chapter, the way input data is fed is a bit different than other chapters, hence I had to create a
data_feed_handler
attribute intrainer.train()
EDIT: The code is modified based on the recent update to trainer API (#10674). As above, we add a
data_feed_handler
attribute totrain()
.