Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recommendation system implementation with new API #10535

Closed
wants to merge 8 commits into from

Conversation

sidgoyal78
Copy link
Contributor

@sidgoyal78 sidgoyal78 commented May 9, 2018

Fix #10423

In this chapter, the way input data is fed is a bit different than other chapters, hence I had to create a data_feed_handler attribute in trainer.train()

EDIT: The code is modified based on the recent update to trainer API (#10674). As above, we add a data_feed_handler attribute to train().

mov_combined_features = get_mov_combined_features()

inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features)
scale_infer = layers.scale(x=inference, scale=5.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new example, there is a new pattern like this. Should we follow that pattern here?

def inference_program:
   ...
   return prediction

def train_program:
    prediction = inference_program
    ...
    return avg_cost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Modified the code.

return mov_combined_features


def train_network():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it to train_program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return avg_cost, scale_infer


def inference_network():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to inference_program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)

def event_handler(event):
if isinstance(event, fluid.EndIteration):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now rename to fluid.EndEpochEvent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def event_handler(event):
if isinstance(event, fluid.EndIteration):
if (event.batch_id % 10) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now re-name to event.epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

train_reader,
EPOCH_NUM,
event_handler=event_handler,
data_feed_handler=partial(func_feed, feeding_map))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trainer takes in reader and feed_order. I am not sure if it will support data_feed_handler later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have missed what i have written in the description of the PR. I basically wanted to point out that the way data is fed (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py#L231) is a bit different here as compared to other book chapters, hence I think there is a need to invoke the concept of a data_feed_handler. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about missing the PR description. I think we need to discuss with ReYung and JiaYi. So the train function can support it.

if (event.batch_id % 10) == 0:
avg_cost = trainer.test(reader=test_reader)

print('BatchID {0:04}, Loss {1:2.2}'.format(event.batch_id + 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to event.epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao : Can you please take a look? I modified the train() and test() methods to allow for a data_feed_handler attribute.

@@ -301,7 +313,8 @@ def _train_by_any_executor(self, event_handler, exe, num_epochs, reader):
begin_event = BeginStepEvent(epoch_id, step_id)
event_handler(begin_event)
if begin_event.fetch_metrics:
metrics = exe.run(feed=data,
metrics = exe.run(feed=data_feed_handler(data)
Copy link
Contributor Author

@sidgoyal78 sidgoyal78 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao : I think probably this way of handling data with data_feed_handler is incorrect right? We would need data from the reader, rather than the feeder.decorate_reader right?

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to the above question is yes.

So do you have any recommendation on how we could handle this data_feed_handler with the decorate_reader? I guess without the decorate_reader we can't to parallel fetching, but now with its presence, it is tricky to incorporate the data_feed_handler.

return [avg_cost, scale_infer]


def func_feed(feeding, place, data):
Copy link
Member

@jacquesqiao jacquesqiao May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can write a new reader above the default train reader? But not use a function handle, we can process the data in the new reader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this approach. Let the Trainer handle the train process to keep it compact and simple. I also feel that way it will be easier for the user to understand the flow of the fluid programming.

Copy link
Contributor

@daming-lu daming-lu May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this and it should be done by the end of Wed (May 23)

https://github.com/daming-lu/Paddle/tree/recommend_sys

Sid is on label_semantics, and Nicky on machine_translation, which might also need a new reader.

@sidgoyal78
Copy link
Contributor Author

Closing this PR (see #10894).

@sidgoyal78 sidgoyal78 closed this May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants