Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paddle V4 API - Word to Vec #10214

Closed
helinwang opened this issue Apr 25, 2018 · 7 comments
Closed

Paddle V4 API - Word to Vec #10214

helinwang opened this issue Apr 25, 2018 · 7 comments

Comments

@helinwang
Copy link
Contributor

helinwang commented Apr 25, 2018

API design: #10152

import paddle.fluid as fluid
import paddle

IS_SPARSE = True
BATCH_SIZE = 32
NUM_EPOCHS = 3
N = 5


class Word2Vec(fluid.Program):
  EMBED_SIZE = 32
  HIDDEN_SIZE = 256

  def __init__(dict_size, is_sparse):
      self.dict_size = dict_size
      self.is_sparse = is_sparse

  def predict(self):
      embeds = []
      num_embeds = 4
      for i in range(num_embeds):
        # every embedding will share the same parameter
        with fluid.var_scope("shared_embedding"):
            embed = fluid.layers.embedding(
                input=first_word,
                size=[self.dict_size, EMBED_SIZE],
                dtype='float32',
                is_sparse=self.is_sparse)
            embeds.append(embed)

      concat_embed = fluid.layers.concat(
          input= embeds, axis=1)
      hidden1 = fluid.layers.fc(input=concat_embed,
                                size=HIDDEN_SIZE,
                                act='sigmoid')
      predict_word = fluid.layers.fc(input=hidden1,
                                     size=self.dict_size,
                                     act='softmax')
      return predict_word

  @network("firstw", "secondw", "thirdw", "forthw", "nextw")
  def train_step(self):
      first_word = fluid.layers.data(name='firstw', shape=[1], dtype='int64')
      second_word = fluid.layers.data(name='secondw', shape=[1], dtype='int64')
      third_word = fluid.layers.data(name='thirdw', shape=[1], dtype='int64')
      forth_word = fluid.layers.data(name='forthw', shape=[1], dtype='int64')
      next_word = fluid.layers.data(name='nextw', shape=[1], dtype='int64')
      with fluid.var_scope("predict"):
         predict_word = self.predict(first_word, second_word, third_word, forth_word)
      cost = fluid.layers.cross_entropy(input=predict_word, label=next_word)
      avg_cost = fluid.layers.mean(cost)
      sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
      sgd_optimizer.minimize(avg_cost)
      return avg_cost

  @network("firstw", "secondw", "thirdw", "forthw")
  def infer(self):
      first_word = fluid.layers.data(name='firstw', shape=[1], dtype='int64')
      second_word = fluid.layers.data(name='secondw', shape=[1], dtype='int64')
      third_word = fluid.layers.data(name='thirdw', shape=[1], dtype='int64')
      forth_word = fluid.layers.data(name='forthw', shape=[1], dtype='int64')
      with fluid.var_scope("predict"):
          predict_word = self.predict(first_word, second_word, third_word, forth_word)
      return predict_word

# training and infer begins here
word_dict = paddle.dataset.imikolov.build_dict()
word2vec = Word2Vec(len(word_dict), IS_SPARSE).Compile()
train_reader = paddle.batch(
    paddle.dataset.imikolov.train(word_dict, N), BATCH_SIZE)

for epoch_id in range(NUM_EPOCHS):
    for data in train_reader():
        avg_cost = word2vec.train_step(data[0], data[1], data[2], data[3], data[4])

next_word = word2vec.infer(1,2,3,4)

# save inference parameters
fluid.save_parameters(word2vec.infer, "./infer_params")

# load inference parameters example
# `Compile` initializes the parameters, but will be overridden by `fluid.load_parameters`.
word2vec_new = Word2Vec(len(word_dict), IS_SPARSE).Compile()
fluid.load_parameters(word2vec_new.infer, "./infer_params")
next_word = word2vec_new.infer(1,2,3,4)
@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Apr 26, 2018

Thanks for this design!

I understand that this design proposes a way to define and call (from Python) Fluid functions. In particular, the definition of Fluid functions depends not only on Fluid blocks, but also Python decorator @network.

However, given that Fluid functions might be called from other host languages (C++, Java, Go) other than Python, we'd prefer Fluid function definitions implemented only in Fluid.

I think that Fluid functions might be called from other host languages is due to the way we do inference -- the form of the Fluid inference engine could be a gRPC server in C++, Java, or Go, an Objective-C program built for ARM, etc; anyway, they need to be able to call the Fluid function describing the inference process.

I am drafting the proposal of function definition and function invocation in Fluid: #10244

@helinwang
Copy link
Contributor Author

helinwang commented Apr 26, 2018

However, given that Fluid functions might be called from other host languages (C++, Java, Go) other than Python, we'd prefer Fluid function definitions implemented only in Fluid.

Agree, we need to store function signature in Fluid program desc just like the exported functions in a shared library. In this way a program desc can be called from other languages.

In particular, the definition of Fluid functions depends not only on Fluid blocks, but also Python decorator @network.

To me @network is just a front-end language syntax, marking the function below as a to-be-compiled function, and mark its inputs. The Python Compile function can save this information to program desc.

I think that Fluid functions might be called from other host languages is due to the way we do inference -- the form of the Fluid inference engine could be a gRPC server in C++, Java, or Go, an Objective-C program built for ARM, etc; anyway, they need to be able to call the Fluid function describing the inference process.

Agree.

@Superjomn
Copy link
Contributor

To my understanding, the user syntax has two layers, the underlying layer implements the basic but flexible elements which can cover 100% use cases, trivial but complete, just like the TF; the upper layer makes some encapsulation to make the 80% frequent use cases easier, like Keras or other wrappers.

I wonder whether this is the mix of these two layers or the underlying layer?

@Superjomn
Copy link
Contributor

Superjomn commented Apr 27, 2018

def predict(self):
    # every embedding will share the same parameter
    with fluid.var_scope("shared_embedding"):
        embed_first = fluid.layers.embedding(
            input=first_word,
            size=[self.dict_size, EMBED_SIZE],
            dtype='float32',
            is_sparse=self.is_sparse)
    with fluid.var_scope("shared_embedding"):
        embed_second = fluid.layers.embedding(
            input=second_word,
            size=[self.dict_size, EMBED_SIZE],
            dtype='float32',
            is_sparse=self.is_sparse)
    with fluid.var_scope("shared_embedding"):
        embed_third = fluid.layers.embedding(
            input=third_word,
            size=[self.dict_size, EMBED_SIZE],
            dtype='float32',
            is_sparse=self.is_sparse)
    with fluid.var_scope("shared_embedding"):
        embed_forth = fluid.layers.embedding(
            input=forth_word,
            size=[self.dict_size, EMBED_SIZE],
            dtype='float32',
            is_sparse=self.is_sparse)

can be rewritten to

def predict(self):
    embeds = []
    num_embeds = 4
    # every embedding will share the same parameter
    with fluid.var_scope("shared_embedding"):
        for i in range(num_embeds):
            embed = fluid.layers.embedding(
                input=first_word,
                size=[self.dict_size, EMBED_SIZE],
                dtype='float32',
                is_sparse=self.is_sparse)
            embeds.append(embed)

also

@network("firstw", "secondw", "thirdw", "forthw")
  def infer(self):
      first_word = fluid.layers.data(name='firstw', shape=[1], dtype='int64')
      second_word = fluid.layers.data(name='secondw', shape=[1], dtype='int64')
      third_word = fluid.layers.data(name='thirdw', shape=[1], dtype='int64')
      forth_word = fluid.layers.data(name='forthw', shape=[1], dtype='int64')

to

word_names = 'firstw secondw thirdw forthw nextw'.split()
@network(*word_names)
def train_step(self):
    words = [fluid.layers.data(name=n, shape=[1], dtype='int64') for n in word_names]

@helinwang
Copy link
Contributor Author

helinwang commented Apr 27, 2018

Thanks @Superjomn ! That is a very good question!

Sorry maybe the title is a little misleading, this issue mainly tries to address how to do cross language invocation (e.g., Python calls Fluid program). It is primarily done by using the @network decorator and the Compile function. It also address scope sharing, the compiled program shares a same scope (e.g., word2vec.train_step and word2vec.infer shares the same scope.

The issue you raise is very important, however I don't think it's the focus for this issue. The example "network construction" code is taken from our fluid example directly. We need to answer this question in another discussion for Fluid as a whole. Sorry perhaps my title is too general.

@helinwang
Copy link
Contributor Author

helinwang commented Apr 27, 2018

@Superjomn thanks!!! The first code change actually should be:

def predict(self):
    embeds = []
    num_embeds = 4
    for i in range(num_embeds):
        # every embedding will share the same parameter
        with fluid.var_scope("shared_embedding"):
            embed = fluid.layers.embedding(
                input=first_word,
                size=[self.dict_size, EMBED_SIZE],
                dtype='float32',
                is_sparse=self.is_sparse)
            embeds.append(embed)

with fluid.var_scope("shared_embedding"): resets the unique name generator with the new scope. So it needs to be called in every step.

Changed.

The second one I prefer making it plain simple for illustration purpose :)

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants