Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze a model to serve within API #10

Open
PauloQuerido opened this issue Sep 11, 2018 · 15 comments
Open

Freeze a model to serve within API #10

PauloQuerido opened this issue Sep 11, 2018 · 15 comments

Comments

@PauloQuerido
Copy link
Contributor

PauloQuerido commented Sep 11, 2018

Hi.

I successfully tested a portuguese corpus I prepared and trained ( change line in utils.py for word in word_tokenize(sentence, language='portuguese'): ).

I'd like to have a frozen model in a single .pb file in order to serve within an API. I tried several approaches, like this: https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc

But unsuccessfully.

Would you consider providing some method to export a saved model? Or point me to the right direction?

Thanks!

@dongjun-Lee
Copy link
Owner

Hi. I think what you want to do is implemented at test.py. When you run the train.py, the model is saved at every epoch. test.py loads the last saved model and it creates the summary of valid.article.filter.txt. If you have further questions, feel free to ask me. Thank you!

@gifflarn
Copy link

@PauloQuerido I too am trying to get a frozen graph to work. I got the .pb file from the link you posted using his freeze_graph function with output_node_names=decoder/decoder/transpose_1
I am now stuck on using the frozen graph, since importing the graph yields me "You must feed a value to tensor, Placeholder_2 and Placeholder_3" which are tensors used in training (I think). It's weird because in test.py, running model.prediction with only three fed tensors works, but when frozen the model does not like me only using those three.
If you are able to progress further than this, please hear me out

@gifflarn
Copy link

@dongjun-Lee Do you possibly have any insight on why
model.decoder_input: batch_decoder_input,
model.decoder_len: batch_decoder_len,
model.decoder_target: batch_decoder_output
are needed in a frozen graph but not during a regular test.py session?

@dongjun-Lee
Copy link
Owner

@gifflarn I'm sorry but I'm not familiar with the frozen graph. I'll look at it soon.

@gogasca
Copy link
Contributor

gogasca commented Sep 20, 2018

@gifflarn I tried to use the following code to extract the output node names:

[n.name for n in tf.get_default_graph().as_graph_def().node]

This is my code to freeze the Graph:

https://gist.github.com/gogasca/ac743e3664c3e9cb668e9666c9e7b025

I'm unable to restore PB an generate predictions.

While the test.py achieves reading file in a local environment what I want to do is offer an API. Anyone had any luck restoring the PB ?

@gifflarn
Copy link

@gogasca isn't using every node in the graph as a output node counterproductive?
I am able to restore the .pb with dummy values for Placeholder_2 and Placeholder_3, but that gives me some bad results. So decoder_input and decoder_len obviously has some impact in testing as well. However, running test.py never initializes these tensors. I'm a bit confused by this

@gogasca
Copy link
Contributor

gogasca commented Sep 20, 2018

@gifflarn Is possible that I don't really need to enter all the output nodes, I'm just testing. I will continue working on it today, how did you freeze the model to .pb ?

The only difference I see b/w train and test is the the way run is executed and the parameters he passed.

Train

train_feed_dict = {
            model.batch_size: len(batch_x),
            model.X: batch_x,
            model.X_len: batch_x_len,
            model.decoder_input: batch_decoder_input,
            model.decoder_len: batch_decoder_len,
            model.decoder_target: batch_decoder_output
        }

_, step, loss = sess.run([model.update, model.global_step, model.loss],
                                 feed_dict=train_feed_dict)

Test

valid_feed_dict = {
            model.batch_size: len(batch_x),
            model.X: batch_x,
            model.X_len: batch_x_len,
        }

prediction = sess.run(model.prediction,
                                feed_dict=valid_feed_dict)

@gifflarn
Copy link

gifflarn commented Sep 21, 2018

@gogasca From my understanding you only specify the last layer(s) from the graph as output nodes, 'freezing' everything between input and output node. I only specified decoder/decoder/transpose_1 as output node. And I hoped I could get it to work like this, without success
output = graph.get_tensor_by_name('prefix/decoder/decoder/transpose_1:0') input1 = graph.get_tensor_by_name('prefix/batch_size:0') input2 = graph.get_tensor_by_name('prefix/Placeholder:0') input3 = graph.get_tensor_by_name('prefix/Placeholder_1:0')

prediction = self.sess.run(output, feed_dict={ input1: len(batch), input2: batch, input3: batch_x_len})

@gogasca
Copy link
Contributor

gogasca commented Sep 23, 2018

@gifflarn
I changed my script to use the SavedModelBuilder and now I can export and read the .pb successfully, but still facing similar issues as you described before:

Export to PB
https://gist.github.com/gogasca/305c14dea2ad342f163d3865e8576acd

Serving using .PB file
https://gist.github.com/gogasca/7d11b9cbb7f600fb3f4ecc026fa40929

Based on:
https://towardsdatascience.com/deploy-tensorflow-models-9813b5a705d5

I get this error when I run the second script in gist.

         valid_feed_dict = {
            batch_size: len(batch_x),
            X: batch_x,
            X_len: batch_x_len,
        }
        prediction = sess.run(transpose, feed_dict=valid_feed_dict)

Error

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_2' with dtype int32 and shape [?,15]
	 [[Node: Placeholder_2 = Placeholder[dtype=DT_INT32, shape=[?,15], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Any other suggestions?

I modified the utils.py to read text instead of a file, also replaced map and lambdas with list comprehension to improve readability.

def build_prediction(text, word_dict, article_max_len):   
    if not text:
        raise ValueError('Empty text')
    x = [word_tokenize(d) for d in _get_text_list(text)]
    x = [[word_dict.get(w, word_dict['<unk>']) for w in d] for d in x]
    x = [d[:article_max_len] for d in x]
    return [d + (article_max_len - len(d)) * [word_dict['<padding>']] for d in x]

@gogasca
Copy link
Contributor

gogasca commented Sep 24, 2018

I did a slight modification of test.py and now I'm able to do API requests.

This is just a workaround, as I haven't solved the export to .pb issue.
Please take a look at code below:

  1. Use the build_prediction function above.
  2. Replace map/lambdas with list comprehensions for readability.
"""Use a pre-train model."""

import tensorflow as tf
import pickle

from model import Model
from utils import build_dict, build_prediction, batch_iter

with open('args.pickle', 'rb') as f:
    args = pickle.load(f)

print('Loading dictionary...')
word_dict, reversed_dict, article_max_len, summary_max_len = build_dict('test',
                                                                        args.toy)
print('Loading validation dataset...')

sess = tf.Session()
print('Loading saved model...')
model = Model(reversed_dict, article_max_len, summary_max_len, args,
              forward_only=True)
saver = tf.train.Saver(tf.global_variables())
checkpoint = tf.train.get_checkpoint_state('./saved_model/')
initialize = tf.global_variables_initializer()
sess.run(initialize)
saver.restore(sess, checkpoint.model_checkpoint_path)


def summarize(text):
    """

    Args:
        text: (List) A Text array. Example ['This is a long text']

    """
    valid_x = build_prediction(text, word_dict, article_max_len)
    valid_x_len = [len([y for y in x if y != 0]) for x in valid_x]
    batches = batch_iter(valid_x, [0] * len(valid_x), args.batch_size, 1)
    for batch_x, _ in batches:
        batch_x_len = [len([y for y in x if y != 0]) for x in batch_x]
        valid_feed_dict = {
            model.batch_size: len(batch_x),
            model.X: batch_x,
            model.X_len: batch_x_len,
        }
        prediction = sess.run(model.prediction, feed_dict=valid_feed_dict)
        prediction_output = [[reversed_dict[y] for y in x] for x in
                             prediction[:, 0, :]]
        for line in prediction_output:
            summary = []
            for word in line:
                if word == '</s>':
                    break
                if word not in summary:
                    summary.append(word)
            return ' '.join(summary)

This is the Flask server:

"""Server"""

import summarizer
from flask import Flask, request, Response, json

app = Flask(__name__)


@app.route('/')
def index():
    return Response('TensorFlow text summarizer')


@app.route('/summary', methods=['POST'])
def process_text():
    """Process text."""
    try:
        if request.is_json:
            content = request.json
            text = content.get('text')
            summary = summarizer.summarize([text])
            if summary:
                return app.response_class(
                    response=json.dumps(summary),
                    status=200,
                    mimetype='application/json'
                )
        return app.response_class(
            response=json.dumps('No JSON content found'),
            status=400,
            mimetype='application/json'
        )
    except Exception as exception:
        print('POST /summary error: %e' % exception)
        return exception


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8081, debug=True)

API Request:

curl -H "Content-type: application/json" -X POST http://127.0.0.1:8081/summary -d '{"text": "australian foreign minister alexander downer called wednesday for the reform of the un security council and expressed support for brazil, india , japan and an african country to join the council ."}'

@gifflarn
Copy link

gifflarn commented Sep 24, 2018

I too got the script to answer to API calls, but it was very slow (~2s per sentence), hence why I am trying to freeze it. Could you time your solution? Maybe freezing the graph is not necessary.

And to your earlier comment, I did feed dummy values to Tensor_2 and Tensor_3, this yielded decimal values which could not be read from the dictionary, and if Flooring the decimals, I got really weird results.

@gogasca
Copy link
Contributor

gogasca commented Sep 24, 2018

Now is running in a Mac Pro ~16GB RAM/Intel Core i7.
Is taking ~1s which is not ideal. I would be happy with < 500 ms per request.

curl -o /dev/null -s -w 'Total: %{time_total}\n' -H "Content-type: application/json" -X POST http://127.0.0.1:8081/summary -d '{"text": "australian foreign minister alexander downer called wednesday for the reform of the un security council and expressed support for brazil, india , japan and an african country to join the council ."}'
Total: 1.094210

@gifflarn
Copy link

gifflarn commented Oct 2, 2018

@gogasca Did you progress any further on this? I kind of put this project on the bench for now, but I really want it to work, so if you have any ideas, I'm willing to try.

@gogasca
Copy link
Contributor

gogasca commented Oct 5, 2018

@gifflarn I'm resuming this project today, need to present some results within the next 2 weeks, I will update the progress.

@gifflarn
Copy link

@gogasca I had the idea of rewriting model.py and putting the decoder placeholders under a
if forward_only: statement and not let them get initialized at all during the testing phase. Do you think that might help?
Of course, that would mean retraining the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants