Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraints for image captioning module #472

Merged
merged 12 commits into from Jul 26, 2018
Merged

Conversation

lorisbaz
Copy link
Contributor

@lorisbaz lorisbaz commented Jul 10, 2018

Adding

  • Constrains input from plain file.
  • Image Captioning now supports constrained decoding.
  • Image Captioning: zero padding of features now allows input features of different shape for each image.

Pull Request Checklist

  • Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • You have considered writing a test
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lorisbaz and others added 4 commits May 11, 2018 10:08
Merge from upstream (original) repository
Merge from awslabs/sockeye
Merge from awslabs/sockeye
…aptioning module. Zero padding of features enables to have input features of different shape for each image.
@lorisbaz lorisbaz changed the title [WIP] Constraints for image captioning module Constraints for image captioning module Jul 10, 2018
Copy link
Contributor

@fhieber fhieber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice feature for image captioning!

CHANGELOG.md Outdated
@@ -10,6 +10,12 @@ Note that Sockeye has checks in place to not translate with an old model that wa

Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.

## [1.18.34]
### Added
- Constrains input from plain file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description isn't clear to me: what constrains the input?

Copy link
Contributor Author

@lorisbaz lorisbaz Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update that description. The only way to input constraints before was via json file.

Now you can use a plain file. In each line, you have a list of space-separated strings (constraints) which correspond to an image (or source sentence). This is a parallel file to the one specified in --input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so are these then only single-word constraints? How about several multi-word constraints (which is supported by JSON)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. How about having them separated by some special character (e.g., comma or tab)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g.: blue sky, cow, grass, ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would need to rewrite the entire parsing logic of separating tokens. You can see how complicated this already is with the factored strings. In my opinion, inputs with constraints and/or factors is inherently structured data and not just plain strings. JSON provides a good format for such use cases, whereas plain file parsing is quite error-prone.

I don't want to enforce my opinion on this though. Maybe others can chime in on this discussion? We could also separate the constrained decoding for image captioning feature from this API change to unblock this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point. However, it seems that having the target sentence and constraints coupled (as in the JSON) is not flexible, because we might want to run experiments with different constraints. In image captioning we can have a target file and multiple constraints file (one for each experiment we want to run).

The parsing logic is simple: if json is not provided, we search for the constraint file; otherwise we use the json file as before. Is this too complex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "source sentence", right?

Creating different constraints for an input sentence is already easy to do. You can use the sockeye.lexical_constraints CLI to produce inputs with arbitrary constraints and pass/pipe them directly to sockeye.translate / sockeye.captioner with the --json-input flag.

I think we should hold off on changing the overall API / behavior of the API/translate CLI with this PR. This PR should concentrate on enabling constraints for image captioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I meant source sentence.

lexical_constraints expects a sentence as input + constraints. Would it work with the image relative path?

I wanted to avoid to use it, because in my case I have to first generate the constraints, then the file with source sentence (image path) + constraints, then apply lexical_constraints and now I can perform inference. It is a bit tricky sequence of steps.

I can revert the changes and remove the option --input-constraints, even though I think it might be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how your captioner currently handles the content of the 'text' field in the JSON object. If it uses it as a relative image path this should already work.

In the end its up to you how you define the API for the image captioning use case, but I vote for keeping the general translation API (sockeye.translate) as is by reverting the changes in the files out side of sockeye/image_captioning/.

CHANGELOG.md Outdated
### Added
- Constrains input from plain file.
- Contraints can be used for image captioning module.
- Zero padding of features enables to have input features of different shape for each image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rephrase to make clear that this refers to image captioning? Something like: "Image Captioning: zero padding of features now allows input features of different shape for each image."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

CHANGELOG.md Outdated
## [1.18.34]
### Added
- Constrains input from plain file.
- Contraints can be used for image captioning module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe: Image Captioning now supports constrained decoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -1079,6 +1079,12 @@ def add_inference_args(params):
"Optionally, a list of factors can be provided: "
"{'text': 'some input string', 'factors': ['C C C', 'X X X']}.")

decode_params.add_argument('--input-constraints',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, this introduces an inconsistency with input from a file or stdin. I think we intentionally restricted constrained decoding to json input to avoid parsing multiple files and having to add tons of checks whether they have the same number of lines as the input file / stdin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it makes sense to be flexible here. You can still use the old way with json.

@@ -109,6 +109,10 @@ def main():
params = arguments.ConfigArgumentParser(description='Image Captioning CLI')
arguments_image.add_image_caption_cli_args(params)
args = params.parse_args()
run_captining(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: captioning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, I'll correct it

raise ValueError("Provided target shape must be bigger then the original "
"shape. (provided: {}, original {})".format(len(tshape), len(fshape)))
diff_shape = np.subtract(tshape, fshape)
if np.any(diff_shape<0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8: spacing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. For some reason, my brain wants compact lines 😄

diff_shape = [[0, d] for d in diff_shape] # pad format: ((before_1, after_1), ... (before_N, after_N))
p = np.pad(f, diff_shape, 'constant', constant_values=0)
pad_feat.append(p)
return pad_feat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

def make_input_from_multiple_strings(sentence_id: int, strings: List[str]) -> TranslatorInput:
def make_input_from_multiple_strings(sentence_id: int,
strings: List[str],
constraints: str = None) -> TranslatorInput:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type: Optional[str] if it can be None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

"""
Returns a TranslatorInput object from multiple strings, where the first element corresponds to the surface tokens
and the remaining elements to additional factors. All strings must parse into token sequences of the same length.

:param sentence_id: An integer id.
:param strings: A list of strings representing a factored input sequence.
:param constraints: A string with constraints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Optional string with constraints" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -166,6 +174,7 @@ def read_and_translate(translator: inference.Translator,
:param input_file: Optional path to file which will be translated line-by-line if included, if none use stdin.
:param input_factors: Optional list of paths to files that contain source factors.
:param input_is_json: Whether the input is in json format.
:param input_constraints: Optional path to file which will contain constrains for each source sentence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: constraints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

@lorisbaz
Copy link
Contributor Author

@fhieber I reverted the --input-constraints option and covered the other small issues listed above.

Copy link
Contributor

@fhieber fhieber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for iterating and for separating the two changes from each other!

Unless its urgent to get merged, I'd suggest waiting for approval from @mjpost who knows the whole constraints business best.

@@ -754,7 +754,8 @@ def make_input_from_factored_string(sentence_id: int,
return TranslatorInput(sentence_id=sentence_id, tokens=tokens, factors=factors)


def make_input_from_multiple_strings(sentence_id: int, strings: List[str]) -> TranslatorInput:
def make_input_from_multiple_strings(sentence_id: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@@ -311,7 +311,7 @@ def test_failed_make_input_from_valid_json_string(text, text_key, factors, facto
@pytest.mark.parametrize("strings",
[
["a b c"],
["a b c", "f1 f2 f3", "f3 f3 f3"]
["a b c", "f1 f2 f3", "f3 f3 f3"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not an issue, but adding the comma could be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just because I remove the constraints manually. Fixing it.

feature_shape = feature.shape
if len(feature_shape) < len(target_shape): # add extra dimensions
for i in range(len(target_shape)-len(feature_shape)):
feature = np.expand_dims(feature, axis=len(feature.shape)+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: spacing :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arggggg ☠️ 😄

@lorisbaz
Copy link
Contributor Author

Not urgent. But let's not forget about it 😸

@fhieber
Copy link
Contributor

fhieber commented Jul 11, 2018

I won't forget :)

@mjpost
Copy link
Contributor

mjpost commented Jul 11, 2018

Looks good to me. Two thoughts:

  • We could consider using a different keyword other than "tokens" in TranslatorInput for the path to the image file. There's no need to overload on this and it could be clearer otherwise.
  • Can you add notes to the tutorials to list this feature, for both lexical constraints and image captioning?

@mjpost
Copy link
Contributor

mjpost commented Jul 11, 2018

As an additional thought, note that the lexical constraints documentation provides a command-line module for generating the JSON object from tab-delimited data on STDIN of the form:

input sentence TAB constraint1 TAB another constraint TAB yet another constraint

If you ignored my thought #1, this would work transparently with image captioning (with field 1 as the path).

@lorisbaz
Copy link
Contributor Author

I removed the option to have a plain file. We will use JSON files as you and Felix suggested.

I will add some notes about lexical constraints in the image captioning tutorial.

@lorisbaz
Copy link
Contributor Author

Hey @mjpost, I added the instructions. Can you have a look and approve if it is ok?
Thanks.

@mjpost
Copy link
Contributor

mjpost commented Jul 26, 2018

LGTM. You'll have to remerge master once more since we just fixed the pylint issues that caused Travis to fail.

@lorisbaz
Copy link
Contributor Author

Done

@tdomhan tdomhan merged commit e6cd190 into awslabs:master Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants