New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constraints for image captioning module #472
Conversation
Merge from upstream (original) repository
Merge from awslabs/sockeye
Merge from awslabs/sockeye
…aptioning module. Zero padding of features enables to have input features of different shape for each image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice feature for image captioning!
CHANGELOG.md
Outdated
@@ -10,6 +10,12 @@ Note that Sockeye has checks in place to not translate with an old model that wa | |||
|
|||
Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_. | |||
|
|||
## [1.18.34] | |||
### Added | |||
- Constrains input from plain file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description isn't clear to me: what constrains the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update that description. The only way to input constraints before was via json file.
Now you can use a plain file. In each line, you have a list of space-separated strings (constraints) which correspond to an image (or source sentence). This is a parallel file to the one specified in --input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so are these then only single-word constraints? How about several multi-word constraints (which is supported by JSON)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. How about having them separated by some special character (e.g., comma or tab)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.: blue sky, cow, grass, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we would need to rewrite the entire parsing logic of separating tokens. You can see how complicated this already is with the factored strings. In my opinion, inputs with constraints and/or factors is inherently structured data and not just plain strings. JSON provides a good format for such use cases, whereas plain file parsing is quite error-prone.
I don't want to enforce my opinion on this though. Maybe others can chime in on this discussion? We could also separate the constrained decoding for image captioning feature from this API change to unblock this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point. However, it seems that having the target sentence and constraints coupled (as in the JSON) is not flexible, because we might want to run experiments with different constraints. In image captioning we can have a target file and multiple constraints file (one for each experiment we want to run).
The parsing logic is simple: if json is not provided, we search for the constraint file; otherwise we use the json file as before. Is this too complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean "source sentence", right?
Creating different constraints for an input sentence is already easy to do. You can use the sockeye.lexical_constraints CLI to produce inputs with arbitrary constraints and pass/pipe them directly to sockeye.translate / sockeye.captioner with the --json-input
flag.
I think we should hold off on changing the overall API / behavior of the API/translate CLI with this PR. This PR should concentrate on enabling constraints for image captioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I meant source sentence.
lexical_constraints expects a sentence as input + constraints. Would it work with the image relative path?
I wanted to avoid to use it, because in my case I have to first generate the constraints, then the file with source sentence (image path) + constraints, then apply lexical_constraints and now I can perform inference. It is a bit tricky sequence of steps.
I can revert the changes and remove the option --input-constraints
, even though I think it might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how your captioner currently handles the content of the 'text' field in the JSON object. If it uses it as a relative image path this should already work.
In the end its up to you how you define the API for the image captioning use case, but I vote for keeping the general translation API (sockeye.translate) as is by reverting the changes in the files out side of sockeye/image_captioning/
.
CHANGELOG.md
Outdated
### Added | ||
- Constrains input from plain file. | ||
- Contraints can be used for image captioning module. | ||
- Zero padding of features enables to have input features of different shape for each image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rephrase to make clear that this refers to image captioning? Something like: "Image Captioning: zero padding of features now allows input features of different shape for each image."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
CHANGELOG.md
Outdated
## [1.18.34] | ||
### Added | ||
- Constrains input from plain file. | ||
- Contraints can be used for image captioning module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe: Image Captioning now supports constrained decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
sockeye/arguments.py
Outdated
@@ -1079,6 +1079,12 @@ def add_inference_args(params): | |||
"Optionally, a list of factors can be provided: " | |||
"{'text': 'some input string', 'factors': ['C C C', 'X X X']}.") | |||
|
|||
decode_params.add_argument('--input-constraints', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, this introduces an inconsistency with input from a file or stdin. I think we intentionally restricted constrained decoding to json input to avoid parsing multiple files and having to add tons of checks whether they have the same number of lines as the input file / stdin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it makes sense to be flexible here. You can still use the old way with json.
@@ -109,6 +109,10 @@ def main(): | |||
params = arguments.ConfigArgumentParser(description='Image Captioning CLI') | |||
arguments_image.add_image_caption_cli_args(params) | |||
args = params.parse_args() | |||
run_captining(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: captioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops, I'll correct it
sockeye/image_captioning/utils.py
Outdated
raise ValueError("Provided target shape must be bigger then the original " | ||
"shape. (provided: {}, original {})".format(len(tshape), len(fshape))) | ||
diff_shape = np.subtract(tshape, fshape) | ||
if np.any(diff_shape<0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. For some reason, my brain wants compact lines 😄
sockeye/image_captioning/utils.py
Outdated
diff_shape = [[0, d] for d in diff_shape] # pad format: ((before_1, after_1), ... (before_N, after_N)) | ||
p = np.pad(f, diff_shape, 'constant', constant_values=0) | ||
pad_feat.append(p) | ||
return pad_feat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆗
sockeye/inference.py
Outdated
def make_input_from_multiple_strings(sentence_id: int, strings: List[str]) -> TranslatorInput: | ||
def make_input_from_multiple_strings(sentence_id: int, | ||
strings: List[str], | ||
constraints: str = None) -> TranslatorInput: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: Optional[str] if it can be None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
sockeye/inference.py
Outdated
""" | ||
Returns a TranslatorInput object from multiple strings, where the first element corresponds to the surface tokens | ||
and the remaining elements to additional factors. All strings must parse into token sequences of the same length. | ||
|
||
:param sentence_id: An integer id. | ||
:param strings: A list of strings representing a factored input sequence. | ||
:param constraints: A string with constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Optional string with constraints" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
sockeye/translate.py
Outdated
@@ -166,6 +174,7 @@ def read_and_translate(translator: inference.Translator, | |||
:param input_file: Optional path to file which will be translated line-by-line if included, if none use stdin. | |||
:param input_factors: Optional list of paths to files that contain source factors. | |||
:param input_is_json: Whether the input is in json format. | |||
:param input_constraints: Optional path to file which will contain constrains for each source sentence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: constraints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆗
@fhieber I reverted the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for iterating and for separating the two changes from each other!
Unless its urgent to get merged, I'd suggest waiting for approval from @mjpost who knows the whole constraints business best.
sockeye/inference.py
Outdated
@@ -754,7 +754,8 @@ def make_input_from_factored_string(sentence_id: int, | |||
return TranslatorInput(sentence_id=sentence_id, tokens=tokens, factors=factors) | |||
|
|||
|
|||
def make_input_from_multiple_strings(sentence_id: int, strings: List[str]) -> TranslatorInput: | |||
def make_input_from_multiple_strings(sentence_id: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
test/unit/test_inference.py
Outdated
@@ -311,7 +311,7 @@ def test_failed_make_input_from_valid_json_string(text, text_key, factors, facto | |||
@pytest.mark.parametrize("strings", | |||
[ | |||
["a b c"], | |||
["a b c", "f1 f2 f3", "f3 f3 f3"] | |||
["a b c", "f1 f2 f3", "f3 f3 f3"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not an issue, but adding the comma could be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just because I remove the constraints manually. Fixing it.
sockeye/image_captioning/utils.py
Outdated
feature_shape = feature.shape | ||
if len(feature_shape) < len(target_shape): # add extra dimensions | ||
for i in range(len(target_shape)-len(feature_shape)): | ||
feature = np.expand_dims(feature, axis=len(feature.shape)+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit: spacing :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arggggg ☠️ 😄
Not urgent. But let's not forget about it 😸 |
I won't forget :) |
Looks good to me. Two thoughts:
|
As an additional thought, note that the lexical constraints documentation provides a command-line module for generating the JSON object from tab-delimited data on STDIN of the form:
If you ignored my thought #1, this would work transparently with image captioning (with field 1 as the path). |
I removed the option to have a plain file. We will use JSON files as you and Felix suggested. I will add some notes about lexical constraints in the image captioning tutorial. |
Hey @mjpost, I added the instructions. Can you have a look and approve if it is ok? |
LGTM. You'll have to remerge master once more since we just fixed the pylint issues that caused Travis to fail. |
Done |
Adding
Constrains input from plain file.Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.