Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart merge of po files #55

Closed
nijel opened this issue Jul 2, 2012 · 22 comments
Closed

Smart merge of po files #55

nijel opened this issue Jul 2, 2012 · 22 comments
Assignees
Labels
enhancement Adding or requesting a new feature.
Milestone

Comments

@nijel
Copy link
Member

nijel commented Jul 2, 2012

It should be possible to write smarter po file merger for git, actually using content of the files and pushing translations to new template. It might be indeed complex, but should by doable.

@jan-hudec
Copy link

I'd note the comment I just made on issue #98 here. I am currently working on 3-way merge translate toolkit. It really does not have much to do with weblate. I only use .po, but I am writing it so that it should be extensible to any other format supported by translate toolkit (e.g. xliff).

@quinox
Copy link
Contributor

quinox commented Oct 27, 2015

I ran into a conflict today where the only difference was the po header timestamp.

I haven't tested it yet, but looking around I found this gist which points to git-whistles. The heavy lifting by git-whistles is done by msgmerge/msgcat/msguniq, the driver itself doesn't do any .po parsing (source code / explanation)

@nijel
Copy link
Member Author

nijel commented Oct 29, 2015

There is already one po merge driver in Weblate sources: https://docs.weblate.org/en/latest/admin/continuous.html#updating-repositories

@nijel
Copy link
Member Author

nijel commented May 19, 2017

Just to document current state:

  • Weblate currently tries to use git-merge-gettext-po as a merge driver
  • However as that is shipped under examples, it doesn't find it on pip installed systems, this should be fixed
  • Maybe it would be better to base the merge driver on something more clever, for example https://github.com/jan-hudec/podiffutils

@stanislav-brabec
Copy link
Contributor

Well, even msgcat alone (from gettext tools) is capable to perform the merge. But the merge itself is not sufficient. Most probably, the source of the conflict is pot file change and corresponding po file edit.
We want:

msgcat --use-first from_weblate.po from_upstream.po -o intermediate.po
msgmerge --previous --lang=my_language intermediate.po new_pot_file.pot -o merged.po
rm intermediate.po

Here is a problem: Weblate knows where the pot file is. Git does not. gettextize/autopoint knows it as well (because it creates the infrastructure). ⇒ It would be nice to integrate this feature to gettext tools, and additionally add to a tool generating a merge script (needs access to weblate component description).

Also note, that such a generic script requires pot file being merged before all po files.

Note that it would be better to replace --use-first by --use-latest (using PO-Revision-Date), but it is not implemented in gettext yet.

@jan-hudec
Copy link

Here is a problem: Weblate knows where the pot file is. Git does not.

And I believe it shouldn't be relevant. Merge is a merge and the best approach is to do a 3-way merge. I have prototyped 3-way merge for PO files in https://github.com/jan-hudec/podiffutils. I have used it on project for some time (before changing jobs last summer) and it worked. It would deserve integrating into translate-toolkit, but it would need to be polished a bit first.

Also note, that such a generic script requires pot file being merged before all po files.

Not if you do a 3-way merge. Also, the only way to merge the pot is a 3-way merge and the algorithm is the same for pot and po, because pot is just a po with no translations filled.

@nijel
Copy link
Member Author

nijel commented Oct 11, 2018

@stanislav-brabec This is what we were discussing yesterday.

@Jibec
Copy link
Contributor

Jibec commented May 14, 2019

hello, thanks to your documentation and my lack of patience, I wrote this script: https://pagure.io/fedora-docs/translations-scripts/blob/master/f/solve_weblate_merge_failures.sh

only the pot file path is specific to the way the localization are stored: https://pagure.io/fedora-docs/translations-scripts/blob/master/f/solve_weblate_merge_failures.sh#_43

If I understand correctly, weblate should be able to provide the upstream repository to this script (and the full path of the local git repo for the weblate's one, yes, you can git clone /any/local/path)

As we already have the "new_base", exposed by the component API, I assume this should be possible to automate it?

@eighthave
Copy link
Contributor

In my experience, one of the biggest ongoing source of git merge conflicts are the gettext fields POT-Creation-Date: and PO-Revision-Date:. So I would not like to see those fields be required by Weblate, e.g. --use-latest. IMHO, I think they should be stripped if possible.

@stanislav-brabec
Copy link
Contributor

So I would not like to see those fields be required by Weblate, e.g. --use-latest. IMHO, I think they should be stripped if possible.

I disagree. These fields are mandatory for po file format. And they are one of most important fields for human review, e. g. check of maintenance status, check that po files are in sync with pot.

@eighthave
Copy link
Contributor

eighthave commented Mar 24, 2020 via email

@stanislav-brabec
Copy link
Contributor

it would be worth it to me to disable the features that require POT-Creation-Date and PO-Revision-Date so that there will be fewer merge conflicts.

  1. Again, these fields are mandatory parts of the file format definition (established 25 years ago). Third party tools can reject these po files, fail, or even applications and webs can fail (any runtime tool can check this header, and many really do it when they display translation credits). You would need to initiate file format definition change negotiation, and then request modification of all tools that depend on it, software and even webs. It would require ~10 years to complete.

  2. It would make impossible fast translation status check. That it why the above will be rejected by GNU/FSF.
    Example: gettext libc '' | grep ^PO-Revision-Date: allows to check glibc translation age in the runtime. There is no alternative for this feature.

  3. Merge conflicts are caused by a diff/patch merge algorithm that is inferior for po files. This bug addresses it in the right way. Once we will provide the correct merge algorithm, rejects will rarely occur.

@eighthave
Copy link
Contributor

eighthave commented Mar 24, 2020 via email

@stanislav-brabec
Copy link
Contributor

If there is a standard way to manage git merge algorithms for specific file types

Git has support for custom merge strategies. https://git-scm.com/docs/git-merge#_merge_strategies

@eighthave
Copy link
Contributor

eighthave commented Mar 24, 2020 via email

@stanislav-brabec
Copy link
Contributor

right, I mean that if it is something that can be included when someone does apt-get install weblate or something like that. I think it is important that whatever the solution is, it is easy to deploy and transparent to use.

It is not related with Weblate. But yes, ideally Weblate should install it automatically to hosted git repositories.

It is easy to do such three way merge manually:

msgcat --use-first local-old.po remote.po -o temporary.po # or new --use-newest feature
msgmerge --previous temporary.po local-new.pot -o local-new.po
rm temporary.po

Writing a generic three way merge command working with any translation layout is more complicated.

@cpa-level-it
Copy link

cpa-level-it commented Apr 9, 2020

it would be worth it to me to disable the features that require POT-Creation-Date and PO-Revision-Date so that there will be fewer merge conflicts.

  1. Again, these fields are mandatory parts of the file format definition (established 25 years ago). Third party tools can reject these po files, fail, or even applications and webs can fail (any runtime tool can check this header, and many really do it when they display translation credits). You would need to initiate file format definition change negotiation, and then request modification of all tools that depend on it, software and even webs. It would require ~10 years to complete.
  2. It would make impossible fast translation status check. That it why the above will be rejected by GNU/FSF.
    Example: gettext libc '' | grep ^PO-Revision-Date: allows to check glibc translation age in the runtime. There is no alternative for this feature.
  3. Merge conflicts are caused by a diff/patch merge algorithm that is inferior for po files. This bug addresses it in the right way. Once we will provide the correct merge algorithm, rejects will rarely occur.

While I agree it's a standard I still think that

  • For most of the users using only weblate, thoses fields are useless as the files are in git.
  • Most of the "newbie" users will encounter this merge problem and won't want to dig deeper into git to handle the merge.

I say ,let the end-user make its own decisions and decides if he wants it in the file or not.
If this doesn't break other functions inside Weblate of course.

@mikkorantalainen
Copy link

Here's a more generic 3-way merge for .po files: https://stackoverflow.com/a/68799310/334451

@nijel nijel self-assigned this Sep 8, 2021
@nijel nijel added this to the 4.8.1 milestone Sep 8, 2021
nijel added a commit that referenced this issue Sep 8, 2021
@nijel nijel closed this as completed in e1ac67d Sep 8, 2021
File format support automation moved this from TODO to Done Sep 8, 2021
@github-actions
Copy link

github-actions bot commented Sep 8, 2021

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

@burner1024
Copy link
Contributor

@nijel so is this script used automatically by Weblate or needs additional configuration? I can't make it out from comments, and it's not mentioned in the documentation as far as I can see.

Install with git config...
Note that you also need file .gitattributes with following lines...
This merge driver is now automatically installed for all Weblate internal...

nijel added a commit that referenced this issue Jun 1, 2023
@nijel
Copy link
Member Author

nijel commented Jun 1, 2023

Please avoid posting the same question twice, #9328 already has the answer.

@burner1024
Copy link
Contributor

I asked here first. Since there was no response for a few days, I assumed you didn't see the notification.
Didn't mean to bother, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature.
Projects
Development

No branches or pull requests

9 participants