Skip to content
csware edited this page May 5, 2012 · 8 revisions

Table of Contents

Git for Windows Unicode Support

As of V1.7.10, Git for Windows supports Unicode. Most importantly, this means that Git repositories with non-ASCII file names can now be seamlessly shared between Git for Windows and other Git flavors (i.e. Git on Linux/Mac, Cygwin-Git and JGit / EGit).

Unfortunately, it also means that users of previous Git for Windows versions need to update their Git settings, and probably need to migrate their Git repositories, too.

Known Issues

Git for Windows

  • MSYS programs don't fully support Unicode yet, e.g.
    • bash doesn't let you type non-ASCII characters
    • ls converts non-ASCII characters to '?' when printing to the console (redirecting to a file or another program works, though)
  • Tcl only supports BMP (Basic Multilingual Plane, i.e. Unicode characters \u0000 - \uffff), therefore gitk and git-gui currently don't support e.g. CJK Extensions B - D.

Other Git tools

  • TortoiseGit supports Unicode starting with version 1.7.9 ( http://code.google.com/p/tortoisegit/downloads/ ).
  • GitExtensions needs to be configured to use UTF-8 ("Settings" dialog, "Global settings" tab, "Files encoding" and "GitExtensions encoding")

Editor

If you want to use a custom text editor to enter commit messages or to edit config files (instead of vim/gvim that are installed with Git for Windows), find one that supports Unix line breaks (LF only) and can save UTF-8 without BOM (i.e. Windows notepad.exe is a bad choice).

Settings

Windows settings

Console font (per user)

The default console font does not support Unicode. Change the console font to a TrueType font such as Lucida Console or Consolas. The setup program can do this automatically, but only for the installing user.

Git settings

These can be set per user (with the --global option) or per repository, the repository settings take precedence.

Disable quoted file names

By default, git will print non-ASCII file names in quoted octal notation, i.e. "\nnn\nnn...". This can be disabled with

 git config [--global] core.quotepath off

Disable commit message transcoding

Previous Git for Windows required to set the i18n.logoutputencoding to your Windows system's default OEM encoding for proper console output of non-ASCII commit messages. This is no longer necessary. Remove this or set it to 'utf-8':

 git config [--global] --unset i18n.logoutputencoding

The i18n.commitencoding setting should also be removed or set to 'utf-8' to support commit messages on the command line (git commit -m "..." from cmd.exe, MSYS bash won't let you enter non-ASCII characters):

 git config [--global] --unset i18n.commitencoding

Disable SVN file name transcoding

If you're using git-svn, reencoding SVN file names is no longer necessary (SVN also stores file names in UTF-8):

 git config [--global] --unset svn.pathnameencoding

Migrating old Git for Windows repositories

This is only relevant if you used non-ASCII file names with non-Unicode Git for Windows versions.

Previous Git for Windows versions stored file names in the default encoding of the originating Windows system, making these repositories incompatible with other Windows language-versions and other Git versions (including Cygwin-Git and JGit / EGit on Windows).

The Unicode-enabled Git for Windows stores file names UTF-8 encoded.

Checking if a repository contains non-ASCII file names

The recodetree check command scans the entire history of a git repository and prints all non-ASCII file names. If the output is empty, no migration is necessary.

Note: the recodetree script doesn't work with quoted characters, disable quoted file names first: git config [--global] core.quotepath off

Migration with previous Git for Windows version available

The simplest way to convert old repositories is by keeping an old Git for Windows version around (e.g. installed in C:\git1.7.9):

  1. With the old Git for Windows version: Check out a completely clean state of the working copy (so git status reports nothing, not even untracked files):
    /c/git1.7.9/bin/git clean -f & /c/git1.7.9/bin/git reset --hard
  2. With the new Git for Windows version: git status with the new version should now report non-ASCII file names as untracked (with correct file names), and in most cases also as deleted (with mangled file names).
  3. Replace file names in the staging area with the current state of the working copy:
    git rm -rf --cached \* & git add --all
  4. git status should now report all non-ASCII file names as renamed only.
  5. Commit the changes:
    git commit -m "UTF-8 conversion"
  6. Repeat for every branch of interest

Migration without previous Git for Windows available

Manually

This requires renaming all non-ASCII file names manually.

  1. Check out a clean state of the working copy:
    git clean -f & git reset --hard
  2. git status will report non-ASCII file names as untracked (mostly with mangled names).
  3. Fix the mangled file names in the working copy manually.
  4. Replace file names in the staging area with the current state of the working copy:
    git rm -rf --cached \* & git add --all
  5. git status should now report all non-ASCII file names as renamed only.
  6. Commit the changes:
    git commit -m "UTF-8 conversion"
  7. Repeat for every branch of interest

Using the recodetree script (experimental)

This requires iconv.exe on the path.

  1. Replace file names in the staging area with the transcoded names of the HEAD commit:
    recodetree head
  2. Reset the working copy to the state of the staging area:
    git clean -f & git checkout-index -af
  3. git status should now report all non-ASCII file names as renamed only.
  4. Commit the changes:
    git commit -m "UTF-8 conversion"
  5. Repeat for every branch of interest

Convert config files

Git config files with non-ASCII content need to be converted to UTF-8, for example your name in %HOME%/.gitconfig, or non-ASCII file names in .gitattributes / .gitignore / .gitmodules files.

Migrating the entire history (experimental)

The recodetree history command can be used to convert the entire history of the repository (requires iconv.exe). Beware that rewriting history changes all the object hashes in the repository, which has quite severe implications on other users if the repository is published (see "RECOVERING FROM UPSTREAM REBASE" in git help rebase). The recodetree history script currently does not convert config files such as .gitattributes / .gitignore / .gitmodules.