Not backing up filenames with ANSI encoding #4753

tom1422 · 2022-06-19T14:51:02Z

I have searched open and closed issues for duplicates.
I have searched the forum for related topics.

Environment info

Duplicati version: 2.0.6.3_beta_2021-06-17
Operating system: Linux
Backend: Docker

Description

Certain files which contain ANSI(not officially) or Windows-1252 character encoding fail to back up. These files were created using older versions of windows and don't show up on SMB, but are seen by linux over NFS.
This is the error received:

[Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file: /source/xxxx�xxxx.xxx
[Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-PathProcessingError]: Failed to process path: /source/xxxx�xxxx.xxx

Steps to reproduce

Create a file with an older version of windows (Not tried yet but one that uses Windows 1252 encoding for filenames)
Run the backup

Actual result:
Doesn't backup files
Expected result:
Backs up files regardless of filename encoding

Screenshots

Debug log

The text was updated successfully, but these errors were encountered:

ts678 · 2022-06-26T18:56:40Z

Backs up files regardless of filename encoding

How is it supposed to interpret an unknown (or potentially multiple unknown) 8 bit character sets, e.g. for display purposes?
Code pages have given way to Unicode. What are these files on? Can you convert the names before sharing over a network?

tom1422 · 2022-06-26T19:52:35Z

Yes I can convert the names manually as I'm no longer using these old windows machines (some old files got backed up at some point and so happened to have the characters in them). The files themselves are stored on the filesystem of my NAS. It's only the special characters which don't show properly for display purposes (which is fine, there is no concrete way for it to tell what character set they are using, but the letters are universal).
The only issue is that these files are not backed up and this weird error is given. I'm not skilled in C# or knowledgeable in string handling or filename handling but my assumption is that the filename is just handled as a list of bytes throughout the program so the program should not care about what bytes are in the filename. I am more concerned about the reason duplicati wont back up these files (e.g. is it a problem with the library duplicati is using?) because I think that it should back them up regardless.

ts678 · 2022-06-27T00:59:55Z

my assumption is that the filename is just handled as a list of bytes

I'm not a C# developer, but I think this is incorrect. Character encoding in .NET (Microsoft) says that a string uses 16-bit Unicode. Windows has supported Unicode since Windows NT in 1993 although the encoding changed to UTF-16 with Windows 2000 which is different from the UTF-8 encoding that NFS (at least NFSv4) likes. In both cases, though, what's encoded is Unicode. Interpreting encoded forms has certain format expectations which an unpredictable list of 8-bit bytes will eventually not meet and cause errors.

Try getting a better message by watching with About --> Show log --> Live --> Warning. You might need to click the warning line.

If it helps, your second (weirder-looking) error is a little easier to track in the code. It looks like it got a string that choked its query:
(actually, I now notice that this warning can happen in two other places in file, but you can use this as an example of the problem)

duplicati/Duplicati/Library/Main/Operation/Backup/FileEnumerationProcess.cs

Lines 248 to 263 in d716051

    
                private static bool AttributeFilter(string path, FileAttributes attributes, Snapshots.ISnapshotService snapshot, Library.Utility.IFilter sourcefilter, Options.HardlinkStrategy hardlinkPolicy, Options.SymlinkStrategy symlinkPolicy, Dictionary<string, string> hardlinkmap, FileAttributes fileAttributes, Duplicati.Library.Utility.IFilter enumeratefilter, string[] ignorenames, Queue<string> mixinqueue) 
        
                { 
        
           // Step 1, exclude block devices 
        
           try 
        
                    { 
        
                        if (snapshot.IsBlockDevice(path)) 
        
                        { 
        
                            Logging.Log.WriteVerboseMessage(FILTER_LOGTAG, "ExcludingBlockDevice", "Excluding block device: {0}", path); 
        
                            return false; 
        
                        } 
        
                    } 
        
                    catch (Exception ex) 
        
                    { 
        
                        Logging.Log.WriteWarningMessage(FILTER_LOGTAG, "PathProcessingError", ex, "Failed to process path: {0}", path); 
        
                        return false; 
        
                    }

duplicatibot · 2023-11-01T20:06:05Z

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/database-recration-not-really-starting/16948/36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not backing up filenames with ANSI encoding #4753

Not backing up filenames with ANSI encoding #4753

tom1422 commented Jun 19, 2022 •

edited

ts678 commented Jun 26, 2022

tom1422 commented Jun 26, 2022

ts678 commented Jun 27, 2022 •

edited

duplicatibot commented Nov 1, 2023

Not backing up filenames with ANSI encoding #4753

Not backing up filenames with ANSI encoding #4753

Comments

tom1422 commented Jun 19, 2022 • edited

Environment info

Description

Steps to reproduce

Screenshots

Debug log

ts678 commented Jun 26, 2022

tom1422 commented Jun 26, 2022

ts678 commented Jun 27, 2022 • edited

duplicatibot commented Nov 1, 2023

tom1422 commented Jun 19, 2022 •

edited

ts678 commented Jun 27, 2022 •

edited