Commit 758e56a
authored
Update decode_octal_escapes to support utf-8 multi-byte
Fix UTF-8 decoding of ZFS octal escape sequences in file paths
ZFS encodes special characters in paths using octal sequences (e.g., \0040
for space). Multi-byte UTF-8 characters like ' (U+2019) are encoded as
multiple consecutive sequences (\0342\0200\0231).
Previous implementation decoded each octal sequence individually, breaking
UTF-8 multi-byte characters and causing FileNotFoundError when accessing
files with characters like fancy quotes, em-dashes, etc.
Updated decode_octal_escapes() to:
- Buffer consecutive octal sequences before decoding
- Decode complete UTF-8 byte sequences together
- Handle invalid sequences with latin-1 fallback1 parent 4842c13 commit 758e56a
1 file changed
Lines changed: 29 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1091 | 1091 | | |
1092 | 1092 | | |
1093 | 1093 | | |
1094 | | - | |
1095 | | - | |
1096 | 1094 | | |
1097 | 1095 | | |
1098 | 1096 | | |
1099 | | - | |
1100 | | - | |
1101 | | - | |
| 1097 | + | |
| 1098 | + | |
1102 | 1099 | | |
1103 | 1100 | | |
1104 | 1101 | | |
1105 | 1102 | | |
1106 | | - | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
1107 | 1126 | | |
1108 | | - | |
1109 | | - | |
1110 | | - | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
1111 | 1130 | | |
1112 | | - | |
| 1131 | + | |
1113 | 1132 | | |
1114 | 1133 | | |
1115 | 1134 | | |
| |||
0 commit comments