New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileSystemObject Unicode filepath truncations #26
Comments
|
I test your path on my computer, return is correct. Also, could use regex _,_,strParent = string.find('C:\\Root\\ĀĒĪŌŪ Unicode\\Folder','(.+)\\')
assert(strParent==[[C:\Root\ĀĒĪŌŪ Unicode]])for strParent = fso:GetParentFolderName([["C:\not exist directory\not exist file"]])
assert(strParent=="\"C:\\not exist directory")
-- strange path, truncated quotation? |
|
Yes, as I said, everything is OK in Lua 5.3 but is faulty in Lua 5.1. |
|
Sorry, I missed the '5.1'. assert(#"C:\\Root\\ĀĒĪŌŪ Unicode"==26 and #'ĀĒĪŌŪ'==10)
strParent = fso:GetParentFolderName("C:\\Root\\ĀĒĪŌŪ Unicode\\Folder")
assert(strParent==[[C:\Root\ĀĒĪŌŪ Un]] and #strParent==26-5)
----
assert(#'C:\\Root\\啊啊啊啊啊 Unicode'==31 and #'啊啊啊啊啊'==15)--3 bytes per character
strParent = fso:GetParentFolderName("C:\\Root\\啊啊啊啊啊 Unicode\\Folder")
assert(strParent=='C:\\Root\\啊啊啊啊\229' and #strParent==31-10)
----
strParent = fso:GetParentFolderName("C:\\Root\\ĀĒĪŌŪ Unicode \\Folder")--cheat by appending 1 byte character
assert(strParent==[[C:\Root\ĀĒĪŌŪ Unicode]])And I just begin to know FSO and just from your post~ |
|
Hi, @tatewise I got a workaround, but not sure if has limitations - I don't known about code point. string='C:\\Root\\ĀĒĪŌŪ Unicode\\Folder'
print(string)
print(string.byte(string,1,#string))
map={[196]=1,[146]=2,[170]=3,[197]=4,[140]=5,[128]=6,
196,146,170,197,140,128
}
tem_str_bytes={}
index=1
while index<=#string do
byte=string.byte(string,index)
byte=map[byte] or byte
assert(byte<=128,byte)
table.insert(tem_str_bytes,string.char(byte))
index=index+1
end
tem_str=table.concat(tem_str_bytes)
print(tem_str)
tem_strParent = fso:GetParentFolderName(tem_str)
print(string.byte(tem_strParent,1,#tem_strParent))
print(tem_strParent)
tem_str_bytes={}
index=1
while index<=#tem_strParent do
byte=string.byte(tem_strParent,index)
byte=map[byte] or byte
table.insert(tem_str_bytes,string.char(byte))
index=index+1
end
strParent=table.concat(tem_str_bytes)
print(strParent==[[C:\Root\ĀĒĪŌŪ Unicode]])print output: |
|
There are probably many workarounds just for |
|
OK, wish you good luck~ |
|
Unfortunately, |
|
I need to be able to support all Unicode UTF-8 code points and not just a subset. |
|
Oh, then try iconv, - convert between various encode, there is a lua bind on windows, but need to compile(and I'm not familiar), not had a try. Or, could convert from utf8 to local lfs=require'lfs'
a=lfs.attributes('\168\161 \168\165 \168\169 \168\173 \168\177')
-- ANSI (using local encode when beyond ASCII?): Ā Ē Ī Ō Ū, equal the mess code in the picture above
assert(a)
endSee this, mention PowerShell / iconv (command line tool). Contain file convert (I use SaveAs above), also string convert? And many misstatement I have took, on UTF, Unicode and Character Set maybe, I'm lack of relate knowledge from now on... |
|
Hi, there is another solution, utf8_filenames.lua. |
|
Unfortunately, that is NOT a general solution for arbitrary UTF-8 symbols because as its comments say: |
|
Yes, I see, very limitation 😂 |
|
Hi, I have built and test lua-iconv (based on [libiconv - GNU Project - Free Software Foundation (FSF)](http://www.gnu.org/software/libiconv/)) in Windows 10, with Lua 5.3, works fine, could have a try~ |
|
We ran in a similar problem: customers created file paths consisting of utf16 characters on a Windows machine. You can even get the path into a utf-8 string, which can be handeled by Lua with no problem. Our solution was to write a Lua library. One function is like: |


This involves using LuaCOM 1.3 and Lua 5.1 with Microsoft FileSystemObject running in Windows 10.
If file paths include Unicode code-points in UTF-8 format then some methods return truncated file paths. e.g.
That should return
C:\Root\ĀĒĪŌŪ Unicodebut actually returnsC:\Root\ĀĒĪŌŪ Untruncated by 5 bytes.It is always truncated by the number of multi-byte UTF-8 code points.
Similar problems affect other methods such as
fso:GetFolder(...)andfso:GetFile(...)regarding file path names.When the same script is used with Lua 5.3 and Windows 10 everything works correctly.
Unfortunately, I am forced to use a precompiled Lua 5.1 application.
As a check, I ran similar FileSystemObject methods in Windows PowerShell on the same PC and that worked correctly.
Another user has the same symptoms on a different PC with Lua 5.1 and Windows 11.
Is there any workaround for this problem?
The text was updated successfully, but these errors were encountered: