Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby require fails when the path has special characters #265

Open
mohits opened this issue Mar 21, 2022 · 10 comments
Open

Ruby require fails when the path has special characters #265

mohits opened this issue Mar 21, 2022 · 10 comments

Comments

@mohits
Copy link
Collaborator

mohits commented Mar 21, 2022

What problems are you experiencing?

If the path has special characters in it and you try to run a Ruby script that does a relative_require on that path, it fails to load the file. It's almost certainly something to do with encoding on the Windows console.

It failed for me with:

  • Windows Terminal
  • Windows cmd.exe with both Active code page: 437 and Active code page: 65001

This ticket is based on an issue on rails at rails/rails#29087

Steps to reproduce

Create a folder called Test Ø and in it have 2 files:

1.rb

# encoding: UTF-8
require_relative "2.rb"

puts 'success'

2.rb

puts 'in the file'

You should see an error like this:

$ ruby 1.rb
1.rb:2:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Test ?/2.rb (LoadError)
        from 1.rb:2:in `<main>'

What's the output from ridk version?


ruby:
path: C:/Ruby30-x64
version: 3.0.3
platform: x64-mingw32
ruby_installer:
package_version: 3.0.3-1
git_commit: 981867a
msys2:
path: C:\Ruby30-x64\msys64
cc: gcc (Rev2, Built by MSYS2 project) 11.2.0
sh: GNU bash, version 5.1.8(1)-release (x86_64-pc-msys)
os: Microsoft Windows [Version 10.0.19044.1586]

@fxn
Copy link

fxn commented Mar 21, 2022

I have run this script in that directory:

p __dir__.encoding
p Dir.pwd.encoding
puts
p __ENCODING__
p ''.encoding
p Encoding.default_external
p Encoding.default_internal

and the output is

#<Encoding:IBM437>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

This is Windows 10 running in Parallels Desktop.

I suspect that just shows my ignorance wrt how file encoding works in Windows, and also what can a Ruby program assume when reading file/directory names.

@mohits
Copy link
Collaborator Author

mohits commented Mar 21, 2022

Thanks for posting this here @fxn - I opened the issue here so that we can get closer to finding the correct place to fix this :) since the issues is clearly to do with Ruby + Windows, and not Rails.

This is what I get with codepage 65001 (UTF-8)

#<Encoding:UTF-8>
#<Encoding:UTF-8>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

and with codepage 437

#<Encoding:IBM437>
#<Encoding:UTF-8>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

Can you please check your codepage by doing chcp on the command line?

@fxn
Copy link

fxn commented Mar 21, 2022

@mohits it says 437.

@fxn
Copy link

fxn commented Mar 21, 2022

If I execute chpc 65001, the output is:

#<Encoding:UTF-8>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

Almost!

@mohits
Copy link
Collaborator Author

mohits commented Mar 22, 2022

Hi @fxn - Yes, I think 437 (OEM - United States) is the most common on English Windows. I think we need someone with a better understanding of locales on Windows to look at this issue.

Unsurprisingly, my simple test works on JRuby, of course - it successfully requires the file. Also, your code matches the output for chcp 65001 when run with JRuby even on a console that is CP-437.

$ jruby xfn.rb
#<Encoding:UTF-8>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

@fxn
Copy link

fxn commented Apr 2, 2022

Today I could not reproduce, trying more carefully.

The file system in my machine is in Windows-1252. I created a directory called à using the file explorer to make sure the encoding is honored. Inside that directory I created this test file and a dummy bar.rb:

puts Encoding.find('filesystem')
p Dir.pwd.bytes[-1]
require_relative "bar"

This works, and the output is

Windows-1252
224

If you check the codes in Windows-1252, you'll see 224 is, indeed, à.

@mohits Can you reproduce using these steps? Maybe the directory was created with UTF-8 bytes for a non-UTF-8 file system?

@fxn
Copy link

fxn commented Apr 2, 2022

However, ø belongs to Windows-1252 (code 248) and the same script prints the expected byte, but fails to perform the require_relative.

This is interesting, because both à and ø and non-ASCII, I would expect to succeed or fail in the same way.

@mohits what happens in your machine with à?

@mohits
Copy link
Collaborator Author

mohits commented Apr 2, 2022

hi @fxn - I am a bit confused now with the results I am seeing but I have progress to report (kind of..)

[1] I created this path:

$ cd
D:\projects\blog_posts-trials\rails\Test-à

[2] I ran your code:

$ chcp
Active code page: 437

$ ruby 1.rb
UTF-8
160

On my system, it shows both it as UTF-8. I did a chcp 1252 and ran the same code and it also ran with the same result. This is where it gets interesting. I went to the folder with Test Ø in the name, and ran the code again (still with CP-1252) and it ran successfully.

UTF-8
152

[3] I forced it to change to CP-437 again by doing chcp 437 and it failed but I got this:

UTF-8
152
1.rb:4:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Rails Server Test ?/2.rb (LoadError)
        from 1.rb:4:in `<main>'

It read the character properly (as 52) but failed on the require_relative.

[4] On the other hand, with cp-437it, I ran it in the path with Test-à and it worked.

$ ruby 1.rb
UTF-8
160

So, to summarise:

  • CP-1252: Both paths worked. Got back UTF-8 and {160, 152} for the byte.
  • CP-437: Both paths returned UTF-8 and {160, 152} for the byte. Path with Ø failed to require_relative.

I found this online: http://zuga.net/articles/text-ascii-vs-cp-1252-vs-cp-437/ that compares the code pages side by side.

CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows.
CP-437 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for console applications under Windows.

In this, CP-1252 has the 2 characters at 224 and 248 respectively.
CP-437 has à at 143 but does not have Ø at all.

@fxn
Copy link

fxn commented Apr 2, 2022

@mohits Which Ruby version is that?

I discovered by testing related things in Zeitwerk that in Ruby 3.0 the file system encoding is assumed (unsure if the verb is correct) to be UTF-8. This issue in Redmine seems relevant.

@mohits
Copy link
Collaborator Author

mohits commented Apr 2, 2022

@fxn - my bad. I should have included the ruby version: 3.0.3.

More information then:

Ruby 3.0.3 | Test Ø | CP-437  | UTF-8 | 152 | Fails to require
Ruby 2.7.4 | Test Ø | CP-437  | Windows-1252 | 216 | Fails to require
Ruby 2.6.8 | Test Ø | CP-437  | Windows-1252 | 216 | Fails to require

Ruby 3.0.3 | Test Ø | CP-1252 | UTF-8 | 152 | require_relative works
Ruby 2.7.4 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 2.6.8 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works

Ruby 3.0.3 | Test-à | CP-437  | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-437  | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-437  | Windows-1252 | 224| require_relative works

Ruby 3.0.3 | Test-à | CP-1252 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works

Yes, the issue on Redmine does seem relevant and might explain the result we see for the character code and encoding... but it appears that require_relative uses some other encoding for the file path/ name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants