Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case sensitive filesystems don't work? #14

Closed
MrWorldwideMkII opened this issue Apr 4, 2022 · 8 comments
Closed

Case sensitive filesystems don't work? #14

MrWorldwideMkII opened this issue Apr 4, 2022 · 8 comments

Comments

@MrWorldwideMkII
Copy link

Hi,

I ran the crates download on a Windows machine, some of crates got downloaded into paths with uppercase letters.
For example aead got downloaded into crates/Ae/ad/aead/aead-x.y.z.crate (notice the Ae).

I have transferred the crates to a network drive on an offline network, and serve from that drive from a linux machine.

When serving from one machine with the drive mounted using cifs, everything seems to work fine, but when serving from another machine that has the drive mounted as nfs, all crates that have uppercase letters in their prefix seem to break (return 404).

Any ideas or help would be appreciated.

@MrWorldwideMkII
Copy link
Author

MrWorldwideMkII commented Apr 4, 2022

Ran a small script to rename all the top level folders in crates/ to lowercase, it seems to have resolved some of the problems.
It seems the same thing has happened with prefixes that start with numbers, like crates/3/F/fnv/fnv-x.y.z.crate.

After renaming all directories to lowercase all problems have been fixed.

@MrWorldwideMkII
Copy link
Author

I seem to have figured out the problem, some crates' names start with a capital letter, and these crates seem to be downloaded first, that causes the prefix paths to be created with uppercase letters as well, then other crates with lowercase names with the same prefix get downloaded into the already existing directories (windows paths seem to be non case-sensitive) causing the problems.

Renaming all prefixes to lowercase then breaks the downloading of the uppercase named crates.

@drmikehenry
Copy link
Owner

Thanks for submitting this issue. I haven't documented your use case (sharing a single mirror with both case-sensitive and case-insensitive mounts), but it is supported, with caveats.

TL/DR: If you write the crates using a case-insensitive mount, you must access the crates case-insensitively thereafter. If you want to use the same underlying storage with both case-sensitive and case-insensitive mounts, you must always use the case-sensitive mount when writing to the storage.

The easiest way to recover is to start over from the original crates.tar.gz, this time using the Linux host with a case-sensitive mount. A case-insensitive mount of that filesystem may then be used for read-only serving if desired.

If you no longer have the original crates.tar.gz file, you may recreate it from your offline mirror as follows:

  • Use the case-insensitive mount to access the share.

  • Start in the mirror directory.

  • Create crates.tar.gz for the entire mirror:

       romt crate --start 0 --end master pack

The above steps should work even after you've renamed your prefix directories to be all lowercase, as long as you access the mirror case-insensitively. This is because Romt uses the crates.io-index Git repository to determine each crate's official name (including case), and it retains the correct case in the paths within the crates.tar.gz tarball (even if the on-disk crates/ tree uses a different case).

Unconditionally forcing the crate prefix to lowercase runs into a couple of difficulties. The main problem is when using nginx or some other server instead of romt serve. Rust tooling requests packages with URLs of the form /crates/SomePackage/SomePackage-1.2.3.crate. The README shows how to use nginx rewrite rules to calculate the prefix from the package name, but the technique doesn't allow for converting the prefix to lowercase; it just blindly extracts the first characters of the package name and uses them in the prefix. So for a mixed-case name like SomePackage, the nginx rules can't create the prefix so/me/ in lowercase. For this reason, Romt uses case-sensitive prefixes that match the case of the package itself. This works on case-sensitive filesystems as expected, but also works if the crates have been written to a case-insensitive filesystem. In the latter case, directories that would be distinct on a case-sensitive filesystem (such as so/ and So/) do merge together, but because subsequent accesses are case-insensitive, the crates are found correctly and may be served without correcting the case in the URL.

The other issue is one of backward compatibility with the way Romt has historically worked. Forcing crate prefixes to lowercase would cause failures for non-lowercase crate prefixes for users with existing mirrors.

A couple of years ago I requested that Cargo add the ability to use lowercase prefixes for crate downloads to someday make this issue easier: rust-lang/cargo#8267

The change was accepted, so if older Rust tooling need not be supported, in the future it would be possible to have Rust tooling use crate URLs with lowercase-only prefixes using the new {lowerprefix} marker; however, Romt currently doesn't have a way to take advantage of this {lowerprefix} feature.

I want to consider all of the above aspects more fully before deciding what changes to make (though the changes will certainly at least include additions to the README to explain the current limitations).

@MrWorldwideMkII
Copy link
Author

hmm I see.

The nginx problem probable can be solved using perl.
The backwards compatibility thing can be "solved" by incrementing the major version (or minor since romt is still in 0.x).

But the repackaging and re-extracting solves my problem, though I recommend adding the case-sensitivity information to the readme.

@drmikehenry
Copy link
Owner

While considering the changes I want to make, I discovered that Windows case-insensitive shares of case-sensitive filesystems don't work as I'd originally thought.

Consider creating the following tree using Linux on a case-sensitive file system:

mkdir /m/tmp/rust
cd /m/tmp/rust
mkdir directory DIRECTORY
touch directory/crate DIRECTORY/CRATE
tree

with output:

├── directory
│   └── crate1
└── DIRECTORY
    └── CRATE2

2 directories, 2 files

Now on a Windows machine accessing this same share via Samba (as m:\tmp\rust), we can see both directory names (directory and DIRECTORY):

M:\tmp\rust>dir /b
directory
DIRECTORY

I'd thought this behavior carried over to contents within the subdirectories, but Windows is unable to see the contents of one of these directories, e.g.:

M:\tmp\rust>dir /s /b
M:\tmp\rust\directory
M:\tmp\rust\DIRECTORY
M:\tmp\rust\directory\crate1
M:\tmp\rust\DIRECTORY\crate1

Note how crate1 is found twice (once with each spelling of directory/DIRECTORY), whereas CRATE2 can't be accessed, e.g.:

M:\tmp\rust>copy DIRECTORY\CRATE2
The system cannot find the file specified.

Romt therefore currently doesn't support using the same tree of crates in both case-sensitive and case-insensitive mode simultaneously. Reading and writing exclusively in one mode or the other works. I've reopened this issue pending a fix.

@drmikehenry drmikehenry reopened this Apr 16, 2022
@MrWorldwideMkII
Copy link
Author

From my experiments, on the same network drive, mounted on two linux systems:

  1. fedora using nfs (case sensitive)
  2. ubuntu using cifs (case insensitive)

When creating directories with the same name but different casing from the fedora machine:
aa, aA, Aa, AA appear as aa, AA~1, AA~2, AA~3 on the ubuntu machine.

The numbers seem to correspond to the directories' order of creation rather than something to do with the casing.

Anyway currently I'm planning to serve the crates just from the fedora machine so it works for me, but I agree that something should be figured out.

drmikehenry added a commit that referenced this issue Apr 27, 2022
This avoids problems when using a crate mirror with both case-sensitive
and case-insensitive filesystems simultaneously; see
#14.  See README.rst for
details.
@drmikehenry
Copy link
Owner

Romt-0.4.0 now can use lowercase crate prefixes in addition to mixed-case prefixes. Lowercase is the default. Crate archives continue to use mixed-case prefixes for interoperability with older Romt, though new Romt can now use either prefix format.

See the README.rst section on "Upgrading from Romt versions before 0.4.0" for more details.

@drmikehenry
Copy link
Owner

Closing with the assumption that Romt-0.4.0 fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants