Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: add RepositoryPathBuf #1921

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cruessler
Copy link
Contributor

This is a PoC that adds RepositoryPathBuf in gix_object::tree, for initial use in gix_object::tree::Entry. It is intended to make sure I start making the right changes in the right places.

This PR adds RepositoryPathBuf and uses it for gix_object::tree::Entry::filename. I adapted other places in the code until I got just check passing.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started.

Besides my notes, one key aspect of it is that there is a &RepositoryPath that only exists as reference, similar to PathBuf/&Path. An example for this is gix-ref::FileName|&FileNameRef, it needs some unsafe to do the conversion.

Finally, there must be an std::str::from_utf8_unchecked() equivalent for APIs that only have a single component, for example, or do their own validation naturally as part of the parsing.

This leads me to an interesting observation: gix_object::Tree::entries actually only has a RepositoryPathComponent, which might be a type we want explicitly just to say it's definitely not having slashes in there.

And with all this as a start, I think it will come together naturally.

}
}

impl std::ops::Deref for RepositoryPathPuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be to avoid having easy conversions to BString and treat it as distinct type with its own PathBuf-like API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! This was just a shortcut to get this first addition to compile with the least amount of changes possible (I should have been more clear about the context and the status of this initial commit).

A more general question: is the idea to have both the new types backed by inner: BString (and inner: &BStr) or rather by inner: Vec<u8> (and inner: &[u8])? If the latter, we would need to implement all the necessary functions that are currently provided by BString (and &BStr) ourselves, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path of least resistance (and the preferred one) should be to let it be backed by BString|&BStr. Maybe this will also be a great preparation for the time when std::…::ByteString can be used, when only the backing needs to be adjusted as most code is using RepositoryPathComponent|RepositoryPathBuf and friends.

}
}

impl From<BString> for RepositoryPathPuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no API that converts to this type without checking the input - paths now have to be relative.

@@ -249,14 +249,57 @@ impl Ord for EntryRef<'_> {
}
}

/// TODO
/// Keep in mind that the path separator always is `/`, independent of the platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on top of that, it's a relative path with only Normal path components. Also it's always represented as bunch of bytes.

@cruessler
Copy link
Contributor Author

I removed Deref for BString and did a couple of follow-up changes to make the code compile.

Is this what you had in mind?

Next, I want to convert the From implementations into TryFrom ones.

@Byron
Copy link
Member

Byron commented Apr 11, 2025

Apologies for the late response, and thanks for pushing this forward.

I removed Deref for BString and did a couple of follow-up changes to make the code compile.

  • I added a method RepositoryPathBuf::to_bstring for use in places where surrounding code expects a BString. Is this a method that we would only use temporarily, so the goal would be to remove it once all the relevant places have been converted to RepositoryPathBuf?

Yes, that seems like the way to go. I didn't check the code and maybe the documentation of to_bstring() says it all, but I'd even go as far as to name it to_bstring_do_not_use() to make clear what it ought to be.

That seems like something one could expect for a bundle-of-bytes path type.

Is this what you had in mind?

Now that I see this I wonder if it should be a bundle of bytes, and as such can deref to bytes so it can be used similarly as Git would do in C.

Next, I want to convert the From implementations into TryFrom ones.

That sounds good!

I didn't look at the code yet, but would do so after your next session. If that feels a little unsafe, please let me know and I will get to it earlier.

@cruessler cruessler force-pushed the introduce-repository-path branch from 2b8f377 to 574d003 Compare April 12, 2025 11:19
@cruessler
Copy link
Contributor Author

Thanks for the feedback! I rebased the PR and made a few more changes. I mainly replaced the From implementations by TryFrom ones. I think I’ve completed all items on my TODO list now, so I’ll wait for your feedback before making other changes.

  • object::tree::iter::lookup_entry::utils returns Option, but uses try_into(). Does it need error handling?
  • There’s now a couple of try_into().expect("TODO") in tests. The tests don’t fail, so it appears these calls could also be replaced by try_into().unwrap() or a more descriptive expect.
  • There’s From<tree::EntryRef<'_>> for tree::Entry in gix-object/src/object/convert.rs that now needs to call try_into(). I got it to compile using try_into().expect("TODO"), but this does not feel like it is supposed to be the final state. Or can we expect try_into to never fail in this method?
  • I needed to add a dependency on gix-fs in gix-object for it to import gix_fs::stack::to_normal_path_components which I wanted to use in the TryFrom implementations.
  • Changing inner: BString to inner: Vec<u8> seems possible. We would need to change the Display implementation so that it converts the underlying bytes to characters, otherwise a couple of tests start failing.

@Byron
Copy link
Member

Byron commented Apr 12, 2025

Thanks so much for continuing this adventure with me :)!

While starting to work on looking at the PR more thoroughly, I realised that there is another thing to consider: Even though users of the API should be able to leverage the type-system, gitoxide itself still should be able to represent all states that Git can read on the lower levels. For instance, gix fsck can see all kinds of garbage and should be able to report it. If it would insta-fail because validation is in the wrong spots, that command couldn't exist.

Once again, I kept it quick, letting your questions drive me.

Thanks for the feedback! I rebased the PR and made a few more changes. I mainly replaced the From implementations by TryFrom ones. I think I’ve completed all items on my TODO list now, so I’ll wait for your feedback before making other changes.

  • object::tree::iter::lookup_entry::utils returns Option, but uses try_into(). Does it need error handling?

It's for tests and and it's related to filename - please see below - so, I'd skip that.

  • There’s now a couple of try_into().expect("TODO") in tests. The tests don’t fail, so it appears these calls could also be replaced by try_into().unwrap() or a more descriptive expect.

I think that's also related to filename in all the cases I saw.

  • There’s From<tree::EntryRef<'_>> for tree::Entry in gix-object/src/object/convert.rs that now needs to call try_into(). I got it to compile using try_into().expect("TODO"), but this does not feel like it is supposed to be the final state. Or can we expect try_into to never fail in this method?

I think filename in Entry* is out of scope. It's not a path, it's a filename, technically a single, slash-free component of a byte path.
One day this could be more strongly typed, but that can definitely happen another day.

  • I needed to add a dependency on gix-fs in gix-object for it to import gix_fs::stack::to_normal_path_components which I wanted to use in the TryFrom implementations.

I always thought that RepositoryPath* is so general that it should be in a lower-level/less-depended-on crate even, gix-path came to mind. Does that solve the issue?

  • Changing inner: BString to inner: Vec<u8> seems possible. We would need to change the Display implementation so that it converts the underlying bytes to characters, otherwise a couple of tests start failing.

I think it's OK to keep it as BString, knowing that one day it would be ByteString.

On another note, I'd turn to_bstring_do_not_use() into into_bstrIng_do_not_use(self)->BString, that way the clone() must be outside, which is preferable here.

And here it comes: Since RepositoryPathBuf is a bag of bytes, it should probably deref to &[u8] to easily plug into existing code. However, code that meddles with / should be reviewed as from there one might be able to offer more convenient APIs, sings like push_component or something like that.
It's notable that \ are perfectly valid within such a RepositoryPaths, they just aren't path-separators. The type is meant to be mostly an indicator of what's needed with some type-safety added.

In a way, ToNormalPathComponents is already the trait that makes inputting such paths most flexible. In case of RepositoryPath* it's actually infallible, something that's not representable by the trate yet and that is a pain to do.

So (at least) one question remains: where would the user get these paths? Entry::filename it is not, so one obvious place comes to mind: It's the gix_index::State. There, however, it should be made so that it can still return unvalidated paths, and validated ones. As far as I know it doesn't validate the paths it reads, which is the way to go, but the user should have a hard(er) time to retrieve such unvalidated paths, and an easy time to get validated RepositoryPath*s.

So in short, I think what should be done is to leave filename alone for now, move RepositoryPathBuf into gix-path, and then see where RepositoryPathBuf can actually be returned, gix-index comes to mind.

Does that make any sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants