Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssetPath source parse fix #11543

Merged
merged 12 commits into from Jan 26, 2024
Merged

Conversation

thepackett
Copy link
Contributor

@thepackett thepackett commented Jan 26, 2024

Objective

Fixes #11533

When AssetPaths are created from a string type, they are parsed into an AssetSource, a Path, and a Label.
The current method of parsing has some unnecessary quirks:

  • The presence of a : character is assumed to be the start of an asset source indicator.
    • This is not necessarily true. There are valid uses of a : character in an asset path, for example an http source's port such as localhost:80.
  • If there are multiple instances of ://, the last one is assumed to be the asset source deliminator.
    • This has some unexpected behavior. Even in a fully formed path, such as http://localhost:80, the : between localhost and 80 is assumed to be the start of an asset source, causing an error since it does not form the full sequence ://.
  • The very first asset label delimiter # is used to determine the asset label, and all further parsing and validation stops there.

Solution

Changes the AssetPath's parse_internal method to be more permissive.

  • Only the exact sequence :// is taken to be the asset source delimiter, and only the first one if there are multiple.
  • Only the last # is taken to be the asset label delimiter.
  • The asset label delimiter # cannot be contained in the asset source.
  • The asset source delimiter :// cannot be contained in the asset label.

The ParseAssetPathError enum and the doc tests have been updated to reflect these new conditions.

@thepackett thepackett changed the title AssetPath source fix AssetPath source parse fix Jan 26, 2024
@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Assets Load files from disk to use for things like images, models, and sounds labels Jan 26, 2024
@alice-i-cecile
Copy link
Member

As a consequence, it is no longer possible to detect a malformed asset source deliminator, and so the corresponding error was removed.

It looks like the error type was not actually removed? I see a removed test but it sounded like you meant to remove an Error enum variant completely.

@rparrett
Copy link
Contributor

diff --git a/crates/bevy_asset/src/path.rs b/crates/bevy_asset/src/path.rs
index 689913169..9b4b68bb6 100644
--- a/crates/bevy_asset/src/path.rs
+++ b/crates/bevy_asset/src/path.rs
@@ -135,7 +135,7 @@ impl<'a> AssetPath<'a> {
         let mut label_range = None;
         while let Some((index, char)) = chars.next() {
             match char {
-                ':' => {
+                ':' if source_range.is_none() => {
                     let (_, char) = chars
                         .next()
                         .ok_or(ParseAssetPathError::InvalidSourceSyntax)?;

Seems like something like this could fix it, and avoid scanning the path twice.

@thepackett
Copy link
Contributor Author

As a consequence, it is no longer possible to detect a malformed asset source deliminator, and so the corresponding error was removed.

It looks like the error type was not actually removed? I see a removed test but it sounded like you meant to remove an Error enum variant completely.

My bad, you're correct. I've removed the error variant now.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be opposed to heavy commenting of this entire function, but your test case clearly demonstrates both the problem and that your fix works.

@thepackett
Copy link
Contributor Author

diff --git a/crates/bevy_asset/src/path.rs b/crates/bevy_asset/src/path.rs
index 689913169..9b4b68bb6 100644
--- a/crates/bevy_asset/src/path.rs
+++ b/crates/bevy_asset/src/path.rs
@@ -135,7 +135,7 @@ impl<'a> AssetPath<'a> {
         let mut label_range = None;
         while let Some((index, char)) = chars.next() {
             match char {
-                ':' => {
+                ':' if source_range.is_none() => {
                     let (_, char) = chars
                         .next()
                         .ok_or(ParseAssetPathError::InvalidSourceSyntax)?;

Seems like something like this could fix it, and avoid scanning the path twice.

This still doesn't fix the issue of localhost:80 being an invalid AssetPath. This parse function is called on AssetPath creation, so if you were using something like an http:// asset source as the default, then localhost:80 ought to be a valid path, but it wouldn't be.

In any case, is using find and rfind really a significant performance hit?
They only look for the first instance of a substring, so they could early out once they find it.
I tried looking into the standard library implementation to see if they do, although I couldn't make much sense of it since it is heavily generic.

@rparrett
Copy link
Contributor

rparrett commented Jan 26, 2024

This still doesn't fix the issue of localhost:80 being an invalid AssetPath.

Does it not? The test you added passes with that change. (If not, another test would be nice)

In any case, is using find and rfind really a significant performance hit?

This code isn't likely to be in a hot path, so probably not, but most asset paths at the moment use the default source with no label.

@thepackett
Copy link
Contributor Author

This still doesn't fix the issue of localhost:80 being an invalid AssetPath.

Does it not? The test you added passes with that change. (If not, another test would be nice)

I'll add another test to show that case.

In any case, is using find and rfind really a significant performance hit?

This code isn't likely to be in a hot path, so probably not, but most asset paths at the moment use the default source with no label.

Good point, you are right that if an AssetPath uses the default source with no label then the &str will be scanned twice. I could modify the previous algorithm so that it's always one pass. The only reason I opted not to was because it would quite messy.

I wouldn't be opposed to heavy commenting of this entire function

Maybe with some good comments though, it wouldn't matter too much if the function implementation itself is a bit messy.

@rparrett
Copy link
Contributor

rparrett commented Jan 26, 2024

I could modify the previous algorithm so that it's always one pass.

I thought it was worth discussing, because the single-passness was touted here: #9885 when that code was added.

I'm not personally sure how much it matters. The error being removed doesn't seem super useful, and I'm not sure that the optimization would make a difference in any real-world uses.

Either way, this seems like a high priority bug to fix, so I'm happy as long as the test coverage is there for future refactoring.

@alice-i-cecile
Copy link
Member

Yeah, I'm much more concerned with correctness and tests than performance here. We can refactor later if it shows up.

@BD103
Copy link
Member

BD103 commented Jan 26, 2024

Does this branch also close #11271?

@thepackett
Copy link
Contributor Author

Does this branch also close #11271?

Not quite, but it does help. This PR will allow the path portion of the AssetPath to contain anything and will no longer error when it sees a : character. However, interpreting the path is the job of the AssetReader, and the default file AssetReader interprets all paths as relative.

The options for an absolute path would either be to:

  1. Modify the file AssetReader to check the provided path for an absolute path indicator like C:.
  2. Add a new absolute file AssetReader which uses the provided paths as absolute paths.

@thepackett
Copy link
Contributor Author

thepackett commented Jan 26, 2024

I changed it back to a single pass algorithm. It's a bit messy, but hopefully the comments I've added are enough to clarify the behavior.

I've also discovered a couple new error cases:

  • # in source such as #insource://a/b.test
  • :// in label such as source://a/b.test#://inlabel

To account for these two error cases, I've forbidden # in sources and :// in labels. I've added appropriate error types, as well as tests for these cases.

@alice-i-cecile
Copy link
Member

Remember to update the PR description with the final set of changes :)

Copy link
Contributor

@NthTensor NthTensor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to accept both empty strings "" and pathless strings "source://#label". Is this intended behavior?

@thepackett
Copy link
Contributor Author

This seems to accept both empty strings "" and pathless strings "source://#label". Is this intended behavior?

This is consistent with the prior behavior. The Path portion of the AssetPath doesn't really have any restrictions on it, and it is up to the AssetReader to interpret it. For example, with the default file AssetReader, an empty path "" would be interpreted as the assets folder directory and could be used with the AssetServer's load_folder method.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 26, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 26, 2024
Merged via the queue into bevyengine:main with commit 182c21d Jan 26, 2024
23 checks passed
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
# Objective


Fixes bevyengine#11533 


When `AssetPath`s are created from a string type, they are parsed into
an `AssetSource`, a `Path`, and a `Label`.
The current method of parsing has some unnecessary quirks:

- The presence of a `:` character is assumed to be the start of an asset
source indicator.
- This is not necessarily true. There are valid uses of a `:` character
in an asset path, for example an http source's port such as
`localhost:80`.
- If there are multiple instances of `://`, the last one is assumed to
be the asset source deliminator.
- This has some unexpected behavior. Even in a fully formed path, such
as `http://localhost:80`, the `:` between `localhost` and `80` is
assumed to be the start of an asset source, causing an error since it
does not form the full sequence `://`.


## Solution
Changes the `AssetPath`'s `parse_internal` method to be more permissive.
- Only the exact sequence `://` is taken to be the asset source
deliminator, and only the first one if there are multiple.
- As a consequence, it is no longer possible to detect a malformed asset
source deliminator, and so the corresponding error was removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Assets Load files from disk to use for things like images, models, and sounds C-Bug An unexpected or incorrect behavior S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trouble loading assets from localhost while doing development
6 participants