Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading Smithy files from jars with urlencoded characters in paths #850

Merged
merged 9 commits into from
Mar 13, 2023

Conversation

kubukoz
Copy link
Member

@kubukoz kubukoz commented Mar 4, 2023

What's the problem

Models aren't loaded when the dependency jar has URLEncoded characters in its filesystem path

Why is this a problem?

  • Dynver uses + as a version separator by default, which Coursier will urlencode before putting it into the cache directory
  • Repositories that require credentials may end up with the user@hostname part in the filesystem path, and @ is, again, URLEncoded there.

(if you ask me, it's kinda weird that Coursier urlencodes everything in the paths - e.g. sbt's publishLocal puts things in ~/.ivy2/local/org.polyvariant/test-library-core_2.13/0.0.1+123-SNAPSHOT)

How to reproduce

  1. I published a library with two versions (identical contents):
org.polyvariant:test-library-core_2.13:0.0.1-SNAPSHOT
org.polyvariant:test-library-core_2.13:0.0.1+123-SNAPSHOT

This is publicly available on sonatype s01 snapshots.

Both versions are identical and contain the following structure:

META-INF/MANIFEST.MF
META-INF/smithy/manifest
META-INF/smithy/testlibrary.smithy

The manifest lists the testlibrary.smithy file, which contains the following:

$version: "2"

namespace testlibrary

string MyString
  1. Then I'm trying to load the models using the default smithy4s facilities. See the test case attached to this PR.

What I think is wrong

Here's where Coursier puts the problematic jar:

$COURSIER_CACHE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar

Here's how we're trying to load it in ModelLoader:

val deps = resolveDependencies(dependencies, localJars, repositories)
val modelsInJars = deps.flatMap { files =>
val manifestUrl =
ModelDiscovery.createSmithyJarManifestUrl(files.getAbsolutePath())
try { ModelDiscovery.findModels(manifestUrl).asScala }
catch {
case _: ModelManifestException => Seq.empty
}
}

The findModels call here throws a ModelManifestException with an underlying cause of:

java.nio.file.NoSuchFileException: $COURSIER_CACHE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1+123-SNAPSHOT/test-library-core_2.13-0.0.1+123-SNAPSHOT.jar

You can see that here there are no URLEncoding escapes: 0.0.1+123-SNAPSHOT. The actual file path contains 0.0.1%2B123-SNAPSHOT, though, so clearly the file being opened is the wrong one... which seems wrong on the Smithy library side.

Switching to URLClassLoader-based discovery seems to be fine - I tried this approach in #846.

@kubukoz
Copy link
Member Author

kubukoz commented Mar 4, 2023

Here's a scala-cli snippet you can run as a worksheet, which showcases the difference without involving smithy4s:

//> using scala "2.13.10"
//> using lib "software.amazon.smithy:smithy-model:1.28.0"
//> using lib "io.get-coursier::coursier:2.1.0-RC6"
import software.amazon.smithy.model.Model
import coursier.util.Task
import coursier.cache.FileCache
import coursier._
import software.amazon.smithy.model.loader.ModelDiscovery
import scala.util.Try
import java.net.URLClassLoader
import scala.concurrent.duration.Duration
import java.nio.file.Paths

val withSpecialCharacters = true

val lib =
  if (withSpecialCharacters)
    dep"org.polyvariant:test-library-core_2.13:0.0.1+123-SNAPSHOT"
  else
    dep"org.polyvariant:test-library-core_2.13:0.0.1-SNAPSHOT"

val f =
  Fetch[Task](FileCache().withTtl(Duration.Zero))
    .addDependencies(lib)
    .addRepositories(mvn"https://s01.oss.sonatype.org/content/repositories/snapshots")
    .run()
    .head

f.getAbsolutePath()

val manUrl = ModelDiscovery.createSmithyJarManifestUrl(
  f.getAbsolutePath()
)

val r = Try(ModelDiscovery.findModels(manUrl))

// false
r.isSuccess

// ok
ModelDiscovery.findModels(new URLClassLoader(Array(f.toURI().toURL()), null))
Worksheet output
//> using scala "2.13.10"
//> using lib "software.amazon.smithy:smithy-model:1.28.0"
//> using lib "io.get-coursier::coursier:2.1.0-RC6"
import software.amazon.smithy.model.Model
import coursier.util.Task
import coursier.cache.FileCache
import coursier._
import software.amazon.smithy.model.loader.ModelDiscovery
import scala.util.Try
import java.net.URLClassLoader
import scala.concurrent.duration.Duration
import java.nio.file.Paths

val withSpecialCharacters = true
// withSpecialCharacters: Boolean = true

val lib =
  if (withSpecialCharacters)
    dep"org.polyvariant:test-library-core_2.13:0.0.1+123-SNAPSHOT"
  else
    dep"org.polyvariant:test-library-core_2.13:0.0.1-SNAPSHOT"
// lib: Dependency = Dependency(
//   module = Module(
//     organization = Organization(value = "org.polyvariant"),
//     name = ModuleName(value = "test-library-core_2.13"),
//     attributes = Map()
//   ),
//   version = "0.0.1+123-SNAPSHOT",
//   configuration = Configuration(value = ""),
//   minimizedExclusions = MinimizedExclusions(data = ExcludeNone),
//   publication = Publication(
//     name = "",
//     type = Type(value = ""),
//     ext = Extension(value = ""),
//     classifier = Classifier(value = "")
//   ),
//   optional = false,
//   transitive = true
// )

val f =
  Fetch[Task](FileCache().withTtl(Duration.Zero))
    .addDependencies(lib)
    .addRepositories(mvn"https://s01.oss.sonatype.org/content/repositories/snapshots")
    .run()
    .head
// f: java.io.File = /Users/kubukoz/Library/Caches/Coursier/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar

f.getAbsolutePath()
// res0: String = "/Users/kubukoz/Library/Caches/Coursier/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar"

val manUrl = ModelDiscovery.createSmithyJarManifestUrl(
  f.getAbsolutePath()
)
// manUrl: java.net.URL = jar:file:/Users/kubukoz/Library/Caches/Coursier/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar!/META-INF/smithy/manifest

val r = Try(ModelDiscovery.findModels(manUrl))
// r: Try[java.util.List[java.net.URL]] = Failure(
//   exception = software.amazon.smithy.model.loader.ModelManifestException: Error parsing Smithy model manifest from jar:file:/Users/kubukoz/Library/Caches/Coursier/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar!/META-INF/smithy/manifest
// )

// false
r.isSuccess
// res1: Boolean = false

// ok
ModelDiscovery.findModels(new URLClassLoader(Array(f.toURI().toURL()), null))
// res2: java.util.List[java.net.URL] = [jar:file:/Users/kubukoz/Library/Caches/Coursier/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%252B123-SNAPSHOT/test-library-core_2.13-0.0.1%252B123-SNAPSHOT.jar!/META-INF/smithy/testlibrary.smithy]

@Baccata
Copy link
Contributor

Baccata commented Mar 6, 2023

I'm not sure I want to re-introduce a URLClassLoader-based solution considering we moved away from it (although I acknowledge that I don't remember exactly what caused me to move away from it).

I think that new URLClassLoader(deps, null) and new URLClassLoader(deps) (what we were using before) are equivalent, but looking at the source code I'm not clear if it is the case. Assuming they are equivalent, I'd be inclined to trust my past self on the decision to not use URLClassLoader anymore, and attempt to fix by attempting to load the models from the URL-encoded path by applying a preliminary check to verify whether the non-URL-encoded path exists.

@kubukoz
Copy link
Member Author

kubukoz commented Mar 6, 2023

I think that new URLClassLoader(deps, null) and new URLClassLoader(deps) (...) are equivalent

That seems to be the case indeed. Somehow I thought we were passing a parent classloader, but we only did so for validatorClassLoader (as we do now).

Hard to say what was the reason to move away from that if we don't have a test case.


attempt to fix by attempting to load the models from the URL-encoded path by applying a preliminary check to verify whether the non-URL-encoded path exists.

so are you saying the behavior of findModels(URL) is correct? I'm not sure how an extra check would help things.

Another thing that seems to work:

val manUrl = ModelDiscovery.createSmithyJarManifestUrl(
-  f.getAbsolutePath()
+  f.toURI().toURL().getPath
)

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Mar 6, 2023

AFAIK toURL() is a side effecting routine that does blocking IP resolution under the hood.

@Baccata
Copy link
Contributor

Baccata commented Mar 6, 2023

AFAIK toURL() is a side effecting routine that does blocking IP resolution under the hood.

Totally, but the URL in question are using the file:// scheme. I'm certainly hoping that there's no dns lookup when that's the case.

@Baccata
Copy link
Contributor

Baccata commented Mar 7, 2023

Another thing that seems to work:

@kubukoz I think we should roll with that 👍

Hard to say what was the reason to move away from that if we don't have a test case.

Agreed, but this stuff tends to be hard to reproduce in automated tests, so sometimes corners get cut (I'm more comfortable with cutting corners for the build-time than the runtime)

Sometimes I think the situation would be better if we were running the codegen in a forked JVM, but that would lose some compile-time guarantees that let us verify that build-plugins pass the correct information to the codegen. This has the consequence of the codegen running on the same JVM as SBT/mill, which may be impacted by their respective classloaders... sometimes I think that anything involving classloaders is side-effecty and impacted by the current state of the runtime, even if theoretically you're instantiating a classloader in isolation ...

@kubukoz
Copy link
Member Author

kubukoz commented Mar 7, 2023

Fair enough. @Baccata are you fine with the setup I used here (with the polyvariant library) or should I do something else to test it?

Do we need the scripted test or is it OK if we just test ModelLoader directly?

@kubukoz kubukoz changed the title Reproduce issue with Smithy files in jars with urlencoded paths Allow loading Smithy files from jars with urlencoded paths Mar 7, 2023
@Baccata
Copy link
Contributor

Baccata commented Mar 8, 2023

I'm happy with the setup 👍 I think it's fine to ditch the scripted test.

Comment on lines 51 to 59
val modelsInJars = deps.flatMap { file =>
Using.resource(
// Note: On JDK13+, the second parameter is redundant.
FileSystems.newFileSystem(file.toPath(), null)
) { jarFS =>
val p = jarFS.getPath("META-INF", "smithy", "manifest")

if (Files.exists(p)) {
try ModelDiscovery.findModels(p.toUri().toURL()).asScala.toList
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some trust in this approach, as it avoids dealing with any String-based path/uri manipulation, unlike ModelDiscovery.createSmithyJarManifestUrl which takes a String.

Copy link
Contributor

@daddykotex daddykotex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love what I see, thanks for the fix

@Baccata Baccata merged commit cb36889 into series/0.17 Mar 13, 2023
@Baccata Baccata deleted the urlencoded-dependency-paths branch March 13, 2023 15:13
@kubukoz kubukoz changed the title Allow loading Smithy files from jars with urlencoded paths Allow loading Smithy files from jars with urlencoded characters in paths Mar 13, 2023
@kubukoz kubukoz mentioned this pull request Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants