Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add util methods to parse proto files with dependencies #3834

Merged
merged 2 commits into from
Oct 25, 2023

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Oct 17, 2023

This is to open a discussion about a new public API to parse a file and its dependencies.

In the current state I don't like (at all) the other public API I've exposed for #3819

ie

        Descriptors.FileDescriptor mainProtoFd = FileDescriptorUtils.parseProtoFileWithDependencies(mainProto,
                mainProtoFileName,
                Map.of("mypackage0/producerId.proto", importedProto1, "mypackage2/version.proto", importedProto2));

@apicurio-bot
Copy link

apicurio-bot bot commented Oct 17, 2023

Thank you for creating a pull request!

Pinging @andreaTP to respond or triage.

gr8routdoors pushed a commit to gr8routdoors/apicurio-registry that referenced this pull request Oct 17, 2023
Caching of artifacts with no version (e.g. latest) did not work because reindex would use the new key that has the artifact version that was found in the lookup.  This code changes the default behavior to index both the artifact with its version and the latest/null version.  It also exposes a configuration property (`apicurio.registry.cache-latest`) where this behavior can be disabled for use cases where caching of latest is not desired.  See Apicurio#3824 for details.
@franz1981 franz1981 marked this pull request as ready for review October 18, 2023 10:02
@franz1981
Copy link
Contributor Author

@carlesarnal I've missed few tests (on purpose) because I need first to understand which parts to change/improve (and maybe deduplicate, cause i've introduced a couple of wrappers for concept that are maybe already defined elsewhere)

Re the comment on the APi I would use in Hyperfoil at #3834 (comment), I've replaced it with something as simple as passing files, which is instead the parse method version used by Apicurio.

@franz1981
Copy link
Contributor Author

franz1981 commented Oct 18, 2023

This PR doesn't just include the change to enable parsing protos + deps but it add a fix too to the ability to link protos which belong to different packages.
let me know if i need to split this in 2s

@franz1981 franz1981 force-pushed the batch_dep_solver branch 2 times, most recently from b23b57e to 50b7217 Compare October 18, 2023 12:34
carlesarnal pushed a commit that referenced this pull request Oct 20, 2023
…a updates (#3839)

* feat(schema-cache): ERCache.configureFaultTolerantRefresh #3807 (#3823)

Adds the notion of a fault tolerance in refresh for production environments where it's better to use a stale cache value than die when a cache entry refresh fails.  See issue #3807 for details.

Co-authored-by: Devon Berry <devon.berry@riotgames.com>

* fix(schema-resolver): caching of latest artifacts #3834

Caching of artifacts with no version (e.g. latest) did not work because reindex would use the new key that has the artifact version that was found in the lookup.  This code changes the default behavior to index both the artifact with its version and the latest/null version.  It also exposes a configuration property (`apicurio.registry.cache-latest`) where this behavior can be disabled for use cases where caching of latest is not desired.  See #3824 for details.

---------

Co-authored-by: Devon Berry <devon.berry@riotgames.com>
@franz1981
Copy link
Contributor Author

franz1981 commented Oct 20, 2023

@carlesarnal

I've just noticed that loading the schema is keep on loading imports which are not used and it seems is due to

    private static FileSystem getFileSystem() throws IOException {
        final FakeFileSystem inMemoryFileSystem = new FakeFileSystem();
        inMemoryFileSystem.setWorkingDirectory(okio.Path.get("/"));
        inMemoryFileSystem.setAllowSymlinks(true);

        final ClassLoader classLoader = ProtobufSchemaLoader.class.getClassLoader();

        createDirectory(GOOGLE_API_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, GOOGLE_API_PROTOS, GOOGLE_API_PATH);

        createDirectory(GOOGLE_WELLKNOWN_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, GOOGLE_WELLKNOWN_PROTOS, GOOGLE_WELLKNOWN_PATH);

        createDirectory(METADATA_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, Collections.singleton(METADATA_PROTO), METADATA_PATH);

        createDirectory(DECIMAL_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, Collections.singleton(DECIMAL_PROTO), DECIMAL_PATH);

        return inMemoryFileSystem;
    }

which is placing dependencies which maybe won't be required to resolve the specific proto file (nor its dependencies).

This could be done in a follow-up PR but ideally we would like to re-use the transitive resolution of dependencies mechanism of this PR
to filter the effectively used types in the previous java snipped. detecting which ones are really necessary to resolve a given proto file (which I suppose is part of the imports section on it - or in its dependencies); given that each proto is solved by re-creating a new file system, is still not optimal (and this is another follow-up PR, actually), but it would reduce the files we place into it which will be loaded by https://github.com/square/wire/blob/d58f02b4072403a16ff891e26ec8e926869e0444/wire-schema/src/commonMain/kotlin/com/squareup/wire/schema/internal/CommonSchemaLoader.kt#L120.

@franz1981
Copy link
Contributor Author

In addition I've checked how ProtobugDirectoryParser was resolving the dependencies of the failed test (now fixed ie autoRegisterProtoWithReferences)

Before this PR:

PARSE mode.proto schema deps = [] dependencies []
PARSE table_notification_type.proto schema deps = [mode.proto] dependencies [mode.proto]
PARSE table_info.proto schema deps = [mode.proto, table_notification_type.proto] dependencies [mode.proto, table_notification_type.proto]
PARSE table_notification.proto schema deps = [table_info.proto, mode.proto, table_notification_type.proto] dependencies [table_info.proto, mode.proto, table_notification_type.proto]

With this PR:

PARSE mode.proto schema deps = [] dependencies = []
PARSE table_info.proto schema deps = [mode.proto] dependencies = [mode.proto]
PARSE table_notification_type.proto schema deps = [] dependencies = []
PARSE table_notification.proto schema deps = [table_notification_type.proto, mode.proto, table_info.proto] dependencies = [table_info.proto, table_notification_type.proto]

And I've noticed that before this PR we always add ALL the base dependencies Fds, while this PR strip it out in order to use just the required one (which is a win).

@carlesarnal
Copy link
Member

This PR doesn't just include the change to enable parsing protos + deps but it add a fix too to the ability to link protos which belong to different packages. let me know if i need to split this in 2s

No need to split, it's fine as it is.

@@ -93,6 +93,10 @@ private static void loadProtoFiles(FakeFileSystem inMemoryFileSystem, ClassLoade
for (String proto : protos) {
//Loads the proto file resource files.
final InputStream inputStream = classLoader.getResourceAsStream(protoPath + proto);
// TODO tmp fix to run tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remember to remove this :)

@carlesarnal
Copy link
Member

@carlesarnal

I've just noticed that loading the schema is keep on loading imports which are not used and it seems is due to

    private static FileSystem getFileSystem() throws IOException {
        final FakeFileSystem inMemoryFileSystem = new FakeFileSystem();
        inMemoryFileSystem.setWorkingDirectory(okio.Path.get("/"));
        inMemoryFileSystem.setAllowSymlinks(true);

        final ClassLoader classLoader = ProtobufSchemaLoader.class.getClassLoader();

        createDirectory(GOOGLE_API_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, GOOGLE_API_PROTOS, GOOGLE_API_PATH);

        createDirectory(GOOGLE_WELLKNOWN_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, GOOGLE_WELLKNOWN_PROTOS, GOOGLE_WELLKNOWN_PATH);

        createDirectory(METADATA_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, Collections.singleton(METADATA_PROTO), METADATA_PATH);

        createDirectory(DECIMAL_PATH.split("/"), inMemoryFileSystem);
        loadProtoFiles(inMemoryFileSystem, classLoader, Collections.singleton(DECIMAL_PROTO), DECIMAL_PATH);

        return inMemoryFileSystem;
    }

which is placing dependencies which maybe won't be required to resolve the specific proto file (nor its dependencies).

This could be done in a follow-up PR but ideally we would like to re-use the transitive resolution of dependencies mechanism of this PR to filter the effectively used types in the previous java snipped. detecting which ones are really necessary to resolve a given proto file (which I suppose is part of the imports section on it - or in its dependencies); given that each proto is solved by re-creating a new file system, is still not optimal (and this is another follow-up PR, actually), but it would reduce the files we place into it which will be loaded by https://github.com/square/wire/blob/d58f02b4072403a16ff891e26ec8e926869e0444/wire-schema/src/commonMain/kotlin/com/squareup/wire/schema/internal/CommonSchemaLoader.kt#L120.

Yes, this was done just to be sure we have all the well-known types available when resolving the main proto file in case any of them are being used. Obviously resolving the actually used ones and just adding them is way better than the current solution.

@carlesarnal
Copy link
Member

In addition I've checked how ProtobugDirectoryParser was resolving the dependencies of the failed test (now fixed ie autoRegisterProtoWithReferences)

Before this PR:

PARSE mode.proto schema deps = [] dependencies []
PARSE table_notification_type.proto schema deps = [mode.proto] dependencies [mode.proto]
PARSE table_info.proto schema deps = [mode.proto, table_notification_type.proto] dependencies [mode.proto, table_notification_type.proto]
PARSE table_notification.proto schema deps = [table_info.proto, mode.proto, table_notification_type.proto] dependencies [table_info.proto, mode.proto, table_notification_type.proto]

With this PR:

PARSE mode.proto schema deps = [] dependencies = []
PARSE table_info.proto schema deps = [mode.proto] dependencies = [mode.proto]
PARSE table_notification_type.proto schema deps = [] dependencies = []
PARSE table_notification.proto schema deps = [table_notification_type.proto, mode.proto, table_info.proto] dependencies = [table_info.proto, table_notification_type.proto]

And I've noticed that before this PR we always add ALL the base dependencies Fds, while this PR strip it out in order to use just the required one (which is a win).

Same as above, the old parser was just making anything "parsable" available. This is obviously a win (just as you said :))

carlesarnal pushed a commit that referenced this pull request Oct 23, 2023
Caching of artifacts with no version (e.g. latest) did not work because reindex would use the new key that has the artifact version that was found in the lookup.  This code changes the default behavior to index both the artifact with its version and the latest/null version.  It also exposes a configuration property (`apicurio.registry.cache-latest`) where this behavior can be disabled for use cases where caching of latest is not desired.  See #3824 for details.

Co-authored-by: Devon Berry <devon.berry@riotgames.com>
@franz1981 franz1981 force-pushed the batch_dep_solver branch 2 times, most recently from 4f0b335 to 63545d8 Compare October 23, 2023 10:42
@carlesarnal carlesarnal merged commit 9c54710 into Apicurio:main Oct 25, 2023
20 checks passed
@carlesarnal carlesarnal added the backport-2.x Backport fix to Registry v2 label Nov 3, 2023
@carlesarnal carlesarnal self-assigned this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-2.x Backport fix to Registry v2
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

None yet

2 participants