Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialization: provide deserialize-info from toplevel modules #59

Merged
merged 2 commits into from
Apr 7, 2024

Conversation

Bogdanp
Copy link
Contributor

@Bogdanp Bogdanp commented Apr 6, 2024

This change makes it so that the following program runs successfully after being converted to an executable using raco exe:

#lang racket/base

(require gregor
         racket/serialize)

(deserialize (serialize (current-time)))
(deserialize (serialize (today)))
(deserialize (serialize (now/moment)))

As it stands, the deserialization process fails to look up the deserialization-info submodules in an exe. On failure, that process then has a fallback that looks at the toplevel module where the struct is declared to get the info. So, this change fixes the issue by providing the info directly from the toplevel modules in addition to the deserialization-info submodules.

See racket/racket#4967 for more details.

This change makes it so that the following program runs successfully
after being converted to an executable using `raco exe`:

    #lang racket/base

    (require gregor
             racket/serialize)

    (deserialize (serialize (current-time)))
    (deserialize (serialize (today)))
    (deserialize (serialize (now/moment)))

As it stands, the deserialization process fails to look up the
`deserialization-info` submodules in an exe. On failure, that process
then has a fallback that looks at the toplevel module where the struct
is declared to get the info. So, this change fixes the issue by
providing the info directly from the toplevel modules in addition to
the `deserialization-info` submodules.

See racket/racket#4967 for more details.
@rfindler
Copy link

rfindler commented Apr 6, 2024

I don't actually know how all this works but if I'm reading the linked issue correctly, is the problem simply that there is no depenency from the gregor top-level module to the submodule that has the provide of the deserialize information and thus raco exe doesn't add the submodule?

If that's the case, would it work to make a module instead of a module+ and put a bunch of stuff into that submodule, but then have the top-level module reprovide it? That way, the top-level module's exports wouldn't have to change.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented Apr 6, 2024

I think moving the struct into a module submodule and reproviding that would probably resolve the problem for the same reason this change solves it. But, simply adding an explicit dependency (eg. requiring it or adding a define-runtime-module-path-index from somewhere else) on the submodule isn't enough to resolve the issue. The problem is that, in an exe, the submodule is embedded with a path like #%embedded:gregor/private/datetime>:(deserialize-info), but the racket/serialize library tries to look up the submodule by doing something like

(module-path-index-resolve
 (module-path-index-join
  '(submod "." deserialize-info)
  (module-path-index-join
   '(lib "gregor/private/datetime.rkt") #f)))

which, in a regular module resolves to

#<resolved-module-path:(submod "/Users/bogdan/sandbox/gregor/gregor-lib/gregor/private/datetime.rkt" deserialize-info)>

and can be dynamic-required, but in an exe resolves to

#<resolved-module-path:(submod '#%embedded:gregor/private/datetime: deserialize-info)>

which is different from the embedded path above, so the dynamic-require fails. Maybe the embedding process could do a better job of recording submodule paths such that they can be resolved at runtime correctly, but I haven't dug into it. Even if that improvement can be made to the embedding process, that still leaves the problem of needing an explicit dependency on the submodule.

@rfindler
Copy link

rfindler commented Apr 6, 2024

Okay! Thank you!

@LiberalArtist
Copy link

IIUC, the same problem will affect lots of other serializable structures, I think including those using serializable-struct. It seems like it would be better to figure out how to make this work out-of-the-box than to have to make manual changes in many places.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented Apr 6, 2024

Uses of serializable-struct are not affected because the serializable-struct macro expands to some code that includes a use of (runtime-require (submod "." deserialize-info)), and that seems to get the submodule recorded in the exe's module lookup table in such a way that its path can be resolved at runtime. I'll update this PR to use runtime-require instead of reproviding the bindings from the toplevel, since that seems better overall.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented Apr 6, 2024

Digging a little deeper into the embedding code, I think this section of code in the embedded exe module resolver is what makes the runtime-require work, because it specifically recognizes the kind of relative dependency from the parent (base) module to the child that the runtime-require form introduces (and that explains why an external dependency on the module, like I was trying here, wouldn't work).

@97jaz
Copy link
Owner

97jaz commented Apr 7, 2024

I know very little about the implementation of serialization in Racket, so maybe I'm off-base here, but this really sounds like a problem with the serialization code. Is that not the case?

@Bogdanp
Copy link
Contributor Author

Bogdanp commented Apr 7, 2024

I don't think the serialization code can do much better than it currently does here. The issue is that, fundamentally, to deserialize a piece of serialized data, a module has to be looked up at runtime to discover the deserialization info. In an exe, the submodule by which the deserialize info is looked up is not going to be resolvable unless there is a dependency on it as recorded by something like runtime-require. In fact, without a runtime-require, the submodule won't even be included in the generated executable's module resolution table (unless user code does records a dependency on it), because nothing will statically depend on it.

I think this approach is in line with how other dependencies get recorded for runtime and for executable distribution (i.e. lazy-require, define-runtime-path and co.), and with how serializable-struct itself works, so it seems fine to me. Ideally, we would just use serializable-struct instead of worrying about any of this, but I don't think that's an option here since these structs have custom serialization behavior compared to what serializable-struct would generate.

@97jaz
Copy link
Owner

97jaz commented Apr 7, 2024

Ah, okay. I don't really understand why locating deserialization info should require looking up a module at runtime. When I wrote the serialization code, I was following examples (though I'm no longer sure which examples, since the ones I see now in the reference look different and don't use a submodule). But given that it does need to look up a module at runtime, I agree with you. Thanks for the patch!

@97jaz 97jaz merged commit f56215d into 97jaz:master Apr 7, 2024
@LiberalArtist
Copy link

Along those lines, runtime-require seems much better than the manual solution, and I think it would be worth mentioning in the serialization docs. I also wondered if something could be done more automatically in racket/serialize by using compiler/cm-accomplice, but it might end up causing more confusion to try to handle special cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants