New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RProtoBuf cannot deserialize certain invalid R objects #28

Closed
jefshe opened this Issue Jul 27, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@jefshe
Contributor

jefshe commented Jul 27, 2017

R version 3.3.2 (2016-10-31)
RProtoBuf_0.4.9

When using this library to serialize and deserialize certain lm objects:

x <- factor(c("a","b","c","a","c"))
y <- 1:5
z<-lm(x ~ y)
serialized <- RProtoBuf::serialize_pb(object=z, connection=NULL)
RProtoBuf::unserialize_pb(serialized)

I get the following error:

Error in attributes(xobj) <- attrib : adding class "factor" to an invalid object

This is because lm is creating an invalid r object (a factor class with non integer values):

z$residuals
Error in as.character.factor(x) : malformed factor`

Although these calls to lm are invalid, It would be nice for the library to be able to support this since both read and readRDS both do. It would make sense for the library to be able to recreate the state of the object regardless of validity

I'm presuming this will most likely require part of the deserialization code to be written in C?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 27, 2017

Owner

PRs are always welcome :)

Owner

eddelbuettel commented Jul 27, 2017

PRs are always welcome :)

@jefshe

This comment has been minimized.

Show comment
Hide comment
@jefshe

jefshe Jul 27, 2017

Contributor

Any suggestions on where this might fit into the C source code?

Contributor

jefshe commented Jul 27, 2017

Any suggestions on where this might fit into the C source code?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 27, 2017

Owner

So the bug seems to be the lack of support for factor types as a plain lm() on numeric data serializes just fine:

> y <- rnorm(10)
> x <- runif(10)
> fit <- lm(y ~ x)
> serialized <- RProtoBuf::serialize_pb(object=fit, connection=NULL)
> 

Factors are somewhere between tricky (to get levels rights) and simple (really just an integer vector). We also don't have them in Rcpp itself which may be part of the problem.

Owner

eddelbuettel commented Jul 27, 2017

So the bug seems to be the lack of support for factor types as a plain lm() on numeric data serializes just fine:

> y <- rnorm(10)
> x <- runif(10)
> fit <- lm(y ~ x)
> serialized <- RProtoBuf::serialize_pb(object=fit, connection=NULL)
> 

Factors are somewhere between tricky (to get levels rights) and simple (really just an integer vector). We also don't have them in Rcpp itself which may be part of the problem.

@jefshe

This comment has been minimized.

Show comment
Hide comment
@jefshe

jefshe Jul 28, 2017

Contributor

I think it is more to do with the deserialization side. unrexp sets the attributes of the deserialized object within R itself which prevents the object from being recreated.

I was able to get the above example to deserialize properly by bypassing r's checking of factor types and setting the class attribute directly.

   unsafe_set_class <- cfunction(c(vec = "ANY",klass="ANY"), '
            PROTECT(vec);
            PROTECT(klass);
            SEXP t = R_NilValue;
            if(TYPEOF(vec) == CHARSXP)
            	error("cannot set attribute on a CHARSXP");
            /* this does no allocation */
            for (SEXP s = ATTRIB(vec); s != R_NilValue; s = CDR(s)) {
            	if (TAG(s) == R_ClassSymbol) {
            	    SETCAR(s, klass);
                    return R_NilValue;
                }
                t = s;
            }

            SEXP s = CONS(klass, R_NilValue);
            SET_TAG(s, R_ClassSymbol);
            if (ATTRIB(vec) == R_NilValue) SET_ATTRIB(vec, s); else SETCDR(t, s);
            SET_OBJECT(vec, 1);
            UNPROTECT(2);
            return R_NilValue;
        ')

   unsafe_set_class(xobj,as.character(attrib["class"]))

It seems lm must do something similar when setting the attributes of the object. Could this be a viable workaround?

Contributor

jefshe commented Jul 28, 2017

I think it is more to do with the deserialization side. unrexp sets the attributes of the deserialized object within R itself which prevents the object from being recreated.

I was able to get the above example to deserialize properly by bypassing r's checking of factor types and setting the class attribute directly.

   unsafe_set_class <- cfunction(c(vec = "ANY",klass="ANY"), '
            PROTECT(vec);
            PROTECT(klass);
            SEXP t = R_NilValue;
            if(TYPEOF(vec) == CHARSXP)
            	error("cannot set attribute on a CHARSXP");
            /* this does no allocation */
            for (SEXP s = ATTRIB(vec); s != R_NilValue; s = CDR(s)) {
            	if (TAG(s) == R_ClassSymbol) {
            	    SETCAR(s, klass);
                    return R_NilValue;
                }
                t = s;
            }

            SEXP s = CONS(klass, R_NilValue);
            SET_TAG(s, R_ClassSymbol);
            if (ATTRIB(vec) == R_NilValue) SET_ATTRIB(vec, s); else SETCDR(t, s);
            SET_OBJECT(vec, 1);
            UNPROTECT(2);
            return R_NilValue;
        ')

   unsafe_set_class(xobj,as.character(attrib["class"]))

It seems lm must do something similar when setting the attributes of the object. Could this be a viable workaround?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 28, 2017

Owner

Dunno.if.

I also thought about this a little during the day and factors are also of limited interest to Protocol Buffers when transferring between R and C++/Java/Python/... none of which have factor types.

Of course, as a workaround you could always serialize to raw using digest() and then pass the raw vector via ProtoBuf.

Owner

eddelbuettel commented Jul 28, 2017

Dunno.if.

I also thought about this a little during the day and factors are also of limited interest to Protocol Buffers when transferring between R and C++/Java/Python/... none of which have factor types.

Of course, as a workaround you could always serialize to raw using digest() and then pass the raw vector via ProtoBuf.

@jefshe

This comment has been minimized.

Show comment
Hide comment
@jefshe

jefshe Aug 1, 2017

Contributor

We still would like the deserializer to not crash in these circumstances as the object was created by native R functions.

Would you accept a pull request which specifically handles this case on the deserializer side which skips setting the class attribute if the values are invalid?

Contributor

jefshe commented Aug 1, 2017

We still would like the deserializer to not crash in these circumstances as the object was created by native R functions.

Would you accept a pull request which specifically handles this case on the deserializer side which skips setting the class attribute if the values are invalid?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 13, 2017

Owner

Thanks very much for your PR #29 which fixes this.

As it happens, CRAN just asked for another small fix I will make shortly so this will go out "shortly" in a fresh 0.4.10 release. Thanks again, this was very helpful.

Owner

eddelbuettel commented Aug 13, 2017

Thanks very much for your PR #29 which fixes this.

As it happens, CRAN just asked for another small fix I will make shortly so this will go out "shortly" in a fresh 0.4.10 release. Thanks again, this was very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment