Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-7775: Don't let safe code arbitrarily transmute readers and writers #6256

Closed
wants to merge 4 commits into from

Conversation

Marwes
Copy link

@Marwes Marwes commented Jan 22, 2020

This code were quite clearly unsafe (which seems to be known, judging
from the comment about it). As the functions appeared to callable
indirectly from other public functions I changed these to just panic
instead to keep the API.

@Marwes Marwes changed the title fix: Don't let safe code arbitrarily transmute readers and writers [Rust] fix: Don't let safe code arbitrarily transmute readers and writers Jan 22, 2020
@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@Marwes
Copy link
Author

Marwes commented Jan 22, 2020

The rust code could really use some auditing of the unsafe code, a quick glance tells me that there are both unnecessary unsafe and undefined behaviour in the crate.

unsafe impl<T> Send for RawPtrBox<T> {}
unsafe impl<T> Sync for RawPtrBox<T> {}

Packing in T that is not Send or Sync lets it be sent across threads (might be prevented by some type wrapping RawBox perhaps, still risky to have the impls as is).

mem::transmute(self.raw_values.get().offset(self.data.offset() as isize))

Undefined behaviour if T::Native has a greater alignment than u8 (1).

Undefined behaviour if the caller passes parameters that are out of bounds.

impl flatbuffers::EndianScalar for MetadataVersion {
#[inline]
fn to_little_endian(self) -> Self {
let n = i16::to_le(self as i16);
let p = &n as *const i16 as *const MetadataVersion;
unsafe { *p }
}
#[inline]
fn from_little_endian(self) -> Self {
let n = i16::from_le(self as i16);
let p = &n as *const i16 as *const MetadataVersion;
unsafe { *p }
}
}

This transmute to and from values which aren't specified in the enum definition. repr(i16) etc does not help since rust will still let you pattern match exhaustively on the enum so it must still only have valid (specifed) values.

There are most likely more, those are just the first things I saw after grepping for unsafe

$ rg unsafe | wc -l
176

@nevi-me
Copy link
Contributor

nevi-me commented Jan 24, 2020

Hi @sunchao @liurenjie1024 @sadikovi, pulling you in the loop as you're more familiar with the parquet codebase

Copy link
Contributor

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Could you state the motivation for the change? Was there an outage or did your library/project panic when using the API? Are you changing the code because you don't want to use unsafe?

I think the change needs a discussion, there are a few things that stood out:

  • having column reader/writer imports in data_type.rs - data types are used everywhere, I would prefer not having dependency on column reader/writer there if possible.
  • I would not want to panic, and instead update API to return an option so users can resolve None cases themselves.
  • I don't remember if we have unit tests for this method, would be good to add.

See the inline comments.

make_type!(
BoolType,
Type::BOOLEAN,
BoolColumnReader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. We generate column readers and writers when making types?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generates the unpacking functions needed to unpack the enum. I don't know of another, better way of doing this so 🤷‍♂

Copy link
Contributor

@sadikovi sadikovi Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is just an odd place to put them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marwes@eb1ba90 is the only alternative I know which I would argue is a lot more verbose for no gain.

ColumnReader::ByteArrayColumnReader(r) => unsafe { mem::transmute(r) },
ColumnReader::FixedLenByteArrayColumnReader(r) => unsafe { mem::transmute(r) },
}
T::get_column_reader(col_reader).expect("Column type mismatch")
Copy link
Contributor

@sadikovi sadikovi Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix does not change the outcome - we can still fail by passing the wrong type, except we would directly panic now. The original function implementation is actually "safe" despite using unsafe - there is a strict list of physical types that are allowed to be used.

Actually, I would probably return an Option<ColumnReaderImpl> instead. Same for writing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix does not change the outcome - we can still fail by passing the wrong type, except we would directly panic now.

Sure, but that is much better than undefined behaviour.

The original function implementation is actually "safe" despite using unsafe - there is a strict list of physical types that are allowed to be used.

That is not enough to make it safe. To transmute a ColumnReader<A> into a ColumnReader<B> the representations of A and B must be identical. A and B are both empty structs in this case

pub struct $name {}
which may be enough to make Rust lay them out in the same way but I wouldn't rely on it without any confirmation from the reference.

In any case it can be implemented with safe code which also has the added benefit of actually panicking if you do the wrong thing instead of continuing silently.

Actually, I would probably return an Option instead. Same for writing.

That would force callers to unwrap instead which seems redundant. Propagating the error seems unnecessarily complicated as well as it should just be a programming error (should not be visible to users of the crate).

Copy link
Contributor

@sadikovi sadikovi Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never transmute ColumnReader<A> into ColumnReader<B>. We transmute ColumnReader_A into ColumnReaderImpl<A>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but mistakes happen and now it will panic instead of causing undefined behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there is no "we never do X" in a public interface.
  • the code should immediately panic if someone uses it in the wrong way. This should not be an Option because it is a programing error.
  • failing as undefined bahavior is worse than failing by panic. (why are even having this discussion in rust...)
  • changing the return type is a backward incompatible change... WTF?

@@ -23,6 +23,8 @@ use std::mem;
use byteorder::{BigEndian, ByteOrder};

use crate::basic::Type;
use crate::column::reader::{ColumnReader, ColumnReaderImpl};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could result in a circular dependency, we should not be doing that, or we should at least factor out the interfaces and structs there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust handles circular, inter crate dependencies fine 🤷‍♂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still does not mean we should be doing it.

@Marwes
Copy link
Author

Marwes commented Jan 25, 2020

Thanks for the patch! Could you state the motivation for the change?

Transmuting is wildly unsafe, it should only be done if absolutely necessary (it is not, see this safe implementation).

Was there an outage or did your library/project panic when using the API?

Not, but with this it has a chance to crash instead of causing undefined behavior which is vastly preferred.

Are you changing the code because you don't want to use unsafe?

Yes.

having column reader/writer imports in data_type.rs - data types are used everywhere, I would prefer not having dependency on column reader/writer there if possible.

It could be changed, I guess, to have these methods on their own trait. But I believe that will just make that new trait need need to be propagate to all/most of the placed that DataType is used (so T: DataType + ExtracteReaderWriter instead of plain T: DataType) 🤷‍♂ .

I would not want to panic, and instead update API to return an option so users can resolve None cases themselves.

It is always possible to explicitly match on the enum if one wants that but sure it could be changed, I just wanted to avoid breaking the API.

I don't remember if we have unit tests for this method, would be good to add.

The methods are generated by a macro and there is really just one way to implement the method so any test would just be identical to the implementation which seems redundant. The methods are used throughout the rest of the crate anyway so they are implicitly used.

@sadikovi
Copy link
Contributor

sadikovi commented Jan 25, 2020

I am not convinced, to be honest. I understand the motivation to get rid of transmute and agree that we should remove unsafe and transmute, but I don't agree with the proposed approach, I think there is a better way to handle this case. For example, having a method on ColumnReader enum and check the type compatibility on (type, reader) pair - you would know exactly what user is trying to do so we can assert on it and fail early, similar to what you did in this patch.

I would offer a new "safe" API that would return a Result/Option and convert the old one to unwrap the result and report a meaningful error message. I would also try finding (if possible) the case when the current method with transmute fails and come up with the solution that fixes it.

Again, this is just my opinion and suggestions. I would like to hear what you and other reviewers think.

ColumnWriter::ByteArrayColumnWriter(r) => unsafe { mem::transmute(r) },
ColumnWriter::FixedLenByteArrayColumnWriter(r) => unsafe { mem::transmute(r) },
}
T::get_column_writer(col_writer).expect("Column type mismatch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you improve the error message, it is just too generic? Would be good to have something like Failed to convert column writer into a typed column writer for <data type> type or something similar. Thanks!

@Marwes
Copy link
Author

Marwes commented Jan 27, 2020

For example, having a method on ColumnReader enum and check the type compatibility on (type, reader) pair - you would know exactly what user is trying to do so we can assert on it and fail early, similar to what you did in this patch.

I might be missing something, but if I understand it that would still need unsafe. This approach avoids unsafe, without any overhead (the type check is practically identical).

From what I can gather, the DataType is practically sealed (only implemented by the crate, not by users) so having extra methods shouldn't cause issues for users anyway.

@jorgecarleitao
Copy link
Member

jorgecarleitao commented Feb 5, 2020

LGTM

The fact that @Marwes had to break module dependencies to make it safe hints that either there may be design problems in the code base, or there is some lack of clarity of what modules can depend on others or not. Regardless, tighter coupling is IMO less important than an unsafe public interface. So, yes, there is work to further reduce dependencies, but IMO it can be left to a future PR specifically for that.

I would also consider panics as better than undefined behavior, and, AFAI can tell, this patch does not break backward compatibility, which I am pleasantly surprised. This PR also reduces code by leveraging macros.

Finally, IMO Result is not the solution here since, as @Marwes points out, the execution only reaches that arm due to a programing error, which IMO is correctly handled through a panic. If at a later stage we agree that this should be an Result, then this is still fine: we can still open a new issue to make a backward-incompatible change. I just do not see why we should tie these things together on the same PR.


As a user, I decide if I use this lib or just use the c++/java/etc counterparts. The main trade-off between others and rust is the safety provided by the compiler in both memory management, undefined behavior.

Currently, public functions to read and write data, e.g. pub fn get_typed_column_writer, are documented as:

/// NOTE: the caller MUST guarantee that the actual enum value for `col_writer` matches
/// the type `T`. Otherwise, disastrous consequence could happen.

which IMO puts parquet-rs' interface on the same safety level as its c++ counterpart. Which is a way of saying:

hey, this library:

  • is using a less prevalent programming language
  • has less features than its counter-parts
  • has the same inherent risks to your code base as the other languages

so, as you see, there is no tradeoff here: just use c++/java/etc instead

Then, while going through this issue here, I read:

Could you state the motivation for the change?

Isn't it obvious? providing an unsafe public interface on a language whose main differentiator is being safe is a major reason to not rely on this library in the first place. Since this is the de-facto lib for handling parquet, this is a likely reason to not use rust or parquet in the first place.

Finally, please consider that how the feedback on this PR was delivered likely drove @Marwes away from further contributing to this project. Specifically,

  • their (good judgment call) to not make API-breaking changes was dismissed and even fought against
  • some comments were flat-out unhelpful, e.g. "Well, this is just an odd place to put them", "It still does not mean we should be doing it.".

regardless of their specific track-record in rust projects (which in this particular case is quite significant), we may want to re-evaluate how we approach PRs...

@sunchao
Copy link
Member

sunchao commented Feb 5, 2020

Sorry for chiming in late. Overall I like the PR and agree that we should reduce unsafe usage as much as possible. One minor comment is that (as @sadikovi above), whether we can find a way to separate the get_column_reader, get_column_writer APIs from the DataType trait (e.g., is it possible to define another trait which is not public and visible to outside this crate?).

Also @Marwes , could you create a JIRA for this? we can't merge the PR without a JIRA.

}
T::get_column_reader(col_reader).unwrap_or_else(|| {
panic!(
"Failed to convert column reader into a typed column reader for `{}` typ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace typ with type? same for others.

@Marwes
Copy link
Author

Marwes commented Feb 5, 2020

e.g., is it possible to define another trait which is not public and visible to outside this crate

Marwes@eb1ba90 There could be another trait, but it still has to be public since the functions that call these functions/methods are public.

From how DataType is used however, it looks like it should only be implemented in the crate itself, so it could be sealed and any methods on it could be hidden.

@Marwes Marwes changed the title [Rust] fix: Don't let safe code arbitrarily transmute readers and writers ARROW-7775: [Rust] fix: Don't let safe code arbitrarily transmute readers and writers Feb 5, 2020
@sunchao
Copy link
Member

sunchao commented Feb 5, 2020

Can we define a private module in the data_type.rs and put the public traits for reader/writer inside it? then we can have the DataType inherit these traits. I think this may hide the APIs from external crates. Let me know what you think.

@Marwes
Copy link
Author

Marwes commented Feb 6, 2020

That would work, but docs would still show

trait DataType : ExtractReaderWriter { }

If the methods should be hidden from users then #[doc(hidden)] on the methods would do trick more efficticely

@Marwes Marwes closed this Feb 6, 2020
@Marwes Marwes reopened this Feb 6, 2020
@sadikovi
Copy link
Contributor

sadikovi commented Feb 6, 2020

@jorgecarleitao I would argue the problem comes from the language limitation of not having proper inheritance rather than the project design limitation, otherwise, we could have used generics and cast. However, we do have a few things that could have been done differently and much better. Correct me if I am wrong, but the approach in PR still throws an error at runtime, there are no compile time checks.

My main point is providing a safe API vs not having unsafe, you can have safe API while still keeping unsafe (there are examples in std lib implementation as well). I thought this could be done simpler just by checking the physical type if they match -> transmute, otherwise throw an error or panic. If the author decided to go with remove unsafe approach then that is great too. Either way, it would be good to explain what approaches were considered and why we need to solve it this way in the PR description.

By the way, I still think it would be a good idea to add a test case or two.

@sunchao
Copy link
Member

sunchao commented Feb 7, 2020

I think @sadikovi 's approach should also work here. Something like:

impl ColumnReader {
    fn get_type(&self) -> Type {
        match self {
            ColumnReader::BoolColumnReader(_) => Type::BOOLEAN,
            ...
            _ => unreachable!("Impossible"),
        }
    }
}
...
pub fn get_typed_column_reader<T: DataType>(
    col_reader: ColumnReader,
) -> ColumnReaderImpl<T> {
    if col_reader.get_type() != T::get_physical_type() {
        panic!("type doesn't match!")
    }
   ...
}

while it doesn't eliminate the unsafe usage, the gun is in developers' hand now and only they can shoot their own foot. The API is safe in this regard I guess?

With that said, I'm fine with either approach and I do like to see this PR being merged. :)

Copy link
Contributor

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. LGTM.

@sunchao
Copy link
Member

sunchao commented Feb 9, 2020

@Marwes the build has failed. Could you rebase so I can merge this?

Markus Westerlind added 4 commits February 10, 2020 09:32
This code were quite clearly unsafe (which seems to be known, judging
from the comment about it). As the functions appeared to callable
indirectly from other public functions I changed these to just panic
instead.
@sunchao sunchao changed the title ARROW-7775: [Rust] fix: Don't let safe code arbitrarily transmute readers and writers ARROW-7775: Don't let safe code arbitrarily transmute readers and writers Feb 15, 2020
@sunchao sunchao closed this in f41b863 Feb 15, 2020
@sunchao
Copy link
Member

sunchao commented Feb 15, 2020

Merged. Thanks @Marwes .

BTW: I still think it is a good idea to hide these reader/writer getters from public though. Perhaps we can fix that in a follow up PR.

@Marwes Marwes deleted the unsafe_rust branch February 16, 2020 00:32
nealrichardson pushed a commit that referenced this pull request Apr 14, 2020
This removes or corrects many instances of unsafe code in the rust crates. This is by no means a complete fix and some of the fixes do not entirely fix the issues with the particular `unsafe` (see comments), but in all instances it should put the code in a better place than before.

Based on #6256

Closes #6395 from Marwes/unsafe_rust_2

Authored-by: Markus Westerlind <markus.westerlind@distilnetworks.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants