Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(err): remove unwrap from util.rs #260

Merged
merged 1 commit into from
Dec 12, 2022
Merged

Conversation

dan-da
Copy link

@dan-da dan-da commented Dec 6, 2022

addresses #247. (more to come)

This PR focuses on kindelia_core/src/util.rs.

util.rs

  • u8s_to_u128s() now returns a Result instead of unwrap()
  • get_time() and get_time_micro now returns UNIX_EPOCH (0u128) if system time is somehow before UNIX_EPOCH. rather than panic.

runtime/mod.rs

  • now calls unwrap on result of u8s_to_u128s(). This is temporary until error handling is refactored in runtime.

note: regarding the change to get_time() and get_time_micro(). This change means that if a node's SystemTime is before Jan 1, 1970 00:00:00, then it will be interpreted as 1970-01-01 by a caller using the returned u128. I don't believe this should be a serious (security) issue because the p2p protocol should/must be verifying that blocks and messages between nodes are recent/current (or operating without time at all, eg using causal ordering, lamport timestamps, etc).

It would certainly be preferable to return a Result for these fn, but will require a lot of changes to node.rs and related code. Another option is to keep the panic() behavior in get_time() until the necessary node refactor can occur.

@dan-da
Copy link
Author

dan-da commented Dec 6, 2022

Another approach would be to use the chrono crate, eg:

chrono::offset::Utc::now().timestamp_micros()

There is no need to unwrap here. However chrono timestamps use i64. This makes sense because time 0 is at the unix epoch and anything before is a negative number.

So now we get to the real heart of the matter: kindelia is representing timestamps as u128. So times before 1970 cannot be simply represented -- unless we do some tricky bitshifts or something.

Options would appear to be:

  • use chronos and move to a signed integer (i64 or i128)
  • round any negative values to 0 (as per this draft PR)
  • propagate errors up the call stack and log as necessary. (a bit messy)
  • keep existing unwrap() panics

My preference would be to move to a signed integer as it seems cleanest for representing time losslessly. I believe this would be a change to the block structure and p2p protocol, so it's not something I will do without consensus.

note: i128 and u128 seem like overkill, given this comment for timestamp_micros() which returns an i64.

Note that this does reduce the number of years that can be represented from ~584 Billion to ~584 Thousand. (If this is a problem, please file an issue to let me know what domain needs microsecond precision over millennia, I’m curious.)

So we could save block/network space with i64.

@dan-da dan-da marked this pull request as ready for review December 7, 2022 17:08
@racs4 racs4 self-requested a review December 12, 2022 16:54
@racs4
Copy link
Contributor

racs4 commented Dec 12, 2022

It would certainly be preferable to return a Result for these fn (get_time and get_time_micro)

In this specific case I think differently than you. I think a panic would be best for the get_time functions. These functions are used to return the current time, a date before 1970 should never be considered, and returning a Result for that would just take the trouble of always having to check elsewhere, to return this same error.

The Rust book says a panic should be returned when:

The bad state is something that is unexpected, as opposed to something that will likely happen occasionally.

Your code after this point needs to rely on not being in this bad state, rather than checking for the problem at every step.

I think this case intersects these two topics.

@racs4
Copy link
Contributor

racs4 commented Dec 12, 2022

i128 and u128 seem like overkill, given this comment for timestamp_micros() which returns an i64.

Yes, this is an internal discussion. I don't think signed numbers should be used, for the reason explained in the message above, that we don't expect to have dates before 1970 (and not even dates before the first day of the net launch), but I agree that 128 seems like overkill.

I think you could open a discussion in an issue on this topic, with these two time problems, regarding typing (size and sign) and what to do with unexpected dates (which I suggested to keep the behavior of returning panic in this PR).

@dan-da
Copy link
Author

dan-da commented Dec 12, 2022

Our difference in philosophy perhaps come from my experience working on a team with a zero unwrap policy. It is stricter yes, but when everyone adheres to it for all situations, imho the overall code quality becomes higher, because all errors either get handled appropriately or propagated up to a place where they can be logged in a standard way and/or cause the program to exit in a controlled way.

Calling unwrap() as the current code does causes the program to exit suddenly with no chance to log anything and has potential to leave data in an inconsistent state. Perhaps a user temporarily sets their system clock to 1949 for a minute and a week later is surprised to find their k-chan instance hasn't been running and with no clear indicator why. (btw, this also brings up the matter of proper logging/tracing, which I hope we can adopt instead of eprintln).

Here are some articles that discuss reasons not to panic. (one two three)

edit: it can be argued that a strict "no unwrap" policy correctly forces us here to consider that our represention of time as u128 is "wrong" as it cannot represent all possible states on a user's machne. So a correct fix then is to change the data representation.

That said, these are debatable philosophical differences and I am certainly willing to defer to the team's wishes here.

If I understand correctly, you prefer a panic to the change in this PR of rounding times before epoch to 0? (at least until such time as i64/u64/u128 is resolved)

Perhaps a simple way forward would be to keep the two existing get_time() functions but mark them deprecated, and add try_get_time() that returns Result. New code can use those and existing callers can be retrofitted if/when possible.

@racs4
Copy link
Contributor

racs4 commented Dec 12, 2022

I understand what you're talking about. This is my first year working with Rust, so I don't have that experience from past work with this subject like you do. As you say this would be "debatable philosophical differences", which would be better discussed in an issue about it.

Perhaps a simple way forward would be to keep the two existing get_time() functions but mark them deprecated, and add try_get_time() that returns Result. New code can use those and existing callers can be retrofitted if/when possible.

I believe this is the best thing for now so that I can approve and merge this PR.

@dan-da
Copy link
Author

dan-da commented Dec 12, 2022

No worries. Within the wider rust community, I believe it is unusual to find crates that never unwrap/panic, so I definitely hold a minority view. That said, I consider kindelia to be similar to writing aircraft control software... it needs to be a step above the rest in terms of reliability because failures can have very serious impacts once people have their finances/businesses riding on it. So that's where I'm coming from.

anyway, I will make the suggested change.

util.rs
* u8s_to_u128s() now returns a Result instead of unwrap()
* deprecates (by comment)  get_time() and get_time_micro() which
  can panic.
* change unwrap to expect in get_time() and get_time_micro()
* change visibility to pub(crate) for get_time and get_time_micro
* add try_get_time() and try_get_time_micro() that return Result
* add EpochError to describe error when system time precedes epoch

runtime/mod.rs
* now calls unwrap on result of u8s_to_u128s().  This is temporary
until error handling is refactored in runtime.
@dan-da dan-da marked this pull request as draft December 12, 2022 21:43
@dan-da dan-da marked this pull request as ready for review December 12, 2022 21:43
@racs4 racs4 merged commit 775d153 into Kindelia:dev Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants