Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade GlacierScript to python3 #61

Merged
merged 23 commits into from
Jan 14, 2020
Merged

Conversation

bitcoinhodler
Copy link
Collaborator

@bitcoinhodler bitcoinhodler commented Feb 14, 2019

GlacierScript was unnecessarily using an old version of Python (2.7.12), while Python 3.5.2 is available as python3.

I successfully ran all the developer tests on my own live boot of Ubuntu 16.04.1 (same as the protocol specifies).

Let's bring GlacierScript into modern times with Python 3.

As usual, I recommend that reviewers go one commit at a time, as the commit messages explain both what & why for each change.

This reverts commit 971523d.

It's not clear why the original developers switched to this older
version, but it's incompatible with python3. I'm switching to this
modern version of base58.py and plan to update it further next.

It appears this base58.py comes from:
https://github.com/keis/base58/releases/tag/v0.2.3
Otherwise any change to the Python interpreter can change the order of
the outputs in the transaction, causing the developer tests to fail.

This changes some of the generated transactions, reversing the order
of the two outputs.
Otherwise create-withdrawal-data.fails.test would randomly change the
order of the arguments passed to `createrawtransaction` (which are
part of the error message in the *.golden file).
(I simply added parentheses around the arguments.)
I started by running `2to3 -w glacierscript.py` which took care of
many of the details. Then I had to fix by hand several places where
string vs Unicode were mixed.
pipes.quote() has been deprecated since python 2.7
PEP 8 is the Python style guide.

Note: base58 is in a separate paragraph since it's not a standard
library module like the others are.
@bitcoinhodler
Copy link
Collaborator Author

For some reason, probably having to do with all the rebases I did on this branch before creating this PR, GitHub is listing the commits out of order. The correct order is:

1a806ba Revert "Swapping base 58 implementation for Gavin Andresen’s"
62def07 Update base58.py to latest released version
1181e75 Update link to base58.py
ea3f95c Add copyright notice to base58.py
60e189b Use OrderedDict for addresses dict, for stability
7904667 Use OrderedDict for stability
ac421ec Use modern, python3-ish print statements
dff8ba6 Switch to python3
6a01c4f Update for python3
5b5d821 Use modern python3 module
bb0a435 Alphabetize module imports, as recommended by PEP 8
d335153 Remove unneeded module import

@Opiumreg18
Copy link

По какой-то причине, вероятно, имея дело со всеми переизданиями, которые я сделал на этой ветке, прежде чем создать этот PR, GitHub перечисляет коммиты из строя. Правильный порядок:

1a806ba вернуться "замена базы 58 реализации Гэвин Андерсен"
62def07 обновление base58.py в последней версии
1181e75 обновить ссылку base58.py
ea3f95c добавить уведомление об авторском праве на base58.py
60e189b использовать OrderedDict для адресов дикт, для стабильности
7904667 использовать OrderedDict для стабильности
ac421ec использовать современные питон3-ишь печати заявления
dff8ba6 переключиться на питон3
6a01c4f обновление для питон3
5b5d821 использовать современные питон3 модуль
bb0a435 алфавиту модуль импорта, как рекомендуют ПЭП 8
d335153 удалить ненужные модуль импорта

base58.py Outdated
alphabet = b'123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'


if bytes == str: # python2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the sys package instead to check the python version? https://stackoverflow.com/questions/9079036/how-do-i-detect-the-python-version-at-runtime

edit: if this is copied from elsewhere we should have a clear link to it, preferably in the file itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source of this code is explained in the commit message for 1a806ba, and 62def07, and in commit 1181e75 the link was added to the glacierscript.py source.


# Copyright (c) 2015 David Keijser

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now a combination of the old script and https://github.com/trezor/trezor-core/blob/master/src/trezor/crypto/base58.py ?

0 BTC going back to cold storage address 2NGPJX8kzdRpAQJuZWMpCBth1umjQFHeFcz
0.19996310 BTC going to destination address mxBQD1QAYpwiudaCJdRhE9QSW9cokafJ99

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this should change with a python version bump?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same issue from 60e189b, the ordering of the outputs changed for some of the withdrawals. It was effectively random before; now it is consistent and predictable.

"txid": "cc0871827f83c927d79ca8d50c52a72fcaf4223f1de7f92931bfd70f7d44e3b3",
"hash": "cc0871827f83c927d79ca8d50c52a72fcaf4223f1de7f92931bfd70f7d44e3b3",
"txid": "31906179a4f3d983f62cecd9e8fcd4bc3df4969e3d5ec85547b3fecbdb53c51c",
"hash": "31906179a4f3d983f62cecd9e8fcd4bc3df4969e3d5ec85547b3fecbdb53c51c",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did all of these fixtures change? Should they not be compatible with the script after the version bump?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained in the commit message for 60e189b. The order of the transaction outputs was dependent on the ordering of a python dict, which is not specified by the language, and effectively random. It may change when the python interpreter version changes. To alleviate this, I switched to using an OrderedDict, and now the ordering of the outputs is consistent.

Copy link

@gracenoah gracenoah Feb 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. How did we not have tests fail from the unordered python dict before? My understanding is that python 2.7 has the same behavior, please let me know if I'm missing something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than you ever wanted to know about Python dict ordering

In 2.7 it was stable but unspecified. Every run would be the same. The Python3 version we have (3.5.2) uses the random hash seed explained in the SO answer linked above. Every time we run the script, the ordering is randomized, which makes it impossible for us. We need a stable ordering for both the developer tests and so the two quarantined laptops will match each other. Hence OrderedDict.

In Python 3.7 the dict ordering is specified, and we wouldn't need OrderedDict anymore.

@bitcoinhodler
Copy link
Collaborator Author

Thank you for the review @gracenoah. While I believe all your questions were already answered in the individual commit messages, I've also addressed your comments above.

Copy link

@gracenoah gracenoah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the explanation

@jazarija
Copy link
Contributor

jazarija commented Aug 2, 2019

Following are some comments about the pull request

commit: 62def07

  • (l. 20-31) If our aim is to switch to python3 then why keep the if-then-else check for python2 ?

Generally speaking I am a bit skeptical of changing the base58 encoding file to this new one. Are we sure the sourced file is coming from a reliable place? Note also how the source file has a cluttered main function that accepts command line arguments etc... Looks like we need to discuss this more.

commit dff8ba6

  • l407 Why is the call to list() required here?

commit bb0a435

Do we have a convention for when to use "from X import Y" or "import X" and then referencing with either Y or X.Y? Seems like we are confusing two styles with no consistent basis

@bitcoinhodler
Copy link
Collaborator Author

bitcoinhodler commented Aug 4, 2019

You can make comments directly on the source code lines during a review, like @gracenoah did above. That would make it easier for me to reply to each individually.

Re 62def07, the old base58.py did not support python3, so I updated it to a later version from a "reliable" open-source repo. I didn't want to make any edits to that file, only copy it verbatim. It has some unneeded stuff, yes, but I figured it was better to use it as-is than to make edits which would require further scrutiny.

Re dff8ba6, the python 2to3 conversion program made this change, and it's important since the code modifies the dict inside the loop. If I remove it, python complains:

RuntimeError: OrderedDict mutated during iteration

Re bb0a435: I'm not aware of any style guidelines for imports like this (though I haven't gone looking, but pylint et al have no checks for this). Nor do I think any consistency is needed, since it's clear what's being imported in each case. (But from foo import * should be discouraged.) I tend to import the entire module if I'm going to use several things from that module, or import just one thing if that's all I need.

Thank you for the detailed review.

@jazarija
Copy link
Contributor

jazarija commented Aug 5, 2019

I see. Regarding the list remark a more direct way would perhaps be to change the for loop and if statement to

destinations = OrderedDict({key:val for key in destinations if destinations[key] != 0})

As far as the reliability of the base58 code, how was that determined? I am a bit skeptical to just include code from another repo unless we have an objective measure of its reliability.

Otherwise, I'd much rather see that we take the code, refactor it, include it as our internal library, remove whatever is not needed and actually properly review what was created.

@bitcoinhodler
Copy link
Collaborator Author

destinations = OrderedDict({key:val for key in destinations if destinations[key] != 0})

I tried the following:

 destinations = OrderedDict({key:val for key, val in destinations.items() if val != '0'})

The problem is, this creates a new dict, then passes that dict into OrderedDict. In python 3.5.2, dict ordering is random, which creates problems. (See discussion above.) That's why I switched to OrderedDict in the first place.

For base58.py, the repo I sourced this from is the same repo the original Glacier developers sourced their original base58.py file from. Later it was changed to a different, older version; unfortunately the git commit history does not record any rationale for this change. The repo is a widely used python base58 implementation. What kind of objective measure of reliability is there besides review?

But I take your point, so I pushed some changes to a new branch in my fork. Do you think I should include those changes here? The file is greatly simplified now.

@jazarija
Copy link
Contributor

jazarija commented Aug 6, 2019

Gotcha, missed the ordering part. In that case I think you can do

destinations = OrderedDict( (key, val) for key, val in destinations.items() if val != '0')

to achieve the required ordering.

If by including them here you mean in this PR, then I'd say yes. Personally I like this structuring much better this way. We can then take a more in depth look into the code and see if all fits.

@bitcoinhodler
Copy link
Collaborator Author

bitcoinhodler commented Sep 22, 2019

Can you paste the last line of t/create-withdrawal-data.fails.out? That seems to be the line that is mismatching and it is expected to look like so:

subprocess.CalledProcessError: Command '['bitcoin-cli', '-testnet', '-rpcport=18340', '-datadir=bitcoin-test-data', 'createrawtransaction', '[{"txid": "e0e9bb25fb873c4caccdc8ab743c4350310031f2cc077bb90c3f495458860157", "vout": 1}]', '{"2N93du8YobdgsHyu3qgBvSyhGUT52utMNeA": 0, "myP4xdJNwAW9iMakvCjnozg814ewgn8apx": 0}']' returned non-zero exit status 5

(Note: any rpcport should match the regexp in the last line of t/create-withdrawal-data.fails.golden.re.)

@bitcoinhodler
Copy link
Collaborator Author

Also I noticed that in case of a failure like this one, the reported line number was incorrect (off by one), so I just pushed a change to fix that.

@jlopp
Copy link
Member

jlopp commented Sep 23, 2019

Enter fee rate.
Satoshis per vbyte: Traceback (most recent call last):
  File "../../glacierscript.py", line 864, in <module>
    withdraw_interactive()
  File "../../glacierscript.py", line 747, in withdraw_interactive
    source_address, keys, addresses, redeem_script, txs)
  File "../../glacierscript.py", line 479, in get_fee_interactive
    source_address, destinations, redeem_script, input_txs)
  File "../../glacierscript.py", line 415, in create_unsigned_transaction
    json.dumps(destinations)).strip()
  File "../../glacierscript.py", line 129, in bitcoin_cli_checkoutput
    if retcode != 0: raise subprocess.CalledProcessError(retcode, cmd_list, output=output)
subprocess.CalledProcessError: Command '['bitcoin-cli', '-testnet', '-rpcport=18340', '-datadir=bitcoin-test-data', 'createrawtransaction', '[{"txid": "e0e9bb25fb873c4caccdc8ab743c4350310031f2cc077bb90c3f495458860157", "vout": 1}]', '{"2N93du8YobdgsHyu3qgBvSyhGUT52utMNeA": 0, "myP4xdJNwAW9iMakvCjnozg814ewgn8apx": 0}']' returned non-zero exit status 5.

Somewhere between python 3.5.3 and python 3.7.4, they added a period
to the end of this error message.

Using a regexp like I did here will allow this test to pass on either
version.
@bitcoinhodler
Copy link
Collaborator Author

Looks like the difference is in the very last byte, a period at the end of the error message from python. The quarantined laptops with Ubuntu 16.04.1 have python 3.5.2 so I've been testing with 3.5.3 which I already had installed locally. I tried 3.7.4 and I saw the same failure.

I pushed a change to optionally allow the period at the end of the line and now the tests pass for me with both 3.5.3 and 3.7.4.

@bitcoinhodler
Copy link
Collaborator Author

Also, I have a change in another branch that improves this cryptic error message, but it's probably not appropriate to include in this PR which is only about the python version upgrade.

Copy link
Member

@jlopp jlopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK

@jlopp
Copy link
Member

jlopp commented Oct 2, 2019

@bitcoinhodler please resolve the conflict by removing the import of random library

I'll also note that I tried running tests with that conflict resolved and got a failure:

cd testrun/create-deposit-data.test && ../../t/create-deposit-data.run 18333  2>&1 > ../../t/create-deposit-data.out
Traceback (most recent call last):
  File "../../glacierscript.py", line 858, in <module>
    deposit_interactive(args.m, args.n, args.dice, args.rng)
  File "../../glacierscript.py", line 649, in deposit_interactive
    addresses = [get_address_for_wif_privkey(key) for key in keys]
  File "../../glacierscript.py", line 649, in <listcomp>
    addresses = [get_address_for_wif_privkey(key) for key in keys]
  File "../../glacierscript.py", line 335, in get_address_for_wif_privkey
    label = hash_sha256(privkey)
  File "../../glacierscript.py", line 54, in hash_sha256
    m.update(s)
TypeError: Unicode-objects must be encoded before hashing
make: *** [Makefile:75: t/create-deposit-data.test] Error 1

Copy link
Member

@jlopp jlopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs conflicts fixed and retested

@bitcoinhodler
Copy link
Collaborator Author

What's the policy on rebasing? I could rebase my entire change onto master, and keep a clean history, which seems to be generally preferred among Git intelligentsia. But that will make review more difficult since (to be safe) you'll have to go over my entire change again.

Alternatively I could merge and patch up the two minor issues as part of the merge commit.

@bitcoinhodler
Copy link
Collaborator Author

I went with the latter policy and merged in the latest master into this branch.

Alternatively, I created a python3-rebased branch that I could force-push over this one if we wanted to be cleaner about the history.

I realized we were always calling encode('ascii') on every parameter
passed into these functions. So why not do that inside the function?
@jlopp jlopp self-requested a review October 5, 2019 19:50
@privacybuilder
Copy link

What is needed to move this PR along?

@jlopp
Copy link
Member

jlopp commented Dec 18, 2019

One more review would be nice if @jacoblyles @jazarija @gracenoah are available.

Copy link
Contributor

@jazarija jazarija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use encode here, given that hash_sha256 already does that?

@bitcoinhodler
Copy link
Collaborator Author

Do we need to use encode here, given that hash_sha256 already does that?

I'm not sure how you left these comments on github, but it's not showing me what lines/changes you're referring to. My latest change on this branch (b75c400) took out a couple of unnecessary encodes. Does that address your concern?

@bitcoinhodler
Copy link
Collaborator Author

One more review would be nice if @jacoblyles @jazarija @gracenoah are available.

And while you're all here, how about a review on #76 and #73 that build atop this one.

I opened those before this one was merged so that people could review all of them at once, in an attempt to accelerate the pace of reviews.

I've got 6 more PRs ready for review after that, and then we get to PSBT, which I have working already.

@privacybuilder
Copy link

@bitcoinhodler - if you are looking for testers for the new combined PRs? Do you have them all merged in a single branch? Is there an update to the PDF that goes with those? How can I help? :D

@bitcoinhodler
Copy link
Collaborator Author

@bitcoinhodler - if you are looking for testers for the new combined PRs? Do you have them all merged in a single branch? Is there an update to the PDF that goes with those? How can I help? :D

I'm looking for reviewers, mainly. Read through my commits and make sure I haven't broken anything, or introduced any new weaknesses or backdoors (intentional or not).

Each of my three open PRs builds upon the previous (though they are out of order). Review this one first, then #76, then #73. The branch for #73 includes all the commits from all three PRs.

For these three PRs, and the next six I have ready to submit, no PDF updates are needed. It's not until we get to PSBT that the process changes at all, and that will no doubt be documented separately (as an optional appendix, perhaps) until it's gone through a public review process.

@jacoblyles
Copy link
Member

jacoblyles commented Jan 14, 2020

I ran the tests and reviewed the changed and comments. Thanks @bitcoinhodler for doing this and responding to all the inquiries on the thread. LGTM

@jlopp jlopp merged commit b75c400 into GlacierProtocol:master Jan 14, 2020
@bitcoinhodler bitcoinhodler deleted the python3 branch January 15, 2020 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants