Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

DevDocs corrections part 1 #589

Merged
merged 9 commits into from Oct 3, 2014

Conversation

Projects
None yet
3 participants
Contributor

luke-jr commented Oct 1, 2014

Started reviewing the developer docs, finding lots of things to correct. I'll do more later, but please review this batch first so I know what, if anything, needs to be fixed moving forward - I'm not entirely sure on the markdownish format used.

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
stores the hash of the previous block's header, chaining the blocks
together. This ensures a transaction cannot be modified without
modifying the block that records it and all following blocks.
Transactions are also chained together. Bitcoin wallet software gives
-the impression that satoshis are sent from and to addresses, but
-bitcoins really move from transaction to transaction. Each standard
-transaction spends the satoshis previously spent in one or more earlier
+the impression that bitcoins are sent from and to wallets, but
+bitcoins really move from transaction to transaction. Each
+transaction spends the bitcoins previously received in one or more earlier
@harding

harding Oct 1, 2014

Contributor

We settled on using "satoshi" as the term of generic value. See the style guide and the issue where we discussed this

@luke-jr

luke-jr Oct 1, 2014

Contributor

That sounds fair for cases where units are relevant, but that isn't the case here... In this context, it is just unnecessarily confusing to specify any unit.

@harding

harding Oct 1, 2014

Contributor

Sure, but a bitcoin is also a specific unit of value. In the discussion linked to above we tried to find a term for generic value which had no unwanted connotations, but we failed. In the end, we decided that we'd either have to use "bitcoin" or "satoshi", and so we (basically) flipped a coin and "satoshi" won.

I'd prefer to stick with the previous decision rather than update all the text and illustrations.

@luke-jr

luke-jr Oct 1, 2014

Contributor

Ok.

@saivann saivann commented on the diff Oct 1, 2014

_includes/guide_block_chain.md
@@ -20,56 +20,56 @@ A [block][]{:#term-block}{:.term} of one or more new transactions
is collected into the transaction data part of a block.
Copies of each transaction are hashed, and the hashes are then paired,
hashed, paired again, and hashed again until a single hash remains, the
-[Merkle root][]{:#term-merkle-root}{:.term} of a Merkle tree.
+[merkle root][]{:#term-merkle-root}{:.term} of a merkle tree.
@saivann

saivann Oct 1, 2014

Contributor

Actually AFAIK, Merkle tree should be capitalized (the concept is named after Ralph Merkle).

@luke-jr

luke-jr Oct 1, 2014

Contributor

Common nouns are lowercase in English, unless the first word of a sentence. Many things are named after people, but those common nouns don't get exceptions from this rule.

@saivann

saivann Oct 1, 2014

Contributor

That's good with me then, although I didn't check if all occurences were replaced.

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
-Outputs are not the same as Bitcoin addresses. You can use the same
-address in multiple transactions, but you can only use each output once.
+Outputs are not the same as Bitcoin addresses.
+Once bitcoins are sent, they remain associated with the output until later spent, while the association with a given address ends the moment it is received.
@harding

harding Oct 1, 2014

Contributor

These two sentences were written before we had a Transactions section, and I think we explain everything better there, so I suggest we simply delete lines 45 and 46.

@saivann

saivann Oct 1, 2014

Contributor

I tend to agree, this one sounds a bit confusing to me.

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
transaction's inputs and outputs is given as a [transaction fee][]{:#term-transaction-fee}{:.term} to
the Bitcoin [miner][]{:#term-miner}{:.term} who creates the block containing that transaction.
-For example, in the illustration above, each transaction spends 10,000 satoshis
-fewer than it receives from its combined inputs, effectively paying a 10,000
-satoshi transaction fee.
+For example, in the illustration above, each transaction spends a value of 10,000 (0.0001 BTC)
@harding

harding Oct 1, 2014

Contributor

What is the reason for adding "a value of" here? The original wording seems more direct and both wordings seem equally unambiguous. (Note: I'd like to replace that big ugly illustration at some point, so there will be opportunity for rephrasing later.)

@luke-jr

luke-jr Oct 1, 2014

Contributor

Avoiding "satoshis". Considering the style guide you pointed to, it does make sense to go back to "satoshis" in this one context.

@harding harding and 2 others commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
{% endautocrossref %}
-### Proof Of Work
+### Proof of Work
@harding

harding Oct 1, 2014

Contributor

Although it's not currently mentioned in the style guide (I'll update it), we use Microsoft Technical Style for headings and subheads, in which the first letter of each word is capitalized unless it's a proper noun that starts lowercased (e.g. iPhone). The same style is also used for illustration captions and abbreviation expansions.

@luke-jr

luke-jr Oct 1, 2014

Contributor

Why not use standard English? :|

@harding

harding Oct 1, 2014

Contributor

Both forms are standard English, just as you can use or not use a comma before the final conjunction in list (one, two, and three). Here's a page which describes different capitialization styles (we use number 4): http://www.quickanddirtytips.com/education/grammar/capitalizing-titles?page=all

All the text and images use the style described in the Microsoft Style Guide ( https://en.wikipedia.org/wiki/Microsoft_Manual_of_Style_for_Technical_Publications ) for the simple reason that it's the style I learned early in my career. I don't really care what style we use, but I'd prefer not to update everything for so minor an issue.

@saivann

saivann Oct 1, 2014

Contributor

I think this issue is not a big deal, let's keep what is already used everywhere.

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
that a given hash attempt will generate a number below the [target][]{:#term-target}{:.term}
-threshold. Bitcoin itself does not track probabilities but instead
-simply assumes that the lower it makes the target threshold, the more
-hash attempts, on average, will need to be tried.
+threshold.
+Bitcoin assumes a linear probability that the lower it makes the target threshold, the more hash attempts, on average, will need to be tried.
@harding

harding Oct 1, 2014

Contributor

Suggestions:

  • s/a linear/in a linear/ (seems like the better usage)
  • s/, on average,/ (on average)/ (I try to avoid comma-based parenthetical when other comma-based sentence divisions are being used)
@luke-jr

luke-jr Oct 2, 2014

Contributor

"in a linear" makes no sense grammatically there...

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
New blocks will only be added to the block chain if their hash is at
-least as challenging as a [difficulty][]{:#term-difficulty}{:.term} value expected by the peer-to-peer
-network. Every 2,016 blocks, the network uses timestamps stored in each
+least as challenging as a [difficulty][]{:#term-difficulty}{:.term} value expected by the current best block chain.
@harding

harding Oct 1, 2014

Contributor

This is a minor issue, but I feel the block chain is an inert thing which is operated on by peers, so it would be incorrect to say the block chain expects something but correct to say that peers or the p2p network expect that thing.

@luke-jr

luke-jr Oct 1, 2014

Contributor

The full nodes only enforce the preset rules. There is nothing inherently connected to the peer-to-peer network here.

@harding

harding Oct 1, 2014

Contributor

Then how about saying, "difficulty value expected by full nodes on the network"?

@luke-jr

luke-jr Oct 1, 2014

Contributor

I don't think we need to turn every case of "consensus rules" into "expected by full nodes on the network"... This is so fundamental to the system.

@harding

harding Oct 2, 2014

Contributor

Then how about, "difficulty value expected by the network consensus". As stated in my original comment, my (minor) issue is with treating the block chain as an animate object that has expectations. To me, the block chain is just a collection of data, like the log files on my computer.

@luke-jr

luke-jr Oct 2, 2014

Contributor

Part of that data is the required target for new blocks. But "difficulty value expected by the consensus protocol" would work too.

@harding

harding Oct 2, 2014

Contributor

That sounds good to me. Thanks.

@harding harding commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
block header to calculate the number of seconds elapsed between generation
of the first and last of those last 2,016 blocks. The ideal value is
-1,209,600 seconds (two weeks).
+1,209,600 seconds (two weeks), and the difficulty will be adjusted to compensate for any difference in reality.
@harding

harding Oct 1, 2014

Contributor

I suggest undoing this change as the immediate following bullets explain in more detail what happens.

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
@@ -125,7 +124,7 @@ propagate a modified block as the entire Bitcoin network expended
between the time the original block was created and the present time.
Only if you acquired a majority of the network's hashing power
could you reliably execute such a [51 percent attack][]{:#term-51-attack}{:.term} against
-transaction history.
+transaction history (although it should be noted that even less than 50% of the hashing power still has a good chance of performing such attacks).
@harding

harding Oct 1, 2014

Contributor

Minor style suggestion: change "transaction history (although [...] attacks)." to "transaction history. (Although [...] attacks.)"

@luke-jr

luke-jr Oct 1, 2014

Contributor

That suggestion is improper English...

@harding

harding Oct 1, 2014

Contributor

It seems like an independent clause to me. Although, it should be noted, that it would be better if "it should be noted" was placed in parenthetical commas.

@harding harding and 2 others commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
Eventually a miner produces another block which attaches to only one of
the competing simultaneously-mined blocks. This makes that side of
-the fork longer than the other side. Assuming a fork only contains valid
-blocks, normal peers always follow the longest fork (the most difficult chain
-to recreate) and throw away ([orphan][]{:#term-orphan}{:.term}) blocks belonging to shorter forks.
+the fork stronger than the other side.
+Assuming a fork only contains valid
+blocks, normal peers always follow the the most difficult chain
+to recreate and throw away ([stale blocks][]{:#term-stale-block}{:.term}) belonging to shorter forks.
@harding

harding Oct 1, 2014

Contributor

A few comments:

  • I don't mind using the term "stale blocks", but we need to at least introduce the commonly-used term "orphan" (verb) for people that search for that term. I suggest: "throw away (orphan) stale blocks"
  • The parenthesis around "(stale blocks)" should be dropped so the sentence reads, "throw away stale blocks"
@luke-jr

luke-jr Oct 1, 2014

Contributor

"Orphan blocks" are another thing entirely (blocks with no known parent block).

@harding

harding Oct 1, 2014

Contributor

Perhaps, but many other people use "orphan" and "orphan block" to refer to blocks no longer on the best chain. It would be nice if people expecting those terms could search the page and be taken to the location where we explain what term we use.

@saivann

saivann Oct 1, 2014

Contributor

Pieter gave a good explanation about the overlapping vocabulary here, I agree this case is a bit messy;
http://bitcoin.stackexchange.com/a/5869

Maybe this is another signal that a technical glossary would indeed be useful, as recently suggested on the mailing list: https://groups.google.com/forum/#!topic/bitcoin-documentation/s_mL1syxYoQ

@harding harding and 2 others commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
@@ -180,34 +181,34 @@ are usually referenced by the SHA256(SHA256()) hash of their header.
{% autocrossref %}
Every block must include one or more transactions. The first one of these
-transactions must be a coinbase transaction which should collect and
-spend the block reward and any transaction fees paid by transactions included in this block.
+transactions must be a generation transaction which should collect and
@harding

harding Oct 1, 2014

Contributor

We originally used the term "generation transaction" but @mikehearn suggested we use "coinbase transaction". I suggest we make sure both terms are introduced to help people searching based on a particular term but that we continue to use "coinbase" unless a discussion is held to change it.

@luke-jr

luke-jr Oct 1, 2014

Contributor

The "coinbase" refers to the scriptSig in the generation transaction. I am probably at fault for using "coinbasetxn" in GBT, but I think it's best to use clearer terminology, even if we have a ("coinbase transaction") in quotes once to avoid confusion.

@harding

harding Oct 2, 2014

Contributor

I think I prefer "generation transaction" and none of our images use the term "coinbase transaction" (I hate updating images because they create binary bloat in the repository), so I have no problems changing the term in the docs---except that two expert developers have each recommended against the term used by the other developer.

I suggest we revert the generation transaction changes in this pull and open a new pull that globally s/coinbase transaction/generation transaction/ (where appropriate with respect to code). Then we can get this pull merged and try to solicit additional opinions from experts about terms on the new pull.

@saivann

saivann Oct 2, 2014

Contributor

"coinbase transaction" is 3.5 times more used according to Google than "generation transaction" and merging the current pull request would make us inconsistently use both terms, so indeed I think we should revert this and open a seperate pull req so this doesn't block all other useful changes (if @luke-jr doesn't have time, I or @harding could revert this before merging).

@harding harding and 1 other commented on an outdated diff Oct 1, 2014

_includes/guide_block_chain.md
txid with one other txid and then hashing them together. If there are
an odd number of txids, the txid without a partner is hashed with a
copy of itself.
The resulting hashes themselves are each paired with one other hash and
hashed together. Any hash without a partner is hashed with itself. The
-process repeats until only one hash remains, the Merkle root.
+process repeats until only one hash remains, the merkle root.
+Since it is impractical for multiple transactions to have the same txid, a merkle link comprised of two identical hashes is always a single-child link.
@harding

harding Oct 1, 2014

Contributor

I don't understand what this sentence is saying, although I'm guessing it has something to do with BIP30. Could you please clarify?

@luke-jr

luke-jr Oct 1, 2014

Contributor

Actually, CVE-2012-2459. Without this clarification, it is possible to have the same merkle root for multiple different merkle trees - one where the last element exists once, and one where there are two of the same element.

@harding

harding Oct 2, 2014

Contributor

Thanks for the link! Do you think we could more simply say? "Duplicate txids are not permitted."

Or perhaps, "Duplicated txids are not permitted anywhere in the block chain, with the exception of two cases described by BIP30."

This seems to address the problems described by both BIP30 and the linked CVE.

@luke-jr

luke-jr Oct 2, 2014

Contributor

BIP 30 is technically obsolete at this point. Actually checking for duplicate txids would have too much overhead. It's better to simply state the fact that it isn't practical to have them.

@harding

harding Oct 2, 2014

Contributor

My problem is that I'm having a hard time parsing the sentence, so I'm looking for a way to rephrase. Maybe: "Note: two identical txids cannot be paired together because this would produce the same hash as a single txid being paired with a copy of itself. Since it is impractical to have separate transactions with identical txids, this does not impose a burden on honest programs but can prevent certain attacks."

@luke-jr

luke-jr Oct 2, 2014

Contributor

Does the new one address your concerns?

@harding harding commented on the diff Oct 1, 2014

_includes/guide_intro.md
@@ -5,8 +5,8 @@ Bitcoin and start building Bitcoin-based applications. To make the best use of
this documentation, you may want to install the current version of Bitcoin
Core, either from [source][core git] or from a [pre-compiled executable][core executable].
-Questions about Bitcoin development are best sent to the Bitcoin [Forum][forum
-tech support] and [IRC channels][]. Errors or suggestions related to
+Questions about Bitcoin development are best asked in the Bitcoin [IRC channels][].
@harding

harding Oct 1, 2014

Contributor

I like what you did in example_intro.md by changing "Bitcoin Forum" to "BitcoinTalk Forum". Perhaps we can do something similar here instead of dropping the reference to the forum altogether. (Although I agree that the forum has its warts, it still seems useful enough to deserve a highly-placed mention.)

@luke-jr

luke-jr Oct 1, 2014

Contributor

BitcoinTalk is pretty much entirely abandoned for development purposes.

@harding

harding Oct 1, 2014

Contributor

How about we link instead to the Bitcoin.org Community page? https://bitcoin.org/en/community

For example, the sentence could say: "If you have questions, you can contact other developers through some of the resources listed on the Community page."

@saivann

saivann Oct 1, 2014

Contributor

Mmh, to be honest, I think most links on the Community page aren't really suitable for developers. IMO renaming "Bitcoin Forum" to "BitcoinTalk" would be fine, BitcoinTalk isn't used much for development, but that still looks like an useful place to ask technical questions.

@luke-jr

luke-jr Oct 1, 2014

Contributor

I think you're likely to get the wrong answers if you ask technical questions on a place where technical people don't read...

@saivann

saivann Oct 2, 2014

Contributor

@luke-jr To your experience, the people commenting on the "Development and Technical" board, although not involved with Bitcoin Core, don't have a good technical understanding of Bitcoin? In any case, I have no strong opinion on this.

@luke-jr

luke-jr Oct 2, 2014

Contributor

I don't regularly read the board anymore, so I honestly can't state for certain. The few times I've seen posts there in the last few months, that did appear to be the case, however.

@harding

harding Oct 2, 2014

Contributor

It seems to me that the appropriate thing to do is to allow this change and then, at some point in the future, create a pull adding a section to the docs (or some other page on the site) describing additional places devs can find help, such as the StackExchange.

@saivann

saivann Oct 2, 2014

Contributor

@harding Agreed!

Contributor

harding commented Oct 1, 2014

@luke-jr I just finished a first-pass review, and in general I like the changes. Thanks!

I think the autocrossref links need to be updated and there are a few minor grammar changes I'd like to make, so I'll try to send a pull request to your branch later tonight. Thanks again!

Contributor

saivann commented Oct 1, 2014

The public key in commit d5babd is correct.

@harding harding commented on the diff Oct 2, 2014

_includes/guide_block_chain.md
-In the example given above, you will almost certainly produce a
-successful hash on your first try. You can even estimate the probability
+In the example given above, you will produce a successful hash on average every other try.
@harding

harding Oct 2, 2014

Contributor

This is incorrect. If the max is 2255 and the target is anything less than 2255, then you should get less than 2**255 practically all the time. I suggest restoring the original phrasing of lines 94 and 95.

@luke-jr

luke-jr Oct 2, 2014

Contributor

The max is 2^256-1, so 2^255 is approximately half that.

@harding

harding Oct 2, 2014

Contributor

I thought so, but you changed line 90 to say, "the maximum possible hash value is 2**255". Maybe that's the line which should be reverted.

@luke-jr

luke-jr Oct 2, 2014

Contributor

Ah yes, I misread that as "maximum accepted hash value". Will fix.

Contributor

luke-jr commented Oct 2, 2014

Conclusions thus far are applied to the branch.

@harding harding and 1 other commented on an outdated diff Oct 2, 2014

_includes/guide_block_chain.md
txid with one other txid and then hashing them together. If there are
an odd number of txids, the txid without a partner is hashed with a
copy of itself.
The resulting hashes themselves are each paired with one other hash and
hashed together. Any hash without a partner is hashed with itself. The
-process repeats until only one hash remains, the Merkle root.
+process repeats until only one hash remains, the merkle root.
+
+Since it is impractical for multiple transactions to have the same txid, a merkle link comprised of two identical hashes is always a single-child link.
+When a block has failed validation, however, one ought to check if it is because of a duplicate transaction that may trigger a merkle root collision (and therefore block hash collision) before caching that failure.
+Failure to do so may result in security bugs such as CVE-2012-2459.
@harding

harding Oct 2, 2014

Contributor

I like the additional detail, but I have some concerns:

  • The phrasing of the first sentence hasn't changed, so I still have a hard time parsing it.
  • A paragraph or other extended description here is suboptimal as the next paragraph starts with "For example". The example is of merkle tree creation, not checking for block hash collision.

My suggestions:

  • Move the new paragraph down to line 225 (in the original numbering) after the {% autocrossref %}.
  • Rephrase. Here's a modified version of the text I suggested earlier: "Note: two identical txids cannot be paired together because this would produce the same hash as a single txid being paired with a copy of itself. Since it is impractical to have separate transactions with identical txids, this does not impose a burden on honest programs but can prevent bugs such as CVE-2012-2459 from damaging the network."
@luke-jr

luke-jr Oct 2, 2014

Contributor

A B C D E F E F would also create a problem.

Contributor

luke-jr commented Oct 2, 2014

How's this?

Contributor

saivann commented Oct 2, 2014

@luke-jr I just sent you a pull req to fix a broken reference link. I'll take a look at everything else tomorrow, thanks again!

Contributor

luke-jr commented Oct 2, 2014

@saivann Merged

@harding harding commented on an outdated diff Oct 2, 2014

_includes/guide_block_chain.md
Eventually a miner produces another block which attaches to only one of
the competing simultaneously-mined blocks. This makes that side of
-the fork longer than the other side. Assuming a fork only contains valid
-blocks, normal peers always follow the longest fork (the most difficult chain
-to recreate) and throw away ([orphan][]{:#term-orphan}{:.term}) blocks belonging to shorter forks.
+the fork stronger than the other side.
+Assuming a fork only contains valid
+blocks, normal peers always follow the the most difficult chain
+to recreate and throw away ([stale blocks][stale block]{:#term-stale-block}{:.term}) belonging to shorter forks.
@harding

harding Oct 2, 2014

Contributor

Parenthesis need to be removed from this line. For searchers, I think it would also be nice if this sentence was followed by something like: "(Stale blocks are also sometimes called orphans, but that term is also used for blocks without a known parent block.)"

@harding harding commented on the diff Oct 2, 2014

_includes/guide_block_chain.md
other transactions. If the five transactions in this block were all at
the maximum size, downloading the entire block would require over
500,000 bytes---but downloading three hashes plus the block header
requires only 140 bytes.
+Note: If identical txids are found within the same block, there is a possibility that the merkle tree may collide with a block with some or all duplicates removed due to how unbalanced merkle trees are implemented (duplicating the lone hash).
+Since it is impractical to have separate transactions with identical txids, this does not impose a burden on honest software, but must be checked if the invalid status of a block is to be cached;
+otherwise, a valid block with the duplicates eliminated could have the same merkle root and block hash, but be rejected by the cached invalid outcome, resulting in security bugs such as CVE-2012-2459.
+
@harding

harding Oct 2, 2014

Contributor

Thanks! ACK this version of the pararaph.

Contributor

harding commented Oct 2, 2014

I just re-reviewed everything and added just three content-change suggestions, so I think we're close to mergability.

@luke-jr you seem to be revising all sentences inside parenthesis, but when I do a web search for are sentences allowed in parenthesis, I don't see anyone arguing against them. (Almost all of the articles are about period location relative to the close-parens.) I suggest we address this issue before you continue editing to avoid future debate.

Contributor

saivann commented Oct 2, 2014

Except for the last comments from me and @harding, this LGTM.

harding added a commit to harding/bitcoin.org that referenced this pull request Oct 3, 2014

@harding harding referenced this pull request in luke-jr/bitcoin.org Oct 3, 2014

Merged

Small Changes To Prepare Pull #589 For Merge #2

luke-jr added a commit to luke-jr/bitcoin.org that referenced this pull request Oct 3, 2014

Merge pull request #2 from harding/final-changes
Small Changes To Prepare Pull #589 For Merge
Contributor

luke-jr commented Oct 3, 2014

Merged in changes

Contributor

harding commented Oct 3, 2014

@luke-jr thanks!

I just completed a final check and this LGTM. @saivann, since you verified the GPG pubkey, I suggest you manage this merge.

@saivann saivann merged commit f85c631 into bitcoin-dot-org:master Oct 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment