Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create square builder class #1659

Merged
merged 15 commits into from
Apr 28, 2023
Merged

feat: create square builder class #1659

merged 15 commits into from
Apr 28, 2023

Conversation

cmwaters
Copy link
Contributor

Ref: #1214

This creates a builder struct for assembling squares from a (prioritised) list of transactions.

It also includes two functions Construct and Reconstruct as listed in ADR20

@MSevey MSevey requested a review from a team April 26, 2023 14:09
Comment on lines 211 to 217
func txsToBytes(txs coretypes.Txs) [][]byte {
e := make([][]byte, len(txs))
for i, tx := range txs {
e[i] = []byte(tx)
}
return e
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this was moved out from the shares package because no one was using it. IIRC there's a similar function in core we could just use

//
// Note that the padding would actually belong to the namespace of the transaction before it, but
// this makes no difference to the total share size.
maxPadding: shares.SubTreeWidth(shareSize) - 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reviewer: Please make sure I have this assumption correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question][blocking] what is the relationship between subtree width and

Blobs start at an index that is equal to a multiple of the blob length divided by MaxSquareSize rounded up.

Source: https://github.com/celestiaorg/celestia-app/blob/main/docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md

Related: should we extract a function that calculates the max padding for a blob and unit test it in isolation? Then the unit tests can use the table here to verify max padding matches what we expect: https://github.com/celestiaorg/celestia-app/blob/main/docs/architecture/assets/worst-case-padding-in-blob-size-range.png

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blobs start at an index that is equal to a multiple of the blob length divided by MaxSquareSize rounded up.

Where do you see this in ADR013? I can't find it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we use SubtreeRootThreshold to work out the height in the tree of the highest merkle mountain and thus the width of the leaves

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shares.SubTreeWidth(shareSize) - 1 is correct, and good idea @rootulp having a function to make this explicit is a good idea.

created #1660

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see this in ADR013? I can't find it

Sorry I should've linked directly to the line:

Blobs start at an index that is equal to a multiple of the blob length divided by `MaxSquareSize` rounded up.

Comment on lines -19 to -36
func isPowerOf2(v uint64) bool {
return v&(v-1) == 0 && v != 0
}

func BlobsFromProto(blobs []core.Blob) ([]coretypes.Blob, error) {
result := make([]coretypes.Blob, len(blobs))
for i, blob := range blobs {
if blob.ShareVersion > math.MaxUint8 {
return nil, fmt.Errorf("share version %d is too large to be a uint8", blob.ShareVersion)
}
result[i] = coretypes.Blob{
NamespaceID: blob.NamespaceId,
Data: blob.Data,
ShareVersion: uint8(blob.ShareVersion),
}
}
return result, nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these functions are either not used (and probably shoudn't be) or redundant

Comment on lines +85 to +94
// AvailableBytesFromSparseShares returns the maximum amount of bytes that could fit in `n` sparse shares
func AvailableBytesFromSparseShares(n int) int {
if n <= 0 {
return 0
}
if n == 1 {
return appconsts.FirstSparseShareContentSize
}
return (n-1)*appconsts.ContinuationSparseShareContentSize + appconsts.FirstSparseShareContentSize
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that the node team might want to use

Base automatically changed from cal/square-1 to main April 26, 2023 15:06
pkg/square/builder.go Outdated Show resolved Hide resolved
pkg/square/builder.go Outdated Show resolved Hide resolved
Comment on lines +23 to +26
// here we keep track of the actual data to go in a square
txs [][]byte
pfbs []*coretypes.IndexWrapper
blobs []*element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[informational][no change needed] this godoc comment will only apply to the first field

Screenshot 2023-04-26 at 10 42 18 AM

Screenshot 2023-04-26 at 10 42 24 AM

Comment on lines +215 to +216
// Imagine if, according to the share commitment rules, a transcation took up 11 shares
// and had the merkle mountain tree commitment of 4,4,2,1. The first part of the share commitment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a tough time relating this example to the ascii diagram above b/c it discusses a blob that occupies 11 shares but there are only 8 possible share indexes in the ASCII diagram

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 possible shares indexes are a snapshot in the tree. It's used to demonstrate that if the index of the last transaction ended at 1 (inclusive), and the SubtreeWidth() was 4, then the next one would start at index 4 and we would have 2 shares of padding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this example is out dated with the implementation of ADR013

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this tree across from somewhere else just so I could explain why we have padding and why that formula gives us the maximum possible padding

//
// Note that the padding would actually belong to the namespace of the transaction before it, but
// this makes no difference to the total share size.
maxPadding: shares.SubTreeWidth(shareSize) - 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question][blocking] what is the relationship between subtree width and

Blobs start at an index that is equal to a multiple of the blob length divided by MaxSquareSize rounded up.

Source: https://github.com/celestiaorg/celestia-app/blob/main/docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md

Related: should we extract a function that calculates the max padding for a blob and unit test it in isolation? Then the unit tests can use the table here to verify max padding matches what we expect: https://github.com/celestiaorg/celestia-app/blob/main/docs/architecture/assets/worst-case-padding-in-blob-size-range.png

@rootulp rootulp added this to the Mainnet milestone Apr 26, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 26, 2023

Codecov Report

Merging #1659 (0dd43ef) into main (4c73824) will increase coverage by 1.11%.
The diff coverage is 73.84%.

@@            Coverage Diff             @@
##             main    #1659      +/-   ##
==========================================
+ Coverage   51.56%   52.67%   +1.11%     
==========================================
  Files          95       97       +2     
  Lines        5954     6185     +231     
==========================================
+ Hits         3070     3258     +188     
- Misses       2570     2598      +28     
- Partials      314      329      +15     
Impacted Files Coverage Δ
pkg/namespace/namespace.go 46.05% <0.00%> (-3.95%) ⬇️
pkg/shares/share_splitting.go 64.65% <0.00%> (ø)
pkg/shares/utils.go 57.69% <11.11%> (+7.69%) ⬆️
pkg/square/square.go 71.42% <71.42%> (ø)
pkg/square/builder.go 86.00% <86.00%> (ø)
pkg/da/data_availability_header.go 78.89% <100.00%> (-0.20%) ⬇️

evan-forbes
evan-forbes previously approved these changes Apr 26, 2023
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!! overall this is a really big improvement

mostly non-blocking naming things

Comment on lines +41 to +45
// Reconstruct takes a list of ordered transactions and reconstructs a square, validating that
// all PFBs are ordered after regular transactions and that the transactions don't collectively
// exceed the maxSquareSize. Note that this function does not check the underlying validity of
// the transactions.
func Reconstruct(txs [][]byte, maxSquareSize int) (Square, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not blocking]
I can see this naming scheme becoming confusing, since "reconstructing the square" is already a term we use to describe the process of collecting enough erasure encoded shares to rebuild the EDS. (for example, search the word reconstruct in celestia-node)

I understand using the word reconstruct here, but perhaps we could use a synonym to contruct? build and rebuild?

Comment on lines +215 to +216
// Imagine if, according to the share commitment rules, a transcation took up 11 shares
// and had the merkle mountain tree commitment of 4,4,2,1. The first part of the share commitment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this example is out dated with the implementation of ADR013

//
// Note that the padding would actually belong to the namespace of the transaction before it, but
// this makes no difference to the total share size.
maxPadding: shares.SubTreeWidth(shareSize) - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shares.SubTreeWidth(shareSize) - 1 is correct, and good idea @rootulp having a function to make this explicit is a good idea.

created #1660

blob core.Blob
pfbIndex int
blobIndex int
shareSize int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not blocking naming thing]
is it possible to use a different word besides shareSize that describes what this is? This is the number of shares used correct? the shares size makes me think of

Comment on lines +71 to +72
size := iw.Size()
pfbShareDiff := c.pfbCounter.Add(size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is a significant improvement to

// maxWrappedTxOverhead calculates the maximum amount of overhead introduced by
// wrapping a transaction with a shares index
//
// TODO: make more efficient by only generating these numbers once or something
// similar. This function alone can take up to 5ms.
func maxIndexWrapperOverhead(squareSize uint64) int {
maxTxLen := squareSize * squareSize * appconsts.ContinuationCompactShareContentSize
wtx, err := coretypes.MarshalIndexWrapper(
make([]byte, maxTxLen),
uint32(squareSize*squareSize),
)
if err != nil {
panic(err)
}
return len(wtx) - int(maxTxLen)
}
// maxIndexOverhead calculates the maximum amount of overhead in bytes that
// could occur by adding an index to an IndexWrapper.
func maxIndexOverhead(squareSize uint64) int {
maxShareIndex := squareSize * squareSize
maxIndexLen := binary.PutUvarint(make([]byte, binary.MaxVarintLen32), maxShareIndex)
wtx, err := coretypes.MarshalIndexWrapper(make([]byte, 1), uint32(maxShareIndex))
if err != nil {
panic(err)
}
wtx2, err := coretypes.MarshalIndexWrapper(make([]byte, 1), uint32(maxShareIndex), uint32(maxShareIndex-1))
if err != nil {
panic(err)
}
return len(wtx2) - len(wtx) + maxIndexLen
}

@evan-forbes
Copy link
Member

I'm working on some changes locally for refactoring limiting the blocksize, and I this change pairs very well

Co-authored-by: Rootul P <rootulp@gmail.com>
@cmwaters
Copy link
Contributor Author

Thought I'd post some relatively crude results comparing the square size estimation algorithm before and after (each PFB is 512 bytes)

pfbs 1 | square size before 4 | square size after 2
pfbs 2 | square size before 4 | square size after 4
pfbs 4 | square size before 8 | square size after 4
pfbs 8 | square size before 8 | square size after 8
pfbs 16 | square size before 16 | square size after 8
pfbs 32 | square size before 16 | square size after 16
pfbs 64 | square size before 32 | square size after 16
pfbs 128 | square size before 32 | square size after 32
pfbs 256 | square size before 64 | square size after 32
pfbs 512 | square size before 64 | square size after 64
pfbs 1024 | square size before 128 | square size after 64
pfbs 2048 | square size before 128 | square size after 128

@cmwaters cmwaters merged commit a4aec0d into main Apr 28, 2023
27 checks passed
@cmwaters cmwaters deleted the cal/square-2 branch April 28, 2023 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants