[R4R] #476 write order id to file for large expire message#478
[R4R] #476 write order id to file for large expire message#478cryptom-dev merged 1 commit intodevelopfrom
Conversation
app/pub/publisher_kafka.go
Outdated
| go pClient.UpdatePrometheusMetrics() | ||
| } | ||
|
|
||
| if _, err := os.Stat(publisher.essentialLogPath); os.IsNotExist(err) { |
There was a problem hiding this comment.
We get a function EnsureDir in github.com/tendermint/tendermint/libs/common, maybe we can reuse it.
app/pub/publisher_kafka.go
Outdated
| Logger.Error("failed to publish", "topic", topic, "msg", avroMessage.String(), "err", err) | ||
| filePath := fmt.Sprintf("%s/%d_%s.log", publisher.essentialLogPath, height, tpe.String()) | ||
| toWrite := []byte(avroMessage.EssentialMsg()) | ||
| if err := ioutil.WriteFile(filePath, toWrite, 0644); err != nil { |
There was a problem hiding this comment.
can we skip if len(toWrite) is zero?
There was a problem hiding this comment.
ioutil.WriteFile will open and close file each time, can we always hold a Writer?
There was a problem hiding this comment.
ioutil.WriteFilewill open and close file each time, can we always hold aWriter?
each file is different file here with height and message type in filename, and as I said we should hit this code path very rare (I hope once per quarter..)
There was a problem hiding this comment.
oh. I See.
2.ext3文件系统下单个目录里的最大文件数无特别的限制,是受限于所在文件系统的inode数。
我在RHEL5u5的ext3文件系统中测试,在一个目录下,touch了100万个文件是没有问题的。但是肯定会受到所在文件系统的inode数的限制。
Each block will create serval file, maybe reach the system limit in the long run.
There was a problem hiding this comment.
Each block will create serval file, maybe reach the system limit in the long run.
we will delete once there is files for a block, this is devops rule
app/pub/msgs.go
Outdated
| if _, ok := stat[order.Status]; !ok { | ||
| stat[order.Status] = order.OrderId | ||
| } else { | ||
| stat[order.Status] += fmt.Sprintf("\n%s", order.OrderId) |
There was a problem hiding this comment.
continuous Sprintf may be slow. There should be other solution.
There was a problem hiding this comment.
continuous
Sprintfmay be slow. There should be other solution.
use strings.Builder instead
app/pub/msgs.go
Outdated
| } | ||
|
|
||
| func (msg Transfers) EssentialMsg() string { | ||
| // deliberated not implemented |
There was a problem hiding this comment.
why transfer not implemented?
i would suggest we implement every message types, even trade. If not, then it would dump nothing anyway.
There was a problem hiding this comment.
why transfer not implemented?
i would suggest we implement every message types, even trade. If not, then it would dump nothing anyway.
for transfer:
- qs doesn't subscribe to its topic (risk control is relying on that)
- risk control can recover from explorer indexed transfers (pull mode)
- we don't have a unique representation of transfer like order-id (we didn't save txhash in message)
for trade:
the problem is same with above point 3, (trade id is only generated during publication, not persisted anywhere). If we keep qty, price, sid, bid for a trade, it would be too much, in this case maybe we should recover from local publisher?
I think whether make this PR a more generic solution is depends on how we use it. My initial understanding is we only need this to cope with expire block (no trades, huge amount of expired orders, rare happen)
If we also want cover large normal block publication failure, I still think have a full node enabled local publisher is safer.
There was a problem hiding this comment.
you are fine with only dump orders/proposals/accounts as discussed offline
af80d98 to
0af58a9
Compare
|
|
||
| [[constraint]] | ||
| name = "github.com/Shopify/sarama" | ||
| version = "1.17.0" |
There was a problem hiding this comment.
complained by @erhenglu that current kafka version 1.1.0 disconnect frequently. We want upgrade kafka to 2.1.0 in new QA and PROD
There was a problem hiding this comment.
when will that happen? do we have a pause the site when we upgrade?
There was a problem hiding this comment.
when will that happen? do we have a pause the site when we upgrade?
rolling upgrade kafka won't pause the site (but due to leader re-election, there will be some 4-6s height delay during rolling bounce kafka as I tested in QA)
0af58a9 to
f99de0b
Compare
|
|
||
| [[constraint]] | ||
| name = "github.com/Shopify/sarama" | ||
| version = "1.17.0" |
There was a problem hiding this comment.
when will that happen? do we have a pause the site when we upgrade?
| // 3. we don't have a unique representation of transfer like order-id (we didn't save txhash in message) | ||
| // | ||
| // for trade: | ||
| // the problem is same with above point 3, (trade id is only generated during publication, not persisted anywhere). |
There was a problem hiding this comment.
can I confirm the trade id is deterministic across all Witness publisher?
app/pub/publisher_kafka.go
Outdated
| kafkaMsg := publisher.prepareMessage(topic, strconv.FormatInt(height, 10), timestamp, tpe, msg) | ||
| if partition, offset, err := publisher.publishWithRetry(kafkaMsg, topic); err == nil { | ||
| Logger.Info("published", "topic", topic, "msg", avroMessage.String(), "offset", offset, "partition", partition) | ||
| if essMsg, ok := avroMessage.(EssMsg); ok { |
There was a problem hiding this comment.
please drop this.
thx..
There was a problem hiding this comment.
please drop this.
fixed
f99de0b to
16f2737
Compare
| pub.IsLive { | ||
| if height >= app.publicationConfig.FromHeightInclusive { | ||
| app.publish(tradesToPublish, &proposals, blockFee, ctx, height, blockTime.Unix()) | ||
| app.publish(tradesToPublish, &proposals, blockFee, ctx, height, blockTime.UnixNano()) |
There was a problem hiding this comment.
need confirm down streams are good
There was a problem hiding this comment.
need confirm down streams are good
@erhenglu want next time upgrade, need a pause on qs to change DB schema
16f2737 to
0ef3c13
Compare
0ef3c13 to
fa3254b
Compare
fa3254b to
bd6a17f
Compare
bd6a17f to
61acc7d
Compare
Make sure Jack or Erheng modify db timestamp column to persist correct date format
Make sure Zongjiang's ws trade timestamp changed from multiply 1000 to divide 10 ^ 6
Description
write order id to file for large expire message
Rationale
#476
wiki update : https://github.com/binance-chain/docs-internal/wiki/qa_runbook#publication-error
Example
will keep a file at
/Users/zhaocong/.bnbchaind_publisher/data/essential/3087_ExecutionResults.logDuplicated order id is because the trade-id on order is different (capture this log on non-breathe block)
3087_Accounts.logChanges
Preflight checks
make build)make test)make integration_test)Already reviewed by
...
Related issues
#476