Skip to content

[R4R] #476 write order id to file for large expire message#478

Merged
cryptom-dev merged 1 commit intodevelopfrom
largeexpire
Mar 29, 2019
Merged

[R4R] #476 write order id to file for large expire message#478
cryptom-dev merged 1 commit intodevelopfrom
largeexpire

Conversation

@cryptom-dev
Copy link
Contributor

@cryptom-dev cryptom-dev commented Mar 6, 2019

Make sure Jack or Erheng modify db timestamp column to persist correct date format

Make sure Zongjiang's ws trade timestamp changed from multiply 1000 to divide 10 ^ 6

Description

write order id to file for large expire message

Rationale

#476

wiki update : https://github.com/binance-chain/docs-internal/wiki/qa_runbook#publication-error

Example

will keep a file at /Users/zhaocong/.bnbchaind_publisher/data/essential/3087_ExecutionResults.log

Duplicated order id is because the trade-id on order is different (capture this log on non-breathe block)

height:3087
orders:
0:
D01E89F25E05DD00382DDAF4541E754B7DAEB9BE-1318
5:
E00476F64BA73E5FE3E2E7D20D1C64706B2F7915-1299
D01E89F25E05DD00382DDAF4541E754B7DAEB9BE-1318
E00476F64BA73E5FE3E2E7D20D1C64706B2F7915-1298
D01E89F25E05DD00382DDAF4541E754B7DAEB9BE-1318
E00476F64BA73E5FE3E2E7D20D1C64706B2F7915-1297
D01E89F25E05DD00382DDAF4541E754B7DAEB9BE-1318
D01E89F25E05DD00382DDAF4541E754B7DAEB9BE-1318
4:
E00476F64BA73E5FE3E2E7D20D1C64706B2F7915-1138

proposals:

3087_Accounts.log

height:3087
bnb1uqz8dajt5ul9lclzulfq68rywp4j77g46gn3q5
bnb16q0gnuj7qhwsqwpdmt69g8n4fd76awd7p8hgcu

Changes

  1. dump large expire and send an empty msg keep qs is running
  2. upgrade kafka sarama lib
  3. change publish acknowledge level to local (we won't be blocked during upgrade kafka)

Preflight checks

  • build passed (make build)
  • tests passed (make test)
  • integration tests passed (make integration_test)
  • manual transaction test passed (cli invoke)

Already reviewed by

...

Related issues

#476

@cryptom-dev cryptom-dev self-assigned this Mar 6, 2019
@cryptom-dev cryptom-dev changed the title #476 write order id to file for large expire message [R4R] #476 write order id to file for large expire message Mar 6, 2019
@cryptom-dev cryptom-dev changed the base branch from master to develop March 6, 2019 05:11
go pClient.UpdatePrometheusMetrics()
}

if _, err := os.Stat(publisher.essentialLogPath); os.IsNotExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get a function EnsureDir in github.com/tendermint/tendermint/libs/common, maybe we can reuse it.

Logger.Error("failed to publish", "topic", topic, "msg", avroMessage.String(), "err", err)
filePath := fmt.Sprintf("%s/%d_%s.log", publisher.essentialLogPath, height, tpe.String())
toWrite := []byte(avroMessage.EssentialMsg())
if err := ioutil.WriteFile(filePath, toWrite, 0644); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we skip if len(toWrite) is zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioutil.WriteFile will open and close file each time, can we always hold a Writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioutil.WriteFile will open and close file each time, can we always hold a Writer?

each file is different file here with height and message type in filename, and as I said we should hit this code path very rare (I hope once per quarter..)

Copy link
Contributor

@unclezoro unclezoro Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. I See.

2.ext3文件系统下单个目录里的最大文件数无特别的限制,是受限于所在文件系统的inode数。
我在RHEL5u5的ext3文件系统中测试,在一个目录下,touch了100万个文件是没有问题的。但是肯定会受到所在文件系统的inode数的限制。

Each block will create serval file, maybe reach the system limit in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each block will create serval file, maybe reach the system limit in the long run.

we will delete once there is files for a block, this is devops rule

app/pub/msgs.go Outdated
if _, ok := stat[order.Status]; !ok {
stat[order.Status] = order.OrderId
} else {
stat[order.Status] += fmt.Sprintf("\n%s", order.OrderId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuous Sprintf may be slow. There should be other solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuous Sprintf may be slow. There should be other solution.

use strings.Builder instead

app/pub/msgs.go Outdated
}

func (msg Transfers) EssentialMsg() string {
// deliberated not implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why transfer not implemented?

i would suggest we implement every message types, even trade. If not, then it would dump nothing anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why transfer not implemented?

i would suggest we implement every message types, even trade. If not, then it would dump nothing anyway.

for transfer:

  1. qs doesn't subscribe to its topic (risk control is relying on that)
  2. risk control can recover from explorer indexed transfers (pull mode)
  3. we don't have a unique representation of transfer like order-id (we didn't save txhash in message)

for trade:
the problem is same with above point 3, (trade id is only generated during publication, not persisted anywhere). If we keep qty, price, sid, bid for a trade, it would be too much, in this case maybe we should recover from local publisher?

I think whether make this PR a more generic solution is depends on how we use it. My initial understanding is we only need this to cope with expire block (no trades, huge amount of expired orders, rare happen)

If we also want cover large normal block publication failure, I still think have a full node enabled local publisher is safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are fine with only dump orders/proposals/accounts as discussed offline

@cryptom-dev cryptom-dev force-pushed the largeexpire branch 5 times, most recently from af80d98 to 0af58a9 Compare March 11, 2019 11:30

[[constraint]]
name = "github.com/Shopify/sarama"
version = "1.17.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complained by @erhenglu that current kafka version 1.1.0 disconnect frequently. We want upgrade kafka to 2.1.0 in new QA and PROD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will that happen? do we have a pause the site when we upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will that happen? do we have a pause the site when we upgrade?

rolling upgrade kafka won't pause the site (but due to leader re-election, there will be some 4-6s height delay during rolling bounce kafka as I tested in QA)


[[constraint]]
name = "github.com/Shopify/sarama"
version = "1.17.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will that happen? do we have a pause the site when we upgrade?

// 3. we don't have a unique representation of transfer like order-id (we didn't save txhash in message)
//
// for trade:
// the problem is same with above point 3, (trade id is only generated during publication, not persisted anywhere).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I confirm the trade id is deterministic across all Witness publisher?

kafkaMsg := publisher.prepareMessage(topic, strconv.FormatInt(height, 10), timestamp, tpe, msg)
if partition, offset, err := publisher.publishWithRetry(kafkaMsg, topic); err == nil {
Logger.Info("published", "topic", topic, "msg", avroMessage.String(), "offset", offset, "partition", partition)
if essMsg, ok := avroMessage.(EssMsg); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop this.

thx..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop this.

fixed

pub.IsLive {
if height >= app.publicationConfig.FromHeightInclusive {
app.publish(tradesToPublish, &proposals, blockFee, ctx, height, blockTime.Unix())
app.publish(tradesToPublish, &proposals, blockFee, ctx, height, blockTime.UnixNano())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need confirm down streams are good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need confirm down streams are good

@erhenglu want next time upgrade, need a pause on qs to change DB schema

@cryptom-dev cryptom-dev changed the title [R4R] #476 write order id to file for large expire message [DO NOT MERGE] #476 write order id to file for large expire message Mar 20, 2019
@cryptom-dev cryptom-dev changed the title [DO NOT MERGE] #476 write order id to file for large expire message [R4R] #476 write order id to file for large expire message Mar 29, 2019
@cryptom-dev cryptom-dev merged commit cd0bd3c into develop Mar 29, 2019
@unclezoro unclezoro deleted the largeexpire branch May 10, 2022 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants