Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zergpool invalid job_id, various algos #193

Closed
JayDDee opened this issue Jul 2, 2019 · 12 comments
Closed

Zergpool invalid job_id, various algos #193

JayDDee opened this issue Jul 2, 2019 · 12 comments

Comments

@JayDDee
Copy link
Owner

JayDDee commented Jul 2, 2019

There seems to be a problem with rejected shares due to invalid job id mining various
algos. I have seen it on yespower, yescrypt and argon2d algos, but only at zergpool.

The pattern seems to be a series of rejects until a new job is received. A race condition
is suspected. One race that is guaranteed to lose is if a new job crosses paths with a share
submission from the old job.

miner submit share ------------------>
<------------------ pool send new job
<------------------ pool rejects share invalid job id
miner receives rejection

The race condition may be exacerbated by hight latency, around 140 ms for me at this pool.

This issue is opened to study the problem and determine if the miner is slow to detect new
work, the pool may be sending invalid jobs, or latency is the root cause.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 2, 2019

Some data collected with 2 PCs. Rejects at startup on PC 2.

PC 1:

[2019-07-02 02:50:53] yespowerr16 block 423820, network diff 0.007
[2019-07-02 02:51:03] Share submitted.
[2019-07-02 02:51:03] Accepted, diff 6.78e-05, 29.529 secs, A/R/B: 376/4/1.
[2019-07-02 02:51:03] Miner 756.85 H/s, Share 2464.09 H/s, Latency 148 ms.
[2019-07-02 02:51:03] Height 423820, Block share 1.03904%.
[2019-07-02 02:51:26] yespowerr16 block 423821, network diff 0.006

PC 2:

[2019-07-02 02:51:25] Stratum difficulty set to 0.3
[2019-07-02 02:51:25] yespowerr16 block 423820, job 49d, network diff 0.0065
[2019-07-02 02:51:34] Share 1 submitted by thread 5, job 49d.
[2019-07-02 02:51:34] Rejected, diff 8.9e-06, 9.829 secs, A/R/B: 0/1/0.
[2019-07-02 02:51:34] reject reason: Invalid job id.
[2019-07-02 02:51:58] Share 2 submitted by thread 5, job 49d.
[2019-07-02 02:51:58] Rejected, diff 2.67e-05, 23.661 secs, A/R/B: 0/2/0.
[2019-07-02 02:51:58] reject reason: Invalid job id.
[2019-07-02 02:52:02] Share 3 submitted by thread 11, job 49d.
[2019-07-02 02:52:03] Rejected, diff 1.05e-05, 4.901 secs, A/R/B: 0/3/0.
[2019-07-02 02:52:03] reject reason: Invalid job id.
[2019-07-02 02:52:16] Share 4 submitted by thread 6, job 49d.
[2019-07-02 02:52:16] Rejected, diff 1.71e-05, 13.752 secs, A/R/B: 0/4/0.
[2019-07-02 02:52:16] reject reason: Invalid job id.
[2019-07-02 02:52:21] Share 5 submitted by thread 13, job 49d.
[2019-07-02 02:52:21] Rejected, diff 9.78e-06, 4.559 secs, A/R/B: 0/5/0.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 2, 2019

Analysis of the stratum code looks like it never received the messagew for new block 423821.
Had it received the message it would have signalled the miner threads to abort their current
work and refresh with the new block data.

That it didn't is evidence the message was never received. It is interesting to note the timing.
The message for block 423820 on PC 2 was one second before PC 1 received the message for
block 423821. The timing may not be coincidental and may suggest a networking problem.
The high latency may also be a networking problem. Congestion could be responsible for both.

The data also show the scenario described in the OP did not occur because PC 1 received the
new block several seconds before PC 2 sent the stale share. PC 2 contoinued to submit stale
shares until I stoped the miner having never received the new block PC 1 had received.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 2, 2019

Another session showing that block 424023 was never received.

[2019-07-02 09:34:41] Starting Stratum on stratum+tcp://yespowerr16.mine.zergpool.com:6534
[2019-07-02 09:34:41] 16 miner threads started, using 'yespowerr16' algorithm.
[2019-07-02 09:34:42] Stratum difficulty set to 0.3
[2019-07-02 09:34:42] yespowerr16 block 424021, job 6b8, network diff 0.0146
[2019-07-02 09:34:45] Share 1 submitted by thread 3, job 6b8.
[2019-07-02 09:34:45] Accepted, diff 1.23e-05, 3.918 secs, A/R/B: 1/0/0.
[2019-07-02 09:34:45] Miner 63.03 H/s, Share 3373.44 H/s, Latency 153 ms.
[2019-07-02 09:34:45] Height 424021, job 6b8, 0.08436% block share.
[2019-07-02 09:34:45] - - - - - - - - - - - - - - - - - - - - - - - - - - -
[2019-07-02 09:35:09] Share 2 submitted by thread 3, job 6b8.
[2019-07-02 09:35:09] Rejected, diff 1.37e-05, 24.013 secs, A/R/B: 1/1/0.
[2019-07-02 09:35:09] reject reason: Invalid job id.
[2019-07-02 09:35:18] Share 3 submitted by thread 5, job 6b8.
[2019-07-02 09:35:18] Rejected, diff 1.24e-05, 9.406 secs, A/R/B: 1/2/0.
[2019-07-02 09:35:18] reject reason: Invalid job id.
[2019-07-02 09:35:22] Share 4 submitted by thread 3, job 6b8.
[2019-07-02 09:35:22] Rejected, diff 5.27e-06, 3.625 secs, A/R/B: 1/3/0.
[2019-07-02 09:35:22] reject reason: Invalid job id.
[2019-07-02 09:35:33] Share 5 submitted by thread 0, job 6b8.
[2019-07-02 09:35:33] Rejected, diff 5.47e-06, 10.842 secs, A/R/B: 1/4/0.
[2019-07-02 09:35:33] reject reason: Invalid job id.
[2019-07-02 09:35:39] Share 6 submitted by thread 0, job 6b8.
[2019-07-02 09:35:39] Rejected, diff 3.41e-05, 6.530 secs, A/R/B: 1/5/0.
[2019-07-02 09:35:39] reject reason: Invalid job id.
[2019-07-02 09:35:43] Share 7 submitted by thread 0, job 6b8.
[2019-07-02 09:35:43] Rejected, diff 6.69e-06, 3.569 secs, A/R/B: 1/6/0.
[2019-07-02 09:35:43] reject reason: Invalid job id.
[2019-07-02 09:35:46] yespowerr16 block 424023, job 6bb, network diff 0.0130
[2019-07-02 09:35:57] Share 8 submitted by thread 7, job 6bb.
[2019-07-02 09:35:57] Accepted, diff 6.85e-06, 14.276 secs, A/R/B: 2/6/0.
[2019-07-02 09:35:57] Miner 1000.22 H/s, Share 515.00 H/s, Latency 153 ms.
[2019-07-02 09:35:57] Height 424023, job 6bb, 0.05255% block share.
[2019-07-02 09:35:57] - - - - - - - - - - - - - - - - - - - - - - - - - - -

@pinpins
Copy link

pinpins commented Jul 2, 2019

Hi,

Was checking problem, is that for CPU algos only?

Otherwise I also did some change for yespowerr16 to have more bandwidth.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 3, 2019

It seems to be only for CPU algos, I have seen it on yespower and argon2d, where most users
use cpuminer-opt. I don't know if anyone has seen it using other miners. I'll do some more
testing to see if things have changed.

I also noticed latency is quite high maybe that's a factor. That could be a geographical issue, I'm in NA.

Edit: just ran a quick test, got a stale job right off the start...
slight correction: given the time difference between block 5a8 and the rejected share it's
likely I didn't get the notice for any blocks between 5a8 and 5ab. It is likely 5a8 was valid
at the time it was sent.

[2019-07-03 08:45:48] Starting Stratum on stratum+tcp://yespowerr16.mine.zergpool.com:6534
[2019-07-03 08:45:48] 16 miner threads started, using 'yespowerr16' algorithm.
[2019-07-03 08:45:48] Stratum difficulty set to 1
[2019-07-03 08:45:48] yespowerr16 block 424704, job 5a8, network diff 0.0101
[2019-07-03 08:47:02] Share 1 submitted by thread 9, job 5a8.
[2019-07-03 08:47:02] Rejected, diff 3.11e-05, 74.649 secs, A/R/B: 0/1/0.
[2019-07-03 08:47:02] reject reason: Invalid job id.
[2019-07-03 08:47:15] New job 5ab.
[2019-07-03 08:47:34] Share 2 submitted by thread 9, job 5ab.
[2019-07-03 08:47:34] Accepted, diff 0.000152, 31.672 secs, A/R/B: 1/1/0.
[2019-07-03 08:47:34] Miner 1017.09 H/s, Share 5143.75 H/s, Latency 258 ms.
[2019-07-03 08:47:34] Height 424704, job 5ab, 1.50786% block share.
[2019-07-03 08:47:34] - - - - - - - - - - - - - - - - - - - - - - - - - - -

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 4, 2019

I've looked at ways to make the miner more aggressive looking for a new job but nothing will help
if the new job is never received.

At this point I think I have taken it as far as I can. I have found noting in the miner that causes
the problem and no way for the miner to mitigate it. I believe the problem originates at the pool
and must be solved there. I will help with any troubleshooting or testing that may be required.

The problem defined as I see it:

  1. Pool sends new job.

  2. Some miners don't receive new job, others do.

  3. Miners who fail to receive new job continue to submit shares using old job.

  4. Miners continue to submit stale shares until they receive notice for the next new job.

Observations:

Job notice is not late and there is no timing issue, the notice for the job is never received
by some miners.

Running 2 miners side by side show one failed to receive a job notice while the other received it.

The pool appears to be sending the job notices but is either failing to send to all miners or
the notice is getting lost.

Possibly unrelated observations:

Zergpool has unusually high latency from my location, around 140 ms.

Sometimes there is a long delay before receiving the first job at startup. This delay may
be the same problem, the first notice was not received and the delay was waiting for the
next job.

@pinpins
Copy link

pinpins commented Jul 5, 2019

Weird, stratum implementation does not differ much as for any other algo. Usually such problems related to certain port getting spammed having too low start diff. Though, in this case 1 is quite good starting diff for yespowerr16.

Nevertheless, I have changed network route for yespowerR16, tell me if any difference, then we could blame one of DDoS service providers

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 5, 2019

Just ran a test, first share was rejected. The next job took 2 minutes and the job_id jumped by 4
suggesting I missed 3 jobs.

It seems to be only the yescrypt and yespower algos. Other CPU algos argon2d*, m7m
are ok.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 5, 2019

lyra2z330 also has the stale share problem.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 7, 2019

Around 30 minutes ago (15:00 NY time) lyra2v3 started spitting out invalid job id rejects, 42 so far,
using ccminer.

The good news is it definitely proves it's not an issue with cpuminer-opt.

The bad news is the problem is spreading to other algos.

@JayDDee
Copy link
Owner Author

JayDDee commented Jul 7, 2019

About 40 minutes later it has cleared up on lyra2v3. It clearly looks like a targetted attack
on the pool.

There's no need to keep this issue open any longer.

@JayDDee JayDDee closed this as completed Jul 7, 2019
@pinpins
Copy link

pinpins commented Jul 7, 2019

Yes, I have just completed fight back on another DDoS, so I can confirm your observation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants