Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share stats for solo mining #246

Closed
JayDDee opened this issue Feb 23, 2020 · 54 comments
Closed

Share stats for solo mining #246

JayDDee opened this issue Feb 23, 2020 · 54 comments

Comments

@JayDDee
Copy link
Owner

JayDDee commented Feb 23, 2020

This is yet another follow up to #244 & #245 to support share statistics for solo mining.

For the most part the share statistics feature works for solo mining with the limitation that "share"
related data has no meaning when solo mining. A share essentially represents a solved
block so all "share" related statistics will actually be block statistics.

Some of this info may be redundant with explicit block level stats or the share stats may be
all zero.

Though acceptble this is not ideal. An attempt will be made to suppress irrelevant info in
the logs.

The only significant work item is to implement a new block log for solo mining. The existing
block log is generated in stratum code that is not used when solo mininig.

The new block log for solo mining will have a different format that it's stratum cousin and
will display different info. There is no stratum diff and no need for share estimates.
This will resuilt in a shorter log than the stratum version.

A proposed format:

[algo]: [URL] block [work->height], diff [net_diff]
TTF @ [refhashrate] [time], net hashrate [net_hash]

All fields have the same meaning as the block related fields of the stratum new block log
and will be displayed in the same format

This log will be output for both getwork and GBT excluding longpoll. Longpoll will be implemented
when it can be tested.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 23, 2020

The summary report will need to be implemented for solo minig as it is currently generated in
stratum code. It appears it will need to be done by one of the mining threads so care will be
required to avoid race conditions.

No changes are anticipated to the report fiields or format.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 23, 2020

Another problem was discovered durting tyesting of v3.12.4, the miner will silently discard has
it believes is stale without reporting it. This is because the actual RPC submission is done by
the workio thread instead of the miner thread . The miner thread reports the share submitted
because it sent the request to the workio thread. But the workio thread discarded the share
and never actually submitted it, hence no result.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 23, 2020

The next release will have the new block and summary logs implemented for getwork.
Minor tweaking will follow to clean up any invalid share stats, address the silent stales,
and any other improvements that become apparent.

Summary log was implemented in workoi_get_work which is called by the workio thread only
for getwork. This avoids having the miners threads do it without tripping over each other.

New block log is implemented in get_upstream_work for the same reasons.

The format of the summary log is unchanged for now, the new block log is slightly modified
from the stratum version due to lack of stratum diff and job ids. Unlike the stratum version the
network hash rate is actually obtained directly from the wallet instead of being calculated by
counting new blocks over time.

[algo] [URL] block {height], diff [net_diff]
Miner TTF @ [miner_ref_hashrate] [time], net TTF @ [net_hash_rate] [time]

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 23, 2020

cpuminer-opt-3.12.4.2 is released with initial implementation of block and summary logs for getwork.
Refinements to stats will be made in subsequent releases to ensure correctness, suppress
irrelevant info and avoid redundancy.

User feedback is welcome.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 23, 2020

One of the follow up issues is the silent discard of stale blocks. The first problem is a log is only
produced if debug is enabled. That will be changed.

Anotther problem is there is no mechanism to communicate the error back from
submit_upstream_work of the workio thread to the miner thread that found the block.

These 2 problems combined create the silent stale scenario. Always producing the discard
log will close the loop.

Another more phylosophical question is whther the block should be discarded or submitted
anyway. The penalty is trivial, just a likely futile test of the block by the wallet. However there is
also the benefit of properly recording the stale block to be included in the stats.

There are actually 2 tests affected: the stale work test which was seen in recent testing,
and a stale block test (block already solved). Each will be evaluated seperately whether
the error should be ignored and the block submitted anyway.

I'm leaning toward submitting just in case it might be accepted accepted. after all te work to
find a block it seems a waste to summarilly discard it. It also make the stats more complete.

That's 2 reasons to submit. Comments welcome.

Edit:

Correction: stratum performs the stale work test but not the stale block test because stratum
doesn't support mininginfo.

There's a third reason, [stratum has no similar tests]. The tests by getwork are 1, a
comparison of g_work (global) and work (local), and 2, an RPC query for mining info to
check the solved block is still current. They are done by workoi thread immediately before
submitting the block to rhe wallet.

On its surface this seems like the right thing to do, and in most cases it would be.
Preventing errors is good defensive coding and given the existance of a race condition,
the last minute test reduces the size of the window of exposure. It all sounds good.

As previously stated allowing the likely stale block to be submitted makes the stats record the
event and include it in performance reports.

I'm not quit eready to make the final decision but I haven't found a practical reason to
perform the tests.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 24, 2020

Just after stating stratum didn't perform the last muinute stale test, I didi some testing I found
out otherwise.

Scryptn2 at zergpool has a chronic stale share problem. The logs show a share submitted
for the old job seconds after the new job report.

Stratum receives the job, signals the miner threads to abort their current work and get new
work, ands logs the event. There is some latency before the miner threads react, they will
finish their current hash, possibly a valid ine and submit it, then check the abor. flag.

The worst case latency is the time it takes for one thread to calculate one hash. A slow algo like
scryptn2 has an unusually long hash time but it is still under a second per thread. Submitting
several seconds late can't be explained.

Another interesting observation was made. The silent discard explains why the share stats get
out of sync when mining scryptn2. The submit count gets incremented but the reply count
does not, implying an unreplied share.

Disabling thesoilent discard would help keep the stats in sync as well as provide a more
accurate stale share count.

Unfortunately it doesn't help with the underlying chronic stale share problem, although the
slow hash rate is highly suspected of being a factor.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 24, 2020

The chronic stale share problem with scryptn2 is now understood.

The scrypt code does up to 24 way hashing therefore the time to calculate one hash is the same
as for 24. This results in a hash time that could be several seconds.

This is the result of the extremely slow hashing of scryptn2 combined with the design of the
scrypt hashing code.

I intend no further follow up on the scryptn2 stale share problem.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

cpuminer-opt-3.12.4.3 is released. The silent discarding of suspected stale shares has been
disabled for both getwork and stratum.

It was found to cause share count mismatches in share stats that resulted in invalid data in
the logs and inaccurate performance measurements.

By submitting tthe suspected stale shares, and letting the server reject them, ensures the counters
remain synchronized and stale shares are properly accounted in performance statistics.

One more release is possible before this issue is closed to allow for final tweaking after user
feedback.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Feb 25, 2020

cpuminer-opt 3.12.4.3
...
Starting miner with AVX2 AES...
[2020-02-25 10:59:49] 24 CPU cores available, 12 miner threads selected.
[2020-02-25 10:59:49] Extranonce subscribe: YES
[2020-02-25 10:59:49] 12 miner threads started, using 'anime' algorithm.
[2020-02-25 10:59:50] Current block is 6264277
[2020-02-25 10:59:50] Switching to getwork, gbt version 112
[2020-02-25 10:59:52] anime 127.0.0.1:25555 block 6264277, diff 98.25
[2020-02-25 11:00:02] anime 127.0.0.1:25555 block 6264277, diff 98.25
Miner TTF @ 3778.68 kh/s 1d07h, net TTF @ 63.65 Mh/s 1h50m
[2020-02-25 11:00:09] anime block 6264278, diff 98.25, net 62.74 MH/s
[2020-02-25 11:00:09] anime 127.0.0.1:25555 block 6264278, diff 98.25
Miner TTF @ 3767.10 kh/s 1d07h, net TTF @ 62.74 Mh/s 1h52m
[2020-02-25 11:00:16] anime block 6264279, diff 98.25, net 62.81 MH/s
[2020-02-25 11:00:16] anime 127.0.0.1:25555 block 6264279, diff 98.25
Miner TTF @ 3770.18 kh/s 1d07h, net TTF @ 62.81 Mh/s 1h51m
[2020-02-25 11:00:23] anime block 6264280, diff 98.25, net 62.93 MH/s
[2020-02-25 11:00:23] anime 127.0.0.1:25555 block 6264280, diff 98.25
Miner TTF @ 3772.33 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m
[2020-02-25 11:00:30] anime 127.0.0.1:25555 block 6264280, diff 98.25
Miner TTF @ 3772.61 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m
[2020-02-25 11:00:37] anime 127.0.0.1:25555 block 6264280, diff 98.25
Miner TTF @ 3770.68 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m
[2020-02-25 11:00:44] anime 127.0.0.1:25555 block 6264280, diff 98.25
Miner TTF @ 3772.97 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m
[2020-02-25 11:00:51] anime 127.0.0.1:25555 block 6264280, diff 98.25

1) diff seems to be incorrect (and NET TTF of course, too):
ff
1d solo/2h network - these are 100% incorrect with such a small block target time
2) do we really need hostname and IP over there?
3) As seen over here:
[2020-02-25 11:12:34] 2 submitted by thread 3, lane 2
[2020-02-25 11:12:34] Hash[7:0]: 00000002 1f50715e ed03d2e0 4cd1e186 8a73febf 30b45c19 f3909e2a 35dd877f
[2020-02-25 11:12:34] Targ[7:0]: 00000002 9b060000 00000000 00000000 00000000 00000000 00000000 00000000
[2020-02-25 11:12:35] 2 Accepted 1 S0 R1 B0, 470.748 sec (1081ms)
Diff 0.47118 (0.48), Block 0

we try to show block height of a found block. I think that info should be removed, as there is no info on block height at that moment.
4) 5min hashrates are always nulls:
[2020-02-25 11:46:10] Periodic Report 5m05s 46m20s
Share rate 0.00/min 0.09/min
Hash rate 0.00h/s 1245.41kh/s (3759.11kh/s)
Lost hash rate 0.00h/s 0.00h/s
Submitted 0 4
Accepted 0 2
Rejected 0 2
5) "Lost hash rate" - this is not applicable to solo mining and should be removed, I think. Solo is a lottery (even TTF is a rough estimate). Better is to replace "Share rate" -> "Block rate" in case of !stratum

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

  1. I was already thinking of moving that to the summary log.

  2. lost hash rate is the hashrate equivalent of rejected and stale shares. which isn't included in
    the effective hash rate (what is actually earned). When comparing performance the statistocal mean
    is:
    ref HR == effective HR + lost HR

Of course if there are no rejects the lost hashrate should be zero and therefore not displayed.

I see a couple of other issues:

Duplicate new block logs, it looks like each thread reports it, easy fix.
It looks like the TTF is reversed between network block TTF and your miner's block TTF.
Can you confirm the numbers are otherwise correct.?

If the data isn't correct in the block log the error will likely propagate to the share and summary
as well, so I'l focus on block log frst before worrying too much about the others.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Feb 25, 2020

It looks like the TTF is reversed between network block TTF and your miner's block TTF.
Can you confirm the numbers are otherwise correct.?

Nope. Real diff level and block times are shown in another miner and @ pool side. No way it's a day or an hour. There're some protocol dump logs in another issue, as you remeber, maybe they could make the real diff level clearer.

Duplicate new block logs, it looks like each thread reports it, easy fix.

Yes, that is present in linux, too.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

In the new block log the diff and height should be correct. If not we have a big problem.
The miner hashrate used to calculate your TTF should also be correct after a few samples.

The net hashrate is probably messed up. It's a global but I treat it like a local and overwrite it when
it's scaled it for display. Corrupting the net hashrate will have a trickle effect on all
info that depends on it.

That limits the value of future data collection until it's fixed. It's an easy fix if you're up to it.
Let me know if not, I can push out a release with the fix.

Define local net_hr and use it instead ot net_hashrate:

cpu-miner.c:1427

  if ( miner_hr )
  {
     double net_hr = net_hashrate;  // NEW
     char net_hr_units[4] = {0};
     char miner_hr_units[4] = {0};
     char net_ttf[32];
     char miner_ttf[32];

     sprintf_et( net_ttf, net_diff * diff_to_hash / net_hr );
     sprintf_et( miner_ttf, net_diff * diff_to_hash / miner_hr );
     scale_hash_for_display ( &miner_hr, miner_hr_units );
     scale_hash_for_display ( &net_hr, net_hr_units );
     applog2(LOG_INFO, "Miner TTF @ %.2f %sh/s %s, net TTF @ %.2f %sh/s %s",
                         miner_hr, miner_hr_units, miner_ttf,
                         net_hr, net_hr_units, net_ttf );
  }

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

There is an interest quirk in the stats for solo mining. The target diff and net diff should be the
same but in the stats they are sourced differently.

The net_diff is provided by RPC mininginfo, but the target diff is calculated by the miner from
the target hash. It makes for an interesting comparison.

Regarding point 4: 5 min hashrate always zero. This is the effective hash rate based on submitted
shares. If no shares were submitted the hashrate is zero. It's correct.

I think I've gone as far as possible with the available data. I'll wait a while longer to see if
any more is forthcoming, then I'll release a new version with a few fixes.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

There seems to be an issue with reporting stale shares, they get reported as rejected instead
of stale.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

v3.12.4.4 has some fixes to getwork stats. Please test.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 25, 2020

One other note is that every "Accepted" should be a "BLOCK SOLVED" when solo mining..
The reason it isn't is because the share diff is so low, it shoulld be >= net diff. The share ratio
should also be >= 100%. It's weird that it's the same as the diff.

@YetAnotherRussian
Copy link

v3.12.4.4 has some fixes to getwork stats. Please test.

I'm on it.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 27, 2020

I just noticed share diff is incorrect in the share result log for stratum. It should always
be >= targetdiff. This will also affect the share ratio. Test results of these fields will be invalid
until fixed. Share diff should not affect any other stats. This appears to be a day1 bug with share
stats.

Edit: I have a fix for the incorrect share diff ready to go. I'll wait for your test report first in case
you find something else that needs fixing.

I hope the next release is the last test release before declaring victory over this issue.

@YetAnotherRussian
Copy link

f

Seems that ther're still some reports on getting job for same block height

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

Just released v3.12.4.5 with some more fixes. I'll take a look at your logs fro 3.12.4.4.

Edit: Repeated block logs for the same block. I think I know what that is, probably just new work,
not a new block. Should be easy to fix.

What's you're opinion of new work? should it be logged like stratum logs new jobs?
Or is it too much noise? It seems a little too frequent, I'll wait to see what the problem is.

How much are the TTF estimates off by? How long does it usually take for you to get a block.
How often does the network issue a new block?

@YetAnotherRussian
Copy link

I'm able to solve pretty fast. I've just seen your new release, and already have new logs from it:

log_9.txt

There's something here:
[2020-02-28 11:05:45] 1 Submit diff 0.2612, block 6273249
[2020-02-28 11:05:46] Block 6273249 already solved, current block 124651581
[2020-02-28 11:05:47] 1 Accepted 1 S0 R0 B0, 717.115 sec (2043ms)
Diff 0.2612 (0.22), Block 0

That was not expected :D

@YetAnotherRussian
Copy link

YetAnotherRussian commented Feb 28, 2020

What's you're opinion of new work? should it be logged like stratum logs new jobs?
Or is it too much noise? It seems a little too frequent, I'll wait to see what the problem is.

I think we should log new block number only once.

How much are the TTF estimates off by? How long does it usually take for you to get a block.

My real estimate is 4-5 blocks in 30min @ ~75Mh/s nethash

How often does the network issue a new block?

Explorer #1: https://animeco.in/explorer
Network: 67.61 MH/s
Difficulty: 118.48
Avg. Block Time: 30.68 seconds
Explorer #2: https://miningbase.tk/explorer/ANI

@YetAnotherRussian
Copy link

Sorry, it seems that cpuminer-opt displays correct diff, e.g. "New block 6273277, diff 118.48", only NET TTF and miner TTF values are not correct.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

That's good data. It looks like a false positive for the stale block test, probably because of invalid
current block height.

I'll double check the TTF calculations, the miner TTF should be the same as stratum, the
net TTF is reversed.

For getwork the net hashrate is provided via RPC and the TTF is calculated.

For stratum the net TTF is calculated by counting the new blocks over time and converting
to a hash rate. I have not been able to verify the net hashrate independently. I suspect
it's not correct.

However, the share TTF and block TTF for stratum are correct so it's just a matter of doing
the same thing for getwork.

Block is still reported as "Accepted" not "BLOCK SOLVED", need to look into that too.

There's a lot more to look at. Let me know of other interesting stuff you find.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Feb 28, 2020

And this is the real and correct TTF (NET TTF):

image

So, the miner TTF in case of 4Mh/s shoud be around 8min or something. This confirms my stats (4-5 blocks in 30min @ ~75Mh/s nethash - written above).

Let me know of other interesting stuff you find.

Yep, I will. Currently switched to v3.12.4.5.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

About logging new block, part of the issue is distinguishing a new block from simply new work.
It should just say "New work" like stratum's "New job".

Usually when solo mining the output is pretty sparse and the new work logs can provide
a heratbeat to reassure the user that things are working. In this case the coin has a high block
emmision rate so I shouln't make the decision based on 1 coin. A coin with a 5 minute block
rate would be more appropriate as a guide.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Feb 28, 2020

the new work logs can provide
a heartbeat to reassure the user that things are working

Previously, it was a per-thread hashrate. But yes, thread count is increasing every year, so it is totally inacceptable to use such an output as a heartbeat in case of Threadripper 3990X or similar systems... Too much new lines.

It should just say "New work" like stratum's "New job".

Seems to be a good solution.

In this case the coin has a high block
emmision rate so I shouln't make the decision based on 1 coin. A coin with a 5 minute block
rate would be more appropriate as a guide.

Definitely we should not base on one coin. But the current tendency is to reduce block times (1-3 minutes or less). The goal is to have fast transactions while maintaining reasonable count of block confirmations. So about 5 minutes... this is too much, and may become an edge case sometime soon. I recommend to take 2 minutes as a reference point.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

I think the issue is the scan time. Your seeing the heartbeat every 5 seconds the scantime
for getwork. Startum gets a new job about once every minute, the scantime for stratum.

There are currently no checks of the new work to see if it's actually different. I can add a check
and that should reduce the heartbeat frequency. Some new block logs will become new work
logs and some might disppear.

It's becoming clear another test release will be required to test the next batch of fixes.

I have a fix for invalid block height. It should fix the false positive stale block pre-test.

I'm stuck with the TTF problem, the code is identical to how it's done for stratum. This could
be a tricky one.

Unstuck.

I found a difference with the stratum share TTF. It uses targetdiff whis is the stratum diff
adjusted by a target factor hard coded fo reach algo. It's adjusted transparently for stratum
share TTF but missed on net TTF or getwork block TTF. It's a bit speculative and will
definitely need your testing to confirm.

Edit: Usng the target factor didn' work for stratum net TTF, it was ridiculously low.
So it wouldn't work for getwork either. I have another possible fix for getwork using the
target in the getwork data (target and targetdiff are discussed in #251) instead of the net_diff
It shouldn't make a difference theoretically but it might be worth a try..

That's 3 new fixes so far and I haven't looked at your attachments yet.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

I see the end for this issue. It's main focus was on share stat specific to getwork which
includes one new log and verifying data is correct in existing logs.

There are 3 remaining issues:

  1. Repeated new block logs. The next release will have a fix so it works like stratum.
    Any remaining concerns with verbosity should be tracked by a seperate issue because it also
    includes stratum, block emission rates, new work/job rates, and scan time. A throttling mechanism
    may be required for certain low diff coins with very short block times.

  2. The stale block pre-submit test is using invalid block height data obtained from the
    mininginfo RPC call. A new debug log is introduced to display mininginfo when -D option
    is used. If the problem persists the test will be disabled as it is of little value.

  3. TTF estimates are incorrect for getwork. For stratum the share TTF is correct but the
    net TTF has not been verified. TTF is still a work in progress and not specific to getwork.
    It probably derserves its own issue for tracking.

A new release will be available in a few hours. It will llikley the last test release to deal with
getwork specific stats. A final tweak and version bump may come after that to officially close
this chapter.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

FYI. I think I just saw the idle problem with ccminer and powershell. I've only recently started using
powershell and never saw the problem before with cmd.exe. I think it's a Windows/powershell issue.
It isn't just the CPU that idles, the GPU does as well.

@JayDDee
Copy link
Owner Author

JayDDee commented Feb 28, 2020

v3.12.4.6 is released with a couple of fixes and a coupe of new debug logs. Please test with -D.

The 3 areas of focus:

  1. New blocks. There should be no repeated new block logs. Soime debug info will be displayed
    to help me understand what data changes when new work arrives for the same block.
    I'm thinking specifically about ntime, it could be useful to include that in the new block and
    new work report, maybe for stratum too. It may also lead to a more efficient test where only
    ntime is checked instead of the entire 80 byte block header.

1b. Also with the new block log is the TTF problem. This may be due to conflicting data
between network difficulty and target, described in point 3 below. I've added both network diff and
target diff to the new block log for comparison to see which is correct. That info will also
be useful for point 3.

  1. Share submission. Not really a fix but more debug info to help understand the stale block
    false positive. The false positive implies that mininginfo mey not be working correctly.
    If a solution to the stale block test isn't apparent with the debug info the test will be disabled.
    If problems with mining info are confirmed it will be tracked by a seperate issue.

Edit: Ignore the following, the results will be invalid due ot a bug in calculating share difficulty.

3, Share result. There seems to be some confusion about the target to solve a block. This
is usually determined by the network difficulty but block with lower difficulty are being submitted
and accepted. The miner's hash test uses the 256 bit target to do a direct comparison with
the 256 bit hash. The share diff will now be calculated using the target diff instead of network diff.
A successful test is if the accepted blocks are reported as "BLOCK SOLVED" and the share ratio
is > 1. There isn't much else to try if that doesn't work.

Looking forward to your test report, we're getting close to the end.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 1, 2020

Here's another test suggestion for the new block, new work logs.

The default scan time is 5 seconds. In your last test new block messages were displayed
every 7 seconds regularly. It is likley it was displayed every poll, whether or not there was
anything new. V3.12.4.6 should address that part and should only log when there is new work.
Any poll whith nothing new will not display a log. You can confirm the logging is correct by
reducing --scantime to guarantee polling faster than new work. The log frequency should not
change to match the scan rate, it should match the actual new work rate.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 1, 2020

I've made a lot of progress with sharediff & targetdiff and I think I have it all working for the
next release.

I've also given up with the stale block test. It's redundant with the stale work test.

Everything should work now. Well almost everything. I'm still not 100% confident with the
network difficulty and network hashrate.

With getwork the network hashrate and network difficulty are both provided via RPC mininginfo.
The reliability of mininginfo has not been verified.

For Stratum only the network difficulty is provided. The network hash rate is calculated from the
diff and the miner counting blocks over time. It should get more accurate as session time increases
as long as there are no large changes in network diff. But this hasn't been verified.

I have been able to verify share diff is correct and share targetting is precise so all share
related statistics should be accurate.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 1, 2020

cpuminer-opt-3.12.5 is released. This one should do it, everything should work correctly.

Please report any problems.

In addition to testing for any previously identified problems I have one specific request:
Run a test with -D to get minininfo output and post the log. Hopefully it will include solved
blocks. It would also be nice, in a strange way, to have a few stale bocks to confirm their
legitimacy, although excessive stale blocks is a problem in itself.

You can even double check the math is correct for hashrate and TTF estimates if you want.

There may be a final tweak but if stats are working properly for getwork this issue can finaly
be closed.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Mar 2, 2020

I'm on it. Net TTF and miner TTF seems to be OK now.

Will use v3.12.5 with -D and output to a file.

UPD:. here it is:
3_12_5_solo_log.txt

Well, I do not see any issues (except no "BLOCK SOLVED" or stale info, and a small race in affinity logging in the very beginning). The lack of proper nethash rounding (should be is not an issue, as it is used only with -D I guess.

This

[2020-03-02 10:14:27]�[01;37m 8 Submit diff 5.9678, block 6281908�[0m
[2020-03-02 10:14:28]�[01;37m 8 �[01;37m�[32mAccepted 6 �[0mS0 �[0mR2 �[0mB0�[0m, 447.622 sec (1081ms)�[0m
Diff 5.9678 (0.0886), �[0mBlock 0�[0m
[2020-03-02 10:14:33] Mining info: diff 67.353, net_hashrate 39096708.000000, height 6281908�[0m

block was not rewarded, I don't know why. Anyway, that should not be a cpuminer-opt issue.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 2, 2020

The last summary log reported 10 accepted, how many were rewarded?

The -D did its job and confirmed mininginfo was correct.

The first thing I noticed is the net diff and targetdiff are different. That explains why "Accepted"
instead of "BLOCK SOLVED". share diff must be >= net diff for BLOCK SOLVED.
But I don't understand why they are different and why blocks were rewarded with diff < netdiff

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 2, 2020

I found the problem with zero block number in share_result log.

The bigger question, I'm not sure if it's a problem or a misunderstanding, is the different values
of net diff and target diff. It is the cause of other symptoms like Accepted instead of BLOCK
SOLVED.

Net_diff is provided via RPC mininginfo and is supposed to be the minimum diff to solve a block

Target diff is calculated from the 256 bit hash target in the work struct and represents the
minimum diff for a share

When pool mining they are expected to be different but solo mining there is no share so the target
diff should be what is needed to solve a block.

That's the theory. The data show otherwise.

"Shares" were submitted that passed the target test, were accepted and resulted in a block
reward. That means the target diff was good enough for a block.

Your session was 136 minutes and you submitted 15 blocks, for a TTF of around 9 mins.
The effective hash rate shows you significantly underperformed.

The miner TTF is based on the target diff and estimated around 4m45s. This is consistent
with your actual TTF or 9 mins while underperforming.

I can find no inconsistencies in the data, everything indicates target diff is used by the server.

If target diff is what is required to solve a block it should be the same value as net diff,
but it isn't.

Why are they different? If target diff is the diff to find a block what is net diff?

I can make it work without answers to those questions by just ignoring net diff and using
target diff as I would use net diff. But I don't like doing that, I prefer to understand.

I can do some more math, verifying net hash rate and TTF looking for discrepencies.

I'm curious about the rejects, none were reported as stale. But the reject at 9:51:00 was
clearly stale. I'll need to follow up on that too.

I don't see a performmance issue that would suggest an invisible problem. The effective hash rate,
the submit rate are consistent with each other and comparing with reference hashrate are
consistent with bad luck. More testing would confirm that it is a luck issue and there was no
bias. -D is no longer necessary.

If you have any insights or find anything else worth reporting, please do so.

Tha's 3 items so far:

  1. fix share result log block height always zero
  2. Investiigate why net diff & target diff are different when solo mining
  3. Investigate rejected/stale shares

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

Something is not right. The block explorer for Anime says it's at block 4,539,011 but your logs
say 6281789. Are you really mining Anime? I need the right block explorer to verify the data.

https://ca.advfn.com/crypto/Animecoin-ANI

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

I tracked the block numbers reported in the logs and everything looks ok. New blocks
are around 20 seconds with an occasional new work in between.

What is weird is the polling rate. Getwork requests occur every 7 seconds, same as previous
versions but the scan time is 5 seconds. I don't see that as a big problem bug the 2 seconds
is precise and unexplained.

Can you reduce the scantime to 4 seconds (--scantime 4) to see if that changes the timing of the
logs? I want to confirm the poll time is an offset of the scantime so I know whether to look for an
explanation for 2 seconds or the full 7 seconds.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

A summary of the rejects.

9:46:26 new block 845, targetdiff .2631
9:46:27 submit block 845, diff .80023
9:46:28 rejected block 845
9:46:33 new work (not a new block, still 845)
9:47:01 new block 846

Both the block and diff were good so it wasn't stale or low diff.

9:50:38 new block 853
9:50:57 submit block 853
9:50:59 new block 854
9:51:00 rejected

This was clearly a stale share, a new block was received between submitting the nonce and
getting the response. But it wasn't reported as stale.

10:49:40 new block 961, target .28873
10:50:52 submit block 961 diff .93631
10:50:53 rejected 961
10:50:57 new block 962

Not stale, not low diff

10:59:56 new block 971, diff .28873
10:59:57 submit block 971, diff .289
10:59:58 rejected 971
11:00:03 new work (not a new block)
11:00:24 new block 972

This was not stale but could be low diff. The share diff is very close to the target, it could be a math
error. The actual test is done on 256 bit integers, the diff is a double precision floating point.
int256 is around 70 decimal digits, double has about 55 decimal digits of precision, and the math
to convert is truncated to 64 bits or 19 decimal digits. The precision should still be prety good
but I recall from school there are many sources of error and some can be magnified by orders
of magnitude as they propagate. It's possible the share was higher than the target (lower is better)
but when converted to diff (higher is better) the share looks valid.

11:2:06 new block 19
11:22:09 submit block 19
11:22:10 rejected block 19
11:22:13 new work
11:23:37 new block 20

Not stale not low diff.

In general it's looking pretty good. The new blocks and new work are reported corectly and the
scan time, other than the mysterious extra 2 seconds, is working well, only 1 real stale block.

But there are some items to follow up:

  1. Stale block reported rejected.

I may be able to find a workaround to determine if the share was
stale if I compare the block submitted with the current block. That will shift the window of
uncertainty. Instead of stales falsely reported as rejects, rejects might be reported as stale.
But it's a much smaller window and lower probability.

  1. The unexplained rejects.

We can't explain them if we don't know the reason. Unfortunately
getwork doesn't give a reject reason so it's up to the miner to try to figure it out. Once stale
is excluded low diff in the next suspect. If the miner knows it's low diff it will display the raw
hash and target for a direct comparison. But there was no reason so no debug data.
I can change that to display the hash for any reject. (coding done).

It would be a waste of time to retest the hash using the same test procedure so it has to be
manually verified. The hash test may be innacurate which leads to the next item.

  1. Accuracy of hash test.

I've expressed my concerns with the accuracy of the hash test. The test previously had a margin
of error built in. The 256 bit target had the lower 192 bits zeroed to make the target lower
(more difficult). This would cover up any systemic error in the actual hash test but could result
in discarding good shares.

I use the same logic as with stale shares. Don't pre-judge your work, submit it and let the server
deal with it. There's potential gain with nothing to lose.

Ideally the test should be precise and accurate, but that might not be possible. i can take
another look.

I might be able to create a mechanism like I did fo stales where I set a flag that the share
might be stale. If it's untimately rejected I assume it was in fact stale.

For share diff I could set an error tolerance. If the share is within the margin of error I set
a flag. If the share is rejected and the marginal diff flag is set I assume it was in fact low diff.

This will require some tuning and a better understanding of the exact error as well as coordination
with the stale flag. I wil likely open a new issue for this.

The 2 other items are awaiting more info to explain the block height confusion and to
modify the scantime to see if it affects the 7 second getwork poll.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

Follow up to the accuracy of the hash test.

The test itself is 100% accurate with 100% precision because it does a 256 bit
integer comparison between the hash and the target.

This make it unlikely that reject 4 was low difficulty A precise test wilh unconverted data
should give a precise result. It may be a coincidence the share was so close to the target.
But it's too soon to draw any conclusions yet.

More testing is required to see if there are any other rejects with share diff close to target diff.

I'm abandoning trying to detect marginal shares, it might be trying to ber too clever.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Mar 3, 2020

@JayDDee There're 2 block explorers I've mentioned above (https://animeco.in/explorer and https://miningbase.tk/explorer/ANI). This is to check the block height to compare with my logs. I don't know where the one you shared was taken from (it seems to be outdated).

I'm going to provide a new (longer period) session with maxing-out the debug info. This will require to make a ZIP archive btw.

Can you reduce the scantime to 4 seconds (--scantime 4) to see if that changes the timing of the
logs?

Will add to CLI.

The last summary log reported 10 accepted, how many were rewarded?

Got rewards for 9 out of 10. When I'll have the logging done, I'll share the reward log as well (so it may be compared to log via timiline). Thanks.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

Debug is only need for a short test time to measure the poll time with different --scantime.
Once you see if the poll time was affected the test is complete and _D isn't needed anymore.
The diff and block numbers and ntime are all displayed in normal logs and tell the whole story.

I prefer you do the long test without debug, it just make more logs to search through

@YetAnotherRussian
Copy link

YetAnotherRussian commented Mar 3, 2020

So, you have no interest in --protocol-dump to see the received & send info itself? I'll skip this parameter then.

@YetAnotherRussian
Copy link

Got three blocks in a row (almost), all got rewarded. So, I'll post it anyway.
3_12_5_solo_log_new.txt

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

Correct, I didn't mention protocol dump earlier because I thought you found it useful.
As far as debug is concerned it did it's job in v3.12.5. I got the data I was looking for.

I looked at the explorer and the data looks good. The block height, net diff, net hash rate,
block time are all in agreement assuming hash rate and diff rose a little since your test.

You can verify the TTF etc live.

Edit:
The poll time in your new test is 6 seconds, reducing the scantime had a direct effect.
Now I have to search the code for 2 seconds.

The block explorer confirmed the net diff but it isn't used to set the target. I still don't
understand that.

@YetAnotherRussian
Copy link

YetAnotherRussian commented Mar 3, 2020

Some info may be found here:

https://github.com/Animecointeam/Animecoin/blob/master/src/rpc/mining.cpp

Block creation method, consensus settings and chain params are split between several header files.

I can also find several reject reasons over there:
return "duplicate";
return "duplicate-invalid";
return "duplicate-inconclusive";
return "inconclusive";

Checks and POW part itself:
https://github.com/Animecointeam/Animecoin/blob/master/src/pow.cpp

@YetAnotherRussian
Copy link

Big log:
logs.zip
There is also a log with rewards & their timiline inside.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 3, 2020

First impression of the logs:

Effective share rate is too low. It quickly converges to arond 1800kh/s but the ref rate is 3780 kh/s.
Howeer the share rate is precisely correct, the miner TTF is 6 mins and the share rate is .1/min.
The net TTF is also correct with a mean of 29.8 secs over the session.

I've never noticed a hashrate problem with stratum mining so it may be a getwork issue.
If you have an opportunity could you try anime in a pool to see if it's a coin issue or getwork
issue. If you have other wallets lying around you could try a small solo test. There would be no need to wait for a block, just long enough for the effective hash rate to show convergence.

The reject rate isn't alarmingly high, once the stales are factored out it should be even lower.
A quick scan of the rejects found none that were suspected of being low diff. Troubleshooting
rejects is more difficult because the reason is not provided in the reject message. I don't think I'll
be pursuing it any further.

I have fixes ready for block 0 in the share result log and better detection of stale shares
but I'm going to pour over the logs looking for anything else that seems off and to follow up
on the other identified issues before releasing.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 4, 2020

I checked the effective hash rate calculation. it's the same code as stratum uses and it reports
the correct rate.

Ignore the following, there is a real problem of low difficuty shares. The low effective hash
rate must have a different cause.

start ignore

One possibility is the target is too low and shares that would be accepted may be discarded.
This is the opposite of low difficulty shares and it is silent with no rejects or other error messages.
You can test for this by forcing the targetdiff lower with a cli option.

If you add -m 0.9 it will reduce the target diff by 10%. If you mine with this setting you may submit
low diff shares that will be rejected. This will not hurt performance, the miner will just be submitting
more shares. It's the rate of low diff rejects and the effective hash rate that will determine if
the target was set correctly.

If the target diff is reduced by 10% you should expect 10% rejects for low diff. The effective hash
rate will remain the same and the lost hash rate will increase.

If there are no rejects, or less that 10%, it means the lower diff shares are being accepted.
You would then see the effective hash rate increase.

This kind of test can be run for a long time, or for as long as it takes to draw a conclusion.
It can be done with any multiplier. Over a long enough time you could focus precisely on the
exact acceptable target diff and how close it is to 100%.

Don't go higher than -m 1.0, you start to lose performance.

If you do it please report your results.

end ignore

Edit: I have reason to suspect the target diff is actually too low which account for some
of your rejects.

R1: diff .35401, target .31496
R2: diff 1.6286, target .34659
R3: diff .38527, target .36654
R4: stale
R5: dif .52946, target .40335

A32: diff .42385, target .28476

A32 is the lowest diff share accepted, R1, R3 are the 2 shares with the lowest diff and both were
rejected.

R5 has a higher diff than the lowest accepted, but also has a higher target.

This is all evidence of incorrect targetting but targetting too low causing low diff rejects.
I recommend continue testing without changing the target diff and note the diff and target
of all rejects as well as the lowest accepted shares for a particular target. The ratio should
converge to the actual acceptible target.

The ratio is impoprtant because the target changes. From the samples available. 1.31 is rejected,
1.48 is accepted. I have not seen this problem mining other coins with stratum. It's either
a getwork issue or a coin issue.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 5, 2020

I tested target_to_diff and diff_to target and the results were interesting.

diff_to_target is accurate to 50 bits or around 15 decimal digits.

target_to_diff is accurate to 4 decimal digits.

The lower precision of target_to_diff affects mostly stats not actual mining. The target as received
from the server is used to test the 256 bit raw hash.

getwork gets the target directly from the server, no conversion is necessary.

With conversin not a factor in getwork mining it's still a mystery how you are experiencing
targetting that appears to be both too low and too high.

I need to analyze the logs in more detail to see if there's a pattern.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 6, 2020

cpuminer-opt-3.12.6 is released. /it contains 2 fixes for this issue:

  1. block number alway zero in getwork share result log.
  2. Stale shares in getwork reported as rejected.

There are a few other enhancements to th elogs not directly related to this issue but may help
troubleshooting.

Two outstanding problems are not resolved:

  1. Low effective hash rate. The effective hash rate is significantly below the refence hashrate
    but the share submission rate is exactly as estimated. That inconsistency can't be explained.
    The same code is used for stratum and it is correct. I have no leads to folow up. no more
    investigation is planned without new information.

  2. Apparent low difficulty shares. Some shares were rejected that apeared to be low difficulty.
    Fixes will improve log reporting of rejects and may provide new information for this problem.
    If that is the case a new issue should be opened to follow up.

I intend to close this issue if the are fixes are confirmed and there are no regressions.
The mian goal of this issue was to implement stats for getwork. That has been accomplished,
anything else are unlelated problems only discoved because of the improved logs

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 6, 2020

Here's a test I did on stratum to verify the target.
Use -D to enable hash & target logs for every share.

Do a control and wait for a block to be submitted, Accepted or not the hash and target will be displayed. It reads like a 256 bit hex integer that is truncated. A valid share is one where the hash
is less than or equal to the target. If it's greater than the target it's considered low difficulty.

Take note of the target value, it is the target set by the server andthis is the value to be verified.
One the reference target is known run a test with -m 0.5. This will rsult in submitting lower diff
hash. As blocks are submitted note the highest hash accepted and the lowest hash rejected.
if the target is correct it should converge to the refernce diff from the control test. The point of
convergence is the correct target as hash above it are rejected and hash below it are accepted.

If it's off my a significant amount it will cause low diff rejects (target too high) or performance loss
(target too low). You seem to have symptoms of both so the results should be interesting.

@JayDDee
Copy link
Owner Author

JayDDee commented Mar 7, 2020

Please let me know if you plan on doing any more testing, I'll wait for your results.

Otherwise I'll close the issue because I have no plans to pursue this any longer without new data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants