Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New versions not working on Haswell CPUs #6

Open
NiteMoves opened this issue Oct 27, 2016 · 12 comments

Comments

@NiteMoves
Copy link

commented Oct 27, 2016

Just a follow up from what we discussed in TCEC chat.

BMI2 and POPCNT builds work fine on Broadwell, but not Haswell CPU.
POPCNT working fine on Ivy Bridge.

Windows 10 on both Broadwell/Haswell machines. Broadwell is a laptop CPU, Haswell is desktop.

@basil00

This comment has been minimized.

Copy link
Owner

commented Oct 27, 2016

Thanks for the report. Just to clarify: is LazyGull.exe working on Windows 10? or not at all on Windows 10?

@NiteMoves

This comment has been minimized.

Copy link
Author

commented Oct 27, 2016

Lazygull.exe works on the W10 Ivy Bridge machine.

@NiteMoves

This comment has been minimized.

Copy link
Author

commented Dec 22, 2017

The problem appears to be a memory access error:

Version=1
EventType=APPCRASH
EventTime=131573322253748933
ReportType=2
Consent=1
UploadTime=131573322254940427
ReportStatus=268435456
ReportIdentifier=16d33475-23c7-425f-99d0-85ef66171330
IntegratorReportIdentifier=c0484487-674e-477a-b1f7-2c430e5e727e
Wow64Host=34404
NsAppName=lazygull.exe
AppSessionGuid=00004edc-0001-0003-e9db-864e3d71d301
TargetAppId=W:0006367dae34e26b8080cd105058a490da2c0000ffff!00009866fab4679673fdc91f5962e9bdf792cdde5a8d!lazygull.exe
TargetAppVer=1970//01//01:00:00:00!3faf6!lazygull.exe
BootId=4294967295
TargetAsId=9017
Response.BucketId=13b31e7fd06acf8d249545d98205fb8d
Response.BucketTable=4
Response.LegacyBucketId=1483168452780096397
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=lazygull.exe
Sig[1].Name=Application Version
Sig[1].Value=0.0.0.0
Sig[2].Name=Application Timestamp
Sig[2].Value=00000000
Sig[3].Name=Fault Module Name
Sig[3].Value=msvcrt.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=7.0.16299.15
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=20688290
Sig[6].Name=Exception Code
Sig[6].Value=c0000005

Sig[7].Name=Exception Offset
Sig[7].Value=000000000005bb10

@basil00

This comment has been minimized.

Copy link
Owner

commented Dec 24, 2017

Yep, that's not good. Might be hard to track this down, and I don't work on this project anymore. Is there a way to reliably reproduce the crash?

@NiteMoves

This comment has been minimized.

Copy link
Author

commented Dec 24, 2017

It crashes as soon as I run the exe. So very reliably. I have had different people run the exe and it doesn't crash for everyone. But it crashes every time on the TCEC server and my workstation. Other people have mentioned it will crash if they set hash over some threshold (maybe 4GB?) but I haven't been able to test that.

I know you aren't working on this anymore, but if you can get this fixed soon, I will put this version into TCEC season 11.

@basil00

This comment has been minimized.

Copy link
Owner

commented Dec 24, 2017

Hmmmm, OK let me try a few tests on different machines.

@basil00

This comment has been minimized.

Copy link
Owner

commented Dec 26, 2017

I wasn't able to reproduce the crash so far. I can try some different machines next week.

@d3vv

This comment has been minimized.

Copy link

commented Mar 2, 2018

And it coredump(segfault) on Core i7-980X on Windows 10
I can to create crashdumps with -g option https://helgeklein.com/support/creating-an-application-crash-dump/

@basil00

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2018

I was never able to reproduce the bug. But I think it is likely because of the funky memory mapping idea I tried to port from Linux to Windows. Basically:

  • Global data is shared between processes using globals declared here.
  • The globals are just fixed addresses specified in the Makefile here.
  • The actual data is a shared mapping created here.

Doing it this way speeds up the code slightly by removing one level of indirection. Not sure if it translates to any measurable difference in playing strength...

Probably, the constant addresses conflict with other objects for certain CPUs, causing the crash. That is my guess, it could be some completely unrelated issue...

@d3vv

This comment has been minimized.

Copy link

commented Mar 3, 2018

Yes, it is all about of memory map files:

$ ./LazyGull.exe

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
LazyGull (180302-Windows-x86_64)
error: failed to remove object "Local\LazyGull_19280_INFO_10"

@d3vv

This comment has been minimized.

Copy link

commented Mar 3, 2018

I have a one question how you calc those addresses?

-Wl,--defsym=INFO=0x51010000 \
-Wl,--defsym=SETTINGS=0x51000000 \
-Wl,--defsym=SHARED=0x51020000 \
-Wl,--defsym=DATA=0x50000000 \
-Wl,--defsym=PAWNHASH=0x54000000 \
-Wl,--defsym=PVHASH=0x58000000

Why it differ from MacOS and Linux?

And Why bellow:

#ifndef WINDOWS
extern GEntry HASH[];
#else
#define HASH ((GEntry *)0x8000000)
#endif

@basil00

This comment has been minimized.

Copy link
Owner

commented Mar 5, 2018

I can't remember why, but my guess is that there was a linker bug.

To "fix" the problem, it should not be too difficult to convert back to using global pointers, e.g.:

GThreadInfo *INFO;
GSettings   *SETTINGS;
GSharedInfo *SHARED;
...etc.

Then use non-fixed addresses when creating the shared mappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.