Merge from head fork #1

ChinChangYang · 2018-08-25T13:24:28Z

No description provided.

Pull request #1386.

* Extend GTP to add support for displaying winrates and variations from LZ while LZ is thinking. * Use UCI format for lz-analyze and lz-genmove-analyze. * Don't sort gtp lz-analyze ouput because it is not thread-safe. Pull request #1388.

For discussion see pull request #1412.

More in line with UCI, cleaner, easier to parse, smaller code.

Don't hardcode the clang version in the Makefile.

Regression from #1388. Fixes issue #1424.

Send leelaz version embedded in the URL used to ask for a new job. Pull request #1430.

* Fix split in net_to_model. * Add soft placement of variables. * Fixes Windows issues. Pull request #1443.

* Updated Mutex implementation to use TTS instead of TS. * Explicitly relax memory order (no behavior change, it's the default) and attempt TS before TTS loop. (improves performance in low contention locks) Pull request #1432.

Pull request #1439.

See discussion in issue #1425. Pull request #1478.

The Alpha (Go) Zero outputs use TanH nonlinearities, not sigmoids. The code comments and variable naming refer to an earlier version that used sigmoids and that is confusing people. See issue #1484.

Pull request #1513.

* Create debian package by cpack We can create debian leelaz package by "make package" by cpack. * Find leelaz if ./leelaz is not existed If leelaz is installed at /usr/bin, then autogtp should find it by leelaz instead of ./leelaz. * Generate package dependency list Use dpkg-shlibdeps to generate better package dependency list * Use git tags as version strings Pull request #1445.

* Look for symmetrical position in cache. * Disable NNCache symmetry in self-play. To increase randomness from rotational assymetry. * Only check symmetry in opening. Refactor TimeControl. Only check for symmetries in the NNCache when we are in the opening (fast moving zone). Refactor TimeControl to take the boardsize out. * Change bench to assymetric position. Avoids rotation symmetry speedups, they are not typical. * Rename rotation to symmetry, limit to early opening. Be consistent and don't call symmetries rotations. Limit the symmetry lookups to until halfway the opening (which is the first 30 moves on 19 x 19). Based on pull request #1275, but without keeping the rotation array in every board instance. Pull request #1421.

Pull request #1522.

See issue #1416. Pull request #1497.

* Remove unused 'BIG' constant. * Capture "N/A" vertex value in constant. Pull request #1528.

Pull request #1529.

Pull request #1538.

Added q+Enter instructions. Pull request #1542.

Fix Validation checking if binary exists on Windows. Pull request #1544.

Pull request #1548.

Update the TODO list.

Pull request #1560.

Pull request #1580.

Pull request #1577.

Pull request #1605.

The real update operation should be the computation of the gradient rather than the assignment of it. Pull request #1614. Fixes issue #1502.

* Remove thread_local variables for OpenCL subsystem. (this is to allow many different OpenCL implementations to exist concurrently) * OpenCLScheduler: task queue cleanup. * Change static Network methods to instance methods and replace it with global Network instance. * All weights moved from Network.cpp static variables to class Network. * NNCache is now a member variable of Network, not a global. * Network filename now comes from external call, not a global variable. * Removed global g_network object, instead it is member of UCTSearch class. * UCTNode is now a static member variable of GTP. (instead of a static of a function) * Rename ThreadData to OpenCLContext. (it's no longer a thread-specific structure). Pull request #1558.

Silence clang warning. Pull request #1644.

Pull request #1638.

* Winograd F(4x4, 3x3) for CPU * Winograd F(4x4, 3x3) for OpenCL * OpenCL batching support. Pull request #1643.

The 256 channel network exceeds 1% error in the tuner, but the network output seems accurate enough during play. Fixes #1645. Pull request #1647.

Keep a single "network" global in GTP, owned by a unique_ptr and move things around when needed. Pull request #1650.

* OpenCL half precision is now command-line option, support compiled in by default. This converts the OpenCL code into a gigantic template library. * Update Network self-check. - Final output is used for self-check. - Criteria is 20% error, while ignoring values smaller than 1/361. - Throws exception when three out of last ten checks fails. Pull request #1649.

Slight style edits of code and comments.

Modernize some parts of SGFTree's style.

This is integrated into the main build now. Pull request #1655.

Fix a bug that an SGF file/string cannot contain 2 consecutive moves of the same color. Fixes issue #1469. Pull request #1654.

Use the preprocessor defines to make a single kernel support both single precision and half precision storage. Pull request #1661.

Pull request #1660.

Pull request #1664.

Pull request #1671.

Implemented NN eval fp16/fp32 autodetect. Runs both precisions for 1 seconds, and if fp16 is faster than fp32 by more than 5%, fp16 is used. Removes --use-half, replaces it with --precision [auto|single|half] option, default auto. Pull request #1657.

Added resign analysis option to search for the highest resign threshold that should be set. Pull request #1606.

Use half precision computation on cards that support it. Pull request #1672.

- On OpenCLScheduler, don't use condvars which tends to be slow because of thread sleep/wake. - Instead, use spinlocks and just have enough contexts to avoid sleeping. - Allow more threads than the CPU physically has. This is required in many multi-GPU setups with low core counts (e.g., quad-core non-hyperthread with 2 GPUs) Pull request #1669.

The previous method is too strict for fp16 compute. Since lower precision of fp16 is still good enough to play at the same strength as fp32 relax the self check. Pull request #1698.

* Fix error calculation (Missing batch_size divider). * Better error reporting when no working configuration could be found. * Change reference data to have less rounding errors with half precision. * Replace BLAS reference SGEMM with custom code that gives transposed output like the OpenCL SGEMM. Pull request #1710.

Should save a tiny bit of memory. Pull request #1716.

Fall back to single precision net when half precision is broken, at least when detection mode is auto. Pull request #1726.

Pull request #1721.

Some OpenCL buffers were allocated too big. Tested with oclgrind that the new sizes are correct. Pull request #1727.

Use smaller precision to store the weights to decrease the file size. See discussion in issue #1733. Pull request #1736.

* Network initialization restructuring - Create one net at a time when doing fp16/fp32 autodetect. Saves some GPU memory. - Create an internal lambda which initializes the nets. - Use std::copy to copy vectors to reduce runtime. * zeropad_U : loop reordering for performance optimization. Plus other optimizations for zero-copying initialization. Pull request #1750.

Minor fixes to incorrect comments, and reduce some excessively long lines.

* Changed Validation and Game to support multiple GTP commands at start up but left the Validations options untouched. * Separated engine options (as positional arguments) from match options. Replaced time settings option with ability to specify any GTP commands. * Added --gtp-command options using the existing option parser. Also changed default binary options from -p 1600 to -v 3200. * Each binary argument has to be preceded by "--". * Changes to use Engine Objects. * Exits on failed GTP command. Added printing of GTP commands in gameStart() so users can see what commands are actually sent to each engine. Pull request #1652.

* Don't refer to stone locations as "squares". * Use "vertex" for those in the "letterbox" representation. * Otherwise, mostly use "intersection". * Also, capture all possible moves (i.e. including pass) in its own explicit constant. * Clean up network constants. Pull request #1723.

godmoves and others added 30 commits May 14, 2018 11:36

Add multi GPU training support.

648b230

Pull request #1386.

Extend GTP to support real time search info.

237b578

* Extend GTP to add support for displaying winrates and variations from LZ while LZ is thinking. * Use UCI format for lz-analyze and lz-genmove-analyze. * Don't sort gtp lz-analyze ouput because it is not thread-safe. Pull request #1388.

Remove virtual loss from eval for live stats.

6e847e1

For discussion see pull request #1412.

Make analysis output use one move per line.

9fd7542

More in line with UCI, cleaner, easier to parse, smaller code.

Remove versioned clang from Makefile.

1b64435

Don't hardcode the clang version in the Makefile.

Fix varargs usage.

62ddf58

Regression from #1388. Fixes issue #1424.

AutoGTP: send leelaz version to server.

c822e5e

Send leelaz version embedded in the URL used to ask for a new job. Pull request #1430.

Multi GPU: fix split and variable placement.

8751123

* Fix split in net_to_model. * Add soft placement of variables. * Fixes Windows issues. Pull request #1443.

Mutex optimization.

0300531

* Updated Mutex implementation to use TTS instead of TS. * Explicitly relax memory order (no behavior change, it's the default) and attempt TS before TTS loop. (improves performance in low contention locks) Pull request #1432.

Update leela-zero.vcxproj for VS2015.

8daa7e7

Pull request #1439.

Add order to analysis data.

893a078

See discussion in issue #1425. Pull request #1478.

Fix misleading comments & naming.

5f8b14b

The Alpha (Go) Zero outputs use TanH nonlinearities, not sigmoids. The code comments and variable naming refer to an earlier version that used sigmoids and that is confusing people. See issue #1484.

Add Lizzie and LeelaSabaki to README.

4c9b41c

Pull request #1513.

Symmetry calculation cleanup.

e919e1c

Pull request #1522.

Non-pruning (simple) time management.

d362ee8

See issue #1416. Pull request #1497.

Clean up some constants.

06759c1

* Remove unused 'BIG' constant. * Capture "N/A" vertex value in constant. Pull request #1528.

Duplicate line removal.

bcfdadb

Pull request #1529.

Script for converting minigo weights.

54e130e

Pull request #1538.

Update README.md.

eaf7707

Added q+Enter instructions. Pull request #1542.

Fix Validation checking on Windows.

1f2f3c5

Fix Validation checking if binary exists on Windows. Pull request #1544.

Constant for the unchanged symmetry index.

4531693

Pull request #1548.

Update README.md.

91031bf

Update the TODO list.

Removed unused class KeyPress.

0c23ac6

Pull request #1560.

Allow 3 AutoGTP quitting conditions.

9480689

Pull request #1580.

More draw handling.

2b37c69

Pull request #1577.

Suppress upstream warnings in Makefile.

b695170

Pull request #1605.

Fix TF update operations.

d0fd3e9

The real update operation should be the computation of the gradient rather than the assignment of it. Pull request #1614. Fixes issue #1502.

TFiFiE and others added 29 commits July 24, 2018 00:23

Give ForwardPipe a virtual destructor.

eba651f

Silence clang warning. Pull request #1644.

Replace if-else chain with switch statement.

6333b66

Pull request #1638.

Use Winograd F(4x4, 3x3).

7cfbb72

* Winograd F(4x4, 3x3) for CPU * Winograd F(4x4, 3x3) for OpenCL * OpenCL batching support. Pull request #1643.

Increase error budget in tuner.

8c65b6c

The 256 channel network exceeds 1% error in the tuner, but the network output seems accurate enough during play. Fixes #1645. Pull request #1647.

Get rid of more "network" globals and pointers.

90d4ff2

Keep a single "network" global in GTP, owned by a unique_ptr and move things around when needed. Pull request #1650.

Minor code cleanups.

d814e0f

Slight style edits of code and comments.

Clean up SGFTree style.

7524a2f

Modernize some parts of SGFTree's style.

Remove separate USE_HALF build from CI.

4b20a12

This is integrated into the main build now. Pull request #1655.

Don't assume alternating colors in SGF.

d5bf982

Fix a bug that an SGF file/string cannot contain 2 consecutive moves of the same color. Fixes issue #1469. Pull request #1654.

Remove separate half precision kernel.

6f8c873

Use the preprocessor defines to make a single kernel support both single precision and half precision storage. Pull request #1661.

Compress duplicate evaluation code.

fefa8c6

Pull request #1660.

Consistent header guard naming.

2ff9539

Pull request #1664.

Replace macros with proper constants.

acf1a3f

Pull request #1671.

Resign analysis: search for the highest resign threshold.

e23009a

Added resign analysis option to search for the highest resign threshold that should be set. Pull request #1606.

Half precision compute support.

5de99db

Use half precision computation on cards that support it. Pull request #1672.

Use L2-norm in self check.

488de43

The previous method is too strict for fp16 compute. Since lower precision of fp16 is still good enough to play at the same strength as fp32 relax the self check. Pull request #1698.

Change policy vector to array.

87c95c4

Should save a tiny bit of memory. Pull request #1716.

Fall back to single precision net on breakage.

e72496d

Fall back to single precision net when half precision is broken, at least when detection mode is auto. Pull request #1726.

AutoGTP: use compressed weights networks.

681229a

Pull request #1721.

Fix OpenCL buffer sizes.

07c908e

Some OpenCL buffers were allocated too big. Tested with oclgrind that the new sizes are correct. Pull request #1727.

Script for quantizing weights.

f85a685

Use smaller precision to store the weights to decrease the file size. See discussion in issue #1733. Pull request #1736.

Fix comments, code style.

7e889c7

Minor fixes to incorrect comments, and reduce some excessively long lines.

ChinChangYang merged commit 3987f9e into ChinChangYang:next Aug 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge from head fork #1

Merge from head fork #1

ChinChangYang commented Aug 25, 2018

Merge from head fork #1

Merge from head fork #1

Conversation

ChinChangYang commented Aug 25, 2018