ChessData

PGN Mirror. There will be dups, dirty data, errors, GM draws etc -- the data will probably need to be post-processed, filtered, deduped etc.

In the news:

Command-line tools can be 235x faster than your Hadoop cluster

The first thing to do is get a lot of game data. This proved more difficult than I thought it would be, but after some looking around online I found a git repository on GitHub from rozim that had plenty of games. I used this to compile a set of 3.46GB of data, which is about twice what Tom used in his test. The next step is to get all that data into our pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 747 Commits
Analysis		Analysis
Britbase		Britbase
Bundesliga		Bundesliga
ChessNostalgia.com		ChessNostalgia.com
ChessOk.com		ChessOk.com
Chessopolis.com		Chessopolis.com
Code		Code
Convekta		Convekta
Corus		Corus
Filtered		Filtered
Game		Game
Headers		Headers
Kingbase		Kingbase
Npollock		Npollock
Old		Old
PgnDownloads		PgnDownloads
PgnMentor		PgnMentor
Player		Player
PolyGlot		PolyGlot
Positions		Positions
RebelSite		RebelSite
Release/2022-10-05		Release/2022-10-05
Tournament		Tournament
Twic		Twic
WorldChampionships		WorldChampionships
filter		filter
.gitconfig		.gitconfig
.gitignore		.gitignore
README.md		README.md
all-2400.sh		all-2400.sh
all-2600.sh		all-2600.sh
pe.sh		pe.sh
pe2400.sh		pe2400.sh
pe2600.sh		pe2600.sh
pe3000.sh		pe3000.sh
pgn-cleanup.sh		pgn-cleanup.sh
pgn-tmp.txt		pgn-tmp.txt
requirements.txt		requirements.txt
tags2400.txt		tags2400.txt
tags2600.txt		tags2600.txt
tags3000.txt		tags3000.txt
xsplit.sh		xsplit.sh

rozim/ChessData

Folders and files

Latest commit

History

Repository files navigation

ChessData

In the news:

About

Topics

Resources

Stars

Watchers

Forks

Languages