Skip to content

MiloszKrajewski/SilesiaCorpus

Repository files navigation

Silesia Compression Corpus

Silesia corpus is a set of files of different characteristics to test compression algorithms.

It was once available here: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia but is inaccessible recently.

Size File Description
10,192,446 dickens English novels, ASCII plain text
51,220,480 mozilla Program, UNIX executables and others, tar
9,970,564 mr 3-D MRI image, DICOM
33,553,445 nci Chemical database, text
6,152,192 ooffice Windows DLL
10,085,684 osdb Database, synthetic data, binary
6,627,202 reymont Polish text, uncompressed PDF
21,606,400 samba Source code and graphics, tar
7,251,944 sao Database, star catalog, binary
41,458,703 webster English dictionary, HTML
8,474,240 x-ray 16 bit grayscale, DICOM
5,345,280 xml XML files, text, tar

About

Silesia compression corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages