Skip to content
Andrew Krizhanovsky edited this page Nov 4, 2018 · 12 revisions

Series of SQL scripts to work with MySQL database wcorpus.

You can play with small version of this database with texts of 3 writers.

Number of authors: 2264.

use wcorpus;
SELECT COUNT(*) FROM authors;

IDs of authors with parsed texts:

select distinct author_id from texts,sentences where sentences.text_id=texts.id and included=1;`

There are three authors with author_id: 423, 62, 298 (Fyodor Dostoevsky, Leo Tolstoy and Anton Chekhov).

Number of texts written by 3 authors: 2651.

select count(*) from texts where author_id in (62, 298, 423);`

Number of parsed texts (included into the research): 2635.

select count(*) from texts where author_id in (62, 298, 423) and included=1;`

Number of parsed sentences: 332860.

select count(*) from sentences;`
select count(*) from wordforms;`

215262

select count(*) from lemmas;`

76314

select count(*) from synsets;`

42

select count(*) from sentence_wordform;`

4316440

Number of manually tagged (marked) sentences: 1519.

select count(*) from lemma_sentence_synset;`

Machine-readable database schema

The structure (tables and relations) of the WCorpus database (database layout):

WCorpus database

References

Clone this wiki locally