-
Notifications
You must be signed in to change notification settings - Fork 0
SQL
Andrew Krizhanovsky edited this page Nov 4, 2018
·
12 revisions
Series of SQL scripts to work with MySQL database wcorpus.
You can play with small version of this database with texts of 3 writers.
Number of authors: 2264.
use wcorpus;
SELECT COUNT(*) FROM authors;
IDs of authors with parsed texts:
select distinct author_id from texts,sentences where sentences.text_id=texts.id and included=1;`
There are three authors with author_id: 423, 62, 298 (Fyodor Dostoevsky, Leo Tolstoy and Anton Chekhov).
Number of texts written by 3 authors: 2651.
select count(*) from texts where author_id in (62, 298, 423);`
Number of parsed texts (included into the research): 2635.
select count(*) from texts where author_id in (62, 298, 423) and included=1;`
Number of parsed sentences: 332860.
select count(*) from sentences;`
select count(*) from wordforms;`
215262
select count(*) from lemmas;`
76314
select count(*) from synsets;`
42
select count(*) from sentence_wordform;`
4316440
Number of manually tagged (marked) sentences: 1519.
select count(*) from lemma_sentence_synset;`
The structure (tables and relations) of the WCorpus database (database layout):
- Krizhanovsky A., Kirillov A., Krizhanovskaya N. (2018): WCorpus mysql database with texts of 3 writers. Figshare. https://doi.org/10.6084/m9.figshare.5938150.v1