# Natural Language Processing

1. What is NLP?
    * A variety of areas in which linguistics meets mathematics and/or computer science
    * Example of interacting with a speech bot:
        - Speech-to-text: creating letters/words/sentences from continuous sounds
        - Natural language understanding: mapping the sentences to some semantic representation
        - Dialog systems: determining appropriate response given semantic representation
        - Information extraction/retrieval: collecting any data needed to make the response
        - Natural language generation: generating words and sentences in a target language
        - Text-to-speech: creating continuous sounds from letters/words/sentences
    * Locally, we tend to do *Information Extraction*, which is essentially converting free-text to structured data
2. Terminology
    * Corpus (pl. corpora): dataset, consisting of documents/notes
    * Document/note: the smallest self-contained unit of analysis (e.g., a comment, post, blog, newspaper article, clinical note, etc.)
    * Token: smallest word or punctuation (_don't_ can have two tokens: _do_ and _n't_)
    * Lemma/stem: canonical form of word (map _hurt_ and _hurts_ to the same word)
3. nltk
    * A popular library, but also very useful to _learn_ basic NLP
    * http://www.nltk.org/book/
    * POS-tagging
    * Syntactic analysis
    * Spell-correction/edit distance
    * Reliance on models, so extra downloads required (or train model yourself)
4. Sentiment analysis example
    * Based on terms, can we learn about people's attitudes, emotions, or perceptions
5. Spam/ham classification
    * Classic example using ML in NLP

## Sentiment Analysis

Sentiment analysis is concerned about learning the subjective information about actors in a particular setting. This is often concerned with detecting 'positive' vs 'neutral' vs 'negative' comments, though we will consider an example of the use of male and female terms in a particular document. 

In [1]:
import nltk
# sentence splitting/tokenization, etc require models, and these must be downloaded separately
nltk.download()  # or download from http://www.nltk.org/nltk_data/

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

In [2]:
with open(r'data\grandmothers-love.txt') as fh:
    text = fh.read()

In [9]:
sents = nltk.sent_tokenize(text)
tokens = [nltk.word_tokenize(s) for s in sents]
pos_toks = [nltk.pos_tag(tok) for tok in tokens]
pos_toks[0]

[('On', 'IN'),
 ('the', 'DT'),
 ('first', 'JJ'),
 ('day', 'NN'),
 ('of', 'IN'),
 ('second', 'JJ'),
 ('grade', 'NN'),
 ('at', 'IN'),
 ('my', 'PRP$'),
 ('elementary', 'JJ'),
 ('school', 'NN'),
 ('in', 'IN'),
 ('Kentucky', 'NNP'),
 (',', ','),
 ('my', 'PRP$'),
 ('teacher', 'NN'),
 ('asked', 'VBD'),
 ('me', 'PRP'),
 ('whether', 'IN'),
 ('I', 'PRP'),
 ('knew', 'VBD'),
 ('of', 'IN'),
 ('a', 'DT'),
 ('woman', 'NN'),
 ('she', 'PRP'),
 ('remembered', 'VBD'),
 ('from', 'IN'),
 ('way', 'NN'),
 ('back', 'RB'),
 ('who', 'WP'),
 ('had', 'VBD'),
 ('the', 'DT'),
 ('same', 'JJ'),
 ('last', 'JJ'),
 ('name', 'NN'),
 ('as', 'IN'),
 ('mine', 'NN'),
 ('.', '.')]

In [20]:
# chunk: find noun phrases
# first, we need to define a grammar
grammar = r'''
NPBAR:
    {<NN.*|JJ>*<NN.*>}
NP:
    {<NPBAR>}
    {<NPBAR><IN><NPBAR>}
'''
chunker = nltk.RegexpParser(grammar)
try:
    tree = chunker.parse(pos_toks[0])
except LookupError:  # if you don't have ghostscript installed
    pass
print(str(tree))

(S
  On/IN
  the/DT
  (NP (NPBAR first/JJ day/NN))
  of/IN
  (NP (NPBAR second/JJ grade/NN))
  at/IN
  my/PRP$
  (NP (NPBAR elementary/JJ school/NN))
  in/IN
  (NP (NPBAR Kentucky/NNP))
  ,/,
  my/PRP$
  (NP (NPBAR teacher/NN))
  asked/VBD
  me/PRP
  whether/IN
  I/PRP
  knew/VBD
  of/IN
  a/DT
  (NP (NPBAR woman/NN))
  she/PRP
  remembered/VBD
  from/IN
  (NP (NPBAR way/NN))
  back/RB
  who/WP
  had/VBD
  the/DT
  (NP (NPBAR same/JJ last/JJ name/NN))
  as/IN
  (NP (NPBAR mine/NN))
  ./.)


In [27]:
# get the nouns out
for s in tree.subtrees():
    if s.label() == 'NP':
        print(s)
        print(s.leaves())
        print(' '.join(x[0] for x in s.leaves()))

(NP (NPBAR first/JJ day/NN))
[('first', 'JJ'), ('day', 'NN')]
first day
(NP (NPBAR second/JJ grade/NN))
[('second', 'JJ'), ('grade', 'NN')]
second grade
(NP (NPBAR elementary/JJ school/NN))
[('elementary', 'JJ'), ('school', 'NN')]
elementary school
(NP (NPBAR Kentucky/NNP))
[('Kentucky', 'NNP')]
Kentucky
(NP (NPBAR teacher/NN))
[('teacher', 'NN')]
teacher
(NP (NPBAR woman/NN))
[('woman', 'NN')]
woman
(NP (NPBAR way/NN))
[('way', 'NN')]
way
(NP (NPBAR same/JJ last/JJ name/NN))
[('same', 'JJ'), ('last', 'JJ'), ('name', 'NN')]
same last name
(NP (NPBAR mine/NN))
[('mine', 'NN')]
mine


In [118]:
stopwords = nltk.corpus.stopwords.words('english')

## Topic/Sentiment Analysis

In [119]:
from enum import Enum

class Category(Enum):
    C1 = 0
    C2 = 1
    BOTH = 2
    UNKNOWN = 3

In [120]:
MALE_TERMS = {
    'man', 'men', 'mans', 'mens', 'guy', 'guys', 'him', 'he', 'hes', 'his',
    'boy', 'boys', 'boyfriend', 'boyfriends', 'brother', 'brothers', 'dad', 'dads',
    'father', 'fathers', 'spokesman', 'spokesmen', 'spokesmans', 'gentleman',
    'gentlemans', 'gentlemen', 'gentlemens', 'groom', 'grooms', 'son', 'sons', 
    'grandson', 'grandsons', 'grandpa', 'grandpas', 'grandfather', 'grandfathers',
    'fiance', 'fiances', 'male', 'males', 'king', 'kings', 'mr', 'himself',
    'sir', 'sirs', 'nephew', 'nephews', 'uncle', 'uncles', 'husband', 'husbands',
    'waiter', 'waiters', 'widower', 'widowers', 'prince', 'princes', 'priest',
    'priests', 'chairman', 'chairmen', 'chairmans', 'chairmens'
}
FEMALE_TERMS = {
    'woman', 'women', 'womans', 'womens', 'gal', 'gals', 'her', 'she', 'shes', 'hers',
    'girl', 'girls', 'girlfriend', 'girlfriends', 'sister', 'sisters', 'mom', 'moms',
    'mother', 'mothers', 'spokeswoman', 'spokeswomen', 'spokeswomans', 'lady', 'ladys'
    'ladies', 'bride', 'brides', 'daughter', 'daughters', 'miss',
    'granddaughter', 'granddaughters', 'grandma', 'grandmas', 'grandmother', 'grandmothers',
    'fiancee', 'fiancees', 'female', 'females', 'queen', 'queens', 'mrs', 'herself',
    'niece', 'nieces', 'aunt', 'aunt', 'wife', 'wifes', 'wives',
    'waitress', 'waitresses', 'widow', 'widows', 'princess', 'princesses', 'priestess',
    'priestesses', 'chairwoman', 'chairwomans', 'chairwomen', 'chairwomens'
}

In [126]:
# remove punctuation
from string import punctuation
TABLE = str.maketrans('', '', punctuation + '’—“')

In [122]:
def get_sentence(text):
    for sent in nltk.sent_tokenize(text):
        yield {word.lower().translate(TABLE) for word in nltk.word_tokenize(sent) if word.translate(TABLE).strip()}
            
def parse_category(texts, cat1, cat2, label1, label2):
    all_sents = Counter()
    all_words = Counter()
    terms = defaultdict(lambda: Counter())
    for title, text in texts:
        sents, words = count_category(get_sentence(text), cat1, cat2, terms)
        all_sents += sents
        all_words += words
        total = sum(words.values())
        for cat, count in words.items():
            percent = count * 100 / total
            num_sents = sents[cat]
            if cat == Category.C1:
                label = label1
            elif cat == Category.C2:
                label = label2
            else:
                label = cat
            print(title)
            print(f'{percent:0.2f}% {label} ({num_sents} sentences)')
    for cat in terms:
        if cat == Category.C1:
            label = label1
        elif cat == Category.C2:
            label = label2
        else:
            label = cat
        print(f'{label}\n\t', end='')
        for term in terms[cat].most_common(20):
            print(f'{term[0]}', end=',')
        print()

In [123]:
from collections import Counter, defaultdict

def count_category(sentences, cat1, cat2, terms):
    sents = Counter()
    words = Counter()
    for sent in sentences:
        cat = judge_category(sent, cat1, cat2)
        sents[cat] += 1  # number of sentences with this category
        words[cat] += len(sent)  # number of words in this category
        terms[cat].update({s for s in sent if s not in stopwords})  # word counts in this category
    return words, sents

In [124]:
def judge_category(words, cat1, cat2):
    count1 = len(cat1 & words)
    count2 = len(cat2 & words)
    if count1 > 0 and count2 > 0:
        return Category.BOTH
    elif count1 > 0:
        return Category.C1
    elif count2 > 0:
        return Category.C2
    else:
        return Category.UNKNOWN

In [127]:
texts = []
with open(r'data\grandmothers-love.txt') as fh:
    texts.append(('grandmothers-love', fh.read()))

parse_category(texts, MALE_TERMS, FEMALE_TERMS, 'male', 'female')

grandmothers-love
6.67% male (98 sentences)
grandmothers-love
35.56% female (468 sentences)
grandmothers-love
55.56% Category.UNKNOWN (457 sentences)
grandmothers-love
2.22% Category.BOTH (7 sentences)
male
	community,history,beason,shaped,views,foundation,face,socialjustice,today,tyrone,times,issues,serves,seattle,columnist,new,parents,early,card,attended,
female
	knew,woman,teacher,used,black,way,back,name,grandmother,white,mama,big,might,day,first,said,two,pride,told,house,
Category.UNKNOWN
	people,justice,history,eyes,turn,teacher,mean,harm,life,better,living,born,country,keep,toward,see,opportunity,time,write,like,
Category.BOTH
	big,dad,mama,mother,


In [136]:
docs = iter((fileid, nltk.corpus.reuters.raw(fileid)) for fileid in nltk.corpus.reuters.fileids())
parse_category(docs, MALE_TERMS, FEMALE_TERMS, 'male', 'female')

test/14826
83.87% Category.UNKNOWN (565 sentences)
test/14826
16.13% male (99 sentences)
test/14828
100.00% Category.UNKNOWN (92 sentences)
test/14829
100.00% Category.UNKNOWN (150 sentences)
test/14832
100.00% Category.UNKNOWN (132 sentences)
test/14833
100.00% Category.UNKNOWN (144 sentences)
test/14839
100.00% Category.UNKNOWN (179 sentences)
test/14840
26.32% male (131 sentences)
test/14840
73.68% Category.UNKNOWN (230 sentences)
test/14841
100.00% Category.UNKNOWN (48 sentences)
test/14842
100.00% Category.UNKNOWN (106 sentences)
test/14843
82.86% Category.UNKNOWN (504 sentences)
test/14843
17.14% male (124 sentences)
test/14844
60.00% Category.UNKNOWN (56 sentences)
test/14844
40.00% male (37 sentences)
test/14849
33.33% male (79 sentences)
test/14849
66.67% Category.UNKNOWN (141 sentences)
test/14852
100.00% Category.UNKNOWN (236 sentences)
test/14854
100.00% Category.UNKNOWN (86 sentences)
test/14858
86.36% Category.UNKNOWN (349 sentences)
test/14858
13.64% male (68 sentences)


66.67% Category.UNKNOWN (367 sentences)
test/15149
23.33% male (117 sentences)
test/15149
10.00% female (40 sentences)
test/15152
100.00% Category.UNKNOWN (20 sentences)
test/15153
100.00% Category.UNKNOWN (18 sentences)
test/15154
56.25% Category.UNKNOWN (218 sentences)
test/15154
43.75% male (151 sentences)
test/15156
100.00% Category.UNKNOWN (127 sentences)
test/15157
100.00% Category.UNKNOWN (18 sentences)
test/15161
66.67% Category.UNKNOWN (59 sentences)
test/15161
33.33% male (15 sentences)
test/15162
100.00% Category.UNKNOWN (81 sentences)
test/15171
60.00% male (72 sentences)
test/15171
40.00% Category.UNKNOWN (26 sentences)
test/15172
100.00% Category.UNKNOWN (12 sentences)
test/15175
100.00% Category.UNKNOWN (24 sentences)
test/15179
100.00% male (116 sentences)
test/15180
100.00% Category.UNKNOWN (30 sentences)
test/15185
100.00% Category.UNKNOWN (9 sentences)
test/15188
100.00% Category.UNKNOWN (28 sentences)
test/15189
100.00% Category.UNKNOWN (101 sentences)
test/15190
10

20.00% male (18 sentences)
test/15547
100.00% Category.UNKNOWN (69 sentences)
test/15548
100.00% Category.UNKNOWN (22 sentences)
test/15549
93.33% Category.UNKNOWN (703 sentences)
test/15549
6.67% male (45 sentences)
test/15550
100.00% Category.UNKNOWN (62 sentences)
test/15551
100.00% Category.UNKNOWN (418 sentences)
test/15552
75.00% Category.UNKNOWN (215 sentences)
test/15552
25.00% male (65 sentences)
test/15553
100.00% Category.UNKNOWN (21 sentences)
test/15556
100.00% Category.UNKNOWN (565 sentences)
test/15558
100.00% Category.UNKNOWN (50 sentences)
test/15559
100.00% Category.UNKNOWN (25 sentences)
test/15560
100.00% Category.UNKNOWN (38 sentences)
test/15561
100.00% Category.UNKNOWN (43 sentences)
test/15562
66.67% male (45 sentences)
test/15562
33.33% Category.UNKNOWN (25 sentences)
test/15563
100.00% Category.UNKNOWN (34 sentences)
test/15565
100.00% Category.UNKNOWN (205 sentences)
test/15566
100.00% Category.UNKNOWN (10 sentences)
test/15567
100.00% Category.UNKNOWN (254 s

test/15894
12.50% male (25 sentences)
test/15895
100.00% Category.UNKNOWN (24 sentences)
test/15896
100.00% Category.UNKNOWN (33 sentences)
test/15897
100.00% Category.UNKNOWN (20 sentences)
test/15898
100.00% Category.UNKNOWN (37 sentences)
test/15899
100.00% Category.UNKNOWN (40 sentences)
test/15900
100.00% Category.UNKNOWN (52 sentences)
test/15901
100.00% Category.UNKNOWN (19 sentences)
test/15902
100.00% Category.UNKNOWN (24 sentences)
test/15903
100.00% Category.UNKNOWN (24 sentences)
test/15904
100.00% Category.UNKNOWN (24 sentences)
test/15906
100.00% Category.UNKNOWN (476 sentences)
test/15908
100.00% Category.UNKNOWN (38 sentences)
test/15909
100.00% Category.UNKNOWN (48 sentences)
test/15910
100.00% Category.UNKNOWN (148 sentences)
test/15911
100.00% Category.UNKNOWN (96 sentences)
test/15912
100.00% Category.UNKNOWN (102 sentences)
test/15913
75.00% male (64 sentences)
test/15913
25.00% Category.UNKNOWN (22 sentences)
test/15914
100.00% Category.UNKNOWN (186 sentences)
tes

test/16233
85.71% Category.UNKNOWN (103 sentences)
test/16233
14.29% male (26 sentences)
test/16234
100.00% Category.UNKNOWN (37 sentences)
test/16236
100.00% Category.UNKNOWN (69 sentences)
test/16238
100.00% Category.UNKNOWN (294 sentences)
test/16241
100.00% Category.UNKNOWN (152 sentences)
test/16243
100.00% Category.UNKNOWN (30 sentences)
test/16244
100.00% Category.UNKNOWN (34 sentences)
test/16246
100.00% Category.UNKNOWN (51 sentences)
test/16247
100.00% Category.UNKNOWN (50 sentences)
test/16248
87.50% Category.UNKNOWN (139 sentences)
test/16248
12.50% male (21 sentences)
test/16250
100.00% Category.UNKNOWN (100 sentences)
test/16251
100.00% Category.UNKNOWN (83 sentences)
test/16252
66.67% Category.UNKNOWN (94 sentences)
test/16252
33.33% male (56 sentences)
test/16255
100.00% Category.UNKNOWN (47 sentences)
test/16256
100.00% Category.UNKNOWN (178 sentences)
test/16257
100.00% Category.UNKNOWN (231 sentences)
test/16258
100.00% Category.UNKNOWN (75 sentences)
test/16260
100.

100.00% Category.UNKNOWN (70 sentences)
test/16886
100.00% Category.UNKNOWN (28 sentences)
test/16888
100.00% Category.UNKNOWN (57 sentences)
test/16890
100.00% Category.UNKNOWN (29 sentences)
test/16893
100.00% Category.UNKNOWN (133 sentences)
test/16897
100.00% Category.UNKNOWN (88 sentences)
test/16903
50.00% Category.UNKNOWN (26 sentences)
test/16903
50.00% male (50 sentences)
test/16908
100.00% Category.UNKNOWN (41 sentences)
test/16909
100.00% Category.UNKNOWN (57 sentences)
test/16910
100.00% Category.UNKNOWN (32 sentences)
test/16911
100.00% Category.UNKNOWN (23 sentences)
test/16912
100.00% Category.UNKNOWN (19 sentences)
test/16913
100.00% Category.UNKNOWN (47 sentences)
test/16916
100.00% Category.UNKNOWN (28 sentences)
test/16917
100.00% Category.UNKNOWN (33 sentences)
test/16918
100.00% Category.UNKNOWN (19 sentences)
test/16921
100.00% Category.UNKNOWN (15 sentences)
test/16922
100.00% Category.UNKNOWN (36 sentences)
test/16923
100.00% Category.UNKNOWN (54 sentences)
test

test/17900
100.00% Category.UNKNOWN (58 sentences)
test/17901
100.00% Category.UNKNOWN (113 sentences)
test/17906
86.67% Category.UNKNOWN (288 sentences)
test/17906
13.33% male (34 sentences)
test/17907
62.50% Category.UNKNOWN (114 sentences)
test/17907
37.50% male (50 sentences)
test/17911
16.67% male (67 sentences)
test/17911
83.33% Category.UNKNOWN (226 sentences)
test/17913
100.00% Category.UNKNOWN (158 sentences)
test/17915
44.83% male (310 sentences)
test/17915
51.72% Category.UNKNOWN (307 sentences)
test/17915
3.45% female (23 sentences)
test/17916
100.00% Category.UNKNOWN (29 sentences)
test/17922
100.00% Category.UNKNOWN (69 sentences)
test/17923
84.38% Category.UNKNOWN (492 sentences)
test/17923
15.62% male (88 sentences)
test/17924
100.00% Category.UNKNOWN (95 sentences)
test/17925
91.67% Category.UNKNOWN (247 sentences)
test/17925
8.33% male (24 sentences)
test/17926
100.00% Category.UNKNOWN (95 sentences)
test/17927
100.00% Category.UNKNOWN (116 sentences)
test/17929
77.78

test/18769
100.00% Category.UNKNOWN (63 sentences)
test/18773
100.00% Category.UNKNOWN (50 sentences)
test/18774
71.43% Category.UNKNOWN (136 sentences)
test/18774
28.57% male (41 sentences)
test/18778
100.00% Category.UNKNOWN (93 sentences)
test/18779
100.00% Category.UNKNOWN (1 sentences)
test/18781
50.00% Category.UNKNOWN (151 sentences)
test/18781
50.00% male (195 sentences)
test/18782
100.00% Category.UNKNOWN (84 sentences)
test/18783
50.00% male (35 sentences)
test/18783
50.00% Category.UNKNOWN (24 sentences)
test/18789
100.00% Category.UNKNOWN (11 sentences)
test/18795
100.00% Category.UNKNOWN (64 sentences)
test/18798
100.00% Category.UNKNOWN (161 sentences)
test/18807
100.00% Category.UNKNOWN (36 sentences)
test/18810
100.00% Category.UNKNOWN (51 sentences)
test/18811
100.00% Category.UNKNOWN (112 sentences)
test/18816
100.00% Category.UNKNOWN (30 sentences)
test/18824
100.00% Category.UNKNOWN (10 sentences)
test/18828
100.00% Category.UNKNOWN (52 sentences)
test/18830
100.00%

100.00% Category.UNKNOWN (73 sentences)
test/19501
100.00% Category.UNKNOWN (13 sentences)
test/19505
36.36% male (92 sentences)
test/19505
63.64% Category.UNKNOWN (153 sentences)
test/19506
100.00% Category.UNKNOWN (95 sentences)
test/19507
100.00% Category.UNKNOWN (164 sentences)
test/19509
11.76% male (46 sentences)
test/19509
88.24% Category.UNKNOWN (361 sentences)
test/19510
100.00% Category.UNKNOWN (201 sentences)
test/19511
90.00% Category.UNKNOWN (233 sentences)
test/19511
10.00% male (18 sentences)
test/19512
100.00% Category.UNKNOWN (124 sentences)
test/19513
100.00% Category.UNKNOWN (12 sentences)
test/19515
100.00% Category.UNKNOWN (33 sentences)
test/19529
100.00% Category.UNKNOWN (56 sentences)
test/19534
100.00% Category.UNKNOWN (166 sentences)
test/19537
100.00% Category.UNKNOWN (12 sentences)
test/19539
100.00% Category.UNKNOWN (251 sentences)
test/19540
84.62% Category.UNKNOWN (210 sentences)
test/19540
7.69% female (30 sentences)
test/19540
7.69% male (22 sentences)


100.00% Category.UNKNOWN (22 sentences)
test/20389
75.00% Category.UNKNOWN (149 sentences)
test/20389
25.00% male (45 sentences)
test/20391
100.00% Category.UNKNOWN (35 sentences)
test/20392
100.00% Category.UNKNOWN (90 sentences)
test/20393
100.00% Category.UNKNOWN (107 sentences)
test/20395
100.00% Category.UNKNOWN (43 sentences)
test/20396
63.64% male (129 sentences)
test/20396
36.36% Category.UNKNOWN (70 sentences)
test/20398
100.00% Category.UNKNOWN (81 sentences)
test/20400
100.00% Category.UNKNOWN (34 sentences)
test/20401
100.00% Category.UNKNOWN (41 sentences)
test/20406
100.00% Category.UNKNOWN (101 sentences)
test/20407
100.00% Category.UNKNOWN (32 sentences)
test/20409
81.82% Category.UNKNOWN (156 sentences)
test/20409
18.18% male (48 sentences)
test/20412
100.00% Category.UNKNOWN (38 sentences)
test/20414
100.00% Category.UNKNOWN (30 sentences)
test/20415
100.00% Category.UNKNOWN (29 sentences)
test/20419
100.00% Category.UNKNOWN (49 sentences)
test/20420
100.00% Category.

test/21193
100.00% Category.UNKNOWN (78 sentences)
test/21194
100.00% Category.UNKNOWN (35 sentences)
test/21195
100.00% Category.UNKNOWN (33 sentences)
test/21196
100.00% Category.UNKNOWN (28 sentences)
test/21197
60.00% Category.UNKNOWN (58 sentences)
test/21197
40.00% male (63 sentences)
test/21199
100.00% Category.UNKNOWN (32 sentences)
test/21201
77.78% Category.UNKNOWN (126 sentences)
test/21201
22.22% male (48 sentences)
test/21202
85.71% Category.UNKNOWN (159 sentences)
test/21202
14.29% male (16 sentences)
test/21203
100.00% Category.UNKNOWN (26 sentences)
test/21205
100.00% Category.UNKNOWN (33 sentences)
test/21206
100.00% Category.UNKNOWN (23 sentences)
test/21207
100.00% Category.UNKNOWN (12 sentences)
test/21208
100.00% Category.UNKNOWN (27 sentences)
test/21209
100.00% Category.UNKNOWN (41 sentences)
test/21210
100.00% Category.UNKNOWN (137 sentences)
test/21211
100.00% Category.UNKNOWN (22 sentences)
test/21212
100.00% Category.UNKNOWN (1 sentences)
test/21214
100.00% C

training/10079
100.00% Category.UNKNOWN (13 sentences)
training/1008
100.00% Category.UNKNOWN (83 sentences)
training/10080
90.91% Category.UNKNOWN (550 sentences)
training/10080
9.09% male (109 sentences)
training/10081
57.14% Category.UNKNOWN (81 sentences)
training/10081
42.86% male (66 sentences)
training/10083
100.00% Category.UNKNOWN (42 sentences)
training/10085
100.00% Category.UNKNOWN (24 sentences)
training/10086
100.00% Category.UNKNOWN (36 sentences)
training/10088
100.00% Category.UNKNOWN (42 sentences)
training/10089
50.00% Category.UNKNOWN (39 sentences)
training/10089
50.00% male (13 sentences)
training/1009
100.00% Category.UNKNOWN (18 sentences)
training/10090
100.00% Category.UNKNOWN (49 sentences)
training/10091
52.63% Category.UNKNOWN (182 sentences)
training/10091
47.37% male (237 sentences)
training/10092
100.00% Category.UNKNOWN (19 sentences)
training/10094
100.00% Category.UNKNOWN (45 sentences)
training/10095
85.71% Category.UNKNOWN (119 sentences)
training/1

training/1065
100.00% Category.UNKNOWN (19 sentences)
training/10650
100.00% Category.UNKNOWN (11 sentences)
training/10651
100.00% Category.UNKNOWN (11 sentences)
training/10652
100.00% Category.UNKNOWN (8 sentences)
training/10653
100.00% Category.UNKNOWN (12 sentences)
training/10654
100.00% Category.UNKNOWN (77 sentences)
training/10657
100.00% male (22 sentences)
training/10658
100.00% Category.UNKNOWN (11 sentences)
training/10659
83.33% male (102 sentences)
training/10659
16.67% Category.UNKNOWN (19 sentences)
training/1066
100.00% Category.UNKNOWN (19 sentences)
training/10660
66.67% Category.UNKNOWN (59 sentences)
training/10660
33.33% male (29 sentences)
training/10661
25.00% Category.UNKNOWN (51 sentences)
training/10661
75.00% male (134 sentences)
training/10662
60.00% Category.UNKNOWN (64 sentences)
training/10662
40.00% male (40 sentences)
training/10665
70.00% Category.UNKNOWN (142 sentences)
training/10665
30.00% male (41 sentences)
training/10669
100.00% Category.UNKNO

75.00% Category.UNKNOWN (76 sentences)
training/11210
25.00% male (18 sentences)
training/11211
25.00% male (39 sentences)
training/11211
75.00% Category.UNKNOWN (44 sentences)
training/11212
100.00% Category.UNKNOWN (85 sentences)
training/11213
100.00% Category.UNKNOWN (554 sentences)
training/11216
100.00% Category.UNKNOWN (92 sentences)
training/1122
100.00% Category.UNKNOWN (26 sentences)
training/11220
100.00% Category.UNKNOWN (110 sentences)
training/11221
100.00% Category.UNKNOWN (48 sentences)
training/11222
66.67% female (109 sentences)
training/11222
16.67% Category.UNKNOWN (19 sentences)
training/11222
16.67% Category.BOTH (32 sentences)
training/11224
94.55% Category.UNKNOWN (1128 sentences)
training/11224
5.45% male (71 sentences)
training/11225
9.09% male (82 sentences)
training/11225
90.91% Category.UNKNOWN (551 sentences)
training/11227
100.00% Category.UNKNOWN (51 sentences)
training/11228
100.00% Category.UNKNOWN (69 sentences)
training/11229
100.00% Category.UNKNOWN

30.00% male (43 sentences)
training/11578
100.00% Category.UNKNOWN (20 sentences)
training/11579
100.00% Category.UNKNOWN (57 sentences)
training/1158
100.00% Category.UNKNOWN (35 sentences)
training/11580
37.50% Category.UNKNOWN (81 sentences)
training/11580
12.50% male (44 sentences)
training/11580
50.00% female (80 sentences)
training/11584
100.00% Category.UNKNOWN (10 sentences)
training/11587
87.50% Category.UNKNOWN (154 sentences)
training/11587
12.50% male (33 sentences)
training/11588
100.00% Category.UNKNOWN (40 sentences)
training/1159
100.00% Category.UNKNOWN (131 sentences)
training/11597
100.00% Category.UNKNOWN (76 sentences)
training/11598
100.00% Category.UNKNOWN (31 sentences)
training/11599
100.00% Category.UNKNOWN (177 sentences)
training/1160
100.00% Category.UNKNOWN (53 sentences)
training/11602
100.00% Category.UNKNOWN (176 sentences)
training/11607
100.00% Category.UNKNOWN (11 sentences)
training/11608
100.00% Category.UNKNOWN (12 sentences)
training/11609
100.00

training/12139
100.00% Category.UNKNOWN (14 sentences)
training/12140
100.00% Category.UNKNOWN (10 sentences)
training/12145
75.00% male (149 sentences)
training/12145
25.00% Category.UNKNOWN (33 sentences)
training/12146
85.71% Category.UNKNOWN (278 sentences)
training/12146
14.29% male (66 sentences)
training/12147
100.00% Category.UNKNOWN (66 sentences)
training/12149
100.00% Category.UNKNOWN (41 sentences)
training/1215
100.00% Category.UNKNOWN (163 sentences)
training/12150
100.00% Category.UNKNOWN (38 sentences)
training/12152
100.00% Category.UNKNOWN (119 sentences)
training/12153
100.00% Category.UNKNOWN (34 sentences)
training/12154
100.00% Category.UNKNOWN (32 sentences)
training/12155
100.00% Category.UNKNOWN (23 sentences)
training/12156
70.00% Category.UNKNOWN (205 sentences)
training/12156
30.00% male (72 sentences)
training/12159
100.00% Category.UNKNOWN (105 sentences)
training/1216
100.00% Category.UNKNOWN (45 sentences)
training/12160
100.00% Category.UNKNOWN (534 sen

100.00% Category.UNKNOWN (10 sentences)
training/12559
100.00% Category.UNKNOWN (70 sentences)
training/12562
100.00% Category.UNKNOWN (52 sentences)
training/12563
62.50% Category.UNKNOWN (96 sentences)
training/12563
37.50% male (55 sentences)
training/12564
100.00% Category.UNKNOWN (91 sentences)
training/12569
100.00% Category.UNKNOWN (31 sentences)
training/1257
100.00% Category.UNKNOWN (108 sentences)
training/12570
100.00% Category.UNKNOWN (32 sentences)
training/12571
100.00% Category.UNKNOWN (29 sentences)
training/12572
100.00% Category.UNKNOWN (33 sentences)
training/12573
100.00% Category.UNKNOWN (89 sentences)
training/12574
100.00% Category.UNKNOWN (156 sentences)
training/12576
66.67% male (99 sentences)
training/12576
33.33% Category.UNKNOWN (52 sentences)
training/12579
100.00% Category.UNKNOWN (51 sentences)
training/1258
100.00% Category.UNKNOWN (77 sentences)
training/12580
100.00% Category.UNKNOWN (19 sentences)
training/12583
100.00% Category.UNKNOWN (47 sentences

training/13294
100.00% Category.UNKNOWN (74 sentences)
training/133
100.00% Category.UNKNOWN (1 sentences)
training/1331
100.00% Category.UNKNOWN (13 sentences)
training/13313
100.00% Category.UNKNOWN (170 sentences)
training/13315
100.00% Category.UNKNOWN (57 sentences)
training/13317
100.00% Category.UNKNOWN (12 sentences)
training/1332
75.00% male (67 sentences)
training/1332
25.00% Category.UNKNOWN (13 sentences)
training/13320
100.00% Category.UNKNOWN (167 sentences)
training/13321
100.00% Category.UNKNOWN (84 sentences)
training/13333
100.00% Category.UNKNOWN (100 sentences)
training/13336
100.00% Category.UNKNOWN (13 sentences)
training/1335
100.00% Category.UNKNOWN (88 sentences)
training/13380
100.00% Category.UNKNOWN (93 sentences)
training/13382
100.00% Category.UNKNOWN (93 sentences)
training/1339
100.00% Category.UNKNOWN (80 sentences)
training/13393
100.00% Category.UNKNOWN (72 sentences)
training/13398
100.00% Category.UNKNOWN (10 sentences)
training/134
100.00% Category

100.00% Category.UNKNOWN (183 sentences)
training/1577
100.00% Category.UNKNOWN (97 sentences)
training/1579
91.30% Category.UNKNOWN (553 sentences)
training/1579
8.70% male (59 sentences)
training/1581
100.00% male (93 sentences)
training/1582
100.00% Category.UNKNOWN (448 sentences)
training/1584
100.00% Category.UNKNOWN (54 sentences)
training/1585
100.00% Category.UNKNOWN (60 sentences)
training/1586
100.00% Category.UNKNOWN (21 sentences)
training/1588
100.00% male (82 sentences)
training/1590
42.86% Category.UNKNOWN (55 sentences)
training/1590
57.14% male (131 sentences)
training/1598
75.00% Category.UNKNOWN (71 sentences)
training/1598
25.00% male (30 sentences)
training/160
100.00% Category.UNKNOWN (47 sentences)
training/1601
100.00% Category.UNKNOWN (43 sentences)
training/1602
100.00% Category.UNKNOWN (88 sentences)
training/1606
100.00% Category.UNKNOWN (75 sentences)
training/1607
100.00% Category.UNKNOWN (81 sentences)
training/1608
100.00% Category.UNKNOWN (30 sentences

training/2095
75.00% Category.UNKNOWN (94 sentences)
training/2095
25.00% male (18 sentences)
training/2097
71.43% Category.UNKNOWN (75 sentences)
training/2097
28.57% male (37 sentences)
training/2098
100.00% Category.UNKNOWN (10 sentences)
training/2099
100.00% Category.UNKNOWN (44 sentences)
training/210
100.00% Category.UNKNOWN (50 sentences)
training/2101
100.00% Category.UNKNOWN (26 sentences)
training/2102
100.00% Category.UNKNOWN (28 sentences)
training/2107
100.00% Category.UNKNOWN (70 sentences)
training/2108
77.78% Category.UNKNOWN (153 sentences)
training/2108
22.22% male (40 sentences)
training/2109
100.00% Category.UNKNOWN (25 sentences)
training/211
100.00% Category.UNKNOWN (80 sentences)
training/2110
100.00% Category.UNKNOWN (231 sentences)
training/2112
100.00% Category.UNKNOWN (54 sentences)
training/2114
100.00% Category.UNKNOWN (81 sentences)
training/2115
33.33% Category.UNKNOWN (141 sentences)
training/2115
66.67% male (299 sentences)
training/2116
100.00% Catego

training/2593
100.00% Category.UNKNOWN (18 sentences)
training/2594
100.00% Category.UNKNOWN (131 sentences)
training/2595
100.00% Category.UNKNOWN (9 sentences)
training/2596
46.15% male (140 sentences)
training/2596
53.85% Category.UNKNOWN (134 sentences)
training/2599
100.00% Category.UNKNOWN (9 sentences)
training/260
33.33% male (44 sentences)
training/260
66.67% Category.UNKNOWN (84 sentences)
training/2601
100.00% Category.UNKNOWN (99 sentences)
training/2602
100.00% Category.UNKNOWN (39 sentences)
training/2604
100.00% Category.UNKNOWN (39 sentences)
training/2606
75.00% Category.UNKNOWN (62 sentences)
training/2606
25.00% male (21 sentences)
training/2608
100.00% Category.UNKNOWN (19 sentences)
training/2609
100.00% Category.UNKNOWN (82 sentences)
training/2611
100.00% Category.UNKNOWN (49 sentences)
training/2613
100.00% Category.UNKNOWN (31 sentences)
training/2617
66.67% Category.UNKNOWN (58 sentences)
training/2617
33.33% male (24 sentences)
training/2618
87.18% Category.U

training/3028
30.00% male (64 sentences)
training/303
100.00% Category.UNKNOWN (203 sentences)
training/3031
58.33% Category.UNKNOWN (190 sentences)
training/3031
41.67% male (105 sentences)
training/3034
27.27% male (68 sentences)
training/3034
72.73% Category.UNKNOWN (203 sentences)
training/3035
100.00% Category.UNKNOWN (10 sentences)
training/3036
100.00% Category.UNKNOWN (97 sentences)
training/3037
100.00% Category.UNKNOWN (13 sentences)
training/3038
100.00% Category.UNKNOWN (82 sentences)
training/3039
100.00% Category.UNKNOWN (136 sentences)
training/304
100.00% Category.UNKNOWN (169 sentences)
training/3040
100.00% Category.UNKNOWN (39 sentences)
training/3041
100.00% Category.UNKNOWN (10 sentences)
training/3042
100.00% Category.UNKNOWN (50 sentences)
training/3043
100.00% Category.UNKNOWN (75 sentences)
training/3044
100.00% Category.UNKNOWN (81 sentences)
training/3046
100.00% Category.UNKNOWN (64 sentences)
training/3047
100.00% Category.UNKNOWN (92 sentences)
training/30

18.75% male (129 sentences)
training/3535
95.00% Category.UNKNOWN (457 sentences)
training/3535
5.00% male (29 sentences)
training/3538
100.00% Category.UNKNOWN (13 sentences)
training/3539
88.89% Category.UNKNOWN (174 sentences)
training/3539
11.11% male (16 sentences)
training/354
92.86% Category.UNKNOWN (230 sentences)
training/354
7.14% male (15 sentences)
training/3540
57.14% Category.UNKNOWN (285 sentences)
training/3540
42.86% male (166 sentences)
training/3541
100.00% Category.UNKNOWN (123 sentences)
training/3542
90.00% Category.UNKNOWN (228 sentences)
training/3542
10.00% male (34 sentences)
training/3543
100.00% Category.UNKNOWN (78 sentences)
training/3545
100.00% Category.UNKNOWN (59 sentences)
training/3547
57.14% Category.UNKNOWN (68 sentences)
training/3547
42.86% male (69 sentences)
training/3553
57.14% Category.UNKNOWN (106 sentences)
training/3553
42.86% male (45 sentences)
training/3554
100.00% Category.UNKNOWN (110 sentences)
training/3556
77.78% Category.UNKNOWN (

100.00% Category.UNKNOWN (43 sentences)
training/3959
100.00% Category.UNKNOWN (11 sentences)
training/396
100.00% Category.UNKNOWN (16 sentences)
training/3960
75.00% Category.UNKNOWN (44 sentences)
training/3960
25.00% male (7 sentences)
training/3961
100.00% Category.UNKNOWN (55 sentences)
training/3964
100.00% Category.UNKNOWN (8 sentences)
training/3968
100.00% Category.UNKNOWN (18 sentences)
training/3970
100.00% Category.UNKNOWN (70 sentences)
training/3971
100.00% Category.UNKNOWN (25 sentences)
training/3973
33.33% male (31 sentences)
training/3973
66.67% Category.UNKNOWN (44 sentences)
training/3976
100.00% Category.UNKNOWN (11 sentences)
training/3977
100.00% Category.UNKNOWN (34 sentences)
training/3979
91.67% Category.UNKNOWN (226 sentences)
training/3979
8.33% male (15 sentences)
training/398
100.00% Category.UNKNOWN (10 sentences)
training/3980
100.00% Category.UNKNOWN (97 sentences)
training/3981
100.00% Category.UNKNOWN (495 sentences)
training/3982
100.00% Category.UN

57.14% Category.UNKNOWN (99 sentences)
training/4514
20.00% male (33 sentences)
training/4514
80.00% Category.UNKNOWN (79 sentences)
training/4515
100.00% Category.UNKNOWN (13 sentences)
training/4516
100.00% Category.UNKNOWN (11 sentences)
training/4517
100.00% Category.UNKNOWN (41 sentences)
training/4518
50.00% male (163 sentences)
training/4518
50.00% Category.UNKNOWN (170 sentences)
training/4519
100.00% Category.UNKNOWN (65 sentences)
training/4521
100.00% Category.UNKNOWN (21 sentences)
training/4523
100.00% Category.UNKNOWN (38 sentences)
training/4524
44.44% Category.UNKNOWN (131 sentences)
training/4524
55.56% male (73 sentences)
training/4525
80.00% Category.UNKNOWN (86 sentences)
training/4525
20.00% male (18 sentences)
training/4532
100.00% Category.UNKNOWN (16 sentences)
training/4533
100.00% Category.UNKNOWN (16 sentences)
training/4534
100.00% Category.UNKNOWN (18 sentences)
training/4536
100.00% Category.UNKNOWN (75 sentences)
training/4537
100.00% Category.UNKNOWN (53

training/5138
100.00% male (13 sentences)
training/5139
15.79% male (56 sentences)
training/5139
84.21% Category.UNKNOWN (339 sentences)
training/514
100.00% Category.UNKNOWN (18 sentences)
training/5141
100.00% Category.UNKNOWN (57 sentences)
training/5142
100.00% Category.UNKNOWN (35 sentences)
training/5145
90.91% Category.UNKNOWN (158 sentences)
training/5145
9.09% male (30 sentences)
training/5146
100.00% Category.UNKNOWN (59 sentences)
training/5148
100.00% Category.UNKNOWN (121 sentences)
training/5149
50.00% Category.UNKNOWN (74 sentences)
training/5149
50.00% male (74 sentences)
training/5150
100.00% Category.UNKNOWN (286 sentences)
training/5152
80.00% male (74 sentences)
training/5152
20.00% Category.UNKNOWN (16 sentences)
training/5153
100.00% Category.UNKNOWN (148 sentences)
training/5154
100.00% Category.UNKNOWN (151 sentences)
training/5156
90.91% Category.UNKNOWN (216 sentences)
training/5156
9.09% male (25 sentences)
training/516
100.00% Category.UNKNOWN (123 sentences

training/5453
55.56% Category.UNKNOWN (103 sentences)
training/5453
44.44% male (70 sentences)
training/5454
100.00% Category.UNKNOWN (107 sentences)
training/5455
40.00% male (44 sentences)
training/5455
60.00% Category.UNKNOWN (50 sentences)
training/5456
16.67% female (39 sentences)
training/5456
50.00% Category.UNKNOWN (80 sentences)
training/5456
33.33% male (56 sentences)
training/5458
46.15% male (167 sentences)
training/5458
53.85% Category.UNKNOWN (154 sentences)
training/546
100.00% Category.UNKNOWN (18 sentences)
training/5460
100.00% Category.UNKNOWN (14 sentences)
training/5461
100.00% Category.UNKNOWN (25 sentences)
training/5464
100.00% Category.UNKNOWN (31 sentences)
training/5465
33.33% male (39 sentences)
training/5465
66.67% Category.UNKNOWN (71 sentences)
training/5467
94.74% Category.UNKNOWN (404 sentences)
training/5467
5.26% male (19 sentences)
training/5469
100.00% Category.UNKNOWN (36 sentences)
training/547
100.00% Category.UNKNOWN (82 sentences)
training/5470

training/6044
100.00% Category.UNKNOWN (8 sentences)
training/6046
50.00% male (25 sentences)
training/6046
50.00% Category.UNKNOWN (23 sentences)
training/6047
100.00% Category.UNKNOWN (150 sentences)
training/6048
100.00% Category.UNKNOWN (25 sentences)
training/6051
100.00% Category.UNKNOWN (31 sentences)
training/6054
40.00% male (63 sentences)
training/6054
60.00% Category.UNKNOWN (43 sentences)
training/6055
100.00% Category.UNKNOWN (18 sentences)
training/6056
50.00% Category.UNKNOWN (41 sentences)
training/6056
50.00% male (46 sentences)
training/6058
85.71% Category.UNKNOWN (110 sentences)
training/6058
14.29% male (32 sentences)
training/6059
100.00% Category.UNKNOWN (72 sentences)
training/6060
69.23% Category.UNKNOWN (229 sentences)
training/6060
15.38% female (32 sentences)
training/6060
15.38% male (49 sentences)
training/6062
53.85% Category.UNKNOWN (138 sentences)
training/6062
46.15% male (106 sentences)
training/6063
100.00% Category.UNKNOWN (125 sentences)
training/6

100.00% Category.UNKNOWN (65 sentences)
training/6672
100.00% Category.UNKNOWN (66 sentences)
training/6673
100.00% Category.UNKNOWN (18 sentences)
training/6674
100.00% Category.UNKNOWN (57 sentences)
training/6675
100.00% Category.UNKNOWN (12 sentences)
training/6677
100.00% Category.UNKNOWN (85 sentences)
training/6679
100.00% Category.UNKNOWN (35 sentences)
training/6680
100.00% Category.UNKNOWN (33 sentences)
training/6681
100.00% Category.UNKNOWN (40 sentences)
training/6682
100.00% Category.UNKNOWN (58 sentences)
training/6683
100.00% Category.UNKNOWN (19 sentences)
training/6684
100.00% Category.UNKNOWN (16 sentences)
training/6685
100.00% Category.UNKNOWN (21 sentences)
training/6686
100.00% Category.UNKNOWN (18 sentences)
training/6687
100.00% Category.UNKNOWN (36 sentences)
training/6689
100.00% Category.UNKNOWN (284 sentences)
training/6694
100.00% Category.UNKNOWN (63 sentences)
training/6695
100.00% Category.UNKNOWN (33 sentences)
training/6696
100.00% Category.UNKNOWN (1

85.71% Category.UNKNOWN (140 sentences)
training/7043
100.00% Category.UNKNOWN (305 sentences)
training/7045
100.00% Category.UNKNOWN (73 sentences)
training/7046
100.00% Category.UNKNOWN (48 sentences)
training/7047
100.00% Category.UNKNOWN (88 sentences)
training/7048
100.00% Category.UNKNOWN (40 sentences)
training/7049
100.00% Category.UNKNOWN (57 sentences)
training/7052
100.00% Category.UNKNOWN (15 sentences)
training/7057
100.00% Category.UNKNOWN (16 sentences)
training/7058
100.00% Category.UNKNOWN (145 sentences)
training/706
100.00% Category.UNKNOWN (129 sentences)
training/7060
100.00% Category.UNKNOWN (87 sentences)
training/7061
100.00% Category.UNKNOWN (161 sentences)
training/7062
95.65% Category.UNKNOWN (550 sentences)
training/7062
4.35% male (25 sentences)
training/7063
100.00% Category.UNKNOWN (35 sentences)
training/7064
100.00% Category.UNKNOWN (29 sentences)
training/7065
100.00% Category.UNKNOWN (72 sentences)
training/7066
72.73% Category.UNKNOWN (201 sentences)

training/7628
95.83% Category.UNKNOWN (487 sentences)
training/7628
4.17% male (20 sentences)
training/7629
100.00% Category.UNKNOWN (241 sentences)
training/7631
100.00% Category.UNKNOWN (32 sentences)
training/7632
69.23% Category.UNKNOWN (160 sentences)
training/7632
30.77% male (91 sentences)
training/7633
42.86% male (69 sentences)
training/7633
57.14% Category.UNKNOWN (77 sentences)
training/7634
100.00% Category.UNKNOWN (39 sentences)
training/7635
100.00% Category.UNKNOWN (142 sentences)
training/7636
100.00% Category.UNKNOWN (28 sentences)
training/7637
100.00% Category.UNKNOWN (11 sentences)
training/7638
100.00% Category.UNKNOWN (11 sentences)
training/7639
100.00% Category.UNKNOWN (92 sentences)
training/764
100.00% Category.UNKNOWN (39 sentences)
training/7640
33.33% male (30 sentences)
training/7640
66.67% Category.UNKNOWN (49 sentences)
training/7641
100.00% Category.UNKNOWN (12 sentences)
training/7642
100.00% Category.UNKNOWN (157 sentences)
training/7643
85.71% Catego

training/8109
41.67% male (122 sentences)
training/811
100.00% Category.UNKNOWN (39 sentences)
training/8111
100.00% Category.UNKNOWN (165 sentences)
training/8112
100.00% Category.UNKNOWN (39 sentences)
training/8113
62.50% Category.UNKNOWN (107 sentences)
training/8113
37.50% male (68 sentences)
training/8115
96.55% Category.UNKNOWN (570 sentences)
training/8115
3.45% male (27 sentences)
training/8117
76.92% male (226 sentences)
training/8117
23.08% Category.UNKNOWN (90 sentences)
training/8119
66.67% Category.UNKNOWN (52 sentences)
training/8119
33.33% male (24 sentences)
training/8120
100.00% Category.UNKNOWN (14 sentences)
training/8123
100.00% Category.UNKNOWN (14 sentences)
training/8125
100.00% Category.UNKNOWN (70 sentences)
training/8126
100.00% Category.UNKNOWN (50 sentences)
training/8130
91.67% Category.UNKNOWN (237 sentences)
training/8130
8.33% male (22 sentences)
training/8131
30.00% Category.UNKNOWN (80 sentences)
training/8131
70.00% male (163 sentences)
training/8132

60.00% Category.UNKNOWN (165 sentences)
training/8597
40.00% male (154 sentences)
training/8598
72.73% Category.UNKNOWN (226 sentences)
training/8598
27.27% male (93 sentences)
training/8599
50.00% Category.UNKNOWN (116 sentences)
training/8599
50.00% male (123 sentences)
training/86
100.00% Category.UNKNOWN (17 sentences)
training/8600
58.33% Category.UNKNOWN (139 sentences)
training/8600
41.67% male (113 sentences)
training/8602
71.43% Category.UNKNOWN (155 sentences)
training/8602
28.57% male (59 sentences)
training/8603
75.00% Category.UNKNOWN (79 sentences)
training/8603
25.00% male (23 sentences)
training/8604
100.00% Category.UNKNOWN (132 sentences)
training/8605
100.00% Category.UNKNOWN (9 sentences)
training/8606
70.00% Category.UNKNOWN (177 sentences)
training/8606
30.00% male (66 sentences)
training/8607
85.71% Category.UNKNOWN (138 sentences)
training/8607
14.29% male (17 sentences)
training/8608
44.44% male (101 sentences)
training/8608
55.56% Category.UNKNOWN (104 sentenc

training/9030
100.00% Category.UNKNOWN (62 sentences)
training/9031
100.00% Category.UNKNOWN (100 sentences)
training/9032
100.00% Category.UNKNOWN (300 sentences)
training/9033
100.00% Category.UNKNOWN (68 sentences)
training/9034
100.00% Category.UNKNOWN (46 sentences)
training/9036
100.00% Category.UNKNOWN (156 sentences)
training/9039
100.00% Category.UNKNOWN (13 sentences)
training/904
100.00% Category.UNKNOWN (61 sentences)
training/9040
100.00% Category.UNKNOWN (29 sentences)
training/9041
100.00% Category.UNKNOWN (9 sentences)
training/9044
100.00% Category.UNKNOWN (21 sentences)
training/9045
100.00% Category.UNKNOWN (56 sentences)
training/9047
78.26% Category.UNKNOWN (341 sentences)
training/9047
21.74% male (100 sentences)
training/9048
57.14% Category.UNKNOWN (83 sentences)
training/9048
42.86% male (93 sentences)
training/9049
100.00% Category.UNKNOWN (28 sentences)
training/9051
100.00% Category.UNKNOWN (114 sentences)
training/9053
100.00% Category.UNKNOWN (60 sentences

37.50% Category.UNKNOWN (33 sentences)
training/9425
100.00% Category.UNKNOWN (31 sentences)
training/9426
100.00% Category.UNKNOWN (10 sentences)
training/9427
100.00% Category.UNKNOWN (51 sentences)
training/9428
100.00% Category.UNKNOWN (20 sentences)
training/9429
100.00% Category.UNKNOWN (47 sentences)
training/943
100.00% Category.UNKNOWN (25 sentences)
training/9431
100.00% Category.UNKNOWN (43 sentences)
training/9432
100.00% Category.UNKNOWN (68 sentences)
training/9433
100.00% Category.UNKNOWN (91 sentences)
training/9434
100.00% Category.UNKNOWN (63 sentences)
training/9435
100.00% Category.UNKNOWN (10 sentences)
training/9436
60.00% Category.UNKNOWN (239 sentences)
training/9436
40.00% male (288 sentences)
training/9437
100.00% Category.UNKNOWN (180 sentences)
training/9438
100.00% Category.UNKNOWN (26 sentences)
training/944
100.00% Category.UNKNOWN (27 sentences)
training/9441
100.00% Category.UNKNOWN (33 sentences)
training/9443
100.00% Category.UNKNOWN (123 sentences)
t