# Texts 4 to 6 Vocabulary Item Extraction Evaluation

This notebook calls the TextScorer Class (which in turn calls the TextItems class) to extract vocabulary items from the specified text files. We then analyse the success of the extraction.

#### Create instances of the TextScorer class to extract vocab items and their teacher scores

In [1]:
from tool.TextScorer import TextScorer
from tool.TeacherScores import TeacherScores

In [2]:
text4 = TextScorer('files/sample_texts/text4.txt')
text4_with_teacherscores=TeacherScores('text4').add_teacher_scores(text4.master_table)

text5 = TextScorer('files/sample_texts/text5.txt')
text5_with_teacherscores=TeacherScores('text5').add_teacher_scores(text5.master_table)

text6 = TextScorer('files/sample_texts/text6.txt')
text6_with_teacherscores=TeacherScores('text6').add_teacher_scores(text6.master_table)

In [3]:
#create dictionary instances of the target vocabulary items (as extracted by teachers) for each text
target_vocab_items_text4 = TeacherScores('text4').teacher_scores_dict
target_vocab_items_text5 = TeacherScores('text5').teacher_scores_dict
target_vocab_items_text6 = TeacherScores('text6').teacher_scores_dict

In [4]:
#helper method to evaluate whether all teacher target items have been extracted
def evaluate_found(tool, teacherdict):
    teacherdict
    count_total=0
    count_found=0
    count_notfound=0
    count_notfound_single=0
    not_found=[]
    for key in teacherdict:
        count_total+=1
        if key in tool:
            count_found+=1
        else:
            count_notfound+=1
            not_found.append(key)
            if type(key)!=tuple:
                count_notfound_single+=1
    print(count_total, 'target vocab items.', count_found, 'found, (', (round((count_found/count_total)*100),0), '%).', 
          count_notfound, 'unfound, of which', count_notfound_single, 'is/are single word(s). Unfound items: \n', not_found)

#### Evaluate the success of the extraction in each test by examining its list of vocabulary items and comparing to teacher targets

In [5]:
print('Text 4')
evaluate_found(text4_with_teacherscores['word_in_text'].tolist(), target_vocab_items_text4)

Text 4
52 target vocab items. 36 found, ( (69, 0) %). 16 unfound, of which 2 is/are single word(s). Unfound items: 
 [('credit', 'card', 'debt'), ('eat', 'the', 'fees'), ('expand', 'the', 'pie'), ('focus', 'on'), ('land', 'a', 'client'), ('marketable', 'skills'), ('master', 'class'), ('on', 'your', 'own', 'time'), ('payment', 'plan'), ('product', 'coaches'), 'productizing', ('put', 'in', 'the', 'work'), 'scale', ('side', 'business'), ('walk', 'you', 'through'), ('work', 'one', 'on', 'one')]


In [6]:
print('Text 5')
evaluate_found(text5_with_teacherscores['word_in_text'].tolist(), target_vocab_items_text5)

Text 5
43 target vocab items. 36 found, ( (84, 0) %). 7 unfound, of which 0 is/are single word(s). Unfound items: 
 [('apply', 'for'), ('cover', 'costs'), ('force', 'to'), ('illegal', 'activities'), ('out', 'of', 'fear'), ('set', 'restrictions'), ('throw', 'into', 'disarray')]


In [7]:
print('Text 6')
evaluate_found(text6_with_teacherscores['word_in_text'].tolist(), target_vocab_items_text6)

Text 6
46 target vocab items. 35 found, ( (76, 0) %). 11 unfound, of which 1 is/are single word(s). Unfound items: 
 [('(across', 'the)', 'globe'), ('(parental)', 'leave', '(benefits)'), ('(purpose)', 'driven'), ('address', '[verb]'), ('committed', 'to'), ('date', 'back', 'to'), 'impactful', ('leading', 'news'), ('make', 'headlines'), ('set', '(goals)'), ('track', '(progress)')]


#### Consider the unique extracted vocabulary items for each text

In [8]:
print('Text 4 Extracted Items:',(set([item for item in text4_with_teacherscores['word_in_text'].tolist()])), '\n')
print('Text 5 Extracted Items:',(set([item for item in text5_with_teacherscores['word_in_text'].tolist()])), '\n')
print('Text 6 Extracted Items:',(set([item for item in text6_with_teacherscores['word_in_text'].tolist()])), '\n')

Text 4 Extracted Items: {('break', 'down'), 'day', 'new', 'focus', 'ridiculous', 'no-brainer', 'read', 'them', 'work', 'success', 'button', 'my', 'zero', 'everyone', 'who', ('on', 'the', 'side'), 'credit-card', 'ensure', 'delighted', 'plans', 'eat', 'focuses', ('i', 'know'), 'keep', 'now', 'first', 'take', ('know', 'what'), 'it', 'shows', ('check', 'out'), 'that', 'want', 'membership', 'ban', 'follow-ups', 'but', 'finance', 'save', 'the', ('cut', 'back'), 'mornings', 'material', 'book', 'me', 'thing', 'skills', 'your', 'complete', 'clients', 'students', 'days', 'period', 'they', 'system', 'what', 'three', 'master', 'common', 'works', 'yet', 'or', ('scale', 'up'), 'each', 'offer', ('what', 'if'), 'client', 'busy', 'page', 'click', 'idea', ('on', 'the', 'fence'), 'spending', 'crafted', 'money', 'consulting', 'would', 'for', ('data', 'points'), 'hours', 'worry', 'walk', ('there', 'be'), 'got', 'ask', 'guide', 'job', 'pay', 'from', ('lie', 'around'), 'step-by-step', 'hi', 'techniques', 'to