# Describe: Creating 'cloze' exercises with Cicero

## Devise



## Plan

As always, let's plan out our work before we start writing Python code. We will use the following steps to create our cloze exercises:

**Pseudocode for Cicerorian 'cloze' exercises**

- Load our library of Latin texts, keeping only those by Cicero
- Create a list of sentences from which we can draw our exercises, keeping them at a certain length (~10-25 words)
- Pick a sentence at random
- Pick a word at random to mask
- Create a set of multiple-choice answers, i.e. three random words in addition to the removed word
- Ask user for input to test whether the removed word can be correctly identified

## Code

In [1]:
# Preliminary imports
from natsort import natsorted
from pprint import pprint
from time import sleep

As always, let's set up our corpus reader and pull out the texts we want to describe.

In [28]:
# PC 1: Load our library of Latin texts, keeping only those by Cicero

from cltkreaders.lat import LatinTesseraeCorpusReader

T = LatinTesseraeCorpusReader()

cicero = natsorted([fileid for fileid in T.fileids() if 'de_finibus' in fileid])
pprint(cicero[:10])

['cicero.de_finibus_bonorum_et_malorum.part.1.tess',
 'cicero.de_finibus_bonorum_et_malorum.part.2.tess',
 'cicero.de_finibus_bonorum_et_malorum.part.3.tess',
 'cicero.de_finibus_bonorum_et_malorum.part.4.tess',
 'cicero.de_finibus_bonorum_et_malorum.part.5.tess']


In [37]:
# PC 2a: Create a list of sentences from which we can draw our exercises

sents = list(T.sents(fileids=cicero))

In [38]:
# Show example sentences

for i, sent in enumerate(sents[:10], 1):
    print(f'{i}: {sent}')

1: Non eram nescius, Brute, cum, quae summis ingeniis exquisitaque doctrina philosophi Graeco sermone tractavissent, ea Latinis litteris mandaremus, fore ut hic noster labor in varias reprehensiones incurreret.
2: nam quibusdam, et iis quidem non admodum indoctis, totum hoc displicet philosophari.
3: quidam autem non tam id reprehendunt, si remissius agatur, sed tantum studium tamque multam operam ponendam in eo non arbitrantur.
4: erunt etiam, et ii quidem eruditi Graecis litteris, contemnentes Latinas, qui se dicant in Graecis legendis operam malle consumere.
5: postremo aliquos futuros suspicor, qui me ad alias litteras vocent, genus hoc scribendi, etsi sit elegans, personae tamen et dignitatis esse negent.
6: contra quos omnis dicendum breviter existimo.
7: Quamquam philosophiae quidem vituperatoribus satis responsum est eo libro, quo a nobis philosophia defensa et collaudata est, cum esset accusata et vituperata ab Hortensio.
8: qui liber cum et tibi probatus videretur et iis, quo

In [39]:
print(len(sents))

2512


In [40]:
sents = [sent.as_doc() for sent in sents if len(sent) > 10 and len(sent) < 25]
for i, sent in enumerate(sents[:10], 1):
    print(f'{i}: {sent}')

1: nam quibusdam, et iis quidem non admodum indoctis, totum hoc displicet philosophari. 
2: quidam autem non tam id reprehendunt, si remissius agatur, sed tantum studium tamque multam operam ponendam in eo non arbitrantur. 
3: erunt etiam, et ii quidem eruditi Graecis litteris, contemnentes Latinas, qui se dicant in Graecis legendis operam malle consumere. 
4: sive enim ad sapientiam perveniri potest, non paranda nobis solum ea, sed fruenda etiam sapientia est; 
5: Iis igitur est difficilius satis facere, qui se Latina scripta dicunt contemnere. 
6: Synephebos ego, inquit, potius Caecilii aut Andriam Terentii quam utramque Menandri legam? 
7: 'ferreum scriptorem', verum, opinor, scriptorem tamen, ut legendus sit. 
8: rudem enim esse omnino in nostris poe+tis aut inertissimae segnitiae est aut fastidii delicatissimi. 
9: mihi quidem nulli satis eruditi videntur, quibus nostra ignota sunt. 
10: nam si dicent ab illis has res esse tractatas, ne ipsos quidem Graecos est cur tam multos lega

This process of loading these sentences into memory takes 15 seconds on my machine. To load all of the sentences from the Cicero files would take even longer. When we find outselves in a situation like this it can often be a huge timesaver to write these kinds of computation-intensive results to disk for quick retrieval later. Here is an example of "pickling" the sentences we just loaded so they can be loaded from disk as opposed to reprocessed. 

In [41]:
import pickle

sents = [sent for sent in sents] # Convert to strings
pickle.dump(sents, open('../data/cicero-sents.pickle', 'wb'))

In [50]:
sents = pickle.load(open('../data/cicero-sents.pickle', 'rb'))
sents[0]

nam quibusdam, et iis quidem non admodum indoctis, totum hoc displicet philosophari. 

In [56]:
# PC 3: Pick a sentence at random

import random
random.seed(42)

exercise = random.choice(sents)
exercise

A primo, ut opinor, animantium ortu petitur origo summi boni. 

In [59]:
# PC 4: Pick a word at random to mask

In [61]:
for i, token in enumerate(exercise):
    print(f'{i}: {token}')

0: A
1: primo
2: ,
3: ut
4: opinor
5: ,
6: animantium
7: ortu
8: petitur
9: origo
10: summi
11: boni
12: .


In [63]:
for token in exercise:
    print(f'{token.i}: {token.text}')

0: A
1: primo
2: ,
3: ut
4: opinor
5: ,
6: animantium
7: ortu
8: petitur
9: origo
10: summi
11: boni
12: .


In [66]:
for token in exercise:
    print(f'{token.i}: {token.is_alpha}')

0: True
1: True
2: False
3: True
4: True
5: False
6: True
7: True
8: True
9: True
10: True
11: True
12: False


In [67]:
remove_options = [token.i for token in exercise if token.is_alpha]

In [125]:
random.seed(1)
remove_choice = random.choice(remove_options)
remove_choice

3

In [126]:
cloze = ' '.join([token.text if token.i != remove_choice else '_____' for token in exercise])

In [127]:
cloze

'A primo , _____ opinor , animantium ortu petitur origo summi boni .'

In [80]:
answer = exercise[remove_choice].text
answer

'ut'

In [82]:
# PC 5: Create a set of multiple-choice answers, i.e. three random words in addition to the removed word

In [93]:
vocab = set([word.text for sent in sents for word in sent])

In [101]:
random.seed(42)
wrong_answers = random.sample(list(vocab - {answer}), 3)
wrong_answers

['futura', 'mercede', 'Principio']

In [105]:
# PC 6: Ask user for input to test whether the removed word can be correctly identified

quiz = {cloze: [answer] + wrong_answers}
pprint(quiz)

{'A primo , _____ opinor , animantium ortu petitur origo summi boni .': ['ut',
                                                                         'futura',
                                                                         'mercede',
                                                                         'Principio']}


In [113]:
for question, alternatives in quiz.items():
    correct_answer = alternatives[0]
    for alternative in sorted(alternatives):
        print(f"  - {alternative}")
    print()
    
    answer = input(f"{question}? ")
    if answer == correct_answer:
        print("Correct!")
    else:
        print(f"Incorrect! The answer is {correct_answer}")

  - Principio
  - futura
  - mercede
  - ut

Correct!


In [134]:
def create_cloze_qa_bank(sents, vocab, n=10):
    sents = random.sample(sents, n)
    cloze_qa_bank = {}
    for sent in sents:
        remove_options = [token.i for token in sent if token.is_alpha]
        remove_choice = random.choice(remove_options)
        cloze = ' '.join([token.text if token.i != remove_choice else '_____' for token in sent])
        answer = sent[remove_choice].text
        wrong_answers = random.sample(list(vocab - {answer}), 3)
        cloze_qa_bank[cloze] = [answer] + wrong_answers
    return cloze_qa_bank

In [136]:
quiz = create_cloze_qa_bank(sents, vocab, n=10)

for question, alternatives in quiz.items():
    correct_answer = alternatives[0]
    for alternative in sorted(alternatives):
        print(f"  - {alternative}")
    print()
    
    answer = input(f"{question}? ")
    if answer == correct_answer:
        print("Correct!")
    else:
        print(f"Incorrect! The answer is {correct_answer}")
    print()

  - angoribus
  - malitias
  - mediocrem
  - quid

Correct!
  - Democritum
  - curet
  - patronusne
  - profecto

Correct!
  - Postumius
  - in
  - novi
  - velles

Correct!
  - aliis
  - laudem
  - posui
  - tota

Incorrect! The answer is posui
  - ageremus
  - illa
  - interest
  - multitudinem

Correct!
  - comem
  - fortitudinis
  - nec
  - re

Correct!
  - accessit
  - egens
  - in
  - meminisse

Correct!
  - describit
  - pertinenti
  - quoniam
  - saxum

Correct!
  - effectum
  - profuso
  - quo
  - sese

Correct!
  - aeternum
  - delectantur
  - enim
  - mori

Incorrect! The answer is enim


TK

## Explore

### Next steps

- ***Change author***: It is becoming a pattern! But that is because this is where exploration lies for us, at least in the early stages. Experiment with sentences from authors other than Cicero, or the works of Cicero that we have not yet looked at. 
- ***Change objective***: Try inserting a random word into a sentence and seeing if the user can identify the errant addition. Try scrambling the letters of one or more words (all?) in a sentence. Get the part of speech of masked words and ask the user for madlib style insertions. This is a Deform experiment—feel free to manipulate the text in any way you see fit.

### For the future

- - ***Consider the multiple choice***: Right now we are inserting random words from Cicero's vocabulary into the multiple choice. There are more principled ways of going about this process though. It may already have been clear from previous examples that 

## Further Reading
- Jacques André. 1949. *Étude sur les termes de couleur dans la langue latine*. Paris: Librairie C. Klincksieck.
- Bramble, J.C. 1983. “Lucan.” In Kenney, E.J. ed. *Cambridge History Latin Literature: The Early Principate*. Cambridge: Cambridge University Press. 533–57.
- Bradley, M. 2009. *Colour and Meaning in Ancient Rome*. Cambridge Classical Studies. Cambridge: Cambridge University Press.
- Gries, S.T. 2016. Quantitative Corpus Linguistics with R: A Practical Introduction. 2nd ed. New York: Routledge. doi:10.4324/9781315746210. Spec. Ch. 2.2 on "frequency lists".