# Flashcards
### Construire une liste de mots d'apprendre avant de lire un texte

Tokenize le texte, c'est-à-dire créér une liste de mots de base utilisés dans le texte.

In [26]:
# text = open("/Users/danjane/cltk_data/latin/text/latin_text_latin_library/falcone.txt", "r").read()
text = '''
Regia Solis erat sublimibus alta columnis, 
clara micante auro flammasque imitante pyropo, 
cuius ebur nitidum fastigia summa tegebat, 
argenti bifores radiabant lumine valvae. 
'''

import nltk_outils
tokens = nltk_outils.tokenize(text)


Identifier les mots utilisés plus que normal dans le texte

In [27]:
import collections
token_counts = collections.Counter(tokens)

all_tokens = open("all_tokens.csv", "r").readlines()
all_tokens = [token.strip() for token in all_tokens]
all_token_counts = collections.Counter(all_tokens)

token_weights = {}
for token, count in token_counts.items():
    if token in all_token_counts:
        token_weights[token] = count / all_token_counts[token]
    else:
        token_weights[token] = 1.

tokens_sorted = sorted(token_weights.items(), key=lambda kv: kv[1])
tokens_sorted = tokens_sorted[::-1]
print(tokens_sorted)


[('biforis', 0.2), ('pyropus', 0.1), ('valva', 0.025), ('ebur', 0.021739130434782608), ('nitidus', 0.014285714285714285), ('radio', 0.010869565217391304), ('sublimis', 0.007042253521126761), ('mico', 0.006578947368421052), ('fastigium', 0.00641025641025641), ('flammo', 0.0030581039755351682), ('columna', 0.0026666666666666666), ('imito', 0.0024330900243309003), ('argentum', 0.0019880715705765406), ('tego', 0.0011862396204033216), ('summum', 0.0011695906432748538), ('regius', 0.0011682242990654205), ('lumen', 0.0010822510822510823), ('clarus', 0.0009337068160597573), ('aurum', 0.0006788866259334691), ('alo', 0.0005076142131979696), ('solus1', 0.00035587188612099647), ('sum1', 2.9896259977876768e-05), ('-que', 2.962962962962963e-05), ('qui1', 1.3497644661006655e-05)]


In [28]:
tokens_to_translate = [token for token, weight in tokens_sorted]

Traduire les tokens

In [29]:
import unidecode

text = open("../FlashCards/plaintext-lewis-short/lewis-short.txt", "r").readlines()

lewis_short = {}
for line in text:
    test = line.split(',', 1)
    if len(test)>1:
        lewis_short[unidecode.unidecode(test[0])] = test[1]

# Work through the words we want, trying to match their meanings
num_flashcards = 0
for wanted in tokens_to_translate:
    if wanted in lewis_short:
        num_flashcards += 1
        print(wanted + 'flipcard%s' % (lewis_short[wanted]))
    # else:
        # print(wanted + ': not found')
    
    if num_flashcards > 5:
        break
        

biforisflipcard e (bĭfŏrus, a, um, Vitr. 4, 6 fin.), adj. bis-foris.  Having two doors or folding-doors: valvae, Ov. M. 2, 4: fenestrae, id. P. 3, 3, 5.— Having two openings or holes, double: via (narium), App. Dogm. Plat. 1.—Poet.: ubi assuetis biforem dat tibia cantum, Verg. A. 9, 618 (bisonum, imparem, Serv.); Aus. Cent. Nupt. 27: biforis tumultus, Stat. Th. 4, 668.

pyropusflipcard i, m., = πυρωπός (fire-colored), a metallic mixture, gold - bronze, bronze, Plin. 34, 8, 20, § 94; Lucr. 2, 803; Prop. 4 (5), 10, 21; Ov. M. 2, 2.

valvaflipcard ae, f., the leaf of a door, a folding-door; sing. rare, Pompon. ap. Non. 19, 23; Petr. 96; Sen. Herc. Fur. 999.—Mostly plur.: valvae, ārum, the leaves, folds, or valves of a door, a folding-door, Cic. Div. 1, 34, 74; Cic. Verr. 2, 4, 43, § 94; 2, 4, 56, § 124; Caes. B. C. 3, 105; Juv. 4, 63; Prop. 4 (5), 8, 51; Ov. M. 1, 172; 2, 4; Hor. S. 2, 6, 112; Plin. Ep. 2, 17, 5; 5, 6, 19; 5, 6, 38.

eburflipcard ŏris (cf. Quint. 1, 6, 22 sq.), n. Copt. e

Visiter https://www.cram.com/flashcards/latin-revision-10000445 pour l'example

### Une liste de tout les tokens

In [2]:
import os

#files = [file in os.listdir(
#    "/Users/danjane/cltk_data/latin/text/latin_text_latin_library/")]
dir_texts = "/Users/danjane/cltk_data/latin/text/latin_text_latin_library/"
files = os.listdir(dir_texts)
files = [file for file in files if (file.endswith(".txt")) ]

all_tokens = []
for file in files:
    print(file)
    file = open(os.path.join(dir_texts, file), "r") 
    text = file.read()
    tokens = nltk_outils.tokenize(text)
    all_tokens += tokens


epistaustras.txt
notitia2.txt
asserius.txt
sall.2.txt
12tables.txt
sidonius3.txt
levis.txt
abbofloracensis.txt
priapea.txt
quum.txt
petrarch.rom.txt
obsequens.txt
marcellinus1.txt
newton.scholium.txt
varro.ll10.txt
sall.phil.txt
sall.lep.txt
sidonius2.txt
valeriusflaccus1.txt
notitia1.txt
pliny.ep8.txt
sedulius4.txt
valeriusflaccus3.txt
sall.1.txt
poggio.txt
henrysettimello.txt
petroniusfrag.txt
appverg.catalepton.txt
wmconchesdogma.txt
xanten.txt
andronicus.txt
kalila.txt
seneca.fragmenta.txt
fragmentumlaurentianum.txt
marcellinus2.txt
marbodus.txt
fabe.txt
dares1.txt
histapoll.txt
sidonius1.txt
claud.inscr.txt
valeriusflaccus2.txt
sedulius5.txt
pliny.ep9.txt
balbus.txt
marullo.txt
janus2.txt
inscriptions.txt
pliny.ep10.txt
appvergculex.txt
sedulius1.txt
valeriusflaccus6.txt
sidonius5.txt
letabundus.txt
valmax8.txt
victor.caes.txt
valmax9.txt
jfkhonor.txt
forsett2.txt
solinus1a.txt
arbroath.txt
prosperus.rufinum.txt
catalogueliberien.txt
sidonius4.txt
andreasbergoma.txt
valeriusflaccu

In [5]:
filepath = "all_tokens.csv"

with open(filepath, 'w') as file_handler:
    for token in all_tokens:
        file_handler.write("{}\n".format(token))

In [12]:
import collections

all_token_counts = collections.Counter(all_tokens)
print(token_counts)

Counter({'qui1': 74087, '-que': 33750, 'sum1': 33449, 'ille': 21254, 'ipse': 17218, 'dico2': 16610, 'facio': 14425, 'possum': 14049, 'edo1': 12769, 'hic': 12137, 'habeo': 11960, 'omne': 11405, 'alius2': 10919, 'is': 10694, 'idem': 10437, 'video': 10081, 'magnus': 9895, 'verus': 9744, 'etiam': 8102, 'noster': 7674, 'autem': 7664, 'do': 6731, 'enim': 6646, 'tu': 6635, 'quis1': 6616, 'homo': 6562, 'atque': 6474, 'venio': 6459, 'multus': 6429, 'sui': 6269, 'bonus': 6240, 'primus': 5990, 'quia': 5973, 'tamen': 5905, 'tempus': 5837, 'filius': 5795, 'tantus': 5752, 'fero': 5620, 'inter': 5304, 'suo': 5248, 'quidam': 5187, 'debeo': 5040, 'ego': 4993, 'nullus': 4901, 'pater': 4782, 'ago': 4765, 'modus': 4711, 'pars': 4701, 'ecclesia': 4687, 'nitor1': 4677, 'ito': 4664, 'dominus': 4592, 'tuus': 4570, 'causa': 4476, 'sino': 4398, 'post': 4378, 'volo1': 4287, 'sancio': 4229, 'deus': 4201, 'duco': 4174, 'pario2': 4173, 'rego': 4137, 'nunc': 4119, 'terra': 4109, 'dies': 4100, 'urbs': 3981, 'totus1':