Skip to content

Commit

Permalink
Use the corpus in unicode
Browse files Browse the repository at this point in the history
  • Loading branch information
Fantomas42 committed Feb 3, 2015
1 parent 09a68a2 commit 2feff54
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 6 deletions.
2 changes: 1 addition & 1 deletion mots_vides/tests/corpus/french_solution.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
XX journée commence. XX X’habille XXXXX XX XXXX XXXX XX prenant XXX café. Chemise blanche repassée XX veille XXX XXX même. XXX cravate XXXXX XXXX XXX jours. XX XXX costume noir XX XXXX Sam Montiel, XXXX chic XX XXXX branché. Chaussures cuir noir. XXXXX XX aime faire remarquer : "XXXX êtes XXXX XXXX XXX chaussures, XXXX XXXX XXXXX lit. XXXXX XX faut XX bonnes chaussures XX XXX bonne literie!". XX météo X annoncée XX ciel bleu XX XXX températures XX XXXXXX XX XX normale saisonnière. X'XXXXX XX XXXX beau mois XX mai XXX S’annoncait.
XX journée commence. XX X’habille XXXXX XX XXXX XXXX XX prenant XXX café. Chemise blanche repassée XX veille XXX XXX XXXX. XXX cravate XXXXX XXXX XXX jours. XX XXX costume noir XX XXXX Sam Montiel, XXXX chic XX XXXX branché. Chaussures cuir noir. XXXXX XX aime faire remarquer : "XXXX êtes XXXX XXXX XXX chaussures, XXXX XXXX XXXXX lit. XXXXX XX faut XX bonnes chaussures XX XXX bonne literie!". XX météo X annoncée XX ciel bleu XX XXX températures XX XXXXXX XX XX normale saisonnière. X'XXXXX XX XXXX beau mois XX mai XXX X’annoncait.
11 changes: 6 additions & 5 deletions mots_vides/tests/stop_words.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,13 @@ class StopWordRebaseFunctionalTestCase(TestCase):
def test_stop_word_rebase_functional(self):
current_dir = os.path.dirname(__file__)
file_name = os.path.join(current_dir, 'corpus', 'french.txt')
file_content = '\n'.join(open(file_name).readlines())
file_name_solution = os.path.join(current_dir,
'corpus', 'french_solution.txt')
file_content_solution = '\n'.join(open(file_name_solution).readlines())
file_content = '\n'.join(open(file_name).readlines()).decode('utf-8')
solution_name = os.path.join(current_dir,
'corpus', 'french_solution.txt')
solution_content = '\n'.join(open(solution_name).readlines()
).decode('utf-8')

factory = StopWordFactory()
stop_words = factory.get_stop_words('fr')
file_content_rebased = stop_words.rebase(file_content)
self.assertEqual(file_content_rebased, file_content_solution)
self.assertEqual(file_content_rebased, solution_content)

0 comments on commit 2feff54

Please sign in to comment.