Let's practice in French! 🔴 ⚪ 🔵

This is a simple app to do practice in french language, based on a pretrained model for the fill-mask task.

📥 Installation

to be done asap

🚀 Usage

| Example of the FRENCH EXERCISES APP on Windows 11 with light mode and 'blue' theme

FRENCH_EXERCISES_APP.mp4

| _Video example of how to do practice with FRENCH EXERCISES app! 🎈 _

💡 Idea

The idea behind this project is to exploit existing pretrained model on fill-mask task in order to learn better a language. In particular, given a text or even better a bunch of text, it's possible collect a set of sentences to create a Complete the sentence with the missing word test. Indeed, for each sentence it's possible to mask randomly a word and offer to the user a list of possible words/answers. One of them is the right one, the remaining are the most possible returned by a specific pretrained model on french language.

✨ Features and Technologies

The relevant features exploited in this projects are:

Package configuration with pyproject.toml built with hatch
Code formatting and linting with ruff, and black
pre-commit configuration file
CI-CD Pipelines with GitHub Actions
Basic pytest set-up for unit tests
Auto-generated docs with mkdocs and mkdocs-material
mBARThez as pretrained model for the fill-mask task
CustomTkinter as a python UI-library based on Tkinter

🎯 Fill-Mask task

Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. I decided to exploit a model trained for this kind of task, in particular for the french language.

In the next section, the exploited model is described more in detail, i.e. mBARThez

🎨 Model description

The mBARThez model is a child of BART, which is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by corrupting text with an arbitrary noising function, and learning a model to reconstruct the original text. BARThez is pretrained by learning to reconstruct a corrupted input sentence. A corpus of 66GB of french raw text is used to carry out the pretraining. Both its encoder and its decoder are pretrained. In addition to BARThez, that is pretrained from scratch, the multilingual BART, mBART, is continuously pretrained which boosted its performance in both discriminative and generative tasks. The french adapted version is called mBARThez and it's the one exploited in this project.

For more detail on the model, check the following sources: paper and github.

📨 Contact

Author: Giulia Gualtieri Email: giulia.gualtieri@mail.polimi.it

About me: Data Scientist 🔬 with strong passion for Machine Learning 💻 and Artificial Intelligence 💭 applications. As you can read, I'm also passionate about french language and culture. Language speaken: English, Italian, French.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
apps		apps
data		data
docs		docs
french_exercises		french_exercises
notebooks		notebooks
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pipeline.yaml		pipeline.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Let's practice in French! 🔴 ⚪ 🔵

Table of Contents

📥 Installation

🚀 Usage

💡 Idea

✨ Features and Technologies

🎯 Fill-Mask task

🎨 Model description

📨 Contact

About

Releases

Packages

Languages

License

GiuliaGualtieri/french-exercises

Folders and files

Latest commit

History

Repository files navigation

Let's practice in French! 🔴 ⚪ 🔵

Table of Contents

📥 Installation

🚀 Usage

💡 Idea

✨ Features and Technologies

🎯 Fill-Mask task

🎨 Model description

📨 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages