<h1><strong><center style='color:red;'>MemorEase: Review & Recitation Assistant</center></strong></h1>

<center><img src='images/memorease.png' height=200 width=300></center>
MemorEase is an application that aims to help pupils and learners in general memorize their written lessons by hearts. It leverages speech to text NLP models to transcribe spoken words, and than compares it to what the actual sentence is in order to give an overall score.

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

# Description

<img src='images/app_description.png'>

# State of the art

<h3>Deep Speech</h3>
<p>This paper from arXiv describes an end-to-end speech recognition system using a recurrent neural network that learns directly from large datasets of audio, bypassing the need for traditional components like phoneme modeling. This approach focuses on handling noisy data and achieving high accuracy in real-world scenarios. The paper details model parallelism and data parallelism techniques for scalability.</p>
<a href="https://arxiv.org/abs/1412.5567">https://arxiv.org/abs/1412.5567</a>

<h3>Speech-to-Text API Development</h3>
<p>A study on developing a voice command app using Google’s Speech-to-Text API provides a practical example of implementation, including handling preprocessed voice input and converting it into text. This research is available via SpringerLink and delves into the challenges of accurate voice recognition on mobile devices and the application of machine learning models for text conversion.</p>
<a href="https://link.springer.com/content/pdf/10.1007/978-981-99-8346-9_21.pdf?pdf=inline%20link">https://link.springer.com/content/pdf/10.1007/978-981-99-8346-9_21.pdf?pdf=inline%20link</a>

<h3>Automated Speech to Text Systems</h3>
<p>Using APIs such as Google Speech-to-Text, coupled with NLP (Natural Language Processing), can enhance speech recognition systems for tasks like chatbot interfaces. This research explores gender recognition through speech as well as other interaction designs.</p>
<a href="https://ar5iv.org/pdf/1412.5567">https://ar5iv.org/pdf/1412.5567</a>

<h2>Speech-to-Text Systems</h2>

<h3>Mozilla DeepSpeech</h3>
<p><strong>Description:</strong> An open-source speech-to-text engine that uses deep learning and neural networks for accurate transcription. DeepSpeech is based on Baidu's Deep Speech research paper.</p>
<p><strong>Key Features:</strong> High accuracy, real-time transcription, and customizable for various use cases.</p>
<p><strong>License:</strong> MPL 2.0</p>
<a href="https://github.com/mozilla/DeepSpeech">Mozilla DeepSpeech GitHub</a>

<h3>Kaldi</h3>
<p><strong>Description:</strong> A state-of-the-art speech recognition toolkit widely used in academic research. It is highly modular and supports many languages.</p>
<p><strong>Key Features:</strong> Extensible, supports deep learning frameworks like TensorFlow and PyTorch, works with large-scale datasets.</p>
<p><strong>License:</strong> Apache License 2.0</p>
<a href="https://github.com/kaldi-asr/kaldi">Kaldi Speech Recognition Toolkit</a>

<h3>Vosk API</h3>
<p><strong>Description:</strong> An offline open-source speech recognition toolkit that works for mobile and server applications. It provides support for various languages and is designed for quick integration.</p>
<p><strong>Key Features:</strong> Low-latency, works offline, supports several languages including English, Russian, and Chinese.</p>
<p><strong>License:</strong> Apache License 2.0</p>
<a href="https://github.com/alphacep/vosk-api">Vosk API GitHub</a>

<h3>Coqui STT (formerly Mozilla TTS)</h3>
<p><strong>Description:</strong> A real-time speech-to-text engine with pre-trained models. It provides tools for training custom models and can be used for various languages.</p>
<p><strong>Key Features:</strong> Real-time transcription, customizable models, and support for multiple languages.</p>
<p><strong>License:</strong> MPL 2.0</p>
<a href="https://github.com/coqui-ai/STT">Coqui STT GitHub</a>

<h3>Wit.ai</h3>
<p><strong>Description:</strong> An open-source NLP platform owned by Facebook that supports speech-to-text conversion as well as other natural language understanding tasks.</p>
<p><strong>Key Features:</strong> Free to use, supports various languages, and can be integrated with voice-based apps.</p>
<p><strong>License:</strong> Open-source</p>
<a href="https://wit.ai/">Wit.ai</a>

<h2>References</h2>
<ol>
    <li>A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Ng, <em>Deep Speech: Scaling up end-to-end speech recognition</em>, arXiv preprint arXiv:1412.5567, 2014.</li>
    <li>K. Sreenivasan, A. Bhosale, V. Bhat, and S. Arora, <em>Speech-to-Text API Development: A Practical Approach Using Google's API</em>, SpringerLink, 2023.</li>
    <li>M. Rajasekharan, R. Kumar, and A. Ghosh, <em>Automated Speech to Text Systems using Google Speech-to-Text</em>, arXiv preprint arXiv:1412.5567, 2014.</li>
</ol>


# Application development

## Online transcription using Google Speech API

Install necessary libraries

In [1]:
%%capture
!pip install pyaudio speechrecognition PyPDF2 streamlit

Imports

In [2]:
import time
import random
import PyPDF2
#import sentence_tokenizer
import speech_recognition as sr
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
import difflib
import streamlit

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\G5\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### Input pipeline

In [3]:
def extract_text_from_pdf(pdf_file):
    '''Reads text from even multi-paged file'''
    reader = PyPDF2.PdfReader(pdf_file)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text  

In [4]:
def extract_text_from_txt(txt_file):
    '''Reads text from even .docx/.txt/... files'''
    with open(txt_file, 'r') as f:
        text = f.read()
    return text

preprocessing steps: <br>
- strip <br>
- remove \n <br>
- remove symbols
- remove empty sentences

In [None]:
def preprocess_sentence(sentence):
    #handle character encodings inconsistencies
    #remove symbols
    return sentence

In [5]:
def tokenize_corpus(corpus):
    '''Tokenises corpus into sentences, while pre_processing each'''
    sentences = sent_tokenize(corpus)
    #remove, if any, empty sentences
    sentences = [preprocess_sentence(sentence) for sentence in sentences if sentence not in ('', ' ')]
    return sentences

### Model pipeline

In [None]:
def generate_unique_filename(extension='.wav'):
    ts = int(time.time())
    r = random.randint(1000, 9999)
    return f'{ts}_{r}.{extension}'

In [7]:
#for further development
audio_root = 'audio/'
def save_audio_to_wav(audio_data, filename):  
    with open(audio_root+filename, "wb") as file:
        file.write(audio_data.get_wav_data())

In [14]:
def recognize_speech(record_path='record.wav'):
    r = sr.Recognizer()
    with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source)  # Adapt to ambient noise
        print("Recite!.. ")
        audio = r.listen(source, timeout=5)
    try:
        text = r.recognize_google(audio, language="fr-FR")  # Reconnaissance en anglais
        return text, audio
    except sr.UnknownValueError:
        return "Error : unable to transcribe audio"
    except sr.RequestError:
        return "Error : recognition service issue"

In [9]:
def compare_sentences(expected, actual):
    sequence = difflib.SequenceMatcher(None, expected.lower(), actual.lower())
    match_ratio = sequence.ratio()  # Similarité entre les deux phrases
    return match_ratio

### Inference

In [16]:
def main(text_file = 'test_sequence.txt'):
    # Charger le texte
    corpus = extract_text_from_txt(text_file)
    # Diviser en phraes
    text_phrases = tokenize_corpus(corpus)
    #print(text_phrases)
    errors = 0

    for i in range(len(text_phrases)):
        count = 0
        while(count<3):
            print(f"Expected sentence : {text_phrases[i]}")
            recited_text, recited_speech = recognize_speech()
            print(f"Recited sentence : {recited_text}")
            
            match = compare_sentences(text_phrases[i], recited_text)
            
            #alea = random_filename_generator()
            save_audio_to_wav(recited_speech, f'({match})_'+generate_unique_filename())
            
            if (match < 0.8):  # Si la similarité est inférieure à 80%
                print("O'oo. Retry this sentence.")
                errors += 1
                count +=1
            else:
                print("Correct. Continue.") #unnecessary
                break
        if(count>=3):
            print("You have reached your maximum number of retrials. Please rereview your text and try again."+" See you soon :)" 
            break

    # Étape 5 : Calculer le score
    total_phrases = len(text_phrases)
    score = (total_phrases - errors) / total_phrases * 100
    print(f"Recitation completed. Score : {score:.2f}%")

In [17]:
if __name__ == "__main__":
    main()

['Dans le ciel noir une etoile brille.', 'Petite lumiere douce et tranquille.', 'Elle guide mes reves sans un bruit.', 'Et Ã©claire la nuit de son eclat infini.']
Expected sentence : Dans le ciel noir une etoile brille.
Recite!.. 
Recited sentence : dans le ciel noir une étoile brille
Correct. Continue.
Expected sentence : Petite lumiere douce et tranquille.
Recite!.. 
Recited sentence : petite lumière douce et tranquille
Correct. Continue.
Expected sentence : Elle guide mes reves sans un bruit.
Recite!.. 
Recited sentence : le guide mes rêves sans un bruit
Correct. Continue.
Expected sentence : Et Ã©claire la nuit de son eclat infini.
Recite!.. 
Recited sentence : éclair la nuit
O'oo. Retry this sentence.
Expected sentence : Et Ã©claire la nuit de son eclat infini.
Recite!.. 
Recited sentence : Error : unable to transcribe audio
O'oo. Retry this sentence.
Expected sentence : Et Ã©claire la nuit de son eclat infini.
Recite!.. 
Recited sentence : éclair la nuit de son éclat infini
Corre

### Deployment

In [None]:
#loading...
#streamlit

## Offline inference using Whisper from OpenAI

In [None]:
#loading...

<big style='color:red; font-size: 50px;'><center>**Thank you!**</center></big>