# NLP VI: Conversational Bots and Voice Recognition

This notebook aims to arm you with the essential knowledge and practical skills to build your own conversational bot. But we won't stop there! You'll also delve into the captivating realm of voice recognition technology, enabling even more dynamic interactions with your bot.


## Introduction to Basic Chat Functionality

We'll start by creating a simple chatbot using the NLTK Chat package. This user-friendly library provides a class designed for building simple chatbots, which utilize pattern-matching algorithms to intelligently respond to sentences input by users with contextually appropriate, automatically-generated replies.

In [None]:
from nltk.chat.util import Chat, reflections

In [None]:
reflections = {"yo soy":"tú eres"}

Let us define a tuple containing a set of pattern-response pairs for a simple chatbot.

This setup allows the chatbot to have a dynamic, though limited, range of responses depending on the user's input.

In [None]:
pairs = (
  (r'Necesito (.*)',
   ("Por qué necesitas %1?",
    "Realmente te aydar %1?",
    "Estás seguro de que necesitas %1?")),
    
  (r'salir',
   ("Gracias por hablar conmigo.",
    "Adiós.",
    "Gracias, que tengas un buen día!")),
 
  (r'(.*)',
   ("Por favor, cuéntame más detalles.",
    "Cambiemos de tema... cuentame qué tal tu familia.",
    "Podrías ser más específico?",
    "Por qué dices %1?",
    "Entiendo.",
    "Muy interesante.",
    "%1.",
    "Ya veo.  Y eso qué representa para tí?",
    "Cómo te hace sentir eso?",
    "Cómo te sientes cuando dices eso?"))
)

Let us now initialize a new instance of the Chat class, part of the NLTK chat utility. The `pairs` argument is a tuple containing your predefined patterns and responses for the chatbot to use. The `reflections` is a dictionary that contains a set of input text to output text mapping, allowing the chatbot to replace words like "I am" with "you are".

In [None]:
chatbot = Chat(pairs, reflections)

We create a function for the chatbot entering a loop, waiting for user input and responding based on the pattern-response pairs you've defined. It will also apply any specified reflections to the user input before pattern matching.

In [None]:
def testChat():
    print("Bot de prueba\n","-"*62)
    print('escribe "salir" para terminar la conversación.')
    print('='*62)
    print("Hola.  ¿Qué tal estás?")

    chatbot.converse(quit='salir')

In [None]:
testChat()

Interacting with the chatbot. The chatbot's mechanism is based on pattern matching. It waits for user input, then searches for a pattern in its predefined pairs that matches the input. Once it finds a match, it generates a response based on the corresponding reply in the pair.

The `chatbot.converse()` method does not show the user input as part of its default behavior. You can modify the `testChat()` function to directly call `chatbot.respond()` in order to include both user and bot messages in the output. 

**Excercice:** Update the previous version of the chatbot code so that the output resembles a natural conversation between the user and the bot like this one:

<div style="text-align:center;">
<img src="Images/ChatBot.png" width="300">
</div>

**Hint:** ChatGPT is your friend, optimize your time.

In [None]:
# Type your code here:


Ok... your thoughts are correct... 

It seems like a crazy conversation...

## SPEECH RECOGNITION AND SPEECH SYNTHESIS

Speech recognition involves the conversion of spoken language into text (string), whereas speech synthesis accomplishes the reverse—turning text into spoken language (audio).

Both processes can be initiated either from a pre-recorded audio file or directly from a device's microphone. Upon capturing the sound source, the audio is then digitized (converted into digital data). Subsequently, decoding based on Markov models is applied to this digital data, resulting in a textual output.

This decoding is a complex task that involves slicing the audio into 10-millisecond fragments, mapping each phoneme, and then determining the most likely word sequence. Given the intensive computational resources required to perform these operations, cloud-based services are often employed for this purpose. While it's technically possible to create these Markov models from scratch, the effort, resources, and time needed make it an impractical endeavor.


### Requirements

In order to follow this notebook correctly we will need to have a microphone in our computer and the following libraries installed:

* SpeechRecognition
* Pyaudio

In case you don't have them installed, here's how to install them.

#### SpeechRecognition

1. We will open a console of "Anaconda Promt".
2. We will execute the command: `pip install SpeechRecognition`

#### Pyaudio

1. We will open a console of "Anaconda Promt".
2. We will execute the command: `conda install pyaudio`
3. Below is this one: `conda install -c anaconda portaudio`
4. Finally this one: `pip install pyttsx3`

### Speech recognition

First we will see how to recognise speech and convert it to text.

#### Importing libraries

In [None]:
import speech_recognition as sr

### Creating the speech recognition facility

To instantiate the speech recognition engine, all we need to do is execute the `Recognizer()` command. This will set up the recognizer object, which you can then use to perform various speech-to-text operations.

In [None]:
# Initialize recognizer
instance = sr.Recognizer()

### Selecting the Audio Source

As mentioned earlier, you have two options when it comes to selecting the source of the audio: it can either come from a WAV file or be captured directly from your device's microphone. Let's explore both methods.

#### Audio from File

Obtaining audio from a file is straightforward. All you need to do is load the file using the `AudioFile('FILE_PATH')` command. This will enable the subsequent steps of the speech recognition process to use this file as the audio source.

In [None]:
# Load audio file
audio_file_path = 'Files/example.wav'

file = sr.AudioFile(audio_file_path)

Here you have a little example:

In [None]:
# Read the audio file
with sr.AudioFile(audio_file_path) as source:
    audio_data = instance.record(source, offset=0, duration=20)
    #print("Audio Data:", audio_data)
    
# Perform speech recognition
text = instance.recognize_google(audio_data, language='es-ES')

print(text)

#### Capturing Audio from the Microphone

To start capturing audio from a device's microphone, we'll use the `Microphone()` function. This function initializes the microphone and prepares it for audio capture, allowing us to record spoken language that can be further processed for speech recognition or other audio analysis tasks.

In [None]:
# In case you get an error when executing the Microphone() command, uncomment these lines
# import sys
# !{sys.executable} -m pip install pyaudio

In [None]:
mic = sr.Microphone()

The `Microphone()` command will select by default the microphone that our system has by default, but we can select any other microphone installed on the system, even the audio output that we could use to transcribe, for example, a meeting.

To see the list of microphones, run the command `list_microphone_names()`.

In [None]:
sr.Microphone().list_microphone_names()

To select any other microphone in the list, simply specify the order in the list within the Micrphone() command.

For instance, `Microphone(device_index=12)` will select the 12th in the list, in this case "Microphone (USB Microphone)".

#### Create the audio fragment

Once we have the input, either from audio or from the microphone, we set it as the source. In this process is when we would eliminate the background noise with the command `adjust_for_ambient_noise()`.

In [None]:
def mic_conversion():
    with mic as source:
        
        #Ambient sound settings
        instance.adjust_for_ambient_noise(source)
        
        #Start recording audio
        audio = instance.listen(source)
        
        #Transcribe using google api
        transcript = instance.recognize_google(audio, language='es-ES', show_all = True)
        
        print(transcript)
        #We return the result obtained
        return transcript ['alternative'][0]['transcript']
        
def audio_conversion():
    with file as source:
        
        instance.adjust_for_ambient_noise(source)
        
        audio = instance.record(source)
        
        transcript = instance.recognize_google(audio, language='es-ES', show_all = True)
        
        print(transcript)
        
        return transcript ['alternative'][0]['transcript']

We are now going to test the functions we have created.

#### Microphone function

When we execute the command, we must wait a second to start speaking to avoid cutting off the beginning of the audio. The function will automatically stop recording after you stop speaking.

In [None]:
mic_conversion()

#### Audio processing function

Now we will do the same with the text function. It will not recognise the beginning of the text, so the recognition will not be quite correct.

The text contained in the audio is as follows:

`La Policía Nacional ha finalizado hoy la implantación de esta nueva versión del DNI y desde hoy únicamente se expedirá este DNI Europeo para todos los ciudadanos españoles.`

In [None]:
audio_conversion()

As you can see, the first model performed well, while the latter did not. There is likely room for improvement in the second model.

It's not uncommon to find that different models yield different levels of performance. If the first model performs well and the second one doesn't, there could be multiple reasons for this discrepancy.

## Speech synthesis

First we will see how to recognise speech and convert it to text.

### Importing libraries

Use the command: `pip install pyttsx3==2.6`.

In [None]:
import pyttsx3

### Recognition engine

The first thing we have to do is to create the recognition engine, to do this we just have to execute the command `init()`.

In [None]:
engine = pyttsx3.init()

### Engine configuration

Once it has been created, it is time to configure it. In our case, the configuration will be static, since we will set the language and the speed and we will not need to modify the parameters regularly.

In [None]:
#Speed setting
engine.setProperty('rate', 140)

#Language settings
engine.setProperty('voice', 'spanish')

### Talking function

Now we just need to create a function to make our engine talk.

In [None]:
def habla(texto):
    engine.say(texto)
    engine.runAndWait()

### Testing the function

Now all that remains is to test the function we have created and listen to how the text is processed. We will use the same text to compare the result.

In [None]:
texto = 'La Policía Nacional ha finalizado hoy la implantación de esta nueva versión del DNI y desde hoy únicamente se expedirá este DNI Europeo para todos los ciudadanos españoles.'

In [None]:
habla(texto)