- 2022 Project Artificial Intelligence Group D
- Written by: Paul, Remy, Karine, Pierre
- Team:
- Created on: 2022-05-05
- Last updated on: 2022-10-05
In the school, during project time when the students work in autonomy, they are expected to speak english, but they are often speaking french.
We are building an intelligent device that can recognize if the people are speaking french or english.
It will turn on a red Led when people are speaking french.
Summary of the problem (from the perspective of the user), the context, suggested solution, and the stakeholders.
Terms | Definitions |
---|---|
AI | Artificial Intelligence : the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages |
DL | Deep Learning : It was created by drawing inspiration from the neural networks found in the human brain, which means that deep learning is made up of a large number of layers of interconnected artificial neurons, hence the name "deep learning". |
Reasons why the problem is worth solving Origin of the problem How the problem affects users and company goals Past efforts made to solve the solution and why they were not effective How the product relates to team goals, OKRs How the solution fits into the overall product roadmap and strategy How the solution fits into the technical strategy
Product requirements in the form of user stories Technical requirements
The device will not speak.
The device will not translate french to english.
Product and technical requirements that will be disregarded
Conditions and resources that need to be present and accessible for the solution to work as described.
Current solution description Pros and cons of the current solution
External components that the solution will interact with and that it will alter Dependencies of the current solution Pros and cons of the proposed solution Data Model / Schema Changes Schema definitions New data models Modified data models Data validation methods Business Logic API changes Pseudocode Flowcharts Error states Failure scenarios Conditions that lead to errors and failures Limitations Presentation Layer User requirements UX changes UI changes Wireframes with descriptions Links to UI/UX designer’s work Mobile concerns Web concerns UI states Error handling Other questions to answer How will the solution scale? What are the limitations of the solution? How will it recover in the event of a failure? How will it cope with future requirements?
- Python
- Version
3.9.X
- Why ? Python is an interpreted and dynamic programming language which is ideal to work AI.
- Version
- Tensorflow
- Version
2.8.X
- Why ? Tensorflow is a library for machine learning.
- Version
- NumPy
- Version
1.22.X
- Why ?
NumPy is a library written in
C
with the purpose of better efficiency, rather than using python default list-type we'll usenumpy.array
.
- Version
- Matplotlib
- Version
3.5.X
- Why ? Matplotlib is a library for plotting graphs, we'll use this library especialy for rendering spectrograms of the sound.
- Version
Explanations of how the tests will make sure user requirements are met Unit tests Integrations tests QA
Deployment architecture Deployment environments Phased roll-out plan e.g. using feature flags Plan outlining how to communicate changes to the users, for example, with release notes
Short summary statement for each alternative solution
Pros and cons for each alternative
Reasons why each solution couldn’t work
Ways in which alternatives were inferior to the proposed solution
Migration plan to next best alternative in case the proposed solution falls through
To see this project through we will use the method which use the Fast Fourier Transform(signal processing algorithm), it consist to analyze the waves of the audio and determine the similarities between the english dataset and the english spoken in room.
We go to use this methods because it's more accurate and faster in the processing of data by the artificial intelligence than the next method that I am going to present you.
We could have chosen to build our project with the speech to text method.
This method consists in recovering the pronounced words and sentences and to analyze them to make the distinction between the two language(English/French).
This method use the library pyttsx3.
What is the cost to run the solution per day? What does it cost to roll it out?
The potential threats are that the conversations could be recorded and make it public.
We don't want to spy the conversations.
They will be mitigated in erasing the database that keep track of the conversations.
What are the potential threats? How will they be mitigated? How will the solution affect the security of other components, services, and systems?
Does the solution follow local laws and legal policies on data privacy? How does the solution protect users’ data privacy? What are some of the tradeoffs between personalization and privacy in the solution?
How accessible is the solution? What tools will you use to evaluate its accessibility?
The model can't recognize the language speaking because of the accent, the noise or multiple speakers.
What risks are being undertaken with this solution? Are there risks that once taken can’t be walked back? What is the cost-benefit analysis of taking these risks?
-> Security impact -> Performance impact Cost impact Impact on other components and services
List of metrics to capture Tools to capture and measure metrics Tests to verify metrics
List of specific, measurable, and time-bound tasks Resources needed to finish each task Time estimates for how long each task needs to be completed
-> Link to the github project
Our priority is to build a dataset of English speaking with multiple accent, genders, phonation, pitch, loudness and rate.
We go to build something similar for the French speaking.
On the management side we need to set our priorities as soon as possible, categorize our tasks by urgency, the first tasks will be the most important.
We also need to categorize them by the impact that the tasks will have on the rest of the project.
Make a dataset
Train the machine learning model
Deploy it to the Microcontroller
Control it with a webpage.
Dated checkpoints when significant chunks of work will have been completed
Metrics to indicate the passing of the milestone
d. Future work
Build an English speaker accent recognition
List of tasks that will be completed in the future