Software Design Document

ÇANKAYA UNIVERSITY FACULTY OF ENGINEERING COMPUTER ENGINEERING DEPARTMENT

Software Design Document

Version 1.1 <\p>

CENG 407

Innovative System Design and Development I <\p>

Development of a Karaoke Application for Language Learning

Mehmet Ali Bekereci Ege Naz ALARSLAN Tolga KARAMAN
<\p>

{c1311009, c1311002, c1111029}@student.cankaya.edu.tr
<\p>

March 13, 2018
<\p>

1. Introduction

1.1 Purpose

This software design document describes the architecture and system design of project titled as “Development of a Karaoke Application for Language Learning”. The target audience of the project is children who are pre-kindergarten level or elementary school level in nursery schools. This application will provide an opportunity to learning English language with more educational yet fun way by using key elements of Karaoke game for children. Our aim for this project is creating a joyful and educational environment that will help children for understanding and learning English language and teach them how to pronounce English words.

The purpose of “Development of a Karaoke Application for Language Learning” project is to design “Karaoke for Kids” application which includes many songs, lyrics and score table for both schools and families who willing to provide children an opportunity to learn and understand English language with the proper level for their age. This application consists of three main parts which are displaying animations and lyrics to the screen, recording user’s voice and operate with that sound, displaying the score table. In the part of displaying animations and lyrics to the screen in the application, user will select a song in the playlist that they’ve chosen and managed. After that, application will open karaoke panel for user to see the lyrics with appealing animation that display simultaneously with the song and listening the song. In the recording user’s voice and operate with that sound part of the application, user will sing the song that she/he had chosen earlier, by using microphone simultaneously with the screen that shows lyrics of the song and application will record user’s voice and start examining that sound. The way of examining of the sound is includes some steps which are background subtraction of the sound, recognizing the speech in the sound with real-time speech recognition, translating the speech into text and store that text. After the examination of the sound, application will calculate the success rate of similarity between original song and user’s voice and send the details of success to the score table. User can only sing one song at a time and can see the score table after the examining process is done. In the part of displaying the score table in the application, user can see the details of success rate which consist of percentage of similarity and number of words that match with the original song, which song that user sing, which user sing a song, and date.
Our application is designed to be used with headphone, screen, microphone, speaker, keyboard and mouse. In order to increase the rate of reliability, high quality microphones may use but regardless the type of microphone application will run anyways. In order to supply a better inclusion, this SDD consist of various diagrams such as UML diagram of the project, activity diagram and block diagram.

1.2 Scope

This document includes a complete explanation of the design of “Development of a Karaoke Application for Language Learning”.

The aim of our project is building a platform that provides language learning for kids within in a fun enviroment by using Karaoke application so they can still have fun while learning English language. In order to calculate score for each song, we deem suitable to compare original lyric with text that created from recorded sound in realtime. For this purpose, we decide to use Sphinx4 libraries for approaching the real-time solution for our goal. Despite the fach that Sphinx libraries can recognize the speech, it cannot specifies some data for displaying the specific values for specific period of time that includes frequency, time of starting the sound or song etc. Therefore, in order to handle the sound process period successfully and calculate and use valid data for specific time, we decided to use Java Speech libraries with Sphinx libraries.

1.3 Glossary

Term	Definition
Speech Recognition	Capability of an electronic device to understand spoken words.
Speech Engine	Software that gives your computer the ability to play back text in a spoken voice. [4]
Hidden Markov Model	Statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states. [3]
LPC	Tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. [1]
Open-Source Software	Software that its source code available with license which provides to use for study, change and distribute in which copyright holder provides.
Sun Microsystems, Inc	Former American manufacturer of computer workstations, servers, and software. In 2010 the company was purchased by Oracle Corporation, a leading provider of database management systems. [2]
SDD	Software Design Document
Family Users	Participants in the family and home environment.

References

[1] Deng, L., & O'Shaughnessy, D. (2003). 2.2 Analysis Based on Linear Predictive Coding. Marcel Dekker CRC Press. Retrieved December 12, 2017, from https://books.google.com.tr/books?id=136wRmFT_t8C&hl=tr&source=gbs_navlinks_s
[2] Hall, M. (2014, September 20). Sun Microsystems, Inc. | American Company | Britannica.com. Retrieved December 13, 2017, from Encyclopædia Britannica: https://www.britannica.com/topic/Sun-Microsystems-Inc
[3] Jurafsky, D., & Martin, J. H. (2014). Hidden Markov Models. In D. Jurafsky, & J. H. Martin, Speech and Language Processing (Vol. Vol 3). London: Pearson. Retrieved December 13, 2017, from https://web.stanford.edu/~jurafsky/slp3/9.pdf
[4] Microsoft ©. (2017). Microsoft © Developer Network. Retrieved December 12, 2017, from Speech engines: https://msdn.microsoft.com/en-us/library/gg650238.aspx

2. Design Overview

2.1 Software Development Methodology Used

We decided to use the scrum, which is one of the agile project management methodologies for our project because of the success rate of the agile method is higher than the waterfall method [4]. In this project management methodology, main work is divided into four sprints that will complete within average 30 days. Our project is based on that works by integrating libraries and API's. Because of this, we think that using scrum to reduce the problems that could occur in their integration.

2.2 Technology Used

CMU Sphinx which also called as Sphinx, is term of describing a group of open-source speech recognition systems and components that includes speech recognizers which is called as Sphinx 2-4 and an acoustic model trainer which is called as SphinxTrain developed at Carnegie Mellon University that supported from Sun Microsystems. Sphinx 4 is a speech recognition engine that provides more flexible framework that use discrete Hidden Markov Model and LPC-derived parameters and designed and build completely in the Java programming language. Sphinx developed on an open-source platform, and is available to researchers, developers, and commercial organizations freely at any stage of development. Main goals for these developments are developing configuration management, implementing speaker adaptation, developing a brand new acoustic model trainer e.g. Based on the report on Penn State University [4] testing results, for small vocabulary, word error rating (WER%) for recognizing words is 9.8% [4] which is really reliable result compared to other speech engines. That’s why, we decided to use this engine in our project.

Building and programming part of the application will be handled by JavaTM. Java is a high-level object oriented programming language and computing platform that can be work and run cross-platform from laptops, datacenters, servers, cell phones e.g. released by Sun Microsystems in 1995.[3][5] The reasons why we use Java programming language are Sphinx 4 is written by completely Java and powerful graphic libraries of Java.

2.3 Architectural Design

2.3.1 Karaoke System Design

Figure 1 Sequence Diagram

The components of the Karaoke project are shown in Figure 1. All designed systems are shown on the block diagram. The system has X subsystems.

2.3.2 Architecture Design of Karaoke

2.3.2.1 Class Diagram

Figure 2 Class Diagram of Karaoke for User

Figure 2. shows information about connections between the systems within the simulation. MasterMain Class is application main system, which includes other main components. This class responsible for running and linking main components within system. Karaoke Class is responsible for displaying and choosing song in the system and deciding which user will sing a song. SpeechRec Class is main class for examining the sound and calculating success rate. It is responsible for displaying flowing lyric animation, recording sound, speech-to-text operation, comparing files between original and recorded ones and calculating success rate for that song. ScoreSelect Class is responsible for displaying and selecting users into a table for family user. ScoreTable Class is responsible for displaying score history for specific user. UserList Class is responsible for creating, updating or deleting another singer in the system.

2.3.2.2 Account Management

Summary: Account management system is used by all types of users and admin. For schools, teacher can add, update or delete class roster. For family users, parents can update their personal information, add, update or delete family another family user. Login and logout are available for both users. Admin can login, logout, create singer account, update or delete any user in the account management system.

Actor: Admin, User

Precondition: User must run the application.

Basic Sequence:

User must login to the application by entering his/her email or username and password.
User can update their personal information by selecting Update Personal Information link from family user’s menu.
Admin can create, update or delete user account in the “Delete User”, "Add User" link in the admin panel by clicking create, update or delete buttons.
Any user can exit from the system by clicking “Logout” button on the menu.

Exception: Application may not find any connection to the database.

Post Condition: None

Priority: Low

2.3.2.3 Karaoke Panel

Summary: This system is used by all types of users includes admin. Admin can add, delete or update song to the system with manipulating the lyrics also. User can select a song in the playlist that consist of the songs in the system and start playing the song. After that, system will calculate the success rate based on the similarity between recorded file and original file.

Actor: Admin, User

Precondition: Users must login to the system. For admin, adding song with lyric is a must. For users, selecting a song and selecting Karaoke Panel in the menu is a must. Users must choose which user will sing which song before playing a song.

Basic Sequence:

Admin can add, delete or update song to the system with manipulating the lyrics also by selecting “Add Song” or "Delete Song" link in the admin menu by clicking add, delete or update song.
Users can select an any user by selecting “Singers” choicebox in the Karaoke Panel.
Users can select a song in the playlist that consist of the songs in the system and by clicking “Song” choicebox in the panel, they can start playing the song.
Users can sing a song by using microphone simultaneously with the original song that hearable from headphone.
Users can see the lyrics on the screen by the flowing animation simultaneously with original song that hearable from speaker.

Exception: Microphone can be failed or may not be connected to the computer properly. Application may not find any connection to the database that consist of songs.

Post Condition: Success rate for each song must be updated within score table.

Priority: High

2.3.2.4 Score Table

Summary: This system is used by users. After using Karaoke Panel, depends on the similarity of words that recorded between original file and recorded file, each user will get an success rate and number of words that pronounce correctly for each song by displaying in the table.

Actor: Users

Precondition: User can run Karaoke Panel at least one time before opening score table.

Basic Sequence:

User can see the success rate which consists of song, date and score information for each user by selecting “Singer” choicebox in the “Score Table” link in the application.

Exception: Application may not find any connection to the database that list of scores.

Post Condition: None.

Priority: Medium

2.3.3 Activity Diagram

Figure 3 Activity Diagram

2.3.4 Database Diagram

Figure 4 Database Diagram

2.3.5 User Interfaces

The GUI design is an interface design aimed at the ideal use of the system by the actors. There are 2 user inputs in this design. These inputs are; User and Admin. In User, there are 4 subsystems including Main Menu, Karaoke Panel, Classes and Score Table. The main menu is the start page of the program. On this screen, the user can login, view how the program will be used, and terminate the program. User can access the Karaoke Panel after login into the system. If the user who is singing is registered in the system, it is selected from the registration list. This panel provide can listen to the song, see the lyrics simultaneously and record the singed song. These operations are enriched with an animation. User can view past scores of each singer by clicking on the 'Score Table' button. User can add, update or delete another singer to the system. Admin is the end user login. Admin login is allowed to 2 important subsystems. There are 3 different transactions in Account section. These operations include adding users, deleting users. Admin can create a new user record by pressing the 'Add User' button. For each user created, the system automatically generates a new id. Likewise, pressing the 'Delete User' button will delete the user information and the deleted id will never become available again. The main reason why the Songs system belongs to the Admin Entry is that the song's infrastructure and vocal separation process is performed. This is done by the system after the 'Add Song' button is pressed. At the end of the process, the 'Add Lyrics' button is pressed and a Text File can be added.

2.3.5.1 View Score Table

This system calculates the percentage of correctly matched lyrics. This system calculates the percentage of correctly matching lyrics. After calculating the correctness of the words in the text file added to the 'Add Lyrics' Button in the Admin User interface, they are processed in the 'Score Table'.

2.3.5.2 Sound Processing

All the functional requirements of the Karaoke Program are realized in this system. The Sound Processing system has 4 subsystems. These are Background Subtraction, Record, Speech-To-Text and Comparison. As a result of the 'Add Songs' operation on the Admin User side, the background is separated from the vocal. When the karaoke program starts, recording automatically begins to record the sound from the microphone. The sound from the microphone is converted to text with double time of the period of the song itself. The converted text is compared with the text in which the original words are found.

2.3.5.3 Animation Design

In the animation system, the lyrics are displayed synchronously with the music.

2.3.5.4 Song Processing

As a result, songs are sung on to the infrastructure and words are recorded. As a result of this, on the background is singed and words are recorded. The sphinx library is used to turn voices into words. How to add the Sphinx library is simple in the following screen picture.

Figure 5 Adding Sphinx library to the build path [2]

Sphinx-4 has 3 high-level recognition interfaces, LiveSpeechRecognizer, StreamSpeechRecognizer and SpeechAligner. Sphinx normally translates instantly, but unlike Sphinx, we do speech recognition on the previously recorded song. After the Speech to text operation, words are processed in a different text file. At the end of the processing, the words saved as a string are compared with the words in the original text. The result of the comparison is scored and added to the user's past score list.

References

[1] Atmaca, G. (2014, September 04). Koç Sistem. Retrieved December 12, 2017, from AGILE YAKLAŞIMI VE SCRUM YÖNTEMİ | Koç Sistem: https://www.kocsistem.com.tr/agle-yaklasm-ve-scrum-yontem/
[2] CMUSphinx. (2017, October 13). CMUSphinx Open Source Speech Recognition. Retrieved December 14, 2017, from Building an application with sphinx4 - CMUSphinx Open Source Speech Recognition: https://cmusphinx.github.io/wiki/tutorialsphinx4/
[3] Java. (2017, December 30). Java. Retrieved December 30, 2017, from What is Java technology and why do I need it?: https://www.java.com/en/download/faq/whatis_java.xml
[4] Lamere, P., Kwok, P., Gouvêa, E., Raj, B., Singh, R., Walker, W., . . . Wolf, P. (2003). THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Vol. 1. Retrieved December 15, 2017, from https://pdfs.semanticscholar.org/5064/c602c3a57f4e6f1e4c8f8fb137384c5d41a7.pdf
[5] Oracle. (2017, September 18). Oracle Java Documentation. Retrieved December 16, 2017, from About the Java Technology (The Java™ Tutorials) >Getting Started > The Java Technology Phenomenon: https://docs.oracle.com/javase/tutorial/getStarted/intro/definition.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software Design Document

ÇANKAYA UNIVERSITY FACULTY OF ENGINEERING COMPUTER ENGINEERING DEPARTMENT