In 2015, I developed a real-time pathological speech detection software using C++ and the Microsoft Foundation Class Library (MFC). This program provides real-time speech data recording, voice activity detection (also called speech end-point detection), pathological speech detection, and the ability to import external audio data.
![](https://private-user-images.githubusercontent.com/112595759/323070211-911ba534-e683-456a-823f-71d0f329a0f3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwMDkwNjMsIm5iZiI6MTcyMjAwODc2MywicGF0aCI6Ii8xMTI1OTU3NTkvMzIzMDcwMjExLTkxMWJhNTM0LWU2ODMtNDU2YS04MjNmLTcxZDBmMzI5YTBmMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNlQxNTQ2MDNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMDE2ZGQ2NDYwNTlmZWU0ODlhNmFhOGJhOTdjOWNmNWNkNTgyYTRhMDM1ZGVhMmFkYzZjYTcxMTkwNmE3NjY4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.nIC6hqwszmlnb39e_YeALD2lF9lluFN5pKrV2PoZxnA)
This software detects speech endpoints using energy/zero crossing rate, computes various speech features, including Jitter, Shimmer, and high-order statistics, and detects pathological speeches based on a pre-trained decision tree model (a simple machine learning model!). The accuracy in detecting patholocial speeches was obtained as 83.11%.
The details for the software were published as a peer-reviwed paper at a Korean Journal in 2015.
For now, this GitHub repository releases C++ code (compatible with MFC) for only the real-time speech data recording and speech end-point detection parts. If you have any questions, please contact me at husky.jihye.moon@gmail.com.
For additional voice activity detection algorithms, I also implemented three based on Autocorrelation Function (ACF), Average Magnitude Difference Function (AMDF), and Higher Order Differential Energy Operators (HODEO) respectively. Results for other voice activity detection methods are displayed below.
![](https://private-user-images.githubusercontent.com/112595759/322982503-1e817bba-94b9-4870-9005-9931b887fae2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwMDkwNjMsIm5iZiI6MTcyMjAwODc2MywicGF0aCI6Ii8xMTI1OTU3NTkvMzIyOTgyNTAzLTFlODE3YmJhLTk0YjktNDg3MC05MDA1LTk5MzFiODg3ZmFlMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNlQxNTQ2MDNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zMWU2YWRlY2EyMTcwOTMxMDQ1Yjc0MmFlOWRmYjcyMjY1MWRiNWJhY2E4OGRlYzE2M2JiZmIzODM0YjY2YWVjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.1VSLlaDUPvpwWVj84F4tggKaRLBnGw51URHAxmbtlaE)
ACF, AMDF, and HODEO-based voice activity detection codes are avablible at Link!