<a href="https://colab.research.google.com/github/MK316/workshops/blob/main/20230126_yonsei/ILIS_139th_Part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🌱 **주제: 디지털시대의 음성언어자료 이해 및 활용 [2]** 
(Leveraging the Potential of Spoken Data in the AI-Powered Digital Age)

**Note:** Check 'Runtime > runtime type' in the minu (to GPU for ASR (Automated Speech Recognition))

In [None]:
#@markdown 💠 Information in the Digital Age (Slide #9)
from IPython.display import Image

pages = "9" #@param = ["9", "10", "11", "12"]
adds = ['https://github.com/MK316/workshops/raw/main/20230126_yonsei/slides/slides.009.jpeg',
        'https://github.com/MK316/workshops/raw/main/20230126_yonsei/slides/slides.010.jpeg',
        'https://github.com/MK316/workshops/raw/main/20230126_yonsei/slides/slides.011.jpeg',
        'https://github.com/MK316/workshops/raw/main/20230126_yonsei/slides/slides.012.jpeg']

if pages == "9":
  add = adds[0]
elif pages == "10":
  add = adds[1]
elif pages == "11":
  add = adds[2]
elif pages == "12":
  add = adds[3]
  
Image(url=add, width=800, height=500)

# 📙 [4] Speech API (Application Programming Interface)

##  A. **Text-to-Speech (TTS):** e.g., {gTTS}

TTS engines:
* Google Text-to-Speech
* Amazon Polly
* IBM Watson Text-to-Speech
* Microsoft Azure Cognitive Services Text-to-Speech
* AWS Transcribe
* OpenAI GPT-3 TTS
* DeepMind WaveNet TTS
* Festival TTS
* Flite TTS
* eSpeak TTS

### **TTS in commercial fields:**

+ Entertainment, audiobooks, video games and other media and marketing
+ Customer service, **navigation (GPS) systems**, **Home applications & smart home devices.**
+ **Screen readers** for the visually impaired

### **TTS in educational fields**

Use of TTS to enhance learning and make it more accessible in:

+ e-learning, online learning, 
+ language learning, 
+ assistive tech, 
+ audio-books, 
+ tutoring, 
+ and interactive classes.

### My example: self-learning tool

In [7]:
#@markdown 🚩 Making functions: etts('text'), ktts('text')
%%capture

!pip install gTTS
from gtts import gTTS
from IPython.display import Audio

def etts(text):
  text_to_say = text

  gtts_object = gTTS(text = text_to_say,
                     lang = "en", tld = "us",
                    slow = False)
  
  gtts_object.save("E-audio.mp3")
  return Audio("E-audio.mp3")


def ktts(text):
  text_to_say = text

  gtts_object = gTTS(text = text_to_say,
                     lang = "ko",
                    slow = False)
  
  gtts_object.save("K-audio.mp3")
  return Audio("K-audio.mp3")
# #@markdown 🚩 Type text to say
# print('Type texts to create audio:')
# txt = input()
# tts(txt)

Sample article: New York Times (September 2, 2022) - [article link]((https://raw.githubusercontent.com/MK316/workshops/main/20230126_yonsei/data/article_22.md))

... This year, the Colorado State Fair’s annual art competition gave out prizes in all the usual categories: painting, quilting, sculpture.  But one entrant, Jason M. Allen of Pueblo West, Colo., didn’t make his entry with a brush or a lump of clay. He created it with Midjourney, an artificial intelligence program that turns lines of text into hyper-realistic graphics.  Mr. Allen’s work, “Théâtre D’opéra Spatial,” took home the blue ribbon in the fair’s contest for emerging digital artists — making it one of the first A.I.-generated pieces to win such a prize, and setting off a fierce backlash from artists who accused him of, essentially, cheating...

In [9]:
#@markdown Create audio file for your text using {gTTS}:

language = "en" #@param ["en","ko"]

mytext = input("Type text to create an audio:  ")

if language == "en":
  etts(mytext)
  myaudio = "E-audio.mp3"
elif language == "ko":
  ktts(mytext)
  myaudio = "K-audio.mp3"

print("="*50)
print("Your text: %s"%mytext)
print("="*50)
Audio(myaudio)

Type text to create an audio:  This year, the Colorado State Fair’s annual art competition gave out prizes in all the usual categories: painting, quilting, sculpture. But one entrant, Jason M. Allen of Pueblo West, Colo., didn’t make his entry with a brush or a lump of clay. He created it with Midjourney, an artificial intelligence program that turns lines of text into hyper-realistic graphics. Mr. Allen’s work, “Théâtre D’opéra Spatial,” took home the blue ribbon in the fair’s contest for emerging digital artists — making it one of the first A.I.-generated pieces to win such a prize, and setting off a fierce backlash from artists who accused him of, essentially, cheating
Your text: This year, the Colorado State Fair’s annual art competition gave out prizes in all the usual categories: painting, quilting, sculpture. But one entrant, Jason M. Allen of Pueblo West, Colo., didn’t make his entry with a brush or a lump of clay. He created it with Midjourney, an artificial intelligence progr

## B. **Speech-to-Text (STT):** e.g., Automated Speech Recognition tool {Whisper}

1. [whisper link](https://openai.com/blog/whisper/)  
2. [whisper github](https://github.com/openai/whisper) and WER (Word Error Rate) graph among languages 
> e.g., Spanish: 3.0, English: 4.3, Japanese: 5.3, Korean: 14.3, Chinese: 14.7)

* Google Speech-to-Text
* Amazon Transcribe
* IBM Watson Speech-to-Text
* Microsoft Azure Cognitive Services Speech-to-Text
* AWS Transcribe
* DeepMind WaveNet STT
* CMU Sphinx STT
* Kaldi STT
* Google Cloud Speech-to-Text API
* Baidu Speech Recognition



---


## 🌀 ASR (Automated Speech Recognition) test


---



In [18]:
#@markdown 📌 (Run this one time only) Install {whisper}, model (base)
%%capture
!pip install git+https://github.com/openai/whisper.git 

import whisper

model = whisper.load_model('base')
# result = model.transcribe("man_lainbow.wav", language="en",fp16=False)
# print(result["text"])   

🌻Sample text (to copy and paste): 

+ The WER calculation is based on a measurement called the “Levenshtein distance.” The Levenshtein distance is a measurement of the differences between two “strings.”
+ 안녕하십니까? 연세대학교 언어정보연구원입니다. 저희 연구원에서는 오는 1월 26일 (목요일) 오후 2시에 경상국립대학교 영어교육과 교수님을 모시고
'디지털시대의 음성언어자료 이해 및 활용(Leveraging the Potential of Spoken Data in the Digital Age)'라는 주제로 제139회 학술발표회를 개최합니다.
+ My number is three two zero five.

In [12]:
#@markdown ▶️ (1/2) Create audio file for your text using {gTTS}:

language = "ko" #@param ["en","ko"]

mytext = input("Type text to create an audio:  ")

if language == "en":
  etts(mytext)
  myaudio = "E-audio.mp3"
elif language == "ko":
  ktts(mytext)
  myaudio = "K-audio.mp3"

print("="*50)
print("Your text: %s"%mytext)
print("="*50)
Audio(myaudio)

Type text to create an audio:  안녕하십니까? 연세대학교 언어정보연구원입니다. 
Your text: 안녕하십니까? 연세대학교 언어정보연구원입니다. 


🌻Extra🌻 [audio sample](https://github.com/MK316/workshops/blob/133af556f3f28930a4bf59ea5f6962465c3515fd/20230126_yonsei/data/mynumber.wav): "My number is 3205." (human speech)

_Note_: download the audio and upload it on colab to test ASR. (change filename to a simpler one before upload it)

In [19]:
#@markdown ▶️ (2/2) Speech-to-text (you need to type the audio file name - see the left panel.)

ASR_lang = "en" #@param ["en","ko"]
lang = ASR_lang
#@markdown - Note: You can also upload a file directly, and specify the file name in the pop-up window.
myfile = input("Type file name: e.g., K-audio.mp3 or E-audio.mp3  ")
filepath = "/content/"+ myfile
result = model.transcribe(myfile, language=lang,fp16=False)
print(result["text"])
Audio(filepath)  

Type file name: e.g., K-audio.mp3 or E-audio.mp3  sample2.wav
 My number is 3205.


### 🌻Extra🌻 

My example: research on "Pronunciation Assessment Using ASR" (in progress)

In [None]:
#@markdown Assessment result sample: English data
url = "https://raw.githubusercontent.com/MK316/workspace/main/ASR01/results/SETR_result_0110.csv"

import pandas as pd
df = pd.read_csv(url)

from tabulate import tabulate
print(tabulate(df, headers = 'keys'))

In [None]:
#@markdown Assessment result sample: English with Korean accent data
url = "https://raw.githubusercontent.com/MK316/workspace/main/ASR01/results/SKTR_result_m.csv"
df1 = pd.read_csv(url)
df1.head()

In [None]:
#@markdown View change (in case)
from tabulate import tabulate
print(tabulate(df1, headers = 'keys'))

# 📙 Closing: Q & As

In [25]:
#@markdown Thanks for listening.
ktts("지금까지 들어주셔서 감사합니다. 질문 있으시면 어떤 것이라도 편하게 질문해 주시기 바랍니다. Thank you for listening. Feel free to ask questions if you have any.")


* mirankim@gnu.ac.kr  
* MK316.github.io (webpage)



---


The End
