In [1]:
audio_en = "test_en.m4a"
audio_ja = "test_ja.m4a"

In [2]:
import whisper
import IPython

  from .autonotebook import tqdm as notebook_tqdm


## SOTA Automatic Speech Recognition

To use `whisper`, you only need three lines of code.

* first line to `import` the library...

* second line to load a `whisper` model...

* and a third line to call `transcribe`

Here are two audio samples we will use for this introduction:

* `text_en.m4a` 
* `text_ja.m4a` 

In [3]:
IPython.display.Audio(audio_en)

The first audio file is me reading aloud the beginning of _Leviathan_ by Paul Auster.

> Six days ago, a man blew himself up by the side of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and fragments of his corpse were found as far as fifty feet away from the site of the explosion. As of today (July 4, 1990), no one seems to have any idea who the dead man was. The FBI, working along with the local police and agents from the Bureau of Alcohol, Tobacco and Firearms, began their investigation by looking into the car, a seven-year-old blue Dodge with Illinois license plates, but they quickly learned that it had been stolen - filched in broad daylight from a Joliet parking lot on July 12.

In [4]:
IPython.display.Audio(audio_ja)

The second audio file is the opening paragraph of ノルウェイの森 by 村上春樹.

> 僕は三十七歳で、そのときボーイング７４７のシートに座っていた。その巨大な飛行機はぶ厚い雨雲をくぐり抜けて降下し、ハンブルク空港に着陸しようとしているところだった。十一月の冷ややかな雨が大地に暗く染め、雨合羽を着た整備工たちや、のっぺりとした空港ビルの上に立った旗や、BMWの広告板やそんな何もかもをフランドル派の陰うつな絵の背景のように見せていた。やれやれ、またドイツか、と僕は思った。

In `transcribe`, the model will automatically detect the language spoken in the audio, and simply transcribe the spoken words in the language detected.

Let's try running the model in its four sizes for the task of simple transcription, against both English and Japanese audio samples.

In [5]:
model = whisper.load_model("tiny")

In [6]:
out = model.transcribe(audio_en)

print(out['text'])
print(out['language'])

 Six days ago, a man blew himself up by the side of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and infragments of his corpse were found as far as 50 feet away from the set of the explosion. As of today, July 4, 1990, no one seems to have any idea who the dead man was. The FBI, working along with a local police and agents from the Bureau of Alcohol, Tobacco and Firearms, began their investigation by looking into the car. A seven-year-old blew dodge with Illinois license plates, but they quickly learned that it had been stolen, Philched in broad daylight from a July parking lot on June 12.
en


In [7]:
out = model.transcribe(audio_ja)

print(out['text'])
print(out['language'])

僕は37歳でその時ボウイング7世のシートに増えていた その巨大な飛行機はブワツイヤマングモークブリ抜けて効果しハンブルククークーに着力しようとしているところだった11月のヒュアヤカナーメンが大地をクラクソメアロンガッパを来た成立方達やノッペリとした空港ビルの上にいたった方やBMWの高校具合やそんな何もかもフランドルハロのユーツがへの背景のように見せていたやりやれまたといすかと僕は思った
ja


----

In [8]:
model = whisper.load_model("base")

In [9]:
out = model.transcribe(audio_en)

print(out['text'])
print(out['language'])

 Six days ago, a man blew himself up by the sight of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and the fragments of his corpse were found as far as 50 feet away from the sight of the explosion. As of today, July 4, 1990, no one seems to have any idea who the dead man was. The FBI, working along with the local police and agents from the Bureau of Alcohol, Tobacco and Firearms, began their investigation by looking into the car, a seven-year-old blue dodge with Illinois license plates, but they quickly learned that it had been stolen, built and broad daylight from a Juliet parking lot on June 12.
en


In [10]:
out = model.transcribe(audio_ja)

print(out['text'])
print(out['language'])

僕は37歳でその時 ボーイング747のシートに座っていたその巨大な飛行機は部圧や漫画も 送り抜けて効かしハンブル空港に着陸しようとしているところだった11月の日焼かな雨が大地を暮らくそめ雨がっぱを着た成り行たちや 乗っ込りとした空港ビルの上に立った果たやBMタビルの航空版や そんな何もかもをフランドルハの言うつな絵の背景のように見せていたやりやりまたドイスカーと僕は思った
ja


----

In [11]:
model = whisper.load_model("medium")

In [12]:
out = model.transcribe(audio_en)

print(out['text'])
print(out['language'])

 Six days ago, a man blew himself up by the side of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and fragments of his corpse were found as far as 50 feet away from the site of the explosion. As of today, July 4, 1990, no one seems to have any idea who the dead man was. The FBI, working along with the local police and agents from the Bureau of Alcohol, Tobacco, and Firearms, began their investigation by looking into the car, a seven-year-old blue Dodge with Illinois license plates, but they quickly learned that it had been stolen, filched in broad daylight from a Joliet parking lot on June 12.
en


In [13]:
out = model.transcribe(audio_ja)

print(out['text'])
print(out['language'])

僕は37歳で、その時ボーイング747のシートに座っていた。その巨大な飛行機は分厚い雨雲をくぐり抜けて降下し、ハンブルク空港に着陸しようとしているところだった。11月の冷やかな雨が大地を暗く染め、雨がっぱ起きた整備校たちや、のっぺりとした空港ビルの上に立った旗や、BMWの広告版や、そんな何もかもをフランドル派の憂鬱な絵の背景のように見せていた。やれやれまたドイツカーと僕は思った。
ja


----

In [14]:
model = whisper.load_model("large")

In [15]:
out = model.transcribe(audio_en)

print(out['text'])
print(out['language'])

 Six days ago, a man blew himself up by the side of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and fragments of his corpse were found as far as 50 feet away from the site of the explosion. As of today, July 4, 1990, no one seems to have any idea who the dead man was. The FBI, working along with the local police and agents from the Bureau of Alcohol, Tobacco and Firearms, began their investigation by looking into the car, a seven-year-old blue Dodge with Illinois license plates. But they quickly learned that it had been stolen, filched in broad daylight from a Joliet parking lot on June 12.
en


In [16]:
out = model.transcribe(audio_ja)

print(out['text'])
print(out['language'])

僕は37歳で、その時ボーイング747のシートに座っていた。その巨大な飛行機は、分厚い雨雲をくぐり抜けて降下し、ハンブルク空港に着陸しようとしているところだった。11月の冷ややかな雨が大地を暗く染め、雨がっぱを着た整理工たちや、のっぺりとした空港ビルの上に立った旗や、BMWの広告板や、そんな何もかもをフランドル派の憂鬱な絵の背景のように見せていた。やれやれ、またドイツか、と僕は思った。
ja


----

Another task that this whisper model is capable of doing _at the same time it is processing the input_ is translation.

Keeping that `large` whisper model, let's see how it fares in translating English $\rightarrow$ Japanese.

In [17]:
out = model.transcribe(audio_en,
                       language='ja')

print(out['text'])
print(out['language'])

6日前、一人が北ウィスコンシンの道路に自殺された。見たことがなかったが、彼は車の隣に座っていたのに、爆弾を撃ち落とした。今、発表された報告によると、一人はすぐ殺された。体は、数十分の小さな部分に割り、体の部分は、爆弾の目の50歳ほど遠くに見えた。今日の4月1990年、誰も知らない人がいなかった。FBIは、アルコール、タバコ、火傷の警察と同じ場所で、車を見ると、7歳のブルー・ダッジのイラノイの許可券が見つかっていた。でも、彼らはすぐに知らなかった。彼らは、12月12日に、ジョリアットの停車場から、火傷の警察が見つかっていた。
ja


Now let's see the reverse: Japanese $\rightarrow$ English:

In [18]:
out = model.transcribe(audio_ja,
                       language='en')

print(out['text'])
print(out['language'])

 I was 37 years old and sat on a Boeing 747 seat. The huge airplane was about to land at Hamburg Airport after passing through a thick cloud of rain. In December, the freezing rain Goddess created a I thought, oh dear, what am I going to do?
en


----

In [20]:
out = model.transcribe(audio_en)

out

{'text': ' Six days ago, a man blew himself up by the side of a road in northern Wisconsin. There were no witnesses, but it appears that he was sitting on the grass next to his parked car when the bomb he was building accidentally went off. According to the forensic reports that have just been published, the man was killed instantly. His body burst into dozens of small pieces, and fragments of his corpse were found as far as 50 feet away from the site of the explosion. As of today, July 4, 1990, no one seems to have any idea who the dead man was. The FBI, working along with the local police and agents from the Bureau of Alcohol, Tobacco and Firearms, began their investigation by looking into the car, a seven-year-old blue Dodge with Illinois license plates. But they quickly learned that it had been stolen, filched in broad daylight from a Joliet parking lot on June 12.',
 'segments': [{'id': 0,
   'seek': 0,
   'start': 0.0,
   'end': 6.0,
   'text': ' Six days ago, a man blew himself 

----

In [None]:
!whisper --help