# Tutorial: Automatic Speech Recognition with ReazonSpeech

In this tutorial, we perform Japanese speech recognition using ReazonSpeech v2.0.

（Note: Choose a GPU instance in 'Runtime > Change runtime type' for acceleration)

## Set up ReazonSpeech

First, install ReazonSpeech python package:

In [None]:
!apt-get install libsndfile1 ffmpeg
!git clone https://github.com/reazon-research/reazonspeech
!pip install --no-warn-conflicts reazonspeech/pkg/nemo-asr

## Download an audio file

Next download an example audio file:

In [None]:
!curl -O https://research.reazon.jp/_static/demo.mp3

from IPython.display import Audio, display
display(Audio("demo.mp3"))

## Perform speech recognition

Now that the setup is ready, we can start perform Japanese speech recognition.

The following Python code shows how to do it:

In [None]:
from reazonspeech.nemo.asr import transcribe, audio_from_path, load_model

# Download ReazonSpeech model from Hugging Face
model = load_model()

In [None]:
# Perform speech recognition
audio = audio_from_path("demo.mp3")
ret = transcribe(model, audio)

# Output
print("\n## Result")
print(ret.text)

If you can see Japanese text in the last line of the output, then it's successful.

## More on speech recognition results

The recognition result contains speech timestamps:

In [None]:
for seg in ret.segments:
  print("%5.2f %5.2f %s" % (seg.start_seconds, seg.end_seconds, seg.text))

Subword timestamps are available too:

In [None]:
for word in ret.subwords[1:10]:
  print("%5.2f %s" % (word.seconds, word.token))