# EasyMMS notebook example to transcribe long files

[EasyMMS](https://github.com/abdeladim-s/easymms) is simple Python package to easily use [Meta's Massively Multilingual Speech (MMS) project](https://github.com/facebookresearch/fairseq/tree/main/examples/mms). 


# Dependencies

In [None]:
import os

!apt install ffmpeg
!apt install sox
!pip install -U --pre torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
!git clone https://github.com/pytorch/fairseq
os.chdir('fairseq')
!pip install -e .
os.environ["PYTHONPATH"] = "."
!pip install git+https://github.com/abdeladim-s/easymms

Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base
  libsox3
Suggested packages:
  libsox-fmt-all
The following NEW packages will be installed:
  libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base
  libsox3 sox
0 upgraded, 6 newly installed, 0 to remove and 34 not upgraded.
Need to get 513 kB of archives.
After this operation, 1,564 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 libopencore-amrnb0 amd64 0.1.5-1 [94.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/universe amd64 libopencore-amrwb0 amd64 0.1.5-1 [49.1 kB]
Get:3 htt

In [None]:
# @title Download Model { display-mode: "form" }

model = 'mms1b_fl102' #@param ["mms1b_fl102", "mms1b_l1107", "mms1b_all"] {allow-input: true}

if model == "mms1b_fl102": 
  !wget -P ./models 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt'

elif model == "mms1b_l1107":
  !wget -P ./models 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt'

elif model == "mms1b_all":
  !wget -P ./models 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt'

--2023-05-31 16:58:58--  https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 52.84.162.119, 52.84.162.51, 52.84.162.20, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|52.84.162.119|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4851043301 (4.5G) [binary/octet-stream]
Saving to: ‘./models/mms1b_fl102.pt’


2023-05-31 16:59:22 (195 MB/s) - ‘./models/mms1b_fl102.pt’ saved [4851043301/4851043301]



# Upload Media files

# Split media file to equal 30s length segments

In [None]:
!mkdir /content/parts
!ffmpeg -i /content/fbvideo.wav -f segment -segment_time 30 -c copy /content/parts/output%02d.wav

mkdir: cannot create directory ‘/content/parts’: File exists
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora -

In [None]:
import os
f = os.listdir('/content/parts')
files = ['/content/parts/' + file for file in sorted(f) if not file.startswith('.')]
files

['/content/parts/output00.wav',
 '/content/parts/output01.wav',
 '/content/parts/output02.wav',
 '/content/parts/output03.wav',
 '/content/parts/output04.wav',
 '/content/parts/output05.wav',
 '/content/parts/output06.wav',
 '/content/parts/output07.wav',
 '/content/parts/output08.wav',
 '/content/parts/output09.wav',
 '/content/parts/output10.wav',
 '/content/parts/output11.wav',
 '/content/parts/output12.wav',
 '/content/parts/output13.wav',
 '/content/parts/output14.wav',
 '/content/parts/output15.wav',
 '/content/parts/output16.wav',
 '/content/parts/output17.wav',
 '/content/parts/output18.wav',
 '/content/parts/output19.wav']

# ASR Model inference

In [None]:
from easymms.models.asr import ASRModel

asr = ASRModel(model=f'./models/{model}.pt')

transcriptions = asr.transcribe(files, lang='spa', align=False)
for i, transcription in enumerate(transcriptions):
    print(f">>> file {files[i]}")
    print(transcription)

INFO:easymms.models.asr:Preparing file /content/parts/output00.wav
INFO:easymms.models.asr:Preparing file /content/parts/output01.wav
INFO:easymms.models.asr:Preparing file /content/parts/output02.wav
INFO:easymms.models.asr:Preparing file /content/parts/output03.wav
INFO:easymms.models.asr:Preparing file /content/parts/output04.wav
INFO:easymms.models.asr:Preparing file /content/parts/output05.wav
INFO:easymms.models.asr:Preparing file /content/parts/output06.wav
INFO:easymms.models.asr:Preparing file /content/parts/output07.wav
INFO:easymms.models.asr:Preparing file /content/parts/output08.wav
INFO:easymms.models.asr:Preparing file /content/parts/output09.wav
INFO:easymms.models.asr:Preparing file /content/parts/output10.wav
INFO:easymms.models.asr:Preparing file /content/parts/output11.wav
INFO:easymms.models.asr:Preparing file /content/parts/output12.wav
INFO:easymms.models.asr:Preparing file /content/parts/output13.wav
INFO:easymms.models.asr:Preparing file /content/parts/output14

>>> file /content/parts/output00.wav
utaondurasy umara castro para intervenir y poner orden ya de una vez por todas en las cásles veo 
>>> file /content/parts/output01.wav
o de la gestión se ha perdido seguridad en las cárceles de onduras por supuesto es objetivo número 1 que es en las cárceles que las cárceles cumplan dentro de las planificaciones el objetivo para los que fueron creadas en primer lugar ser lugares donde se cumplen las penas de personas que hayan cometido delitos y en segundo lugar que se vuelvan aí dentro
>>> file /content/parts/output02.wav
y quiero decirle que tenemos gracias a la intervención también internacional modelos múltiples que podríamos tomar de elecciones positivas que se marcan en cárceles de ee uiu mégico y también de centro américa nosotros estamos en la capacidad de generar nuestro  acordo a nuestro propio diagnóstico lo que la presidenta ha establecido quiero que quede claro aquí al mundo es respectos de leo 
>>> file /content/parts/output03.wav
esct

In [None]:
res = ''.join(transcriptions)
res

'utaondurasy umara castro para intervenir y poner orden ya de una vez por todas en las cásles veo o de la gestión se ha perdido seguridad en las cárceles de onduras por supuesto es objetivo número 1 que es en las cárceles que las cárceles cumplan dentro de las planificaciones el objetivo para los que fueron creadas en primer lugar ser lugares donde se cumplen las penas de personas que hayan cometido delitos y en segundo lugar que se vuelvan aí dentroy quiero decirle que tenemos gracias a la intervención también internacional modelos múltiples que podríamos tomar de elecciones positivas que se marcan en cárceles de ee uiu mégico y también de centro américa nosotros estamos en la capacidad de generar nuestro  acordo a nuestro propio diagnóstico lo que la presidenta ha establecido quiero que quede claro aquí al mundo es respectos de leo esctiones que voy hacer es revisar si los fondos que han sido destinados al instituto nacional penitenciario que son alrededor de unos 75 00l millones de 