# Google Colab OpenVINO Pretrained Model 聲音辨識範例程式

使用OpenVINO及預訓練模型aclnet進行聲音辨識，可分辨ESC-50公開資料集中的50種聲音。  
歐尼克斯實境互動工作室 OmniXRI Jack, 2022.05.24 整理製作  

#1. 安裝Intel OpenVINO工具包
以apt方式安裝OpenVINO，安裝版本為 2021.4.572  
預設安裝路徑為/opt/intel/OpenVINO_2021.4.572，系統會自建出/opt/intel/ OpenVINO_2021捷徑名稱，後續可使用這個較短捷徑名稱。  
若想安裝其它版本，可透過下列指令進行查詢。  
!apt-cache search intel-openvino

In [1]:
# 顯示目前工作目錄
!pwd
# 取得OpenVINO 2021公開金錀
!wget https://apt.repos.intel.com/openvino/2021/GPG-PUB-KEY-INTEL-OPENVINO-2021 
# 加入OpenVINO公開金錀到系統金錀群中
!apt-key add GPG-PUB-KEY-INTEL-OPENVINO-2021 
# 建立更新安裝清單檔案
!touch /etc/apt/sources.list.d/intel-openvino-2021.list
# 將下載指令加入安裝清單中
!echo "deb https://apt.repos.intel.com/openvino/2021 all main" >> /etc/apt/sources.list.d/intel-openvino-2021.list
# 更新系統
!apt update
# 安裝OpenVINO到虛擬機系統中
!apt install intel-openvino-dev-ubuntu18-2021.4.752
# 列出安裝路徑下內容進行確認
!ls /opt/intel

/content
--2022-05-24 15:13:43--  https://apt.repos.intel.com/openvino/2021/GPG-PUB-KEY-INTEL-OPENVINO-2021
Resolving apt.repos.intel.com (apt.repos.intel.com)... 23.66.220.116, 2600:1408:c400:397::4b23, 2600:1408:c400:392::4b23
Connecting to apt.repos.intel.com (apt.repos.intel.com)|23.66.220.116|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 939 [binary/octet-stream]
Saving to: ‘GPG-PUB-KEY-INTEL-OPENVINO-2021’


2022-05-24 15:13:44 (113 MB/s) - ‘GPG-PUB-KEY-INTEL-OPENVINO-2021’ saved [939/939]

OK
Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease [1,581 B]
Get:3 https://apt.repos.intel.com/openvino/2021 all InRelease [5,659 B]
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:6 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]

#2.下載模型

可選用聲音辨識模型為  
aclnet  
aclnet-int8  

這裡選用 --name aclnet （可自行變更所需模型名稱）

In [2]:
!source /opt/intel/openvino_2021/bin/setupvars.sh && \
python3 /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py --name aclnet

!ls public/

[setupvars.sh] OpenVINO environment initialized
################|| Downloading aclnet ||################

... 100%, 10709 KB, 17935 KB/s, 0 seconds passed

aclnet


#3.模型轉換

如果下載的是Intel Pretrained Model則不需轉換就自帶IR檔(xml,bin)
若是Public Pretrained Model則須進行轉換成IR檔，系統會自動判別。
--name 參數為待轉換模型名稱

In [3]:
# public預訓練模型有使用到ONNX,所以需另外安裝ONNX模組
!pip3 install ONNX

# 下載及安裝test-generator 方便檢查程式運行錯誤
!pip3 install test-generator==0.1.1

# 執行環境設定批次檔並將下載到的模型檔進行轉換產生IR(xml & bin)檔
!source /opt/intel/openvino_2021/bin/setupvars.sh && \
python3 /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/converter.py \
--name aclnet

# 檢查模型轉檔後會產生/FP16, FP32不同精度的IR檔(xml, bin)
!ls public/aclnet
!ls public/aclnet/FP32

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ONNX
  Downloading onnx-1.11.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (12.8 MB)
[K     |████████████████████████████████| 12.8 MB 24.6 MB/s 
Installing collected packages: ONNX
Successfully installed ONNX-1.11.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting test-generator==0.1.1
  Downloading test_generator-0.1.1-py2.py3-none-any.whl (5.5 kB)
Installing collected packages: test-generator
Successfully installed test-generator-0.1.1
[setupvars.sh] OpenVINO environment initialized
Conversion command: /usr/bin/python3 -m mo --framework=onnx --data_type=FP16 --output_dir=/content/public/aclnet/FP16 --model_name=aclnet '--input_shape=[1,1,1,16000]' --input=input --output=output --input_model=/content/public/aclnet/aclnet_des_53.onnx

Model Optimizer arguments:
Common parameters:
	- Path to the Input M

為更清楚了解範例程式可支援的模型，可將models.lst及使用方式列出來確認。（此步驟可忽略）

In [4]:
# 列出可支援的模型名稱
!cat /opt/intel/openvino_2021/inference_engine/demos/sound_classification_demo/python/models.lst

# This file can be used with the --list option of the model downloader.
aclnet
aclnet-int8


#4.準備測試聲音檔案  

從Github下載測試聲音樣本(青蛙叫聲 1-17970-A-4_Flog.wav, 公雞叫聲 1-27724-A-1_Rooster.wav, 狗叫聲1-30344-A-0_Dog.wav)  
假設要使用Github dataset下的青蛙叫的聲音樣本 https://github.com/OmniXRI/NTUST_EdgeAI_2022/blob/main/Ch7_Implementations/Dataset/1-17970-A-4_Flog.wav  
則須修改名稱才能順利下載，主要是將github.com變成raw.githubusercontent.com，把/blob/main改成master路徑名稱，其它子路徑保留。  
 https://raw.githubusercontent.com/OmniXRI/NTUST_EdgeAI_2022/master/Ch7_Implementations/Dataset/1-17970-A-4_Flog.wav  
另外兩種聲音測試檔案依此類推下載。  
資料來源：公開資料集 ESC-50 https://github.com/karolpiczak/ESC-50  

In [5]:
# 下載測試聲音檔
!wget -N https://raw.githubusercontent.com/OmniXRI/NTUST_EdgeAI_2022/master/Ch7_Implementations/Dataset/1-17970-A-4_Flog.wav
!ls *.wav

--2022-05-24 15:15:31--  https://raw.githubusercontent.com/OmniXRI/NTUST_EdgeAI_2022/master/Ch7_Implementations/Dataset/1-17970-A-4_Flog.wav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 441044 (431K) [audio/wav]
Saving to: ‘1-17970-A-4_Flog.wav’


Last-modified header missing -- time-stamps turned off.
2022-05-24 15:15:31 (78.9 MB/s) - ‘1-17970-A-4_Flog.wav’ saved [441044/441044]

1-17970-A-4_Flog.wav


#5.進行推論

聲音辨識範例程式 sound_classification_demo.py  

輸入參數：  
-i 輸入聲音檔案(*.wav)  
-m 模型路徑(*.xml)  
--labels 聲音標籤檔(*.txt)，這裡使用OpenVINO內建aclnet_53cl.txt  
-d 推論裝置(在Colab只能選CPU）  
--sample_rate 16000 聲音取樣頻率（預設16KHz)  

最後顯示推論結果，會自動將輸入聲音檔案切成一秒一段再進行辨識。  

In [6]:
# 進行聲音辨識推論
!source /opt/intel/openvino_2021/bin/setupvars.sh && \
python3 \
/opt/intel/openvino_2021/deployment_tools/inference_engine/demos/sound_classification_demo/python/sound_classification_demo.py \
-i 1-17970-A-4_Flog.wav \
-m public/aclnet/FP32/aclnet.xml \
--labels /opt/intel/openvino_2021/deployment_tools/open_model_zoo/data/dataset_classes/aclnet_53cl.txt \
--sample_rate 16000

[setupvars.sh] OpenVINO environment initialized
[ INFO ] Creating Inference Engine
[ INFO ] Loading model public/aclnet/FP32/aclnet.xml
[ INFO ] Loading model to the plugin
[ INFO ] Preparing input
[ INFO ] Starting inference
[ INFO ] [0.00-1.00] - 100.00% Frog
[ INFO ] [1.00-2.00] - 99.97% Frog
[ INFO ] [2.00-3.00] - 100.00% Frog
[ INFO ] [3.00-4.00] - 74.30% Dog
[ INFO ] [4.00-5.00] - 100.00% Frog
[ INFO ] Average infer time - 42.6 ms per clip


列出聲音標籤檔內容，共53類，較ESC-50公開資料集多了最後三類。(本步驟可略過)

In [7]:
!cat /opt/intel/openvino_2021/deployment_tools/open_model_zoo/data/dataset_classes/aclnet_53cl.txt

Dog
Rooster
Pig
Cow
Frog
Cat
Hen
Insects (flying)
Sheep
Crow
Rain
Sea waves
Crackling fire
Crickets
Chirping birds
Water drops
Wind
Pouring water
Toilet flush
Thunderstorm
Crying baby
Sneezing
Clapping
Breathing
Coughing
Footsteps
Laughing
Brushing teeth
Snoring
Drinking sipping
Door knock
Mouse click
Keyboard typing
Door wood creaks
Can opening
Washing machine
Vacuum cleaner
Clock alarm
Clock tick
Glass breaking
Helicopter
Chainsaw
Siren
Car horn
Engine
Train
Church bells
Airplane
Fireworks
Hand saw
Gunshot
Crowd
Speech
