Merge pull request google-ai-edge#55 from googlesamples/py_audio_clas…

…sification Adding python API example for audio classification
arttupii · Feb 14, 2023 · 3490d99 · 3490d99
2 parents e3f4dd1 + be9e820
commit 3490d99
Showing 1 changed file with 228 additions and 0 deletions.
diff --git a/examples/audio_classifier/python/audio_classification.ipynb b/examples/audio_classifier/python/audio_classification.ipynb
@@ -0,0 +1,228 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "h2q27gKz1H20"
+      },
+      "source": [
+        "##### Copyright 2023 The MediaPipe Authors. All Rights Reserved."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "TUfAcER1oUS6",
+        "cellView": "form"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "# https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "L_cQX8dWu4Dv"
+      },
+      "source": [
+        "# Audio Classification with MediaPipe Tasks\n",
+        "\n",
+        "In this notebook you will use the MediaPipe Tasks API to classify audio."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "O6PN9FvIx614"
+      },
+      "source": [
+        "## Preparation\n",
+        "The first thing you will need to do is install the necessary dependencies for this sample.\n",
+        "\n",
+        "Note:\n",
+        "\n",
+        "\n",
+        "\n",
+        "*   *If you see an error about `flatbuffers` incompatibility, it's fine to ignore it. MediaPipe requires a newer version of flatbuffers (v2), which is incompatible with the older version of Tensorflow (v2.9) currently preinstalled on Colab.*\n",
+        "*   *If you install MediaPipe outside of Colab, you only need to run `pip install mediapipe`. It isn't necessary to explicitly install `flatbuffers`.*\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "gxbHBsF-8Y_l"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q flatbuffers==2.0.0\n",
+        "!pip install -q mediapipe==0.9.1"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "a49D7h4TVmru"
+      },
+      "source": [
+        "\n",
+        "The next step you will take is downloading an off-the-shelf model for audio classification. In this case you will use the YAMNet model, which is designed to classify audio in 0.975 second segments, though you are also able to use others, including your own custom models, with MediaPipe Tasks."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "OMjuVQiDYJKF"
+      },
+      "outputs": [],
+      "source": [
+        "!wget -O classifier.tflite -q https://storage.googleapis.com/mediapipe-assets/yamnet_audio_classifier_with_metadata.tflite"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Iy4r2_ePylIa"
+      },
+      "source": [
+        "## Performing Audio Classification\n",
+        "Now that you have the necessary dependencies, it's time to start classifying some audio! While there are a variety of ways to retrieve audio clips, this example will download a `.wav` file of someone speaking."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import urllib\n",
+        "\n",
+        "audio_file_name = 'speech_16000_hz_mono.wav'\n",
+        "url = f'https://storage.googleapis.com/mediapipe-assets/{audio_file_name}'\n",
+        "urllib.request.urlretrieve(url, audio_file_name)"
+      ],
+      "metadata": {
+        "id": "o1WYweJRa8RQ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "You can go ahead and test that your file downloaded correctly by displaying a playback widget."
+      ],
+      "metadata": {
+        "id": "o895gEJ7btdO"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from IPython.display import Audio, display\n",
+        "\n",
+        "file_name = 'speech_16000_hz_mono.wav'\n",
+        "display(Audio(file_name, autoplay=False))"
+      ],
+      "metadata": {
+        "id": "UmvWwoIhatOK"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Once everything looks good, you can start performing inference. You will start by creating the options that are necessary for associating your model with the Audio Classifier, as well as some other customizations.\n",
+        "\n",
+        "Next, you will create your Classifier and read some information from your downloaded audio file, as well as segment the clip into smaller (0.975 seconds, in this case) clips before classifying them.\n",
+        "\n",
+        "Finally, you will loop through the audio file in increments of 975 (the amount of seconds per clip in millesconds) to display the classification results."
+      ],
+      "metadata": {
+        "id": "XM4RazrUdTs6"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "\n",
+        "from mediapipe.tasks import python\n",
+        "from mediapipe.tasks.python.components import processors\n",
+        "from mediapipe.tasks.python.components import containers\n",
+        "from mediapipe.tasks.python import audio\n",
+        "from scipy.io import wavfile\n",
+        "\n",
+        "# Customize and associate model for Classifier\n",
+        "base_options = python.BaseOptions(model_asset_path='classifier.tflite')\n",
+        "options = audio.AudioClassifierOptions(\n",
+        "    base_options=base_options, max_results=4)\n",
+        "\n",
+        "# Create classifier, segment audio clips, and classify\n",
+        "with audio.AudioClassifier.create_from_options(options) as classifier:\n",
+        "  sample_rate, wav_data = wavfile.read(audio_file_name)\n",
+        "  audio_clip = containers.AudioData.create_from_array(\n",
+        "      wav_data.astype(float) / np.iinfo(np.int16).max, sample_rate)\n",
+        "  classification_result_list = classifier.classify(audio_clip)\n",
+        "\n",
+        "  assert(len(classification_result_list) == 5)\n",
+        "\n",
+        "# Iterate through clips to display classifications\n",
+        "  for idx, timestamp in enumerate([0, 975, 1950, 2925]):\n",
+        "    classification_result = classification_result_list[idx]\n",
+        "    top_category = classification_result.classifications[0].categories[0]\n",
+        "    print(f'Timestamp {timestamp}: {top_category.category_name} ({top_category.score:.2f})')"
+      ],
+      "metadata": {
+        "id": "WPO6rvNJTkPd"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "2y50E0ZNbe7j"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.7.13"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}