From e8382572409d5cd4729735807a9622541ec22cb2 Mon Sep 17 00:00:00 2001 From: Alexander Piskun Date: Fri, 15 Sep 2023 13:08:00 +0300 Subject: [PATCH] Talk Bot App with Python Transformers [docs] Signed-off-by: Alexander Piskun --- .github/workflows/analysis-coverage.yml | 12 +-- docs/NextcloudTalkBotTransformers.rst | 129 ++++++++++++++++++++++++ docs/index.rst | 1 + docs/reference/ExApp.rst | 5 + 4 files changed, 141 insertions(+), 6 deletions(-) create mode 100644 docs/NextcloudTalkBotTransformers.rst diff --git a/.github/workflows/analysis-coverage.yml b/.github/workflows/analysis-coverage.yml index 23d3b9f1..758bfce2 100644 --- a/.github/workflows/analysis-coverage.yml +++ b/.github/workflows/analysis-coverage.yml @@ -173,7 +173,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_maria_${{ matrix.nextcloud }}_${{ matrix.python }}_${{ matrix.php-version }} @@ -330,7 +330,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_pgsql_${{ matrix.nextcloud }}_${{ matrix.python }}_${{ matrix.php-version }} @@ -453,7 +453,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_oci_${{ matrix.nextcloud }}_${{ matrix.python }}_${{ matrix.php-version }} @@ -589,7 +589,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_maria_latest @@ -716,7 +716,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_pgsql_latest @@ -861,7 +861,7 @@ jobs: if-no-files-found: error - name: Upload report to Codecov - uses: codecov/codecov-action@v4 + uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} name: coverage_sqlite_${{ matrix.nextcloud }} diff --git a/docs/NextcloudTalkBotTransformers.rst b/docs/NextcloudTalkBotTransformers.rst new file mode 100644 index 00000000..7936097e --- /dev/null +++ b/docs/NextcloudTalkBotTransformers.rst @@ -0,0 +1,129 @@ +Talk Bot App with Transformers +============================== + +`Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.` + +In this article, we'll demonstrate how straightforward it is to leverage the extensive capabilities +of the `Transformers `_ library in your Nextcloud application. + +Specifically, we'll cover: + +* Setting the cache path for the Transformers library +* Downloading AI models during the application initialization step +* Receiving messages from Nextcloud Talk Chat and sending them to a language model +* Sending the language model's reply back to the Nextcloud Talk Chat + +Packaging the Application +""""""""""""""""""""""""" + +Firstly, let's touch upon the somewhat mundane topic of application packaging. + +For this example, we've chosen Debian as the base image because it simplifies the installation of required Python packages. + +.. code-block:: + + FROM python:3.11-bookworm + + +While Alpine might be a better choice in some situations, that's not the focus of this example. + +.. note:: The selection of a suitable base image for an application is a complex topic that merits its own in-depth discussion. + +Requirements +"""""""""""" + +.. literalinclude:: ../examples/as_app/talk_bot_ai/requirements.txt + +We opt for the latest version of the Transformers library. +Because the example was developed on a Mac, we ended up using Torchvision. + +`If you're working solely with Nvidia, you're free to use TensorFlow instead of PyTorch.` + +Next, we integrate the latest version of `nc_py_api` to minimize code redundancy and focus on the application's logic. + +Model Downloading +""""""""""""""""" + +**When Should We Download the Language Model?** + +Although the example uses the smallest model available, weighing in at 300 megabytes, it's common knowledge that larger language models can be substantially bigger. +Downloading such models should not begin when a processing request is already received. + +So we have two options: + +* Heartbeat +* enabled_handler + +This can't be accomplished in the **app on/off handler** as Nextcloud expects an immediate response regarding the app's operability. + +Thus, we place the model downloading within the Heartbeat: + +.. code-block:: + + # Thread that performs model download. + def download_models(): + pipeline("text2text-generation", model=MODEL_NAME) # this will download model + + + def heartbeat_handler() -> str: + global MODEL_INIT_THREAD + print("heartbeat_handler: called") # for debug + # if it is the first heartbeat, then start background thread to download a model + if MODEL_INIT_THREAD is None: + MODEL_INIT_THREAD = Thread(target=download_models) + MODEL_INIT_THREAD.start() + print("heartbeat_handler: started initialization thread") # for debug + # if thread is finished then we will have "ok" in response, and AppAPI will consider that program is ready. + r = "init" if MODEL_INIT_THREAD.is_alive() else "ok" + print(f"heartbeat_handler: result={r}") # for debug + return r + + + @APP.on_event("startup") + def initialization(): + # Provide our custom **heartbeat_handler** to set_handlers + set_handlers(APP, enabled_handler, heartbeat_handler) + + +.. note:: While this may not be the most ideal way to download models, it remains a viable method. + In the future, a more efficient wrapper for model downloading is planned to make the process even more convenient. + +Model Storage +""""""""""""" + +By default, models will be downloaded to a directory that's removed when updating the app. +o persistently store the models even after updates, add the following line to your code: + +.. code-block:: + + from nc_py_api.ex_app import persist_transformers_cache # noqa # isort:skip + +This will set ``TRANSFORMERS_CACHE`` environment variable to point to the application persistent storage. +Import of this **must be** on top before importing any code that perform the import of the ``transformers`` library. + +And that is all, ``transformers`` will automatically download all +models you use to the **Application Persistent Storage** and AppAPI will keep it between updates. + +Working with Language Models +"""""""""""""""""""""""""""" + +Finally, we arrive at the core aspect of the application, where we interact with the **Language Model**: + +.. code-block:: + + def ai_talk_bot_process_request(message: talk_bot.TalkBotMessage): + # Process only messages started with **@ai** + r = re.search(r"@ai\s(.*)", message.object_content["message"], re.IGNORECASE) + if r is None: + return + model = pipeline("text2text-generation", model=MODEL_NAME) + # Pass all text after **@ai** we to the Language model. + response_text = model(r.group(1), max_length=64, do_sample=True)[0]["generated_text"] + AI_BOT.send_message(response_text, message) + + +Simply put, the AI logic is just two lines of code when using Transformers, which is incredibly efficient and cool. + +Messages from the AI model are then sent back to Talk Chat as you would expect from a typical chatbot. + +That's it for now! Stay tuned—this is merely the start of an exciting journey into the integration of AI and chat functionality in Nextcloud. diff --git a/docs/index.rst b/docs/index.rst index 01167cda..0f82a3e5 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -32,6 +32,7 @@ Have a great time with Python and Nextcloud! MoreAPIs NextcloudApp NextcloudTalkBot + NextcloudTalkBotTransformers NextcloudSysApp Options reference/index.rst diff --git a/docs/reference/ExApp.rst b/docs/reference/ExApp.rst index cfa622f1..d62e723f 100644 --- a/docs/reference/ExApp.rst +++ b/docs/reference/ExApp.rst @@ -12,6 +12,11 @@ Constants .. autoclass:: LogLvl :members: +Special functions +----------------- + +.. autofunction:: persistent_storage + User Interface(UI) ------------------