# **Text Generation WebUI**

This is a Google Colab notebook for running Oobabooga's [Text Generation WebUI](https://github.com/oobabooga/text-generation-webui), a [Gradio](https://www.gradio.app/)-powered **User Interface** (UI) that can be used to run various **Large Language Models** (LLMs). For more information, [read the TextGen WebUI docs](https://github.com/oobabooga/text-generation-webui/tree/main/docs#readme).

----

**[Click here for a simple version of this notebook](https://colab.research.google.com/drive/1ztRHfwON9zCeaEiaKPWXIfCDmSYwfzu_).**

----

<font color="pink">**To run this notebook, scroll down and configure the settings you want, then click on the play button on the left side of the cell. Then wait a few minutes for the links to appear at the bottom.**

----

### 🖥️ Models
There are a massive variety of Language Models to choose from. Visit [TheBloke's HuggingFace page](https://huggingface.co/TheBloke), the [LocalLLaMA Wiki](https://www.reddit.com/r/LocalLLaMA/wiki/models), and the [LLM Model List](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-model-list.md) for models and other information.

* Performance rankings of models can be found [here](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-tools.md#benchmarking).

#### 🚫 Colab limitations

Larger models (13B+) and full-precision models (not 8bit/4bit) require heavier GPU resources, which means loading and response times will be slower, **and Colab will likely disconnect you sooner**. If you get disconnected for hitting your free GPU limit, just be patient and come back in a few hours. *Do not try to circumvent these limits or you risk getting banned from Colab.*

Most models at 30B and above will not run in Colab at all unless they are quantized to reduce their size first. Even then, don't expect a blazing-fast response time.

**About 4K / 8K context models:**
  * Larger contexts requires more computing power. **Colab may not have enough resources for running models in full 4K or 8K context**, so if you encounter a CUDA or VRAM error, lower the context size to 2048.

----
### 🤖 Characters
You can use the following websites to create and share characters compatible with this web UI:

* **[Character Card Creator](https://avakson.github.io/character-editor/)** — Most interfaces use character cards, which are images that have extra data inside them that define the character.

* **[Character Hub](https://chub.ai)** — A place to share and download character cards (the site is SFW by default but has a NSFW toggle)

* **[PygmalionAI Discord](https://discord.gg/pygmalionai)**

----

### 🏆 Credit
All credit for this notebook goes to the hard work of these people who are much, much smarter than I am.

* [Oobabooga's original colab](https://colab.research.google.com/github/oobabooga/AI-Notebooks/blob/main/Colab-TextGen-GPU.ipynb) - by [/u/oobabooga4](https://old.reddit.com/u/oobabooga4)
* [ImBlank's ultimate colab](https://colab.research.google.com/drive/18L3akiVE8Y6KKjd8TdPlvadTsQAqXh73) - by [/u/Imblank2](https://old.reddit.com/u/Imblank2)
* [FHSenpai's one-click colab](https://colab.research.google.com/drive/1glB99Snng4JmxKiFjTisM0lFPYS5XvvZ?usp=sharing) - by [/u/FHSenpai](https://old.reddit.com/u/FHSenpai)
* [ManuDash5's superHOT colab](https://colab.research.google.com/github/ManuDash5/Textgen_webui_NEW/blob/main/TEXT_GEN_WEBUI_8K.ipynb) - by [/u/ManuDashOficial](https://old.reddit.com/u/ManuDashOficial)

----

### About SillyTavern

[SillyTavern](https://github.com/SillyTavern/SillyTavern) is an optional feature-rich interface you can use to interact with Ooba, instead of Gradio. For more information, read the [SillyTavern docs](https://docs.sillytavern.app/).

Installing ST costs nothing, it takes about 5 minutes (the same time you spend waiting for this Colab to finish loading), and it takes less than half a GB of disk space (without extensions). **Even absolute potato PCs can run SillyTavern locally.**

Try it out for yourself:

* [Installation Guide for Windows](https://docs.sillytavern.app/installation/windows/)

* [Installation Guide for Linux/MacOS](https://docs.sillytavern.app/installation/linuxmacos/)

* [Installation Guide for Android](https://docs.sillytavern.app/installation/android-(termux)/)

* For iOS users, there is no native ST support (yet), but if you have a PC, you can host ST there, and then access it from your iPhone's web browser - [see this guide to remote hosting](https://docs.sillytavern.app/usage/remoteconnections/).

####Installing SillyTavern Extras

[Refer to this guide.](https://docs.sillytavern.app/extras/installation/#installation-methods)

If you have any questions, feel free to ask the helpful people at [/r/SillyTavernAI](https://www.reddit.com/r/SillyTavernAI/)!

In [None]:
#@title 🎵 Run Silent Audio Player { display-mode: "form" }

#@markdown 👇 Press play on the audio player that appears below. This will keep the Colab tab alive and prevent Google from disconnecting you for inactivity.
%%html
<audio src="https://oobabooga.github.io/silence.m4a" controls>

In [None]:
#@title ##**🚀 Start TextGen WebUI**

import sys, os, sys, base64, subprocess, json, shutil, requests, time, pathlib, multiprocessing
from IPython.display import clear_output, display, HTML
from IPython.utils import capture
from google.colab import files, drive
from PIL import Image

#PARAMS
#@markdown 👈 Configure the settings below, then press this button to start the installation process. The links and any further instructions will appear at the bottom after a few minutes.

#@markdown ----
#@markdown ####**🤗 Download Model**

#@markdown You have two options for downloading models:
#@markdown * **OPTION 1**: Enter any [HuggingFace model repo](https://huggingface.co/models) below in `<Organization>/<model>` format. You can also add `:<branch>` to the end of the name to download a specific branch. A model is provided by default. Click the small arrow on the right side of the input field to view more models.
model_repo_download = "TheBloke/Mythalion-13B-GPTQ" #@param ["TheBloke/MythoMax-L2-13B-GPTQ", "TheBloke/Mythalion-13B-GPTQ", "TheBloke/Huginn-13B-v4.5-GPTQ", "TheBloke/Llama-2-13B-GPTQ", "TheBloke/Llama-2-13B-chat-GPTQ", "TheBloke/CodeLlama-13B-GPTQ", "TheBloke/CodeLlama-13B-Python-GPTQ", "TheBloke/CodeLlama-13B-Instruct-GPTQ", "TheBloke/WizardCoder-Python-13B-V1.0-GPTQ", "TheBloke/WizardMath-13B-V1.0-GPTQ", "TheBloke/Spring-Dragon-GPTQ", "Blackroot/FrankensteinsMonster-13B-GPTQ", "TheBloke/MythoMax-L2-Kimiko-v2-13B-GPTQ", "TheBloke/MLewdBoros-L2-13B-GPTQ", "TheBloke/Asclepius-13B-GPTQ", "Blackroot/Hermes-Kimiko-13B-gptq", "TheBloke/UndiMix-v2-13B-GPTQ", "TheBloke/Luban-13B-GPTQ", "TheBloke/LoKuS-13B-GPTQ", "TheBloke/Speechless-Llama2-13B-GPTQ", "TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ", "TheBloke/Chronos-Beluga-v2-13B-GPTQ", "TheBloke/ReMM-SLERP-L2-13B-GPTQ", "TheBloke/ReMM-v2-L2-13B-GPTQ", "TheBloke/Airochronos-L2-13B-GPTQ", "TheBloke/Nous-Hermes-Llama2-GPTQ", "TheBloke/llama-2-13B-Guanaco-QLoRA-GPTQ", "TheBloke/Guanaco-3B-Uncensored-v2-GPTQ", "TheBloke/MythoLogic-L2-13B-GPTQ", "TheBloke/MythoBoros-13B-GPTQ", "TheBloke/airoboros-l2-13b-gpt4-m2.0-GPTQ", "TheBloke/airoboros-l2-13b-gpt4-2.0-GPTQ", "TheBloke/Airoboros-L2-13B-2.1-GPTQ", "TheBloke/Spicyboros-13B-2.2-GPTQ", "TheBloke/Llama-2-13B-Ensemble-v5-GPTQ", "TheBloke/13B-BlueMethod-GPTQ", "TheBloke/Kimiko-v2-13B-GPTQ", "TheBloke/Carl-Llama-2-13B-GPTQ", "TheBloke/Stheno-L2-13B-GPTQ", "TheBloke/Stheno-Inverted-L2-13B-GPTQ", "TheBloke/Redmond-Puffin-13B-GPTQ", "TheBloke/OpenOrca-Platypus2-13B-GPTQ", "Austism/chronos-hermes-13b-v2-GPTQ", "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ", "TheBloke/WizardLM-1.0-Uncensored-Llama2-13B-GPTQ", "TheBloke/vicuna-13B-v1.5-GPTQ", "TheBloke/orca_mini_v3_13B-GPTQ", "TheBloke/Scarlett-13B-GPTQ", "TheBloke/Samantha-1.11-13B-GPTQ", "TheBloke/Pygmalion-2-13B-GPTQ", "TehVenom/Metharme-13b-4bit-GPTQ", "TheBloke/Pygmalion-2-13B-SuperCOT-GPTQ", "TheBloke/Nous-Hermes-Llama-2-7B-GPTQ", "TheBloke/Llama-2-7B-GPTQ", "TheBloke/Llama-2-7b-Chat-GPTQ", "TheBloke/Zarablend-L2-7B-GPTQ", "TheBloke/llama-2-7B-Guanaco-QLoRA-GPTQ", "TheBloke/MythoLogic-Mini-7B-GPTQ", "TheBloke/Dolphin-Llama2-7B-GPTQ", "TheBloke/Airoboros-L2-7B-2.1-GPTQ", "TheBloke/Luna-AI-Llama2-Uncensored-GPTQ", "TheBloke/llama2_7b_chat_uncensored-GPTQ", "TheBloke/Llama-2-7b-Chat-GPTQ", "TheBloke/Kimiko-7B-GPTQ", "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ", "TheBloke/WizardLM-Uncensored-Falcon-7B-GPTQ", "TheBloke/WizardLM-7B-V1.0-Uncensored-GPTQ", "TheBloke/Spicyboros-7B-2.2-GPTQ", "TheBloke/vicuna-7B-v1.5-GPTQ", "TheBloke/orca_mini_v2_7B-GPTQ", "TheBloke/LosslessMegaCoder-Llama2-7B-Mini-GPTQ", "TheBloke/koala-7B-GPTQ", "TheBloke/Scarlett-7B-GPTQ", "TheBloke/Samantha-7B-GPTQ", "TheBloke/Pygmalion-2-7B-GPTQ", "TehVenom/Metharme-7b-4bit-GPTQ-Safetensors"] {allow-input: true}
#@markdown * **OPTION 2**: For GGUF models, you can paste any .gguf file below. (Oobabooga no longer supports GGML.) These use less GPU resources but are generally slower than GPTQ models. Click the small arrow on the right side of the input field to view some examples.
single_file_download = "" #@param ["https://huggingface.co/TheBloke/Huginn-13B-v4-GGUF/resolve/main/huginn-13b-v4.Q4_K_M.gguf", "https://huggingface.co/TheBloke/Airoboros-L2-13B-2.1-GGUF/resolve/main/airoboros-l2-13b-2.1.Q4_K_M.gguf"] {allow-input: true}
#@markdown If both options are selected, you'll be asked to choose which one to download.

#@markdown ----
#@markdown ####**⚙️ Launcher Settings**
launch_arguments = "" #@param [""] {allow-input: true}
#@markdown > <font color="gray">You can manually type any [command-line flags](https://github.com/oobabooga/text-generation-webui/blob/main/README.md#basic-settings) or [extensions](https://github.com/oobabooga/text-generation-webui/blob/main/docs/Extensions.md) here if you don't want to tick a bunch of boxes below. (Don't do both, or it will break.)
save_to_google_drive = "off" #@param ["off", "chatlogs and characters", "chatlogs, characters, and models"]
#@markdown > <font color="gray">If activated, saves your data to [Google Drive](https://drive.google.com) automatically, so that they will persist across sessions.</font> Remember that chat models are very large, and free Google Drive only provides 15GB.
verbose = False #@param {type:"boolean"}
#@markdown > <font color="gray">Print character prompts and responses to the console log.
multi_user = False #@param {type:"boolean"}
#@markdown > <font color="gray">Multi-user mode. Chat histories are not saved or automatically loaded.
api = True #@param {type:"boolean"}
#@markdown > <font color="gray">Get the Oobabooga public API. This is required to run Oobabooga on alternative UIs, such as SillyTavern.

#@markdown ----
#@markdown ###**⭐ Extensions**
superbooga = False #@param {type:"boolean"}
#@markdown > <font color="gray">Based on [superbig](https://github.com/kaiokendev/superbig) by kaiokendev. An extension that uses ChromaDB to create an arbitrarily large pseudocontext, taking text files, URLs, or pasted text as input.
google_translate = False #@param {type:"boolean"}
#@markdown > <font color="gray">Activates translation extension, allowing you to communicate with the bot in a different language using [Google Translate](https://translate.google.com).
long_replies = False #@param {type:"boolean"}
#@markdown > <font color="gray">Forces bot replies to be longer. It works by banning the `\n` character until a specified minimum number of tokens have been generated, forcing the bot to keep talking for as long as you want.
character_bias = False #@param {type:"boolean"}
#@markdown > <font color="gray">An extension that adds an user-defined, hidden string at the beginning of the bot's reply with the goal of biasing the rest of the response.
silero_tts = False #@param {type:"boolean"}
#@markdown > <font color="gray">Text-to-speech extension using [Silero](https://github.com/snakers4/silero-models). When used in chat mode, it replaces the responses with an audio widget. There are 118 voices available (`en_0` to `en_117`), which can be set in the "Extensions" tab of the interface. You can find samples here: [Silero samples](https://oobabooga.github.io/silero-samples/).
elevenlabs_tts = False #@param {type:"boolean"}
#@markdown > <font color="gray">Text-to-speech extension using the [ElevenLabs](https://beta.elevenlabs.io/) API. You need an API key to use it.
whisper_stt = False #@param {type:"boolean"}
#@markdown > <font color="gray">Speech-to-text extension using [Whisper](https://github.com/openai/whisper). Allows you to enter your inputs in chat mode using your microphone.
send_pictures = False #@param {type:"boolean"}
#@markdown > <font color="gray">Adds a menu for sending pictures to the bot, which are automatically captioned using [BLIP](https://github.com/salesforce/BLIP).
gallery = False #@param {type:"boolean"}
#@markdown > <font color="gray">Creates a gallery with the chat characters and their pictures.
sd_api_pictures = False #@param {type:"boolean"}
#@markdown > <font color="gray">Allows you to request pictures from the bot in chat mode, which will be generated using AUTOMATIC1111's SD API. See examples [here](https://github.com/oobabooga/text-generation-webui/pull/309). Note: You'll need an available instance of AUTOMATIC1111's webui running with an `--api` flag.
openai = False #@param {type:"boolean"}
#@markdown > <font color="gray">Creates an API that mimics the OpenAI API and can be used as a drop-in replacement.
ngrok = False #@param {type:"boolean"}
#@markdown > <font color="gray">Allows you to access the web UI remotely using the [ngrok](https://ngrok.com/) reverse tunnel service (free). It's an alternative to the built-in Gradio `--share` feature.

#@markdown ----
#@markdown ###**⚙️ Advanced Settings**
settings_file = " " #@param ["https://raw.githubusercontent.com/pcrii/Philo-Colab-Collection/main/settings-colab-template.json"] {allow-input:true}
#@markdown > <font color="gray">Load the default interface settings from a raw text file. Click the arrow to see an example.
perplexity_colors = False #@param {type:"boolean"}
#@markdown > <font color="gray">Colors each token in the output text by its associated probability, as derived from the model logits.
xformers = False #@param {type:"boolean"}
#@markdown > <font color="gray">Use [xformers](https://github.com/facebookresearch/xformers)' memory efficient attention. This should increase your tokens/s.
deepspeed = False #@param {type:"boolean"}
#@markdown > <font color="gray">An alternative way of reducing the GPU memory usage. Enable the use of [DeepSpeed ZeRO-3](https://deepspeed.readthedocs.io/en/latest/zero3.html) for inference via the Transformers integration.
auto_devices = False #@param {type:"boolean"}
#@markdown > <font color="gray">Automatically split the model across the available GPU and CPU.
cpu = False #@param {type:"boolean"}
#@markdown > <font color="gray">Use the CPU to generate text. Warning: Extremely slow.
no_cache = False #@param {type:"boolean"}
#@markdown > <font color="gray">Set `use_cache` to False while generating text. This reduces the VRAM usage a bit with a performance cost.
precision = "default" #@param ["default", "8bit", "4bit"]
#@markdown > <font color="gray">For older models that aren't quantized, you can toggle this setting to load them with 8-bit or 4-bit precision using [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), greatly reducing the GPU memory usage, at a small accuracy cost. You can leave this setting alone if the model you're using has GPTQ, 128, x-bit, or GGML in the name.

#@markdown These settings are for older quantized models that have no `quantize_config.json` file. Usually you'll want wbits 4 and groupsize 128.
wbits = "default" #@param ["default", "2", "3", "4", "8"]
groupsize = "default" #@param ["default", "32", "64", "128"]


trust_remote_code = True
share = True
run_web_ui = True
model = ""

# Stop program if both model input fields are empty
# You WILL download the model, and you will be happy
if (model_repo_download == " ".strip() and single_file_download == " ".strip()):
  print(f"\033[92m\n\n######################################################\n\nNo model selected! Please select a model above, then run this cell again.\n\n######################################################\n\n\033[0m")
  sys.exit()

#Install ooba
def install_ooba():
  global launch_arguments
  if os.path.exists(repo_dir):
    %cd {repo_dir}
    !git pull
  else:
    !git clone https://github.com/oobabooga/text-generation-webui.git
  if ("chatlogs and characters" in save_to_google_drive):
    if not os.path.exists(f"{base_drive_dir}/oobabooga-data"):
      os.mkdir(f"{base_drive_dir}/oobabooga-data")
    if not os.path.exists(f"{base_drive_dir}/oobabooga-data/logs"):
      os.mkdir(f"{base_drive_dir}/oobabooga-data/logs")
    if not os.path.exists(f"{base_drive_dir}/oobabooga-data/characters"):
      shutil.move("text-generation-webui/characters", f"{base_drive_dir}/oobabooga-data/characters")
    else:
      !rm -r "text-generation-webui/characters"

    !ln -s "$base_drive_dir/oobabooga-data/logs" "text-generation-webui/logs"
    !ln -s "$base_drive_dir/oobabooga-data/characters" "text-generation-webui/characters"
  else:
    !mkdir text-generation-webui/logs
  !ln -s text-generation-webui/logs .
  !ln -s text-generation-webui/characters .
  !ln -s text-generation-webui/models .
  %rm -r sample_data
  %cd {repo_dir}
  if not (" ".strip() in settings_file):
    !wget {settings_file} -O settings-template.yaml
    launch_arguments.add(f'--settings settings-template.yaml')
  !pip install -r requirements.txt | grep -v 'already satisfied'
  print(f"\033[1;32;1m\nIf you see a warning about packages, just ignore it. There is no need to restart the runtime.\n\033[0;37;0m")
  # Install extension req
  if (deepspeed) or ('deepspeed' in launch_arguments):
    !pip install -U mpi4py | grep -v 'already satisfied'
    !pip install -U deepspeed | grep -v 'already satisfied'
  if (xformers) or ('xformers' in launch_arguments):
    !pip install xformers | grep -v 'already satisfied'
  if (api) or ('api' in launch_arguments):
    !pip install -r extensions/api/requirements.txt | grep -v 'already satisfied'
  if (google_translate) or ('google_translate' in launch_arguments):
    !pip install -r extensions/google_translate/requirements.txt | grep -v 'already satisfied'
  if (superbooga) or ('superbooga' in launch_arguments):
    !pip install -r extensions/superbooga/requirements.txt | grep -v 'already satisfied'
  if (silero_tts) or ('silero_tts' in launch_arguments):
    !pip install -r extensions/silero_tts/requirements.txt | grep -v 'already satisfied'
  if (elevenlabs_tts) or ('elevenlabs_tts' in launch_arguments):
    !pip install -r extensions/elevenlabs_tts/requirements.txt | grep -v 'already satisfied'
  if (whisper_stt) or ('whisper_stt' in launch_arguments):
    !pip install -r extensions/whisper_stt/requirements.txt | grep -v 'already satisfied'
  if (openai) or ('openai' in launch_arguments):
    !pip install -r extensions/openai/requirements.txt | grep -v 'already satisfied'
  if (ngrok) or ('ngrok' in launch_arguments):
    !pip install -r extensions/ngrok/requirements.txt | grep -v 'already satisfied'


#Mount Google Drive
if not ("off" in save_to_google_drive):
  if ("chatlogs and characters" in save_to_google_drive):
    drive.mount('/content/drive')
    base_drive_dir = "/content/drive/MyDrive/"
    repo_dir = '/content/text-generation-webui'
    %cd /content
    install_ooba()
  if ("chatlogs, characters, and models" in save_to_google_drive):
    drive.mount('/content/drive')
    base_drive_dir = "/content/drive/MyDrive/"
    repo_dir = '/content/drive/MyDrive/text-generation-webui'
    model_dir = '/content/drive/MyDrive/text-generation-webui/models'
    %cd /content/drive/MyDrive
    install_ooba()
else:
  %cd /content
  repo_dir = '/content/text-generation-webui'
  model_dir = '/content/text-generation-webui/models'
  install_ooba()

# Repo download
def repo_download():
  global model
  model = model_repo_download
  %cd {repo_dir}
  !python download-model.py {model}
  model = model.replace('/', '_')

if (model_repo_download) and not (single_file_download):
  clear_output(wait = False)
  repo_download()

# Single file download
def single_download():
  global model
  def get_filename_from_url(single_file_download):
    return os.path.basename(single_file_download)
  model = get_filename_from_url(single_file_download)
  %cd {model_dir}
  !apt install aria2
  !aria2c -x 16 -s 16 -o {model} {single_file_download}
  %cd {repo_dir}
  # GGUF: reinstall llama-cpp-python with BLAS for GPU acceleraton support
  !pip uninstall -y llama-cpp-python
  !CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

if (single_file_download) and not (model_repo_download):
  clear_output(wait = False)
  single_download()

# If both input fields are filled
if (model_repo_download and single_file_download):
  clear_output(wait = False)
  valid_input = False
  while not valid_input:
    choice = input("\n\n######################################################\n\nYou have selected two models at once. Choose one to download.\n\nIn the box below, type 1 for the repo download, or type 2 for the single file download. Then press Enter.\n\n")
    choice = int(choice)
    if choice == 1:
      valid_input = True
      repo_download()
    elif choice == 2:
      valid_input = True
      single_download()
    else:
      print("\nInvalid input.\n")

# Arguments
launch_arguments = set()
if ('GPTQ' in model or 'gptq' in model): launch_arguments.add('--loader exllama')
if ('.gguf' in model or 'GGUF' in model or 'gguf' in model): launch_arguments.add('--threads 16 --n-gpu-layers 20 --loader ctransformers')
if not ('default' in wbits): launch_arguments.add(f'--wbits {wbits}')
if not ('default' in groupsize): launch_arguments.add(f'--groupsize {groupsize}')
if ('8bit' in precision): launch_arguments.add('--load-in-8bit')
if ('4bit' in precision): launch_arguments.add('--load-in-4bit')
if trust_remote_code: launch_arguments.add('--trust-remote-code')
if multi_user: launch_arguments.add('--multi-user')
if verbose: launch_arguments.add('--verbose')
if no_cache: launch_arguments.add('--no-cache')
if xformers: launch_arguments.add('--xformers')
if deepspeed: launch_arguments.add('--deepspeed')
if api: launch_arguments.add('--api --public-api')
if share: launch_arguments.add('--share')
if auto_devices: launch_arguments.add('--auto-devices')
if cpu: launch_arguments.add('--cpu')

#Extension toggles
active_extensions = []
if long_replies: active_extensions.append('long_replies')
if send_pictures: active_extensions.append('send_pictures')
if character_bias: active_extensions.append('character_bias')
if google_translate: active_extensions.append('google_translate')
if superbooga: active_extensions.append('superbooga')
if silero_tts: active_extensions.append('silero_tts')
if elevenlabs_tts: active_extensions.append('elevenlabs_tts')
if whisper_stt: active_extensions.append('whisper_stt')
if gallery: active_extensions.append('gallery')
if openai: active_extensions.append('openai')
if sd_api_pictures: active_extensions.append('sd_api_pictures')
if ngrok: active_extensions.append('ngrok')
if perplexity_colors: active_extensions.append('perplexity_colors')

# If any extensions are selected:
# Append the --extensions flag and all selected extensions
if len(active_extensions) > 0:
  launch_arguments.add(f'--extensions {" ".join(active_extensions)}')

clear_output(wait = True)

# Run WebUI
print(f"\033[1;32;1m\n######################################################\n\nThe model should load in about a minute. To enter TextGen, click on the link below that ends with gradio.live.\n\nFor SillyTavern users, copy the \"non-streaming URL\" (ends with \"/api\") and paste it into the \"Blocking API URL\" in the API settings.\n\n######################################################\n\033[0;37;0m")
if ('deepspeed' in launch_arguments):
  cmd =f"deepspeed --num_gpus=1 server.py --model {model} {' '.join(launch_arguments)}"
  print(cmd)
  !$cmd
else:
  cmd = f"python server.py --model {model} {' '.join(launch_arguments)}"
  print(cmd)
  !$cmd

/content
Cloning into 'text-generation-webui'...
remote: Enumerating objects: 12857, done.[K
remote: Counting objects: 100% (72/72), done.[K
remote: Compressing objects: 100% (57/57), done.[K
remote: Total 12857 (delta 55), reused 22 (delta 15), pack-reused 12785[K
Receiving objects: 100% (12857/12857), 23.95 MiB | 33.78 MiB/s, done.
Resolving deltas: 100% (8777/8777), done.
/content/text-generation-webui
Ignoring exllamav2: markers 'platform_system != "Darwin" and platform_machine != "x86_64"' don't match your environment
Collecting git+https://github.com/huggingface/transformers@211f93aab95d1c683494e61c3cf8ff10e1f5d6b7 (from -r requirements.txt (line 27))
  Cloning https://github.com/huggingface/transformers (to revision 211f93aab95d1c683494e61c3cf8ff10e1f5d6b7) to /tmp/pip-req-build-a8lfbm21
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-a8lfbm21
  Running command git rev-parse -q --verify 'sha^211f93aab95d1c