Xwin-LM Demo (created by satyoshi.com)

**Choose "A100 GPU" runtime on Colab Pro/Pro+**

## Use Xwin-LM from Python

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


In [2]:
!pip install transformers>=4.32.0 optimum>=1.12.0
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # make sure the version is the same as CUDA installed on Colab

Looking in indexes: https://pypi.org/simple, https://huggingface.github.io/autogptq-index/whl/cu118/
Collecting auto-gptq
  Downloading https://huggingface.github.io/autogptq-index/whl/cu118/auto-gptq/auto_gptq-0.4.2%2Bcu118-cp310-cp310-linux_x86_64.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate>=0.19.0 (from auto-gptq)
  Downloading accelerate-0.23.0-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting rouge (from auto-gptq)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Collecting peft (from auto-gptq)
  Downloading peft-0.5.0-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rouge, accelerate, peft, auto-gptq
Successfully installed accelera

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "TheBloke/Xwin-LM-70B-V0.1-GPTQ",
    use_fast=True
)

model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/Xwin-LM-70B-V0.1-GPTQ",
    device_map="auto",
    trust_remote_code=False,
    revision="main"
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/748 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/972 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/35.3G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

In [4]:
def run(prompt: str, temperature=0.5, top_p=0.95, top_k=40) -> str:
    prompt = f"""### Instruction:
    {prompt}

    ### Response:
    """
    with torch.no_grad():
        token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
        output_ids = model.generate(
            token_ids.to(model.device),
            temperature=temperature,
            do_sample=True,
            top_p=top_p,
            top_k=top_k,
            max_new_tokens=256,
        )
    return tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)

In [5]:
print(run("What are the main arguments that Ludwig Wittgenstein made in his book \"Philosophical Investigations\"?"))

1. Language as a tool for action: Wittgenstein emphasized that language is not just a vehicle for describing the world, but also a tool for doing things in the world, such as making promises, giving orders, or asking questions.
2. Meaning as use: He argued that the meaning of a word is determined by how it is used in a particular context, rather than by a fixed definition or a correspondence to a specific object or concept.
4. Family resemblance: Wittgenstein suggested that the meaning of a word is not defined by a single essence or property, but by a "family resemblance" among various related uses. This idea challenges the traditional notion of defining words by necessary and sufficient conditions.
5. Private language: Wittgenstein discussed the concept of a private language, which is a language used by only one person, with no possibility of input or correction from others


## Use Xwin-LM from Open-Interpreter

In [6]:
# https://github.com/googlecolab/colabtools/issues/3409#issuecomment-1446281277
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [7]:
!sudo apt install -y build-essential cmake python3 python3-pip python-is-python3 \
    && CUDA_PATH=/usr/local/cuda FORCE_CMAKE=1 CMAKE_ARGS='-DLLAMA_CUBLAS=on' \
    pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -vv

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9ubuntu3).
cmake is already the newest version (3.22.1-1ubuntu1.22.04.1).
python3 is already the newest version (3.10.6-1~22.04).
python3 set to manually installed.
The following additional packages will be installed:
  python3-setuptools python3-wheel
Suggested packages:
  python-setuptools-doc
The following NEW packages will be installed:
  python-is-python3 python3-pip python3-setuptools python3-wheel
0 upgraded, 4 newly installed, 0 to remove and 18 not upgraded.
Need to get 1,680 kB of archives.
After this operation, 8,978 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 python-is-python3 all 3.9.2-2 [2,788 B]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 python3-setuptools all 59.6.0-1.2ubuntu0.22.04.1 [339 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/univer

In [1]:
!export PATH=$PATH:~/.local/bin

In [9]:
!git clone https://github.com/BladeTransformerLLC/open-interpreter.git # we need v0.1.4 modified

Cloning into 'open-interpreter'...
remote: Enumerating objects: 23321, done.[K
remote: Counting objects: 100% (1279/1279), done.[K
remote: Compressing objects: 100% (598/598), done.[K
remote: Total 23321 (delta 722), reused 813 (delta 675), pack-reused 22042[K
Receiving objects: 100% (23321/23321), 86.75 MiB | 15.25 MiB/s, done.
Resolving deltas: 100% (4059/4059), done.


In [2]:
!pip install ./open-interpreter

Processing ./open-interpreter
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: open-interpreter
  Building wheel for open-interpreter (pyproject.toml) ... [?25l[?25hdone
  Created wheel for open-interpreter: filename=open_interpreter-0.1.4-py3-none-any.whl size=35839 sha256=21c6a5e35a4901c2b912e84948ef7d70fac20bd316dff1a977878e9749a8f720
  Stored in directory: /root/.cache/pip/wheels/a2/2e/3b/675087a2ac2373335cf578c8c29f9a60097cfcaf325bc8bc4b
Successfully built open-interpreter
Installing collected packages: open-interpreter
  Attempting uninstall: open-interpreter
    Found existing installation: open-interpreter 0.1.4
    Uninstalling open-interpreter-0.1.4:
      Successfully uninstalled open-interpreter-0.1.4
Successfully installed open-interpreter-0.1.4


In [2]:
!pip install colab-xterm
%load_ext colabxterm



In [3]:
import matplotlib
%matplotlib inline

### Run open-interprete in xterm

1.   Type: `interpreter --model TheBloke/Xwin-LM-70B-V0.1-GGUF -y`
2.   Choose: `See More` -> `xwin-lm-70b-v0.1.Q2_K.gguf` (Size: 27.3 GB)

In [18]:
%xterm  # slow!

Launching Xterm...

<IPython.core.display.Javascript object>