In [15]:
!pip install huggingface_hub
!pip install llama-cpp-python==0.1.78
!pip install numpy==1.23.4



In [1]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format

#**2: Import All the Required Libraries**

In [2]:
from huggingface_hub import hf_hub_download

In [3]:
from llama_cpp import Llama

#**3: Download the Model**

In [4]:
model_path = hf_hub_download(repo_id = model_name_or_path, filename = model_basename)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-13b-chat.ggmlv3.q5_1.bin:   0%|          | 0.00/9.76G [00:00<?, ?B/s]

In [5]:
model_path

'/root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin'

#**4: Loading the Model**

In [6]:
# GPU
lcpp_llm = None
lcpp_llm = Llama(
    model_path = model_path,
    n_threads = 2, # CPU cores
    n_batch = 512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers = 32 # Change this value based on your model and your GPU VRAM pool.
    )

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [7]:
# See the number of layers in GPU
lcpp_llm.params.n_gpu_layers

32

#**5: Create a Prompt Template**

In [8]:
prompt = "Write a linear regression code"
prompt_template = f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: {prompt}

ASSISTANT:
'''

In [9]:
prompt_template

'SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n\nUSER: Write a linear regression code\n\nASSISTANT:\n'

#**6: Generating the Response**

In [10]:
response=lcpp_llm(prompt = prompt_template, max_tokens = 256, temperature=0.5, top_p = 0.95,
                  repeat_penalty = 1.2, top_k = 150,
                  echo = True
                  )

In [11]:
print(response)

{'id': 'cmpl-2ebe08ae-d513-44e4-b9ba-3b26fc85cd9d', 'object': 'text_completion', 'created': 1711499326, 'model': '/root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin', 'choices': [{'text': "SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n\nUSER: Write a linear regression code\n\nASSISTANT:\n\nI'd be happy to help you with that! However, I want to make sure we have the same understanding of what linear regression is before we dive into the code. Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). It helps us understand how changes in X affect Y.\n\nAre you looking for a specific type of linear regression, such as simple linear regression or multiple linear regression? Additionally, do you have any data in mind that we can use to illustrate the code? Kn

In [12]:
print(response["choices"][0]["text"])

SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: Write a linear regression code

ASSISTANT:

I'd be happy to help you with that! However, I want to make sure we have the same understanding of what linear regression is before we dive into the code. Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). It helps us understand how changes in X affect Y.

Are you looking for a specific type of linear regression, such as simple linear regression or multiple linear regression? Additionally, do you have any data in mind that we can use to illustrate the code? Knowing this information will help me provide more tailored guidance.


In [13]:
response=lcpp_llm(prompt = "now write python code for Linear Regression", max_tokens = 256, temperature=0.5, top_p = 0.95,
                  repeat_penalty = 1.2, top_k = 150,
                  echo = True
                  )

Llama.generate: prefix-match hit


In [14]:
print(response["choices"][0]["text"])

now write python code for Linear Regression using scikit-learn library

Here is an example of how you can use the `LinearRegression` class from scikit-learn to train a linear regression model and make predictions:
```
from sklearn.linear_model import LinearRegression
import numpy as np

# Generate some sample data
X = np.random.rand(100, 10)
y = X @ np.random.rand(100, 1) + np.random.randn(100, 1) * 0.1

# Train a linear regression model on the data
reg = LinearRegression().fit(X, y)

# Make predictions on some new data
new_X = np.random.rand(5, 10)
predictions = reg.predict(new_X)
```
This code generates some sample data `X` and `y`, where `X` is a matrix of features and `y` is a vector of targets. It then trains a linear regression model on the data using the `LinearRegression()` class from scikit-learn, and makes predictions on some new data
