## Step 1: Instal all the required packages

In [None]:
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

# For download the models
!pip install huggingface_hub

## Step 2: Import all the required libraries

In [3]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Step 3: Download the Models

In [None]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

## Step 4: Loading the model

In [5]:
model_path = r"D:/AI_CTS/Llama2/llama2_projects/llama2_quantized_models/7B_chat/llama-2-7b-chat.ggmlv3.q8_0.bin"

# GPU
lcpp_llm = None
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
    )

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [None]:
# See the number of layers in GPU
lcpp_llm.params.n_gpu_layer

## Step 5: Create a Prompt Template

In [6]:
prompt = "Write a linear regression in python"
prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: {prompt}

ASSISTANT:
'''

## Step 6: Generating the Response

In [7]:
response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
                  repeat_penalty=1.2, top_k=150,
                  echo=True)
    

In [8]:
print(response)

{'id': 'cmpl-38e99302-09d7-4340-8f26-d291b41daa42', 'object': 'text_completion', 'created': 1690950311, 'model': 'D:/AI_CTS/Llama2/llama2_projects/llama2_quantized_models/7B_chat/llama-2-7b-chat.ggmlv3.q8_0.bin', 'choices': [{'text': 'SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n\nUSER: Write a linear regression in python\n\nASSISTANT:\nOf course! Here is an example of how to write a linear regression model in Python using scikit-learn library:\n```\nfrom sklearn.linear_model import LinearRegression\nimport numpy as np\n\n# Generate some sample data for demonstration purposes\nX = np.random.rand(100, 5)\ny = np.random.rand(100)\n\n# Create a linear regression model and fit the data\nregressor = LinearRegression()\nregressor.fit(X, y)\n```\nIn this example, we first import the necessary libraries: `sklearn` for scikit-learn library and `numpy` (np) for numerical computations. Then, we generate some sample data using `rand()` function from nump

In [9]:
print(response["choices"][0]["text"])

SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: Write a linear regression in python

ASSISTANT:
Of course! Here is an example of how to write a linear regression model in Python using scikit-learn library:
```
from sklearn.linear_model import LinearRegression
import numpy as np

# Generate some sample data for demonstration purposes
X = np.random.rand(100, 5)
y = np.random.rand(100)

# Create a linear regression model and fit the data
regressor = LinearRegression()
regressor.fit(X, y)
```
In this example, we first import the necessary libraries: `sklearn` for scikit-learn library and `numpy` (np) for numerical computations. Then, we generate some sample data using `rand()` function from numpy. The data has 100 observations and 5 features.
Next, we create a linear regression model using `LinearRegression()` class from sklearn.linear_model module and fit the data to the model using `fit()` method.
Please note that this is just an example cod