# LLM Examples
This notebook will showcase some examples of LLMs and how to use them. 

Some of the boilerplate code is hidden in the "src" directory - go have a look!

In [3]:
!git clone https://github.com/SamHollings/llm_examples.git

Cloning into 'llm_examples'...
remote: Enumerating objects: 62, done.[K
remote: Counting objects: 100% (62/62), done.[K
remote: Compressing objects: 100% (52/52), done.[K
remote: Total 62 (delta 13), reused 49 (delta 6), pack-reused 0[K
Unpacking objects: 100% (62/62), 69.01 KiB | 2.88 MiB/s, done.


In [4]:
!pip install -r llm_examples/requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu117
Collecting pytest-html (from -r llm_examples/requirements.txt (line 12))
  Downloading pytest_html-3.2.0-py3-none-any.whl (16 kB)
Collecting pathlib2 (from -r llm_examples/requirements.txt (line 18))
  Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting jupyter (from -r llm_examples/requirements.txt (line 20))
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting transformers (from -r llm_examples/requirements.txt (line 21))
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m44.2 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate (from -r llm_examples/requirements.txt (line 23))
  Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.1/219.1 kB[0m [31m2

In [5]:
from transformers import pipeline, AutoModel, AutoTokenizer
import torch
import datasets

## Test GPU enabled

In [6]:
torch.cuda.is_available()

True

If the above shows false, trying a cuda tensor will show a more informative error message (such as the non-CUDA enabled version of PyTorch being installed)

In [7]:
a=torch.cuda.FloatTensor()

## Dolly

Followed instructions on:

- https://github.com/databrickslabs/dolly
- https://huggingface.co/databricks/dolly-v2-12b#dolly-v2-12b-model-card

In [8]:
# get and save the model
# load the model
# use the model on a string
# use the model on a dataframe
# use the model on a spark dataframe

In [9]:
model_name = "databricks/dolly-v2-3b"

The below will try and load the model from hugging face, but if a local directory exists with the same name, it will try and use that.

In [14]:
def get_dolly_model(model_name='databricks/dolly-v2-3b'):
  import os
  local_model_name = f"model/{model_name}"
  if os.path.isdir(local_model_name):
    model_name = local_model_name

  instruct_pipeline = pipeline(
  model=model_name, #3b, 7b, 12b 
  torch_dtype=torch.float16, #bfloat16 
  trust_remote_code=True, 
  device_map="auto",
  #model_kwargs={'load_in_8bit': True},
  )
  return instruct_pipeline

In [15]:
instruct_pipeline = get_dolly_model()

Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading (…)instruct_pipeline.py:   0%|          | 0.00/9.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/databricks/dolly-v2-3b:
- instruct_pipeline.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Here we apply the model to some simply challenges - generating words and generating haikus - note this isn't the most effcient way of using the GPU as this is being done sequentially - so we need to look into using datasets, or spark drames and distributed inference.

In [16]:
import pandas as pd
import numpy as np
import random
import string

df = pd.DataFrame(np.random.choice(list(string.ascii_letters),10,1), columns=['Input',])

generate_word = lambda x: instruct_pipeline(f'Generate a word starting with "{x}". Return only this word.')[0]['generated_text']

generate_haiku = lambda x: instruct_pipeline(f'Generate a haiku starting with "{x}". Return only this haiku.')[0]['generated_text']


df['Word'] = df["Input"].apply(generate_word)
df['Haiku'] = df["Word"].apply(generate_haiku)
df



Unnamed: 0,Input,Word,Haiku
0,A,algorithm,algorithm baby need glasses\n algorithm baby n...
1,D,ddle,Ring-ding-ding\nI hear the bell but cannot see...
2,V,virus,Infectious virus\nWith deliberation started
3,E,example,Example.\nCurled like a whirlpool in the duck ...
4,X,esktop,IacondaNet urn:isomorphic>> ArchitectureDeskTop
5,c,collision,Breaking glass\ncreates a collision.
6,Z,Zendom,Zendom\nA ghost town in outer space\n\nStill i...
7,P,practicing,Practice and I become stronger\nPractice and I...
8,Z,Zermelo-Fraenkel-dice,"For all our surnames in the universe,\nThere's..."
9,b,ball,You throw a ball; I catch a ball.


In [10]:
instruct_pipeline("Explain to me the difference between nuclear fission and fusion.")

[{'generated_text': 'Fission creates one atom and a fragment of an atom. Fusion creates many atoms and a lot of energy as well.'}]

In [52]:
print(instruct_pipeline("Explain the history of the united kingdom")[0]['generated_text'])



The history of the United Kingdom can be divided into three phases: 
First Phase - 1558-1753 - During this time the English Crown governed England alone.
Second Phase - 1753-1801 - The British Empire was unified through the Act of Union 1707 which unified the realms of England, Scotland and Wales into the Kingdom of Great Britain. 
Third Phase - 1801-present - the Act of Union with Ireland abolished.
