# LLM Examples
This notebook will showcase some examples of LLMs and how to use them. 

Some of the boilerplate code is hidden in the "src" directory - go have a look!

In [5]:
!git clone https://github.com/SamHollings/llm_examples.git

Cloning into 'llm_examples'...
remote: Enumerating objects: 67, done.[K
remote: Counting objects: 100% (67/67), done.[K
remote: Compressing objects: 100% (57/57), done.[K
remote: Total 67 (delta 16), reused 51 (delta 6), pack-reused 0[K
Unpacking objects: 100% (67/67), 77.94 KiB | 3.71 MiB/s, done.


In [6]:
!pip install -r llm_examples/requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu117
Collecting pytest-html (from -r llm_examples/requirements.txt (line 12))
  Downloading pytest_html-3.2.0-py3-none-any.whl (16 kB)
Collecting pathlib2 (from -r llm_examples/requirements.txt (line 18))
  Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting jupyter (from -r llm_examples/requirements.txt (line 20))
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting transformers (from -r llm_examples/requirements.txt (line 21))
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m42.6 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate (from -r llm_examples/requirements.txt (line 23))
  Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.1/219.1 kB[0m [31m2

In [7]:
from transformers import pipeline, AutoModel, AutoTokenizer
import torch
import datasets

## Test GPU enabled

In [8]:
torch.cuda.is_available()

True

If the above shows false, trying a cuda tensor will show a more informative error message (such as the non-CUDA enabled version of PyTorch being installed)

In [9]:
a=torch.cuda.FloatTensor()

## Dolly

Followed instructions on:

- https://github.com/databrickslabs/dolly
- https://huggingface.co/databricks/dolly-v2-12b#dolly-v2-12b-model-card

In [10]:
# get and save the model
# load the model
# use the model on a string
# use the model on a dataframe
# use the model on a spark dataframe

In [11]:
model_name = "databricks/dolly-v2-3b"

The below will try and load the model from hugging face, but if a local directory exists with the same name, it will try and use that.

In [12]:
def get_dolly_model(model_name='databricks/dolly-v2-3b'):
  import os
  local_model_name = f"model/{model_name}"
  if os.path.isdir(local_model_name):
    model_name = local_model_name

  instruct_pipeline = pipeline(
  model=model_name, #3b, 7b, 12b 
  torch_dtype=torch.float16, #bfloat16 
  trust_remote_code=True, 
  device_map="auto",
  #model_kwargs={'load_in_8bit': True},
  )
  return instruct_pipeline

In [13]:
instruct_pipeline = get_dolly_model()

Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading (…)instruct_pipeline.py:   0%|          | 0.00/9.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/databricks/dolly-v2-3b:
- instruct_pipeline.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Here we apply the model to some simply challenges - generating words and generating haikus - note this isn't the most effcient way of using the GPU as this is being done sequentially - so we need to look into using datasets, or spark drames and distributed inference.

In [15]:
import pandas as pd
import numpy as np
import random
import string

df = pd.DataFrame(np.random.choice(list(string.ascii_letters),10,1), columns=['Input',])

generate_word = lambda x: instruct_pipeline(f'Generate a word starting with "{x}". Return only this word.')[0]['generated_text']

generate_haiku = lambda x: instruct_pipeline(f'Generate a haiku starting with "{x}". Return only this haiku.')[0]['generated_text']

generate_cityname = lambda x: instruct_pipeline(f'Generate a city name starting with "{x}". Return only this name.')[0]['generated_text']

generate_citydesc = lambda x: instruct_pipeline(f'Write a paragraph about the city called "{x}"')[0]['generated_text']

summarise_input = lambda x: instruct_pipeline(f'Summarise the following into one sentence: "{x}"')[0]['generated_text']

df['Word'] = df["Input"].apply(generate_word)
df['Haiku'] = df["Word"].apply(generate_haiku)
df['City'] = df["Input"].apply(generate_cityname)
df['CityDesc'] = df["City"].apply(generate_citydesc)
df['CityDesc_Summary'] = df["CityDesc"].apply(summarise_input)
df



Unnamed: 0,Input,Word,Haiku,City,CityDesc,CityDesc_Summary
0,T,tesseract,"Lens flare, tesseract",Toronto,Toronto is a major city in Canada's southern O...,Toronto is a major city in Canada's southern O...
1,z,zombie,Zombie\nI am not a zombie\nI am a living human...,Zarsot,Zarsot is a city located in the heart of the M...,Zarsot is a city located in the heart of the M...
2,C,circumstellar,Circumstellar\nBeginnings are oft fraught with...,Seattle,"Seattle, Washington, is a city in the United S...","Seattle is located in the state of Washington,..."
3,T,terminator,Terminator flies alone\nThrough the barren was...,Toronto,"Toronto, Canada, is an economic and cultural h...",Toronto is an economic and cultural hub in the...
4,j,jog,By the waters of Maiden Lane\nI am lovingly sw...,Jakarta,Jakarta is the capital of Indonesia and one of...,Jakarta is a city in Indonesia with a populati...
5,a,acadeem,Silent disciples of Adeimantus\nWhispering anc...,- Alexandria\n- Aachen\n- Amsterdam\n- Apollon...,"Alexandria, Aachen, Amsterdam, Apollonia, Athe...","These are all places I have been to, but not a..."
6,Y,Yes,A yes and a silence followed\nAnswer one last ...,Yekaterinburg,Yekaterinburg is a city located in the U.S.S.R...,Yekaterinburg is a city located in the U.S.S.R...
7,P,platform,"With me,\nJust like a platform.\nYou look so b...",Providence,Providence is a modern city in the western par...,Providence is a modern city in the western par...
8,X,xater,Skeletal Memory\nWeave a tale of bones\nSiftin...,Xenos,Xenos is a technologically advanced city locat...,Xenos is a city located in the near future on ...
9,C,Cursor,Cursor. \nI'm here by your side.\nYou're the r...,Rio de Janeiro,"Rio de Janeiro, called Rio or Rio de Janeiro i...","Rio de Janeiro, called Rio or Rio de Janeiro i..."


In [None]:
instruct_pipeline("Explain to me the difference between nuclear fission and fusion.")

[{'generated_text': 'Fission creates one atom and a fragment of an atom. Fusion creates many atoms and a lot of energy as well.'}]

In [None]:
print(instruct_pipeline("Explain the history of the united kingdom")[0]['generated_text'])



The history of the United Kingdom can be divided into three phases: 
First Phase - 1558-1753 - During this time the English Crown governed England alone.
Second Phase - 1753-1801 - The British Empire was unified through the Act of Union 1707 which unified the realms of England, Scotland and Wales into the Kingdom of Great Britain. 
Third Phase - 1801-present - the Act of Union with Ireland abolished.
