<a href="https://colab.research.google.com/github/adammuhtar/llm-notebooks/blob/main/notebooks/dolly-v2-3b.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Dolly 2.0 🐑🧱: An Open-Source Instruction-Tuned Large Language Model**

The development of large language models (LLMs) have been nothing short of remarkable, revolutionising the field of natural language processing (NLP) and potentially moving humanity slightly closer towards building an artificial general intelligence. With the advent of large pre-trained models such as GPT-3 and GPT-4, these models are becoming increasingly sophisticated, with the latest models leveraging on billions of parameters and demonstrating impressive language processing capabilities. Equally as exciting is the growing trend towards making these models more accessible to researchers and developers alike, with many pre-trained models becoming freely available.

[**Dolly 2.0**](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm)

On 12 April 2023, [Databricks](https://www.databricks.com) introduced the Dolly 2.0 language model, which is the first true open-source viable language model. There are two key components sets this model apart:
1. It is trained on ~15k instruction/response fine tuning records (available here: [databricks-dolly-15k](https://github.com/databrickslabs/dolly/tree/master/data)) generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, open QA, generation, information extraction, and summarisation. The fact that [databricks-dolly-15k](https://github.com/databrickslabs/dolly/tree/master/data) was sourced from over 5,000 Databricks employees will likely incorporate diversity into the dataset itself.
2. The model is not trained using the [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) models, but using open-sourced alternatives - in this case, EleutherAI's Pythia models. While Pythia is not as 'deep' as [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) (trained on 300 billion tokens as opposed to 1 trillion tokens), the use of Pythia could potentially open up LLM use-cases beyond research.

This notebook explores how to replicate and run the Dolly 2.0 for causal language modelling.

[**Pythia**](https://huggingface.co/EleutherAI/pythia-6.9b)

Pythia is a suite of decoder-only autoregressive language models ranging from 70M to 12B parameters developed by [EleutherAI](https://www.eleuther.ai). It combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. Pythia is designed to enable controlled scientific research on openly accessible and transparently trained LLMs.

---

*References*:
* Conover, M., Hayes, M., Mathur, A., Meng, X. Xie, J., Wan, J., Shah, S., Ghodsi, A., Wendell, P. Zaharia, M., and Xin, R. (2023, April 12). Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM. *Databricks Blog*. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

---

## **Table of Contents**

* [1. Notebook setup](#section-1)
* [2. Load `dolly-v2-3b` model and tokeniser](#section-2)
* [3. Generating text](#section-3)

## 1. Notebook setup <a name="section-1"></a>

This notebook is run using Google Colaboratory (Colab) - Colab is Google's implementation of [Jupyter Notebooks](https://jupyter.org/). This notebook has the following packages installed:
* `python==3.9.16`
* `accelerate>=0.12.0`
* `torch==2.0.0+cu118`
* `transformers[torch]==4.25.1`

The `accelerate` and `transformers` libraries will need to be manually installed into the Colab environment (pip install by running a shell command), following the guidance provided in the [Dolly 2.0 Hugging Face](https://huggingface.co/databricks/dolly-v2-3b) model card.

This Colab notebook will be running the [`dolly-v2-3b`](https://huggingface.co/databricks/dolly-v2-3b), a 2.8 billion parameter causal language model based on derived from EleutherAI’s Pythia-2.8b. Running this requires hardware accelerators to access higher RAM runtimes; GPU specifications should at least match the Tesla T4 GPU (16 GB GDDR6 @ 320 GB/s), which is available for free in Google Colab.

Replicating this notebook for larger Dolly 2.0 models (e.g [`dolly-v2-7b`](https://huggingface.co/databricks/dolly-v2-7b) or [`dolly-v2-12b`](https://huggingface.co/databricks/dolly-v2-12b)) on Colab will require Colab Pro, using hardware such as the A100 Tensor Core GPUs.

In [None]:
# Query GPU device status/details
!nvidia-smi

Sat Apr 15 16:15:59 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# Check IP address details if there are restrictions running non-local servers
!curl ipinfo.io

{
  "ip": "35.225.116.169",
  "hostname": "169.116.225.35.bc.googleusercontent.com",
  "city": "Council Bluffs",
  "region": "Iowa",
  "country": "US",
  "loc": "41.2324,-95.8751",
  "org": "AS396982 Google LLC",
  "postal": "51501",
  "timezone": "America/Chicago",
  "readme": "https://ipinfo.io/missingauth"
}

In [None]:
!pip install accelerate>=0.12.0 transformers[torch]==4.25.1 --quiet

In [None]:
!wget https://huggingface.co/databricks/dolly-v2-7b/raw/main/instruct_pipeline.py

--2023-04-15 17:05:51--  https://huggingface.co/databricks/dolly-v2-7b/raw/main/instruct_pipeline.py
Resolving huggingface.co (huggingface.co)... 18.172.170.36, 18.172.170.70, 18.172.170.14, ...
Connecting to huggingface.co (huggingface.co)|18.172.170.36|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7154 (7.0K) [text/plain]
Saving to: ‘instruct_pipeline.py.1’


2023-04-15 17:05:51 (1.37 GB/s) - ‘instruct_pipeline.py.1’ saved [7154/7154]



In [None]:
# Standard library imports
import textwrap

# Third-party imports
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Local application imports
from instruct_pipeline import InstructionTextGenerationPipeline

In [None]:
# Check available GPUs for computation
if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    # Print details of all available GPUs
    for i in range(num_gpus):
        print(f"Device details: {torch.cuda.get_device_properties(i)}")
    # Get the currently active GPU device and print its name
    active_gpu = torch.cuda.current_device()
    print(f"Currently active GPU device: {torch.cuda.get_device_name(active_gpu)}")
else:
    print("No GPU devices found.")

Device details: _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15101MB, multi_processor_count=40)
Currently active GPU device: Tesla T4


## 2. Load [`dolly-v2-3b`](https://huggingface.co/databricks/dolly-v2-3b) model and tokeniser <a name="section-2"></a>

We load the Dolly 2.0 3B model and tokeniser and model:

* `tokeniser` is created using `AutoTokenizer` from the `transformers` library and loaded with the pre-trained Dolly 2.0 tokeniser from the "databricks/dolly-v2-3b" model checkpoint. `padding_side` argument is set to pad the left of the input sequence.
* `model` is created using `AutoModelForCausalLM` from the `transformers` library and loaded with the pre-trained Dolly 2.0 model from the "databricks/dolly-v2-3b" checkpoint. `torch_dtype` argument is set to "torch.bfloat16", which uses the reduced precision 16-bit floating point format to speed up the model's computations. `device_map` is set to "auto" to automatically select the device (CPU or GPU) to run the model on.

In [None]:
# Choose which Dolly 2 model to run
dolly2_model = "databricks/dolly-v2-3b" #@param ["databricks/dolly-v2-3b", "databricks/dolly-v2-7b", "databricks/dolly-v2-12b"]

# Load tokeniser and model; move model to specified device
tokeniser = AutoTokenizer.from_pretrained(dolly2_model, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(
    dolly2_model,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Instantiate prompt generation pipeline is used for generating response
generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokeniser)

In [None]:
# Define function for human-readable outputs
def dolly_speak(n: int = 3, width_length: int = 100) -> str:
    """
    Prints n sample responses from Dolly, each wrapped after a certain width.

    Arguments:
        * n (`int`): The number of sample responses to generate from Dolly
        * width_length:
    """
    input_prompt = input("Prompt: ")
    print("="*100)
    for i in range(n):
        print(f"Response {i+1}:\n")
        print("\n".join(textwrap.wrap(generate_text(input_prompt), width=width_length)))
        print("-"*100)

## 3. Generating text <a name="section-3"></a>

In this section of the notebook, we'll be working through some examples of various tasks to see how well [`dolly-v2-3b`](https://huggingface.co/databricks/dolly-v2-3b) performs. Note that these are not meant to be comprehensive or robust tests, but simply anecdotal examples of localised prompts. As mentioned in the model card, the model struggles with syntactically complex prompts, programming problems, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of specific length, stylistic mimicry, having a sense of humor, etc. Compared to other comparable publicly available LLMs such as [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [`dolly-v2`](https://huggingface.co/databricks/dolly-v2-3b) is expected to perform somewhat worse. The base model which Dolly 2.0 is trained on - Eleuther AI's Pythia - are trained with a smaller set of tokens compared to the base LLaMa models (300 billion tokens as opposed to 1 trillion tokens).

That said, for a small model such as this, it fares reasonably better than I would expect - some degree of understanding of the user's prompt, getting at least *some* facts right, some attempt at summarisation, some degree of creativity (though at times, creativity derived from being complete off-the-rails!) - and still fun to play around with.

### Test 1: Open Q&A I

> **Prompt: What are Newton's three laws of motion?**

In [None]:
dolly_speak()

Prompt: What are Newton's three laws of motion?
Response 1:

Newton's three laws of motion are: 1. An object at rest stays at rest, and an object in motion
remains in motion unless acted upon by an external force. 2. An object's momentum is equal to the
product of its mass and velocity. 3. A body remains in constant motion unless acted upon by an
external force.
----------------------------------------------------------------------------------------------------
Response 2:

Newton's three laws of motion are: 1. An object at rest stays at rest, and an object in motion
remains in motion unless acted upon by an outside force. 2. An object obstructing a direct line of
motion causes an equal and opposite reaction with an equivalent magnitude to offset the motion. 3.
For every action, there is an equal and opposite reaction.
----------------------------------------------------------------------------------------------------
Response 3:

Newton's three laws of motion are: 1. An object at rest

### Test 2: Open Q&A II

> **Prompt: What is the Pythagoras Theorem?**

In [None]:
dolly_speak()

Prompt: What is the Pythagoras Theorem?
Response 1:

The Pythagoras Theorem is a theorem in mathematics, named after the Greek mathematician Pythagoras.
It states that in a right-angled triangle, the sum of the squares of the two sides in the triangle
that are perpendicular to the side connected to the pivot (the third side) is the square of the
third side.  This result is important in linear algebra, plane geometry, and projective geometry.
----------------------------------------------------------------------------------------------------
Response 2:

The Pythagoras Theorem is named after Pythagoras of Samos who lived in the 6th and 5th centuries
BCE. It states that in a right triangle with right angles at the side, the sum of the squares of the
opposite side and the hypotenuse is equal to the sum of the squares of the two sides of the right
angle.  Example:    hypotenuse =BC  opposite side =AD   side of right angle =AC   sum of the squares
= (BC² + AC²) - (AD²) = (AC² + BC²) = (AC² 

### Test 3: Text Summarisation

> **Prompt: Summarise this text: Spectacular photographs of a Canary Wharf skyscraper being completed have been rediscovered more than 30 years after they were taken.
Amid strong winds, Tony Brien sat in a wooden box suspended by a crane some 250m above east London to capture the images of One Canada Square. Previously believed to have been lost, the photographs were uncovered during a search of Mr Brien's archives. He said he was "completely staggered" to find the photos from November 1990. Mr Brien said he had worked on the Canary Wharf project before when he was approached to photograph the "topping out" of One Canada Square. "I said 'fine', not realising quite at that time that the only way I could really do it was to go up in a bucket or a crate from the ground which was attached to a crane," he explained.**

Source: BBC. (2023, April 15). Canary Wharf: Spectacular photos of skyscraper rediscovered. *BBC News*. https://www.bbc.co.uk/news/uk-england-london-65274803

In [None]:
dolly_speak()

Prompt: Summarise this text: Spectacular photographs of a Canary Wharf skyscraper being completed have been rediscovered more than 30 years after they were taken. Amid strong winds, Tony Brien sat in a wooden box suspended by a crane some 250m above east London to capture the images of One Canada Square. Previously believed to have been lost, the photographs were uncovered during a search of Mr Brien's archives. He said he was "completely staggered" to find the photos from November 1990. Mr Brien said he had worked on the Canary Wharf project before when he was approached to photograph the "topping out" of One Canada Square. "I said 'fine', not realising quite at that time that the only way I could really do it was to go up in a bucket or a crate from the ground which was attached to a crane," he explained.
Response 1:

Tony Brien was a photographer from London, England, who was tasked with photographing a skyscraper
being constructed on London's East End, called Canary Wharf.   One Ca

### Test 4: Brainstorming

> **Prompt: What are some fun activities a family can do along the River Thames?**

In [None]:
dolly_speak()

Prompt: What are some fun activities a family can do along the River Thames?
Response 1:

Boating, canoeing, kayaking, paddleboating, canyoning, swimming, water-skiing, windsurfing, sailing,
rowing, jet-skiing, cave exploration, archery
----------------------------------------------------------------------------------------------------
Response 2:

Enjoy exploring the scenery and animals that can be found along the River Thames.  A popular
activity is a boat ride on the Thames Clipper (http://www.thamesclippers.com/).  While on the
Thames, you can enjoy coffee and pastries while waiting for your boat.
----------------------------------------------------------------------------------------------------
Response 3:

You can hire canoes and float down the river in a group. Fishing is popular, especially for Bass.
You can also go on bike rides, or go for walks. As it is so close to London you can easily do this
from home as well.
-------------------------------------------------------------

### Test 5: Creative Writing I

> **Prompt: Write a short story about Buzz Lightyear's adventure to get flowers for Jessie**

In [None]:
dolly_speak()

Prompt: Write a short story about Buzz Lightyear's adventure to get flowers for Jessie
Response 1:

Buzz Lightyear was excited to help Jessie organise her flowers, but he was surprised to find that
the shopkeepers were not prepared to deliver the flowers themselves and would need a transport unit.
Fortunately, Jessie had included a container ofepisodic hitchhikers for Buzz to deliver. Buzz set
off immediately, planning to use the units transport modes to find a location nearby that would be
able to deliver the flowers. Buzz soon found a small, scenic planet with lots of greenery and began
his delivery, but to his dismay, the planet seemed to be populated only by large animals and plant-
life rather than humans, and the deliveries did not go unscathed. But Buzz persevered and delivered
the flowers to Jessie with the assistance of the animals and plants. Jessie was very pleased with
Buzz's efforts and thanked him, and the two of them went on to enjoy a pleasant day together.
------------

### Test 6: Creative Writing II

> **Prompt: Write a short murder mystery about John Lennon and the Backstreet Boys**

In [None]:
dolly_speak()

Prompt: Write a short murder mystery about John Lennon and the Backstreet Boys
Response 1:

It was a dark and stormy night in April 1980 and Michael, Boogie, Macca and Sean were chilling at
the Starlight Roasters when they heard a commotion outside.  Lennon had just arrived at the coffee
shop and was berating the Backstreet Boys for wearing their hair in pig tails.  "Get your hair cut
or I will kick your asses," he thundered, slamming out the door.  The guys worried that the infamous
song writer had showed up just to kick their butts. So, they made a plan.  If Lennon showed up they
were going to give him a little lecture on why the Backstreet Boys wore their hair in pig tails. So,
they hid a pile of books in the closet and when Lennon came back, they would be waiting.  As Lennon
came in, Michael went over and whispered in his ear:  "Are those your boys?"  Lennon nodded in
reply, furiously texting someone on his phone.  "Are you going to kick our asses again?"  Lennon
nodded in reply.  

### Test 7: Creative Writing III

> **Prompt: Write a short story about the Presidency of Alexander Hamilton in an alternate reality where he had become the US president**

In [None]:
dolly_speak()

Prompt: Write a short story about the Presidency of Alexander Hamilton in an alternate reality where he had become the US president
Response 1:

Alexander Hamilton was a Founding Father, Member of Congress, Director of the Federalist Party, and
the 8th President of the United States. During his time in office, the US economy continued to grow,
and he enacted a national program of economic prosperity, increased employment, and expanded freedom
for the American people. His second term saw the start of the US Revolutionary War, which
successfully overthrew the corrupt, aristocratic British government, and he became the world's first
President on October 4, 1788.
----------------------------------------------------------------------------------------------------
Response 2:

Rebecca Jugg is a high school senior who has lived her entire life in the 30th century. Her passion
is the classic movie, 2001: A Space Odyssey, which was released in 1968. In a flashback to 1986,
20-year-old Rebecca d