Original Dolly: https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

My Dolly: https://huggingface.co/Bandika/Dollyka

Cleaned Alpaca Dataset: https://github.com/gururise/AlpacaDataCleaned

In [1]:
# If colab gets crazy:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding


In [2]:
# Install necessary dependencies
!pip -q install datasets loralib sentencepiece
!pip -q install git+https://github.com/huggingface/transformers
!pip -q install git+https://github.com/huggingface/peft.git
!pip -q install bitsandbytes
!pip -q install gradio
!pip uninstall -y typing-extensions
!pip install typing-extensions==4.5.0
!pip install flask-cors


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m62.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m33.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m41.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m76.4 MB/s

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id


Downloading (…)okenizer_config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.37M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/4.04k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
from peft import PeftModel
from transformers import AutoTokenizer, GPTJForCausalLM, GenerationConfig

model = GPTJForCausalLM.from_pretrained(
    "EleutherAI/gpt-j-6B",
    load_in_8bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "Bandika/Dollyka")


Downloading (…)lve/main/config.json:   0%|          | 0.00/930 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/24.2G [00:00<?, ?B/s]

Downloading (…)/adapter_config.json:   0%|          | 0.00/486 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/7.38M [00:00<?, ?B/s]

In [33]:
import re

def dolly_generate(instruction):
    PROMPT = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
    ### Instruction:
    {instruction}
    ### Response:"""

    inputs = tokenizer(
        PROMPT,
        return_tensors="pt",
    )
    input_ids = inputs["input_ids"].cuda()

    generation_config = GenerationConfig(
        temperature=0.6,
        top_p=0.95,
        repetition_penalty=1.2,
    )

    print("Generating...")
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=128,
        pad_token_id=0,
        eos_token_id=50256
    )

    # Initialize an empty string to hold the final response.
    final_response = ""

    for s in generation_output.sequences:
        decoded = tokenizer.decode(s)
        print(decoded)
        # Isolate the "Response" from the rest of the prompt.
        response = decoded.split('### Response:')[1].strip() if '### Response:' in decoded else decoded
        # Clean up the response.
        response = re.sub(r'\s+', ' ', response).strip()  # Replace multiple whitespace with single space.
        # Append the cleaned response to the final_response string.
        final_response += response + " "

    # Remove trailing spaces from the final response.
    final_response = final_response.strip()

    return final_response


In [34]:
%%time
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a python program that calculates Fibonacci sequences.
### Response:"""

dolly_generate(PROMPT)


Generating...
Below is an instruction that describes a task. Write a response that appropriately completes the request.
    ### Instruction:
    Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a python program that calculates Fibonacci sequences.
### Response:
    ### Response: Here's my solution to your problem, which uses Python 3 and its built-in functions for calculating fibonacci numbers:

    # Define two variables called `a` and `b`, both of type integer (i.e., whole number). These will be used as initial values for our sequence.
    a = 0
    b = 1

    # Create a function named `fib()`. This function takes one argument - the value we want to calculate in this case it would be 2. The return value should also be stored inside variable `c`.
  
CPU times: user 28 s, sys: 157 ms, total: 28.1 s
Wall time: 27.9 s


''

In [7]:
%%time
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What is the difference between Hungary and Austria?
### Response:"""

dolly_generate(PROMPT)


Generating...




Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What is the difference between Hungary and Austria?
### Response:
The two countries are located in Central Europe, with both being part of the European Union (EU). They share many similarities such as their language, culture, history, economy, etc., but there are also some differences between them. For example, while they have similar cultures, Hungarian cuisine tends to be more spicy than Austrian food due to its proximity to India. In addition, Austrians tend to speak German whereas Hungarians prefer speaking Hungarian. The main difference between these two nations lies within their political systems; while both are parliamentary democracies, Hungary has been governed by Fidesz since 1989, whereas Austria was ruled by the Social Democratic Party
CPU times: user 27.9 s, sys: 0 ns, total: 27.9 s
Wall time: 27.8 s


In [35]:
def generate_text(prompt):
  response = dolly_generate(prompt)
  return response


In [36]:
import gradio
from flask_cors import CORS

iface = gradio.Interface(
    fn=generate_text,
    inputs="text",
    outputs="text",
    live=False,
    theme='Bandika/agyrem'
)




In [37]:
app = iface.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://d765867764746dadee.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
