<a href="https://colab.research.google.com/github/datjandra/PhireBlast/blob/main/PhireBlast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install 'transformers[torch]'
!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers
!pip install gradio

In [None]:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import gradio as gr

torch.set_default_device("cuda")

def predict(name, gender, age, conditions):
  PERSIST_DIR = "./storage"
  try:
    if not os.path.exists(PERSIST_DIR):
      model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", low_cpu_mem_usage=True, trust_remote_code=True)
      model.save_pretrained(PERSIST_DIR, from_pt=True)
    else:
      model = AutoModelForCausalLM.from_pretrained(PERSIST_DIR, torch_dtype="auto")

    tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
    prompt = "Instruct: Sample data in FHIR JSON format of {age} year old {gender} patient named {name} with {conditions}.\nOutput:\n"
    prompt = prompt.format(age=age, gender=gender, name=name, conditions=conditions)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    model.to("cuda")
    outputs = model.generate(**inputs, max_length=256)
    text = tokenizer.batch_decode(outputs)[0]
    return text
  finally:
    del model
    del tokenizer

demo = gr.Blocks()
with demo:
  gr.Markdown("<div class='pull-left'><img width='100' src='https://raw.githubusercontent.com/datjandra/PhireBlast/main/phireblast.png'></div><h3>PhireBlast</h3>")
  with gr.Row():
    name = gr.Textbox(label="Name")
    gender = gr.Dropdown(["male", "female"], label="Gender", value="female")
  with gr.Row():
    age = gr.Textbox(label="Age")
    conditions = gr.Textbox(label="Conditions")

  output = gr.Textbox(label="Data", lines=10)
  submit_button = gr.Button("Submit")
  submit_button.click(predict, inputs=[name, gender, age, conditions], outputs=output)

demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://22c64f92b2d385e93a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


config.json:   0%|          | 0.00/863 [00:00<?, ?B/s]

configuration_phi.py:   0%|          | 0.00/9.26k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/phi-2:
- configuration_phi.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi.py:   0%|          | 0.00/62.7k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/phi-2:
- modeling_phi.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
# clean up memory for model reuse
%reset -f