# Download a FineTuned Model 
This notebook demonstrates how to download a finetuned model that you've created using LLM Engine and add it to huggingface!

**This notebook is an extension of the previous finetuning notebook on ScienceQA**

# Packages Required
For this demo, we'll be using the `scale-llm-engine` package, the `datasets` package for downloading our finetuning dataset, `transformers`, and `huggingface_hub` for uploading our model to huggingface.


In [None]:
!pip install scale-llm-engine
!pip install transformers
!pip install datasets

# Data Preparation
Let's load in the dataset using Huggingface and view the features.

In [None]:
from datasets import load_dataset
from smart_open import smart_open
import pandas as pd

dataset = load_dataset('derek-thomas/ScienceQA')
dataset['train'].features

Now, let's format the dataset into what's acceptable for LLM Engine - a CSV file with 'prompt' and 'response' columns.

In [None]:
choice_prefixes = [chr(ord('A') + i) for i in range(26)] # A-Z
def format_options(options, choice_prefixes):
    return ' '.join([f'({c}) {o}' for c, o in zip(choice_prefixes, options)])

def format_prompt(r, choice_prefixes):
    options = format_options(r['choices'], choice_prefixes)
    return f'''Context: {r["hint"]}\nQuestion: {r["question"]}\nOptions:{options}\nAnswer:'''

def format_label(r, choice_prefixes):
    return choice_prefixes[r['answer']]

def convert_dataset(ds):
    prompts = [format_prompt(i, choice_prefixes) for i in ds if i['hint'] != '']
    labels = [format_label(i, choice_prefixes) for i in ds if i['hint'] != '']
    df = pd.DataFrame.from_dict({'prompt': prompts, 'response': labels})
    return df

save_to_s3 = False
df_train = convert_dataset(dataset['train'])
if save_to_s3:
    train_url = 's3://...'
    val_url = 's3://...'
    df_train = convert_dataset(dataset['train'])
    with smart_open(train_url, 'wb') as f:
        df_train.to_csv(f)

    df_val = convert_dataset(dataset['validation'])
    with smart_open(val_url, 'wb') as f:
        df_val.to_csv(f)
else:
    # Gists of the already processed datasets
    train_url = 'https://gist.githubusercontent.com/jihan-yin/43f19a86d35bf22fa3551d2806e478ec/raw/91416c09f09d3fca974f81d1f766dd4cadb29789/scienceqa_train.csv'
    val_url = 'https://gist.githubusercontent.com/jihan-yin/43f19a86d35bf22fa3551d2806e478ec/raw/91416c09f09d3fca974f81d1f766dd4cadb29789/scienceqa_val.csv'

df_train

# Fine-tune
Now, we can fine-tune the model using LLM Engine.

In [None]:
import os
os.environ['SCALE_API_KEY'] = 'xxx'

from llmengine import FineTune

response = FineTune.create(
    model="llama-2-7b",
    training_file=train_url,
    validation_file=val_url,
    hyperparameters={
        'lr':2e-4,
    },
    suffix='science-qa-llama'
)
run_id = response.id

We can sleep until the job completes.

In [None]:
import time

while True:
    job_status = FineTune.get(run_id).status
    print(job_status)
    if job_status == 'SUCCESS':
        break
    time.sleep(60)

fine_tuned_model = FineTune.get(run_id).fine_tuned_model

# Downloading our Finetuned model 
Let's download the weights for the new fine-tuned model using LLM Engine.

In [None]:
from llmengine import Model

response = Model.download(FineTune.get(run_id).fine_tune_model, download_format="hugging_face")
print(response.urls)

We now have a dictionary of filenames and urls that point to the file(s) where our finetuned model lives. We can download the associated finetuned model either synchronously or asynchronously.

In [None]:
import os
import requests

def download_files(url_dict, directory):
    """
    Download files from given URLs to specified directory.
    
    Parameters:
    - url_dict: Dictionary of {file_name: url} pairs.
    - directory: Directory to save the files.
    """
    if not os.path.exists(directory):
        os.makedirs(directory)
    
    for file_name, url in url_dict.items():
        response = requests.get(url, stream=True)
        response.raise_for_status()  # Raise an exception for HTTP errors
        file_path = os.path.join(directory, file_name)
        
        with open(file_path, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                file.write(chunk)

    

In [None]:
output_directory = "YOUR_MODEL_DIR"
download_files(response.urls, output_directory) 

Lastly, we can upload our downloaded model to the huggingface hub.

In [None]:
!pip install huggingface-hub

In [None]:
import os
from huggingface_hub import Repository

HF_USERNAME = "YOUR_HUGGINGFACE_USERNAME"
HF_TOKEN = "YOUR_HUGGINGFACE_TOKEN"

def upload_to_huggingface(directory, model_name):
    """
    Upload files from a directory to the Hugging Face Hub as a new model.

    Parameters:
    - directory: Directory containing the files to be uploaded.
    - model_name: Name of the new model.
    - token: Your Hugging Face authentication token.
    """
    
    # Create a repository with the given name
    repo = Repository(directory, clone_from=f"{HF_USERNAME}/{model_name}", use_auth_token=HF_TOKEN)
    
    # Commit and push files
    repo.push_to_hub()

model_name = "my-new-model"
    
upload_to_huggingface(output_directory, model_name, HF_TOKEN)