# Supplementary Notebook for "Training Your Own Small Language Model from Scratch"

This iPython notebook is a supplementary notebook, which is required to complete the steps in the [the main notebok](slm_pretraining_sft.ipynb). Please refer to [slm_pretraining_sft.ipynb](slm_pretraining_sft.ipynb) for details about the environment setting including Docker container.


## Step 4: Launch a Megatron GPT Server

During Step 4 in [slm_pretraining_sft.ipynb](slm_pretraining_sft.ipynb), you need to launch a text generation server (using [megatron_gpt_eval.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_eval.py).) to get generations from the SLM that you trained. 

Run the following command to launch a Megatron GPT server. Make sure to use the correct .nemo filepath.

In [None]:
# Step 4
MODEL_FILE = "results/megatron_gpt/checkpoints/megatron_gpt.nemo"

!python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
  gpt_model_file={MODEL_FILE} \
  server=True \
  port=55555

After launching a server, you can get generations by using Python code like below:

```
import json
import requests

class MegatronGPTEvalClient:
    def __init__(self,
                 batch_size: int = 1,
                 port_num: int = 55555,
                 headers = {"Content-Type": "application/json"}):
        self.batch_size = batch_size
        self.port_num = port_num
        self.headers = headers

    def generate(self, text: str):
        data = {"sentences": [text] * self.batch_size,
                "tokens_to_generate": 32,
                "temperature": 1.0,
                "add_BOS": True,
                "top_k": 0,
                "top_p": 0.9,
                "greedy": False,
                "all_probs": False,
                "repetition_penalty": 1.2,
                "min_tokens_to_generate": 2}
        resp = requests.put('http://localhost:{}/generate'.format(self.port_num),
                            data=json.dumps(data),
                            headers=self.headers)
        sentences = resp.json()['sentences']
        generation = sentences[0]
        return generation[len(text):].lstrip()


client = MegatronGPTEvalClient()
client.generate("How are you?")
```

## Step 5: Launch a Megatron GPT Server after Fine-Tuning (Advanced)



### Step 5-1: .nemo File Conversion

This conversion step is required as NeMo-Aligner's script creates a slightly different .nemo checkpoint, which is not compatible with [megatron_gpt_eval.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_eval.py).


In [None]:
# Confirm if the .nemo file exists
!ls results/megatron_gpt_sft/checkpoints/megatron_gpt_sft.nemo

Run the following commands to convert the .nemo into a new .nemo checkpoint.

In [None]:
# Untar the .nemo checkpoint
!cd results/megatron_gpt_sft/checkpoints && mkdir -p megatron_gpt_sft_fixed && tar -xf megatron_gpt_sft.nemo -C megatron_gpt_sft_fixed

# Update the target value
!cd results/megatron_gpt_sft/checkpoints/megatron_gpt_sft_fixed && sed -i 's|target: nemo_aligner.models.nlp.gpt.gpt_sft_model.GPTSFTModel|target: nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTModel|' model_config.yaml

# Tar the files to create a new .nemo
!cd results/megatron_gpt_sft/checkpoints/megatron_gpt_sft_fixed && tar -cf ../megatron_gpt_sft_fixed.nemo ./*

In [None]:
# Confirm if the new .nemo file exists
!ls results/megatron_gpt_sft/checkpoints/megatron_gpt_sft_fixed.nemo

### Step 5-2: Launch a Megatron GPT Server

The newly created .nemo checkpoint is compatible with [megatron_gpt_eval.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_eval.py) and we can launch a text generation server in the same manner as the .nemo checkpoint for pre-training.

In [None]:
MODEL_FILE = "results/megatron_gpt_sft/checkpoints/megatron_gpt_sft_fixed.nemo"

!python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
  gpt_model_file={MODEL_FILE} \
  server=True \
  port=55555