## Evaluating Trained Models on the [BabyLM Eval Pipeline](https://github.com/babylm/evaluation-pipeline-2023)

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'GPT-9.0M-2L-4H-516C-1024I'
checkpoint = 'checkpoint-6624'

tokenizer = AutoTokenizer.from_pretrained('10k-tok')
model = AutoModelForCausalLM.from_pretrained(f'results/models/{model_name}/{checkpoint}')

  from .autonotebook import tqdm as notebook_tqdm


##### Setup
1. Create a new environment for the pipeline, as there are a lot of external dependencies that may interfere with your development environment. Follow installation instructions at [BabyLM/evaluation-pipeline](https://github.com/babylm/evaluation-pipeline). These are some of the issues I ran into: 
    - If their torch version does not match your CUDA drivers, remove the version specification in `setup.py`. 
    - On some (older) machines, `git-lfs` does not ship with `git`, and you may need to install it manually. 
    - My docker container running `Ubuntu` did not have `gcc` installed either, which is necessary for the compilation of one of the dependencies (`sklearn`); install with `sudo apt install gcc`. 
    - They use `python==3.10.12` in their [demo notebook](https://colab.research.google.com/drive/1HX2D3wztO81tKcqCeV_ecRcEUseBVuTc?usp=sharing); this does not work for my CUDA installation.
        Specifically, there is a [bug](https://discuss.pytorch.org/t/issues-on-using-nn-dataparallel-with-python-3-10-and-pytorch-1-11/146745/13) requiring you to change `lib/python3.10/site-packages/torch/cuda/nccl.py`, at line `51`: from `collections` to `collections.abc`. 

2. Unzip the dataset: `unzip filter_data.zip`. 


```sh
# create env 
conda create -n babylm python --solver=libmamba # libmamba will save you years of waiting
conda activate babylm 

cd evaluation-pipeline # you need to make sure you've fetched this submodule with git
pip install -e '.[dev]'
pip install torch 

unzip filter_data.zip
```

##### Running
I explain how to run the pipeline below; but strongly recommend writing/converting to a `.py`/`.sh` script instead if you are running a batch of these. Notebooks can time out after a while, or you know, you might want to close your computer and touch some grass while this stuff is running. See [delftblue-guide.md](./delftblue-guide.md) for more info. 

```sh 
# copy over the tokenizer (hf doesn't seem to follow symlinks, but it's only 0.5Mb)
cp 10k-tok/* results/models/{model_name}

python babylm_eval.py 'path/to/model_and_tokenizer' decoder # encoder for BERT
./finetune_all_tasks.sh 'path/to/model_and_tokenizer' 
python collect_results.py 'path/to/model_and_tokenizer'
``` 

In [10]:
import os, subprocess 

def do_eval(model_path: str, tok_path=os.path.abspath('10k-tok')):

    model_path = model_path[:-1] if model_path.endswith('/') else model_path
    model_name = model_path.split('/')[-1]
    model_type = 'decoder' if 'GPT' in model_name else 'encoder'
    
    # copy tokenizer configs to model directory for BabyLM
    subprocess.check_call(f'cp "{tok_path}"/* {model_path}', shell=True)

    print(f'''
        Evaluating \033[1m{model_name}\033[0m 
        with tokenizer at {tok_path}
    ''')

    log_file = os.path.join('logs', f'{model_name}.log')
    os.makedirs('logs', exist_ok=True)

    subprocess.check_call(f'echo sup bitch > {log_file}', shell=True)

    commands = f'''
        /bin/bash
        conda activate babylm 
        cd evaluation-pipeline 

        python babylm_eval.py {model_path} {model_type}
        ./finetune_all_tasks.sh {model_path}
        python collect_results.py {model_path}
    '''

    subprocess.Popen(commands, shell=True, 
                     stdin=subprocess.PIPE, stdout=open(log_file, 'w+'))


In [11]:
os.getcwd()

'/home/jovyan/work/code/common'

In [12]:
do_eval('/home/jovyan/work/code/common/results/models/GPT-9.0M-2L-4H-516C-1024I/')


        Evaluating [1mGPT-9.0M-2L-4H-516C-1024I[0m 
        with tokenizer at /home/jovyan/work/code/common/10k-tok
    


#### Push to HuggingFace


In [None]:
model.push_to_hub(model_name)