# laser-dolphin-mixtral-2x7b-dpo example

To recreate laser-dolphin-mixtral-2x7-dpo there are two choices for how you can achieve this.

1. Merge two models and then apply the laser process to the merged product.

2. Laser two separate models first and then merge them.


The original laser-dolphin-mixtral-2x7-dpo model utilizes the first method. However, instructions for both methods are provided for completeness.


#### General notes

+ **Colab Compatibility**: This script encounters issues when run in Colab, specifically with a shell script error related to the -f 5 flag (lm_eval: error: unrecognized arguments: -f 5). It functions correctly in local environments, including Jupyter and VS Code.

+ **Scalability for Multiple Experts**: Method 2 is scalable for any number of experts, limited only by available compute resources. Simply add more models with appropriate positive prompts.

+ **Handling VRAM Limitations**: For environments with VRAM constraints, consider using the load_in_4bit flag:

```bash
lm_eval --model hf \
    --model_args pretrained=$MODEL_NAME,trust_remote_code=True,load_in_4bit=True \
    --tasks mmlu -f 5 \
    --device cuda:0 \
    --batch_size 1
```

**Not recommneded for optimal results but it should work to use the script

+ **Repository Setup**:

In [None]:
!git clone https://github.com/cognitivecomputations/laserRMT.git
%cd laserRMT
!pip install -r requirements.txt

# Method 1

I made a dolphin-mixtral-2x7b that has not been lasered for the demo, but the original repo is ```macadeliccc/laser-dolphin-mixtral-2x7b-dpo``` as well as the 4x7b variant.

+ **Execution commands:**

In [None]:
!chmod +x ./script_lm_eval.sh
!./script_lm_eval.sh "macadeliccc/dolphin-mixtral-2x7b"

The output of Method 1 is the lasered version of dolphin-mixtral-2x7b.

The original "pre-lasered" version of the model is available [here](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)

# Method 2

This method involves steps that extend beyond the scope of the laserRMT repository. Only essential information and resources are included for brevity.

+ **Step 1:** Select your two models, use the context of the base model as your reference. You should not merge models with vastly different context for this method.
+ **Step 2:** Laser each model individually
+ **Step 3:** Merge the models using the mixtral branch of mergekit.

In [None]:
!./script_lm_eval.sh "cognitivecomputations/dolphin-2.1-mistral-7b"

Run these cells separately. This process will likely take several hours.

In [None]:
!./script_lm_eval.sh "teknium/OpenHermes-2.5-Mistral-7B"

This is where the notebook diverges from laserRMT scope.

Once you have completed two successful lasers, you are now ready to begin the merge process.


In [None]:
%cd ..
!git clone --branch mixtral https://github.com/cg123/mergekit.git
%cd mergekit
!pip install -r requirements.txt

Create your ```config.yaml``` file

If you want more information on this process you can find it [here](https://github.com/cg123/mergekit/blob/4de2a3310eb135363d6588e92f2ba5fb20893361/moe.md)

+ **Example config**:

In [None]:
base_model: cognitivecomputations/dolphin-2.6-mistral-7b-dpo
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: teknium/OpenHermes-2.5-Mistral-7B
    positive_prompts:
      - "instruction"
      - "solutions"
      - "chat"
      - "questions"
      - "comprehension"
      
  - source_model: cognitivecomputations/dolphin-2.6-mistral-7b-dpo
    positive_prompts:
      - "mathematics"
      - "optimization"
      - "code"
      - "step-by-step"
      - "science"

Place your config.yml file in the mergekit/examples directory or wherever you would like it to be.

+ **Run the command**:

In [None]:
mergekit-moe examples/your-config.yml ./your-output-directory

The result of this method is a 2x7b mixtral model that consists of two lasered mistral models.

If you wish to evaluate the model afterwards you can use my [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk#scrollTo=cGJR1zWJsYXG) which provides an evaluation script that you can use if you decide to upload the model to Huggingface. The script also works for safetensors but you will need to replicate it locally for that.




### Resources 

+ [mergekit](https://github.com/cg123/mergekit/tree/mixtral)
+ [cognitivecomputations/dolphin-2.6-mistral-7b-dpo](https://huggingface.co/cognitivecomputations/dolphin-2.6-mistral-7b-dpo)
+ [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)