Model Interaction Code

Create virtual environment

setup the virtual environment

conda create -n msqa python=3.10
conda activate msqa
git clone https://github.com/ModelInteraction/MSQA.git
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Process pretrain Azure documentation

direct to pretrain_azure_doc/ and run the below commandline to download the azure documentation for pretrain
```
chmod +x clone_repos.sh
./clone_repos.sh
```
extract and rename markdown files and save to pretrain_azure_doc/data/
```
python save_azure.py
```
split the markwdown files into json file limited with max token length for pretrain, save json file in to pretrain_azure_doc/azure_json_output/
```
python process_azure.py
```

Process MSQA data

Note we only show sample MSQA data and full data will be available when the paper is accepted.

direct to msqa_process/
post process the msqa data collected from Microsoft Q&A forum
```
python post_process.py
```
split and save to train and test json, they should be saved to msqa_process/data/MSQA_train.json and msqa_process/data/MSQA_test.json, respectively.
```
python split.py
```

Pretrain and finetune

direct to train/

pretrain with Azure documentation following the commandline with DeepSpeed

deepspeed train.py \
--model_name_or_path {YOUR_MODEL_PATH} \
--data_path {AZURE_JSON_PATH} \
--output_dir {PRETRAIN_MODEL_SAVE_PATH} \
--num_train_epochs 8 \
--model_max_length 512 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 4 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True

where {AZURE_JSON_PATH} is the path where you save processed azure documentation json pretrain_azure_doc/azure_json_output/

finetune with MSQA train data previously saved in msqa_process/data/MSQA_train.json

deepspeed train.py \
--model_name_or_path {PRETRAIN_MODEL_SAVE_PATH} \
--data_path {MSQA_TRAIN_JSON_PATH} \
--output_dir {FINETUNE_MODEL_SAVE_PATH} \
--num_train_epochs 5 \
--model_max_length 512 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 20 \
--save_total_limit 3 \
--learning_rate 1e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True

Inference with finetuned model

generate domain knowledge with our finetuned model with the commandline

python inference.py \
--base_model= {FINETUNE_MODEL_SAVE_PATH} \
--infer_ids_path= {QUESTION_ID_TO_INFERENCE} \
--save_path= {RESULT_SAVE_PATH} \
--batch_size=1 \
--max_new_tokens=512 \
--num_beams=4

Result generation and evaluation

Once the domain-specific model output its response to the question, we perform LLM generation taking either our domain knowledge or the chunks from retrieval-based methods.
You should save your OAI key in the keybook.py and the endpoint function of LLM is in llm_components.py.
Standard metrics, including BLEU, ROUGE-1/2/L, METEOR, BERT-Score, SIM, are defined in eval_metrics.py.
Our proposed metrics
- CAR is defined in is_no_answer in eval_metrics.py.
- KHR is defined in KHR.py and keywords need to be extracted beforehand with keyword_extract.py.
- LLM-based metrics is defined in llm_eval.py.
result_generation.py contains all prompts to generate baseline results given either domain knowledge from our model or chunks from retrieval-based methods.
score_conflict.py and conflict_stat_plot.py is to get the conflict analysis from LLM-based metric and visualization, respectively.

Human evaluation UI

We also include the UI for human evaluators

Direct to ui/

Setup the python virtual environment

conda create -n humaneval python=3.10
conda activate humaneval
pip install -r requirements.txt

put the data to be evaluated in ui/human_eval_data/
prepare the data
```
python preprocess_human_eval_data.py
```
Run the UI
```
streamlit run qa_preference.py
```

Human evaluation analysis

Direct to human_annotataion/

put the .csv files of each human evaluator to human_annotation/data/
process the human evaluation
```
python annotation_process.py
```
output statistics and plot results in annotation_stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Interaction Code

Create virtual environment

Process pretrain Azure documentation

Process MSQA data

Pretrain and finetune

Inference with finetuned model

Result generation and evaluation

Human evaluation UI

Human evaluation analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
evaluation		evaluation
human_annotation		human_annotation
inference		inference
msqa_process		msqa_process
pretrain_azure_doc		pretrain_azure_doc
train		train
ui		ui
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ModelInteraction/MSQA

Folders and files

Latest commit

History

Repository files navigation

Model Interaction Code

Create virtual environment

Process pretrain Azure documentation

Process MSQA data

Pretrain and finetune

Inference with finetuned model

Result generation and evaluation

Human evaluation UI

Human evaluation analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages