LLaGA: Large Language and Graph Assistant

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah and Zhangyang Wang*. (*Correspondence )

**VITA Lab@University of Texas at Austin, Snap Inc.

Step 1: Environment Preparation

# create a new environment
conda create -n llaga python=3.10
conda activate llaga

# install pytorch. Modify the command to align with your own CUDA version.
pip3 install torch  --index-url https://download.pytorch.org/whl/cu118

# install related libraries
pip install -r requirements.txt

# install flash-attn
pip install flash-attn --no-build-isolation

# install pyg
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu118.html

Step 2: Data Preparation

Download our datasets from Box (updated link prediction data on 4/11/2024). And move the processed data to ./dataset

.
├── README.md
├── __init__.py
├── doc
├── dataset
│   ├── ogbn-arxiv
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── ogbn-products
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   └── pubmed
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── cora
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── laplacian_2_10.pt
│   ├── laplacian_2_20.pt
│   ├── laplacian_2_5.pt
├── eval
├── model
├── requirements.txt
├── scripts
├── train
├── utils

Step 3: Training

To execute the training process, you can run either ./scripts/train.sh or ./scripts/train_deepspeed.sh. The usage instructions are as follows:

#Auguments
# $1 = model type, e.g. vicuna, vicuna_4hop
# $2 = training task Use nc/lp/nd for single task. For multiple tasks combined, connect these abbreviations with '-', e.g. nc-lp
# $3 = dataset arxiv/products/pubmed/cora  For multiple datasets combined, connect these abbreviations with '-', and use '.n' to repeat multiple times e.g. arxiv-products-pubmed-cora.3 means using arxiv+products+pubmed+cora, and repeat cora for 3 times
# $4 = batch size  default: 16
# $5 = embedding e.g. simteg, sbert, roberta


#  training on single GPU
CUDA_VISIBLE_DEVICES=0 ./scripts/train.sh vicuna nc-lp  arxiv-products-pubmed-cora.3 16 simteg

#  training on multiple GPU
./scripts/train_deepspeed.sh vicuna nc-lp  arxiv-products-pubmed-cora.3 4 simteg

We also uploaded four general projectors to huggingface.

Setting	Template	Repo
General Model	HO	Runjin/llaga-vicuna-7b-simteg-HO-general_model-2-layer-mlp-projector
General Model	ND	Runjin/llaga-vicuna-7b-simteg-ND-general_model-2-layer-mlp-projector
Classification Expert	HO	Runjin/llaga-vicuna-7b-simteg-HO-classification_expert-linear-projector
Classification Expert	ND	Runjin/llaga-vicuna-7b-simteg-ND-classification_expert-linear-projector

Step 4: Evaluation

You can evaluate LLaGA with the command:

model_path="/path/to/projector" # local path or huggingface repo
model_base="lmsys/vicuna-7b-v1.5-16k" #meta-llama/Llama-2-7b-hf
mode="v1" # use 'llaga_llama_2' for llama and "v1" for others
dataset="arxiv" #test dataset
task="nc" #test task
emb="simteg"
use_hop=2 # 2 for ND and 4 for HO
sample_size=10
template="ND"
output_path="/path/to/output"

python eval/eval_pretrain.py \
--model_path ${model_path} \
--model_base ${model_base} \
--conv_mode  ${mode} \
--dataset ${dataset} \
--pretrained_embedding_type ${emb} \
--use_hop ${use_hop} \
--sample_neighbor_size ${sample_size} \
--answers_file ${output_path} \
--task ${task} \
--cache_dir ../../checkpoint \
--template ${template}

To evaluate your predicted results, please run:

python eval/eval_res.py --dataset ${dataset} --task ${task}  --res_path ${output_path}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

eval

eval

model

model

scripts

scripts

train

train

utils

utils

LICENSE

LICENSE

README.md

README.md

init.py

init.py

requirements.txt

requirements.txt

Repository files navigation

LLaGA: Large Language and Graph Assistant

Step 1: Environment Preparation

Step 2: Data Preparation

Step 3: Training

Step 4: Evaluation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
doc		doc
eval		eval
model		model
scripts		scripts
train		train
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

License

VITA-Group/LLaGA

Folders and files

Latest commit

History

Repository files navigation

LLaGA: Large Language and Graph Assistant

Step 1: Environment Preparation

Step 2: Data Preparation

Step 3: Training

Step 4: Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages