diff --git a/README.md b/README.md index 398ec41..03a3e79 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,9 @@ - [LLaVA-Grounding Weights](#llava-grounding-weights) - [Demo](#demo) - [Training data](#training-data) + - [Flickr30k](#flickr30k) + - [COCO](#coco) + - [LLaVA](#llava) - [Training](#training) - [Citation](#citation) @@ -82,9 +85,13 @@ data │ ├── llava_instruct_150k.json │ ├── llava_instruct_150k_visual_prompt.json - ``` - +#### Flickr30k +Please refer to [MDETR's pre-processed flickr30k data](https://github.com/ashkamath/mdetr/blob/main/.github/flickr.md). +#### COCO +Please download coco train2014 and train2017 images and panoptic segmentation and semantic segmentation data. Other annoations can be downloaded [here](). +#### LLaVA +The processed annotations can be downloaded [here](). ### Training Stage 1 ```shell @@ -108,4 +115,11 @@ If you find LLaVA-Grounding useful for your research and applications, please ci year={2023}, booktitle={arXiv} } + +@misc{liu2023llava, + title={Visual Instruction Tuning}, + author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae}, + publisher={arXiv:2304.08485}, + year={2023} +} ```