20,000 Image caption data of diverse scenes including natural scenes, urban street scenes, exhibitions, family environments and other scenes, shot with different brands of cameras, including multiple time periods, multiple shooting angles, description language is English, mainly describes the main scenes in the image, usually including foreground and background description.
For more details, please refer to the link: https://www.nexdata.ai/datasets/llm/1283?source=Github
10,000 images
including natural scenes, urban street scenes, shopping mall scenes, exhibitions, family environment, displays and other scenes
various brands of cameras
multiple scenes, multiple time periods, multiple shooting angles
image format is .jpg, text format is .txt
English, Chinese
in principle, 30~60 words, usually 3-5 sentences
the main scene in the image, usually including foreground and background description
the proportion of correctly labeled images is not less than 97%
Commercial License