Release v0.7.2 · EvolvingLMMs-Lab/lmms-eval

What's Changed

fix(qwen2_5_vl): pass video metadata to processor for correct temporal encoding by @kcz358 in #1269
Fix missing VideoReader imports in FALCONBench and LongVideoBench by @akawincent in #1258
Feat: add model support for penguinvl by @taintaintainu in #1257
feat: add PushUpBench video repetition counting benchmark by @anonymous-atom in #1262
Add VSI-SUPER benchmark by @akawincent in #1267
add Qwen 3.5 chat model and example by @ArdalanM in #1264
refactor: consolidate Qwen3-VL and Qwen3.5 into unified base class by @Luodian in #1270
Add CambriansVSR/VSC/VSCStreaming model integrations by @akawincent in #1268
[Task] Updated sitebench to report sub category score by @oscarqjh in #1282
[Task] Report sub category score for 3DSRBench and Viewspatial by @oscarqjh in #1285
fix: handle internvl_hf video-only inputs and enable frame sampling by @akawincent in #1279
Add trust_remote_code param for huggingface model. by @sablin39 in #1280
feat: add Video-MME-v2 benchmark task by @mwxely in #1289
Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265) by @akawincent in #1277
fix: preserve HME100k prediction case in OCRBench scoring by @akawincent in #1278
feat: add process_results_use_image and video metadata dict support in task API by @Luodian in #1275
feat: add COVER and WM-aBench video understanding benchmarks by @Luodian in #1273
feat: add MMBench static evaluation mode (no OpenAI API needed) by @Luodian in #1276
fix: improve evaluation logic across 10+ existing benchmarks by @Luodian in #1274
feat: add video holmes and perceptioncomp by @ngquangtrung57 in #1296
Add UniG2U benchmark task with model support by @nssmd in #1297
chore: Delete gitignore scripts directory by @kcz358 in #1299
Add WISE Benchmark Task by @Purshow in #1301
feat: add LiteLLM as AI gateway backend by @RheagalFire in #1302
feat: Support FastVideo for Video Generation Models by @pufanyi in #1303
fix: release accelerator model refs during cleanup by @xk-huang in #1321
fix(api/task): handle None generation responses in process_results by @dankit in #1311
fix: preserve OpenAI max_new_tokens by @Genmin in #1318
feat: add VideoNet benchmark by @yadavta in #1308
feat: add ReVSI evaluation by @eamonn-zh in #1307
feat: add TimeLens benchmark by @kcz358 in #1323
feat: add HD-EPIC VQA benchmark (CVPR 2025) by @aliazani in #1316
style: format HD-EPIC files by @aliazani in #1324
feat: add JumpScore evaluation task by @mathCrazyy in #1329
fix(jumpscore): align message format and video lookup by @mathCrazyy in #1330
fix: handle text-only tasks and generate failure in llava_hf and instructblip by @Abhishek8108 in #1328
feat: Add Spatial-DISE benchmark task by @shinmohuang in #1327
fix: preserve per-request vLLM sampling params by @Travor278 in #1326
feat: add LLaVA-OneVision2 chat model wrapper by @yiyexy in #1337
fix(llava_onevision2): forward static images to image_processor by @yiyexy in #1344
fix(mmmu_pro_vision): apply post_prompt to vision-only split by @ts-kim in #1336
feat: add EgoTaskQA task (MCQ variant) by @njb-nvidia in #1338
feat: add EgoPlan-Bench2 task by @njb-nvidia in #1339
feat: add MetaVQA task by @njb-nvidia in #1340
fix: update llava_onevision2 checkpoint repo path by @yiyexy with @Copilot in #1345
feat: add Open-X VQA task by @njb-nvidia in #1346
feat: add SAT task by @njb-nvidia in #1348
feat: add CrossPoint-Bench task by @njb-nvidia in #1349
feat: add RoboSpatial task by @njb-nvidia in #1347
fix: add acc metric and fix data path for Video-MME-v2 by @EliYuan30 in #1351
feat(llava_onevision2): add codec sub-mode (use_codec, codec_*) by @yiyexy in #1352
feat: add Physical AI Understanding task by @njb-nvidia in #1353
feat: add CRPE-Relation task by @njb-nvidia in #1354
feat: add OmniSpatial task by @njb-nvidia in #1357
feat: add MVP-Mini (minimal_video_pairs mini split) by @njb-nvidia in #1356
feat: add HoliSafe task by @youngwanLEE in #1358
feat(ovobench, chat): run OVO-Bench on chat models via multi-round by @kcz358 in #1359
Fix PointBench image/question misalignment and add binary metric by @njb-nvidia in #1360
feat(visfactor): add VisFactor benchmark task by @anxiangsir in #1362
fix: pass list outputs through for generate_until_multi_round by @kcz358 in #1364
feat(vstat): add VSTAT benchmark task by @pinzhihuang in #1363
feat: add ExtremeWhenBench (hour-scale natural-language temporal grounding) by @min1321 in #1367
[ICLR 2026] XmodBench. New MCQ benchmark + omni-LLM interleave wrappers by @XingruiWang in #1365
feat: add Bedrock and local vLLM providers for llm_judge by @ShownX in #1298
feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks by @min1321 in #1366
fix: add ChartQAPro utils and MMT/ScreenSpot fixes by @kcz358 in #1369

New Contributors

@anonymous-atom made their first contribution in #1262
@sablin39 made their first contribution in #1280
@nssmd made their first contribution in #1297
@Purshow made their first contribution in #1301
@RheagalFire made their first contribution in #1302
@xk-huang made their first contribution in #1321
@dankit made their first contribution in #1311
@Genmin made their first contribution in #1318
@yadavta made their first contribution in #1308
@eamonn-zh made their first contribution in #1307
@aliazani made their first contribution in #1316
@Abhishek8108 made their first contribution in #1328
@shinmohuang made their first contribution in #1327
@Travor278 made their first contribution in #1326
@yiyexy made their first contribution in #1337
@ts-kim made their first contribution in #1336
@njb-nvidia made their first contribution in #1338
@yiyexy with @Copilot made their first contribution in #1345
@EliYuan30 made their first contribution in #1351
@youngwanLEE made their first contribution in #1358
@anxiangsir made their first contribution in #1362
@pinzhihuang made their first contribution in #1363
@min1321 made their first contribution in #1367
@XingruiWang made their first contribution in #1365
@ShownX made their first contribution in #1298

Full Changelog: v0.7.1...v0.7.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.7.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!