What's Changed
- fix(qwen2_5_vl): pass video metadata to processor for correct temporal encoding by @kcz358 in #1269
- Fix missing VideoReader imports in FALCONBench and LongVideoBench by @akawincent in #1258
- Feat: add model support for penguinvl by @taintaintainu in #1257
- feat: add PushUpBench video repetition counting benchmark by @anonymous-atom in #1262
- Add VSI-SUPER benchmark by @akawincent in #1267
- add Qwen 3.5 chat model and example by @ArdalanM in #1264
- refactor: consolidate Qwen3-VL and Qwen3.5 into unified base class by @Luodian in #1270
- Add CambriansVSR/VSC/VSCStreaming model integrations by @akawincent in #1268
- [Task] Updated sitebench to report sub category score by @oscarqjh in #1282
- [Task] Report sub category score for 3DSRBench and Viewspatial by @oscarqjh in #1285
- fix: handle internvl_hf video-only inputs and enable frame sampling by @akawincent in #1279
- Add
trust_remote_codeparam for huggingface model. by @sablin39 in #1280 - feat: add Video-MME-v2 benchmark task by @mwxely in #1289
- Fix the incompatibility issue caused by
top_p=0when using vllm to inference (#1265) by @akawincent in #1277 - fix: preserve HME100k prediction case in OCRBench scoring by @akawincent in #1278
- feat: add process_results_use_image and video metadata dict support in task API by @Luodian in #1275
- feat: add COVER and WM-aBench video understanding benchmarks by @Luodian in #1273
- feat: add MMBench static evaluation mode (no OpenAI API needed) by @Luodian in #1276
- fix: improve evaluation logic across 10+ existing benchmarks by @Luodian in #1274
- feat: add video holmes and perceptioncomp by @ngquangtrung57 in #1296
- Add UniG2U benchmark task with model support by @nssmd in #1297
- chore: Delete gitignore scripts directory by @kcz358 in #1299
- Add WISE Benchmark Task by @Purshow in #1301
- feat: add LiteLLM as AI gateway backend by @RheagalFire in #1302
- feat: Support FastVideo for Video Generation Models by @pufanyi in #1303
- fix: release accelerator model refs during cleanup by @xk-huang in #1321
- fix(api/task): handle None generation responses in process_results by @dankit in #1311
- fix: preserve OpenAI max_new_tokens by @Genmin in #1318
- feat: add VideoNet benchmark by @yadavta in #1308
- feat: add ReVSI evaluation by @eamonn-zh in #1307
- feat: add TimeLens benchmark by @kcz358 in #1323
- feat: add HD-EPIC VQA benchmark (CVPR 2025) by @aliazani in #1316
- style: format HD-EPIC files by @aliazani in #1324
- feat: add JumpScore evaluation task by @mathCrazyy in #1329
- fix(jumpscore): align message format and video lookup by @mathCrazyy in #1330
- fix: handle text-only tasks and generate failure in llava_hf and instructblip by @Abhishek8108 in #1328
- feat: Add Spatial-DISE benchmark task by @shinmohuang in #1327
- fix: preserve per-request vLLM sampling params by @Travor278 in #1326
- feat: add LLaVA-OneVision2 chat model wrapper by @yiyexy in #1337
- fix(llava_onevision2): forward static images to image_processor by @yiyexy in #1344
- fix(mmmu_pro_vision): apply post_prompt to vision-only split by @ts-kim in #1336
- feat: add EgoTaskQA task (MCQ variant) by @njb-nvidia in #1338
- feat: add EgoPlan-Bench2 task by @njb-nvidia in #1339
- feat: add MetaVQA task by @njb-nvidia in #1340
- fix: update llava_onevision2 checkpoint repo path by @yiyexy with @Copilot in #1345
- feat: add Open-X VQA task by @njb-nvidia in #1346
- feat: add SAT task by @njb-nvidia in #1348
- feat: add CrossPoint-Bench task by @njb-nvidia in #1349
- feat: add RoboSpatial task by @njb-nvidia in #1347
- fix: add acc metric and fix data path for Video-MME-v2 by @EliYuan30 in #1351
- feat(llava_onevision2): add codec sub-mode (use_codec, codec_*) by @yiyexy in #1352
- feat: add Physical AI Understanding task by @njb-nvidia in #1353
- feat: add CRPE-Relation task by @njb-nvidia in #1354
- feat: add OmniSpatial task by @njb-nvidia in #1357
- feat: add MVP-Mini (minimal_video_pairs mini split) by @njb-nvidia in #1356
- feat: add HoliSafe task by @youngwanLEE in #1358
- feat(ovobench, chat): run OVO-Bench on chat models via multi-round by @kcz358 in #1359
- Fix PointBench image/question misalignment and add binary metric by @njb-nvidia in #1360
- feat(visfactor): add VisFactor benchmark task by @anxiangsir in #1362
- fix: pass list outputs through for generate_until_multi_round by @kcz358 in #1364
- feat(vstat): add VSTAT benchmark task by @pinzhihuang in #1363
- feat: add ExtremeWhenBench (hour-scale natural-language temporal grounding) by @min1321 in #1367
- [ICLR 2026] XmodBench. New MCQ benchmark + omni-LLM interleave wrappers by @XingruiWang in #1365
- feat: add Bedrock and local vLLM providers for llm_judge by @ShownX in #1298
- feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks by @min1321 in #1366
- fix: add ChartQAPro utils and MMT/ScreenSpot fixes by @kcz358 in #1369
New Contributors
- @anonymous-atom made their first contribution in #1262
- @sablin39 made their first contribution in #1280
- @nssmd made their first contribution in #1297
- @Purshow made their first contribution in #1301
- @RheagalFire made their first contribution in #1302
- @xk-huang made their first contribution in #1321
- @dankit made their first contribution in #1311
- @Genmin made their first contribution in #1318
- @yadavta made their first contribution in #1308
- @eamonn-zh made their first contribution in #1307
- @aliazani made their first contribution in #1316
- @Abhishek8108 made their first contribution in #1328
- @shinmohuang made their first contribution in #1327
- @Travor278 made their first contribution in #1326
- @yiyexy made their first contribution in #1337
- @ts-kim made their first contribution in #1336
- @njb-nvidia made their first contribution in #1338
- @yiyexy with @Copilot made their first contribution in #1345
- @EliYuan30 made their first contribution in #1351
- @youngwanLEE made their first contribution in #1358
- @anxiangsir made their first contribution in #1362
- @pinzhihuang made their first contribution in #1363
- @min1321 made their first contribution in #1367
- @XingruiWang made their first contribution in #1365
- @ShownX made their first contribution in #1298
Full Changelog: v0.7.1...v0.7.2