# Strategy Validation

The strategy to be validated consists of several steps:

1. Structured query selection: In the first step, papers are selected based on a structured search query. The goal is to identify papers that reference datasets within a specific research domain.

2. Title + Abstract screening with GEMINI: Titles and abstracts are then screened using GEMINI to determine which papers meet predefined inclusion and exclusion criteria. At this stage, an initial extraction of datasets is also performed if they are explicitly mentioned.

3. Full-text screening with GEMINI: Finally, for papers that have not been excluded and for which datasets have not yet been associated, an additional screening is applied to the full text to extract any remaining relevant dataset information.

## Step 1: Structured Query Selection

The following queries are used to select papers for the validation study:

### IEEE Xplore

("All Metadata": "computer vision")

AND ("All Metadata": "action quality assessment" OR AQA)

AND ("All Metadata": dataset OR "data set" OR database OR "data catalogue" OR "data repository" OR "data sharing" OR "open data")

To ADD Filters:

Year: 2013-2024

Document Type: Article + Conference Paper

-> Language Filter doesn't exists


### Scopus

TITLE-ABS-KEY ("computer vision")

AND TITLE-ABS-KEY ("action quality assessment" OR AQA)

AND TITLE-ABS-KEY (dataset OR "data set" OR database OR "data catalogue" OR "data repository" OR "data sharing" OR "open data")

AND PUBYEAR > 2012 AND PUBYEAR < 2025

AND (LIMIT-TO(DOCTYPE,"ar") OR LIMIT-TO(DOCTYPE,"cp"))

AND (LIMIT-TO(LANGUAGE,"English"))


### Web of Science

TS=("computer vision")

AND TS=("action quality assessment" OR AQA)

AND TS=(dataset OR "data set" OR database OR "data catalogue" OR "data repository" OR "data sharing" OR "open data")

AND PY=(2013-2024)

AND DT=(Article OR Proceedings Paper)

AND LA=(English)

In [1]:
import os
from collections import deque
from dotenv import load_dotenv
from google import genai
import pandas as pd
from pydantic import BaseModel, Field
import time
from typing import Annotated, Literal

folder_path = 'data/validation/'

ieee_file = folder_path + 'ieee_validation.csv'
scopus_file = folder_path + 'scopus_validation.csv'
wos_file = folder_path + 'wos_validation.xls'

# IEEE Xplore

In [2]:
df_ieee = pd.read_csv(ieee_file)
df_ieee

Unnamed: 0,Document Title,Authors,Author Affiliations,Publication Title,Date Added To Xplore,Publication Year,Volume,Issue,Start Page,End Page,...,Mesh_Terms,Article Citation Count,Patent Citation Count,Reference Count,License,Online Date,Issue Date,Meeting Date,Publisher,Document Identifier
0,FineDiving: A Fine-grained Dataset for Procedu...,J. Xu; Y. Rao; X. Yu; G. Chen; J. Zhou; J. Lu,"Department of Automation, Tsinghua University,...",2022 IEEE/CVF Conference on Computer Vision an...,27 Sep 2022,2022,,,2939,2948,...,,85,,50,IEEE,27 Sep 2022,,,IEEE,IEEE Conferences
1,LOGO: A Long-Form Video Dataset for Group Acti...,S. Zhang; W. Dai; S. Wang; X. Shen; J. Lu; J. ...,"Shenzhen International Graduate School, Tsingh...",2023 IEEE/CVF Conference on Computer Vision an...,22 Aug 2023,2023,,,2405,2414,...,,31,,58,IEEE,22 Aug 2023,,,IEEE,IEEE Conferences
2,PECoP: Parameter Efficient Continual Pretraini...,A. Dadashzadeh; S. Duan; A. Whone; M. Mirmehdi,"School of Computer Science, University of Bris...",2024 IEEE/CVF Winter Conference on Application...,9 Apr 2024,2024,,,42,52,...,,17,,51,IEEE,9 Apr 2024,,,IEEE,IEEE Conferences
3,FineParser: A Fine-Grained Spatio-Temporal Act...,J. Xu; S. Yin; G. Zhao; Z. Wang; Y. Peng,"School of Intelligence Science and Technology,...",2024 IEEE/CVF Conference on Computer Vision an...,16 Sep 2024,2024,,,14628,14637,...,,16,,42,IEEE,16 Sep 2024,,,IEEE,IEEE Conferences
4,A Survey of Video-based Action Quality Assessment,S. Wang; D. Yang; P. Zhai; Q. Yu; T. Suo; Z. S...,"Institute of AI & Robotics, Fudan University, ...",2021 International Conference on Networking Sy...,19 Apr 2022,2021,,,1,9,...,,16,,61,IEEE,19 Apr 2022,,,IEEE,IEEE Conferences
5,Action Quality Assessment Across Multiple Actions,P. Parmar; B. Morris,"University of Nevada, Las Vegas; University of...",2019 IEEE Winter Conference on Applications of...,7 Mar 2019,2019,,,1468,1476,...,,109,,28,IEEE,7 Mar 2019,,,IEEE,IEEE Conferences
6,Group-aware Contrastive Regression for Action ...,X. Yu; Y. Rao; W. Zhao; J. Lu; J. Zhou,"Department of Automation, Tsinghua University,...",2021 IEEE/CVF International Conference on Comp...,28 Feb 2022,2021,,,7899,7908,...,,85,,42,IEEE,28 Feb 2022,,,IEEE,IEEE Conferences
7,Tai Chi Action Quality Assessment and Visual A...,J. Li; H. Hu; Q. Xing; X. Wang; J. Li; Y. Shen,"School of Sports Engineering, Beijing Sports U...",2022 IEEE 24th International Workshop on Multi...,22 Nov 2022,2022,,,1,6,...,,6,,26,IEEE,22 Nov 2022,,,IEEE,IEEE Conferences
8,Uncertainty-Aware Score Distribution Learning ...,Y. Tang; Z. Ni; J. Zhou; D. Zhang; J. Lu; Y. W...,"Department of Automation, Tsinghua University,...",2020 IEEE/CVF Conference on Computer Vision an...,5 Aug 2020,2020,,,9836,9845,...,,120,,44,IEEE,5 Aug 2020,,,IEEE,IEEE Conferences
9,What and How Well You Performed? A Multitask L...,P. Parmar; B. T. Morris,"University of Nevada, Las Vegas; University of...",2019 IEEE/CVF Conference on Computer Vision an...,9 Jan 2020,2019,,,304,313,...,,146,,33,IEEE,9 Jan 2020,,,IEEE,IEEE Conferences


In [3]:
df_ieee = df_ieee[['DOI', 'Document Title', 'Authors', 'Publication Year', 'Abstract', 'Article Citation Count']]

df_ieee = df_ieee.rename(columns={
    'Document Title': 'Title',
    'Publication Year': 'Year',
    'Article Citation Count': 'Citations'
})

df_ieee

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1109/CVPR52688.2022.00296,FineDiving: A Fine-grained Dataset for Procedu...,J. Xu; Y. Rao; X. Yu; G. Chen; J. Zhou; J. Lu,2022,Most existing action quality assessment method...,85
1,10.1109/CVPR52729.2023.00238,LOGO: A Long-Form Video Dataset for Group Acti...,S. Zhang; W. Dai; S. Wang; X. Shen; J. Lu; J. ...,2023,Action quality assessment (AQA) has become an ...,31
2,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,A. Dadashzadeh; S. Duan; A. Whone; M. Mirmehdi,2024,The limited availability of labelled data in A...,17
3,10.1109/CVPR52733.2024.01386,FineParser: A Fine-Grained Spatio-Temporal Act...,J. Xu; S. Yin; G. Zhao; Z. Wang; Y. Peng,2024,Existing action quality assessment (AQA) metho...,16
4,10.1109/INSAI54028.2021.00029,A Survey of Video-based Action Quality Assessment,S. Wang; D. Yang; P. Zhai; Q. Yu; T. Suo; Z. S...,2021,Human action recognition and analysis have gre...,16
5,10.1109/WACV.2019.00161,Action Quality Assessment Across Multiple Actions,P. Parmar; B. Morris,2019,Can learning to measure the quality of an acti...,109
6,10.1109/ICCV48922.2021.00782,Group-aware Contrastive Regression for Action ...,X. Yu; Y. Rao; W. Zhao; J. Lu; J. Zhou,2021,Assessing action quality is challenging due to...,85
7,10.1109/MMSP55362.2022.9949464,Tai Chi Action Quality Assessment and Visual A...,J. Li; H. Hu; Q. Xing; X. Wang; J. Li; Y. Shen,2022,Visual-based human action analysis is an impor...,6
8,10.1109/CVPR42600.2020.00986,Uncertainty-Aware Score Distribution Learning ...,Y. Tang; Z. Ni; J. Zhou; D. Zhang; J. Lu; Y. W...,2020,Assessing action quality from videos has attra...,120
9,10.1109/CVPR.2019.00039,What and How Well You Performed? A Multitask L...,P. Parmar; B. T. Morris,2019,Can performance on the task of action quality ...,146


In [4]:
df_ieee = df_ieee.dropna(subset=['DOI'])
df_ieee = df_ieee.dropna(subset=['Citations'])
df_ieee

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1109/CVPR52688.2022.00296,FineDiving: A Fine-grained Dataset for Procedu...,J. Xu; Y. Rao; X. Yu; G. Chen; J. Zhou; J. Lu,2022,Most existing action quality assessment method...,85
1,10.1109/CVPR52729.2023.00238,LOGO: A Long-Form Video Dataset for Group Acti...,S. Zhang; W. Dai; S. Wang; X. Shen; J. Lu; J. ...,2023,Action quality assessment (AQA) has become an ...,31
2,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,A. Dadashzadeh; S. Duan; A. Whone; M. Mirmehdi,2024,The limited availability of labelled data in A...,17
3,10.1109/CVPR52733.2024.01386,FineParser: A Fine-Grained Spatio-Temporal Act...,J. Xu; S. Yin; G. Zhao; Z. Wang; Y. Peng,2024,Existing action quality assessment (AQA) metho...,16
4,10.1109/INSAI54028.2021.00029,A Survey of Video-based Action Quality Assessment,S. Wang; D. Yang; P. Zhai; Q. Yu; T. Suo; Z. S...,2021,Human action recognition and analysis have gre...,16
5,10.1109/WACV.2019.00161,Action Quality Assessment Across Multiple Actions,P. Parmar; B. Morris,2019,Can learning to measure the quality of an acti...,109
6,10.1109/ICCV48922.2021.00782,Group-aware Contrastive Regression for Action ...,X. Yu; Y. Rao; W. Zhao; J. Lu; J. Zhou,2021,Assessing action quality is challenging due to...,85
7,10.1109/MMSP55362.2022.9949464,Tai Chi Action Quality Assessment and Visual A...,J. Li; H. Hu; Q. Xing; X. Wang; J. Li; Y. Shen,2022,Visual-based human action analysis is an impor...,6
8,10.1109/CVPR42600.2020.00986,Uncertainty-Aware Score Distribution Learning ...,Y. Tang; Z. Ni; J. Zhou; D. Zhang; J. Lu; Y. W...,2020,Assessing action quality from videos has attra...,120
9,10.1109/CVPR.2019.00039,What and How Well You Performed? A Multitask L...,P. Parmar; B. T. Morris,2019,Can performance on the task of action quality ...,146


### Total of 17 papers identified against 50 of the original paper
### *without 'Computer Vision' filter (domain), total of 50 papers

# Scopus

In [5]:
df_scopus = pd.read_csv(scopus_file)
df_scopus

Unnamed: 0,Authors,Author full names,Author(s) ID,Title,Year,Source title,Volume,Issue,Art. No.,Page start,...,ISSN,ISBN,CODEN,PubMed ID,Language of Original Document,Document Type,Publication Stage,Open Access,Source,EID
0,"A., Dadashzadeh, Amirhossein; S., Duan, Shucha...","Dadashzadeh, Amirhossein (57213067843); Duan, ...",57213067843; 58722692300; 6602648827; 7004105162,PECoP: Parameter Efficient Continual Pretraini...,2024,,,,,42.0,...,,9798350318920,,,English,Conference paper,Final,,Scopus,2-s2.0-85191968585
1,"W., Wang, Wei; H., Wang, Hongyu; Y., Hao, Ying...","Wang, Wei (57192615575); Wang, Hongyu (2203706...",57192615575; 22037060600; 57199703948; 5921058...,Action Quality Assessment with Multi-scale Tem...,2024,,,,,247.0,...,,9798350361445,,,English,Conference paper,Final,,Scopus,2-s2.0-85197889824
2,"P., Lian, Puxiang; Z., Shao, Zhigang","Lian, Puxiang (58696312500); Shao, Zhigang (35...",58696312500; 35574802700,Improving action quality assessment with acros...,2023,Applied Intelligence,53,24.0,,30443.0,...,15737497; 0924669X,9780511611445; 9780521884280,APITE,,English,Article,Final,,Scopus,2-s2.0-85176769140
3,"K., Zhou, Kanglei; Y., Ma, Yue; H.P., Shum, Hu...","Zhou, Kanglei (57205674291); Ma, Yue (57218566...",57205674291; 57218566782; 25032239300; 7401735847,Hierarchical Graph Convolutional Networks for ...,2023,IEEE Transactions on Circuits and Systems for ...,33,12.0,3281413,7749.0,...,10518215,,ITCTE,,English,Article,Final,All Open Access; Green Accepted Open Access; G...,Scopus,2-s2.0-85161002529
4,"N., Hao, Ning; S., Ruan, Sihan; Y., Song, Yihe...","Hao, Ning (57207816180); Ruan, Sihan (57226663...",57207816180; 57226663703; 57221792710; 5731641...,The Establishment of a precise intelligent eva...,2023,Heliyon,9,11.0,e21361,,...,24058440,,,,English,Article,Final,All Open Access; Gold Open Access; Green Final...,Scopus,2-s2.0-85174462280
5,"Y., Liu, Yanchao; X., Cheng, Xina; T., Ikenaga...","Liu, Yanchao (58262316100); Cheng, Xina (56621...",58262316100; 56621799800; 8882572600,A Figure Skating Jumping Dataset for Replay-Gu...,2023,,,,,2437.0,...,,9798400701085,,,English,Conference paper,Final,,Scopus,2-s2.0-85179548737
6,"K., Zhou, Kanglei; R., Cai, Ruizhi; Y., Ma, Yu...","Zhou, Kanglei (57205674291); Cai, Ruizhi (5812...",57205674291; 58128062700; 57218566782; 5812364...,A Video-Based Augmented Reality System for Hum...,2023,IEEE Transactions on Visualization and Compute...,29,5.0,,2456.0,...,10772626,,ITVGE,37027743.0,English,Article,Final,,Scopus,2-s2.0-85149403925
7,"H., Zhou, Haoyang; T., Hou, Teng; J., Li, Jitao","Zhou, Haoyang (58542159800); Hou, Teng (587598...",58542159800; 58759809500; 58759781700,Prior Knowledge-guided Hierarchical Action Qua...,2023,Journal of Physics: Conference Series,2632,1.0,012027,,...,17426588; 17426596,9788394593742; 9781628905861,,,English,Conference paper,Final,All Open Access; Gold Open Access,Scopus,2-s2.0-85179582633
8,"G.A., Kumie, Gedamu Alemu; Y., Ji, Yanli; Y., ...","Kumie, Gedamu Alemu (57221749042); Ji, Yanli (...",57221749042; 36677523000; 57222954946; 5700203...,Fine-Grained Spatio-Temporal Parsing Network f...,2023,IEEE Transactions on Image Processing,32,,,6386.0,...,10577149,,IIPRE,37963006.0,English,Article,Final,,Scopus,2-s2.0-85177080222
9,"M., Chariar, Mukundan; S., Rao, Shreyas; A., I...","Chariar, Mukundan (58629585600); Rao, Shreyas ...",58629585600; 58629795200; 58628759000; 5721364...,AI Trainer: Autoencoder Based Approach for Squ...,2023,IEEE Access,11,,,107135.0,...,21693536,,,,English,Article,Final,All Open Access; Gold Open Access,Scopus,2-s2.0-85173050326


In [6]:
df_scopus = df_scopus[['DOI', 'Title', 'Authors', 'Year', 'Abstract', 'Cited by']]

df_scopus = df_scopus.rename(columns={
    'Cited by': 'Citations'
})

df_scopus

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,"A., Dadashzadeh, Amirhossein; S., Duan, Shucha...",2024,The limited availability of labelled data in A...,15
1,10.1109/ICAACE61206.2024.10548995,Action Quality Assessment with Multi-scale Tem...,"W., Wang, Wei; H., Wang, Hongyu; Y., Hao, Ying...",2024,Action quality assessment is a more challengin...,0
2,10.1007/s10489-023-05166-3,Improving action quality assessment with acros...,"P., Lian, Puxiang; Z., Shao, Zhigang",2023,Action quality assessment is a significant res...,4
3,10.1109/TCSVT.2023.3281413,Hierarchical Graph Convolutional Networks for ...,"K., Zhou, Kanglei; Y., Ma, Yue; H.P., Shum, Hu...",2023,Action quality assessment (AQA) automatically ...,41
4,10.1016/j.heliyon.2023.e21361,The Establishment of a precise intelligent eva...,"N., Hao, Ning; S., Ruan, Sihan; Y., Song, Yihe...",2023,The introduction of action quality assessment ...,6
5,10.1145/3581783.3613774,A Figure Skating Jumping Dataset for Replay-Gu...,"Y., Liu, Yanchao; X., Cheng, Xina; T., Ikenaga...",2023,"In competitive sports, judges often scrutinize...",8
6,10.1109/TVCG.2023.3247092,A Video-Based Augmented Reality System for Hum...,"K., Zhou, Kanglei; R., Cai, Ruizhi; Y., Ma, Yu...",2023,As the most common idiopathic inflammatory myo...,16
7,10.1088/1742-6596/2632/1/012027,Prior Knowledge-guided Hierarchical Action Qua...,"H., Zhou, Haoyang; T., Hou, Teng; J., Li, Jitao",2023,"Recently, there has been a growing interest in...",1
8,10.1109/TIP.2023.3331212,Fine-Grained Spatio-Temporal Parsing Network f...,"G.A., Kumie, Gedamu Alemu; Y., Ji, Yanli; Y., ...",2023,Action Quality Assessment (AQA) plays an impor...,13
9,10.1109/ACCESS.2023.3316009,AI Trainer: Autoencoder Based Approach for Squ...,"M., Chariar, Mukundan; S., Rao, Shreyas; A., I...",2023,Artificial intelligence and computer vision ha...,18


In [7]:
df_scopus = df_scopus.dropna(subset=['DOI'])
df_scopus = df_scopus.loc[df_scopus['Citations'] > 0]
df_scopus

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,"A., Dadashzadeh, Amirhossein; S., Duan, Shucha...",2024,The limited availability of labelled data in A...,15
2,10.1007/s10489-023-05166-3,Improving action quality assessment with acros...,"P., Lian, Puxiang; Z., Shao, Zhigang",2023,Action quality assessment is a significant res...,4
3,10.1109/TCSVT.2023.3281413,Hierarchical Graph Convolutional Networks for ...,"K., Zhou, Kanglei; Y., Ma, Yue; H.P., Shum, Hu...",2023,Action quality assessment (AQA) automatically ...,41
4,10.1016/j.heliyon.2023.e21361,The Establishment of a precise intelligent eva...,"N., Hao, Ning; S., Ruan, Sihan; Y., Song, Yihe...",2023,The introduction of action quality assessment ...,6
5,10.1145/3581783.3613774,A Figure Skating Jumping Dataset for Replay-Gu...,"Y., Liu, Yanchao; X., Cheng, Xina; T., Ikenaga...",2023,"In competitive sports, judges often scrutinize...",8
6,10.1109/TVCG.2023.3247092,A Video-Based Augmented Reality System for Hum...,"K., Zhou, Kanglei; R., Cai, Ruizhi; Y., Ma, Yu...",2023,As the most common idiopathic inflammatory myo...,16
7,10.1088/1742-6596/2632/1/012027,Prior Knowledge-guided Hierarchical Action Qua...,"H., Zhou, Haoyang; T., Hou, Teng; J., Li, Jitao",2023,"Recently, there has been a growing interest in...",1
8,10.1109/TIP.2023.3331212,Fine-Grained Spatio-Temporal Parsing Network f...,"G.A., Kumie, Gedamu Alemu; Y., Ji, Yanli; Y., ...",2023,Action Quality Assessment (AQA) plays an impor...,13
9,10.1109/ACCESS.2023.3316009,AI Trainer: Autoencoder Based Approach for Squ...,"M., Chariar, Mukundan; S., Rao, Shreyas; A., I...",2023,Artificial intelligence and computer vision ha...,18
10,10.1155/2023/3649217,A Novel Model for Intelligent Pull-Ups Test Ba...,"G., Liu, Guozhong; J., Wang, Jian; Z., Zhang, ...",2023,Applying computer vision and machine learning ...,8


### Total of 31 papers identified against 119 of the original paper
### *without 'Computer Vision' filter (domain), total of 103 papers

# Web of Science

In [8]:
df_wos = pd.read_excel(wos_file)
df_wos

Unnamed: 0,Publication Type,Authors,Book Authors,Group Authors,Book Group Authors,Researcher Ids,ORCIDs,Book Editors,Author - Arabic,Grant Principal Investigator,...,Copyright,Degree Name,Institution Address,Institution,Dissertation and Thesis Subjects,Author Keywords,Indexed Date,UT (Unique ID),Pubmed Id,Unnamed: 78
0,J,"Zhang, Boyu; Chen, Jiayuan; Xu, Yinfei; Zhang,...",,,,,,,,,...,,,,,,,2024-02-24,WOS:001077141300004,,
1,J,"Ren, Yuhao; Zhang, Bochao; Chen, Jing; Guo, Li...",,,,,,,,,...,,,,,,,2022-10-19,WOS:000866697300001,,
2,J,"Nagai, Takasuke; Takeda, Shoichiro; Suzuki, Sa...",,,,,,,,,...,,,,,,,2024-07-22,WOS:001269147100001,,
3,J,"Ke, Xiao; Xu, Huangbiao; Lin, Xiaofeng; Guo, W...",,,,,,,,,...,,,,,,,2024-05-17,WOS:001218574700001,,
4,J,"Zhou, Kanglei; Cai, Ruizhi; Ma, Yue; Tan, Qing...",,,,,,,,,...,,,,,,,2023-12-02,WOS:000966597900001,37027743.0,
5,J,"Liu, Jiang; Wang, Huasheng; Stawarz, Katarzyna...",,,,,,,,,...,,,,,,,2024-11-30,WOS:001363004900001,,
6,J,"Gedamu, Kumie; Ji, Yanli; Yang, Yang; Shao, Ji...",,,,,,,,,...,,,,,,,2024-11-11,WOS:001342519900003,39374293.0,
7,J,"Zhou, Kanglei; Ma, Yue; Shum, Hubert P. H.; Li...",,,,,,,,,...,,,,,,,2024-03-02,WOS:001121618300065,,
8,J,"Li, Ming-Zhe; Zhang, Hong-Bo; Dong, Li-Jia; Le...",,,,,,,,,...,,,,,,,2022-11-09,WOS:000874388900001,,
9,J,"Lei, Qing; Li, Huiying; Zhang, Hongbo; Du, Jix...",,,,,,,,,...,,,,,,,2023-06-24,WOS:001005565000002,,


In [9]:
df_wos = df_wos[['DOI', 'Article Title', 'Authors', 'Publication Year', 'Abstract', 'Times Cited, All Databases']]

df_wos = df_wos.rename(columns={
    'Article Title': 'Title',
    'Publication Year': 'Year',
    'Times Cited, All Databases': 'Citations'
})

df_wos

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1007/s00521-023-09068-w,Auto-encoding score distribution regression fo...,"Zhang, Boyu; Chen, Jiayuan; Xu, Yinfei; Zhang,...",2023,Assessing the quality of actions in videos is ...,18
1,10.3390/electronics11193051,An Efficient Motion Registration Method Based ...,"Ren, Yuhao; Zhang, Bochao; Chen, Jing; Guo, Li...",2022,Action quality assessment (AQA) is an importan...,0
2,10.1109/ACCESS.2024.3423462,MMW-AQA: Multimodal In-the-Wild Dataset for Ac...,"Nagai, Takasuke; Takeda, Shoichiro; Suzuki, Sa...",2024,Action quality assessment (AQA) is a task for ...,0
3,10.1016/j.ins.2024.120347,Two-path target-aware contrastive regression f...,"Ke, Xiao; Xu, Huangbiao; Lin, Xiaofeng; Guo, W...",2024,Action quality assessment (AQA) is a challengi...,6
4,10.1109/TVCG.2023.3247092,A Video-Based Augmented Reality System for Hum...,"Zhou, Kanglei; Cai, Ruizhi; Ma, Yue; Tan, Qing...",2023,As the most common idiopathic inflammatory myo...,18
5,10.1016/j.eswa.2024.125642,Vision-based human action quality assessment: ...,"Liu, Jiang; Wang, Huasheng; Stawarz, Katarzyna...",2025,"Human Action Quality Assessment (AQA), which a...",2
6,10.1109/TIP.2024.3468870,Self-Supervised Sub-Action Parsing Network for...,"Gedamu, Kumie; Ji, Yanli; Yang, Yang; Shao, Ji...",2024,Semi-supervised Action Quality Assessment (AQA...,1
7,10.1109/TCSVT.2023.3281413,Hierarchical Graph Convolutional Networks for ...,"Zhou, Kanglei; Ma, Yue; Shum, Hubert P. H.; Li...",2023,Action quality assessment (AQA) automatically ...,37
8,10.1007/s40747-022-00892-6,Gaussian guided frame sequence encoder network...,"Li, Ming-Zhe; Zhang, Hong-Bo; Dong, Li-Jia; Le...",2023,Can a computer evaluate an athlete's performan...,6
9,10.1007/s10489-023-04613-5,Multi-skeleton structures graph convolutional ...,"Lei, Qing; Li, Huiying; Zhang, Hongbo; Du, Jix...",2023,In most existing action quality assessment (AQ...,7


In [10]:
df_wos = df_wos.dropna(subset=['DOI', 'Abstract'])
df_wos = df_wos.loc[df_wos['Year'] != 2025]
df_wos = df_wos.loc[df_wos['Citations'] > 0]
df_wos

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1007/s00521-023-09068-w,Auto-encoding score distribution regression fo...,"Zhang, Boyu; Chen, Jiayuan; Xu, Yinfei; Zhang,...",2023,Assessing the quality of actions in videos is ...,18
3,10.1016/j.ins.2024.120347,Two-path target-aware contrastive regression f...,"Ke, Xiao; Xu, Huangbiao; Lin, Xiaofeng; Guo, W...",2024,Action quality assessment (AQA) is a challengi...,6
4,10.1109/TVCG.2023.3247092,A Video-Based Augmented Reality System for Hum...,"Zhou, Kanglei; Cai, Ruizhi; Ma, Yue; Tan, Qing...",2023,As the most common idiopathic inflammatory myo...,18
6,10.1109/TIP.2024.3468870,Self-Supervised Sub-Action Parsing Network for...,"Gedamu, Kumie; Ji, Yanli; Yang, Yang; Shao, Ji...",2024,Semi-supervised Action Quality Assessment (AQA...,1
7,10.1109/TCSVT.2023.3281413,Hierarchical Graph Convolutional Networks for ...,"Zhou, Kanglei; Ma, Yue; Shum, Hubert P. H.; Li...",2023,Action quality assessment (AQA) automatically ...,37
8,10.1007/s40747-022-00892-6,Gaussian guided frame sequence encoder network...,"Li, Ming-Zhe; Zhang, Hong-Bo; Dong, Li-Jia; Le...",2023,Can a computer evaluate an athlete's performan...,6
9,10.1007/s10489-023-04613-5,Multi-skeleton structures graph convolutional ...,"Lei, Qing; Li, Huiying; Zhang, Hongbo; Du, Jix...",2023,In most existing action quality assessment (AQ...,7
10,10.1109/TCSVT.2022.3143549,Semi-Supervised Action Quality Assessment With...,"Zhang, Shao-Jie; Pan, Jia-Hui; Gao, Jibin; Zhe...",2022,Action Quality Assessment aims to evaluate how...,25
11,10.1109/TIP.2023.3331212,Fine-Grained Spatio-Temporal Parsing Network f...,"Gedamu, Kumie; Ji, Yanli; Yang, Yang; Shao, Ji...",2023,Action Quality Assessment (AQA) plays an impor...,13
12,10.1007/s10489-024-05349-6,Assessing action quality with semantic-sequenc...,"Huang, Feng; Li, Jianjun",2024,Action Quality Assessment (AQA) is a critical ...,4


### Total of 25 papers identified against 61 of the original paper
### *without 'Computer Vision' filter (domain), total of 45 papers

# Final Dataset

In [11]:
df_all = pd.concat([df_ieee, df_scopus, df_wos], ignore_index=True)
df_all

Unnamed: 0,DOI,Title,Authors,Year,Abstract,Citations
0,10.1109/CVPR52688.2022.00296,FineDiving: A Fine-grained Dataset for Procedu...,J. Xu; Y. Rao; X. Yu; G. Chen; J. Zhou; J. Lu,2022,Most existing action quality assessment method...,85
1,10.1109/CVPR52729.2023.00238,LOGO: A Long-Form Video Dataset for Group Acti...,S. Zhang; W. Dai; S. Wang; X. Shen; J. Lu; J. ...,2023,Action quality assessment (AQA) has become an ...,31
2,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,A. Dadashzadeh; S. Duan; A. Whone; M. Mirmehdi,2024,The limited availability of labelled data in A...,17
3,10.1109/CVPR52733.2024.01386,FineParser: A Fine-Grained Spatio-Temporal Act...,J. Xu; S. Yin; G. Zhao; Z. Wang; Y. Peng,2024,Existing action quality assessment (AQA) metho...,16
4,10.1109/INSAI54028.2021.00029,A Survey of Video-based Action Quality Assessment,S. Wang; D. Yang; P. Zhai; Q. Yu; T. Suo; Z. S...,2021,Human action recognition and analysis have gre...,16
...,...,...,...,...,...,...
68,10.3390/s19194129,A Survey of Vision-Based Human Action Evaluati...,"Lei, Qing; Du, Ji-Xiang; Zhang, Hong-Bo; Ye, S...",2019,The fields of human activity analysis have rec...,80
69,10.3390/electronics9040568,Learning Effective Skeletal Representations on...,"Lei, Qing; Zhang, Hong-Bo; Du, Ji-Xiang; Hsiao...",2020,"In this paper, we propose an integrated action...",15
70,10.1007/s11263-022-01695-5,Automatic Modelling for Interactive Action Ass...,"Gao, Jibin; Pan, Jia-Hui; Zhang, Shao-Jie; Zhe...",2023,"Action assessment, the task of visually assess...",11
71,10.1109/TMM.2023.3328180,Learning Semantics-Guided Representations for ...,"Du, Zexing; He, Di; Wang, Xue; Wang, Qing",2024,This paper explores semantic-aware representat...,9


In [12]:
df_all = df_all.drop_duplicates(subset=['DOI'])
df_all = df_all.drop(columns=['Citations'])
df_all = df_all.reset_index(drop=True)
df_all

Unnamed: 0,DOI,Title,Authors,Year,Abstract
0,10.1109/CVPR52688.2022.00296,FineDiving: A Fine-grained Dataset for Procedu...,J. Xu; Y. Rao; X. Yu; G. Chen; J. Zhou; J. Lu,2022,Most existing action quality assessment method...
1,10.1109/CVPR52729.2023.00238,LOGO: A Long-Form Video Dataset for Group Acti...,S. Zhang; W. Dai; S. Wang; X. Shen; J. Lu; J. ...,2023,Action quality assessment (AQA) has become an ...
2,10.1109/WACV57701.2024.00012,PECoP: Parameter Efficient Continual Pretraini...,A. Dadashzadeh; S. Duan; A. Whone; M. Mirmehdi,2024,The limited availability of labelled data in A...
3,10.1109/CVPR52733.2024.01386,FineParser: A Fine-Grained Spatio-Temporal Act...,J. Xu; S. Yin; G. Zhao; Z. Wang; Y. Peng,2024,Existing action quality assessment (AQA) metho...
4,10.1109/INSAI54028.2021.00029,A Survey of Video-based Action Quality Assessment,S. Wang; D. Yang; P. Zhai; Q. Yu; T. Suo; Z. S...,2021,Human action recognition and analysis have gre...
5,10.1109/WACV.2019.00161,Action Quality Assessment Across Multiple Actions,P. Parmar; B. Morris,2019,Can learning to measure the quality of an acti...
6,10.1109/ICCV48922.2021.00782,Group-aware Contrastive Regression for Action ...,X. Yu; Y. Rao; W. Zhao; J. Lu; J. Zhou,2021,Assessing action quality is challenging due to...
7,10.1109/MMSP55362.2022.9949464,Tai Chi Action Quality Assessment and Visual A...,J. Li; H. Hu; Q. Xing; X. Wang; J. Li; Y. Shen,2022,Visual-based human action analysis is an impor...
8,10.1109/CVPR42600.2020.00986,Uncertainty-Aware Score Distribution Learning ...,Y. Tang; Z. Ni; J. Zhou; D. Zhang; J. Lu; Y. W...,2020,Assessing action quality from videos has attra...
9,10.1109/CVPR.2019.00039,What and How Well You Performed? A Multitask L...,P. Parmar; B. T. Morris,2019,Can performance on the task of action quality ...


### Total of 55 papers identified against 91 of the original paper

# Abstract Evaluation with Gemini

In [13]:
load_dotenv()

MODEL= os.getenv("MODEL")
API_KEY = os.getenv("GEMINI_API_KEY")

client = genai.Client(api_key=API_KEY)

Both GOOGLE_API_KEY and GEMINI_API_KEY are set. Using GOOGLE_API_KEY.


In [None]:
class BatchReviewItem(BaseModel):
    id: int
    decision: Annotated[Literal["yes", "no", "maybe"], Field(
        description="Decision about including the paper:"
                    "- 'yes': There is clear evidence that the paper uses, shares, or discusses datasets, and it fully meets ALL inclusion criteria while NOT violating ANY exclusion criteria."
                    "- 'no': There is clear evidence that the paper should be excluded because it fails to meet all inclusion criteria or it meets at least one exclusion criterion."
                    "- 'maybe': The information is insufficient or ambiguous; checking the full text is recommended."
                    "If unsure, or it is needed to check other infos (like the full text) prefer 'maybe' over 'no'.")]
    reason: Annotated[str, Field(description="Short explanation for the decision")]
    data_source: Annotated[str, Field(
        description="Official and recognized name of the data source used in the paper if the dataset is explicitly mentioned."
                    "Guidelines:"
                    "- The dataset must be a real, identifiable resource (dataset, database, or data collection); check online if necessary."
                    "- If only a vague or generic description is provided (e.g., 'agricultural data' or 'crop dataset' or 'generated data'), leave the field empty. However, if the description seems generic but is cited as a specific dataset, include it."
                    "- Use only the proper name of the resource, without adding descriptors about the type of data source (e.g., use 'PlantVillage' instead of 'PlantVillage dataset')."
                    "- Whenever possible, format the name with recognized abbreviation, but do not force it."
                    "- Always, capitalize the first letter of the name."
                    "- If multiple data sources are mentioned, insert all of them in the same string separated by ';' (semicolon)."
                    "- If the decision is 'no', always leave it empty.")]

#"Whenever possible, format the name as 'Full Name (ABBREVIATION)'; if not feasible, use either the full name or the abbreviation alone."


class BatchReview(BaseModel):
    items: list[BatchReviewItem]

fallback = BatchReviewItem(id=-1, decision="maybe", reason="No response", data_source="")


def is_open_data_relevant_batch(df_chunk: pd.DataFrame, client, model=None, max_retries=3):
    if model is None:
        model = MODEL if MODEL else "gemini-2.5-flash-lite"

    # Prepare record strings for prompt
    records = []
    for idx, row in df_chunk.iterrows():
        title = "" if pd.isna(row.get("Title")) else str(row.get("Title"))
        abstract = "" if pd.isna(row.get("Abstract")) else str(row.get("Abstract"))
        records.append(f"- id: {int(idx)}\n  Title: {title}\n  Abstract: {abstract}")

    # Build prompt with inclusion/exclusion criteria and instructions
    prompt = (
            "Evaluate each record based on the following Inclusion and Exclusion Criteria and return ONLY JSON that matches the provided schema.\n\n"
            "Inclusion Criteria:\n"
            "(a) The abstract indicates that the study is related to computer science.\n"
            "Exclusion Criteria:\n"
            "(a) The abstract states that the article presents results of surveys or review.\n"
            "(b) The article is not related to using vision-based methods.\n"
            "(c) The article is not related to human AQA.\n\n"
            "ALL the requirements should be met.\n\n"
            "Output Instructions:\n"
            "- For each provided record, return an object with: id, decision ('yes'|'no'|'maybe'), reason, data_source.\n"
            "- If title or abstract are missing or empty use decision='no' and reason='Title or abstract is missing', data_source=''.\n\n"
            "Records to evaluate:\n"
            + "\n".join(records)
    )

    global _sent_calls_count
    global _completed_evaluations_count
    backoff = 10

    # Error management with exponential backoff
    for _ in range(max_retries):
        _rate_limit_block_until_allowed()
        try:
            _sent_calls_count += 1
            resp = client.models.generate_content(
                model=model,
                contents=prompt,
                config={
                    "response_mime_type": "application/json",
                    "response_schema": BatchReview,
                    "temperature": 0.0
                },
            )
            # Update total evaluations count
            try:
                _completed_evaluations_count += len(df_chunk)
            except Exception as e:
                print(f"[attempt {_sent_calls_count}] Gemini error: {e}")
                _completed_evaluations_count = len(df_chunk)
            print(f"[GEMINI] Completed {_completed_evaluations_count} total evaluations.")
            return resp.parsed.items
        except Exception as e:
            print(f"[attempt {_sent_calls_count}] Gemini error: {e}")
            time.sleep(backoff)
            backoff *= 2
    return []


def _rate_limit_block_until_allowed():
    # Block until the number of calls in the window is below max
    while True:
        now = time.monotonic()
        while _rate_calls_log and (now - _rate_calls_log[0]) >= _RATE_WINDOW_SEC:
            _rate_calls_log.popleft()
        if len(_rate_calls_log) < _RATE_MAX_CALLS:
            _rate_calls_log.append(now)
            return


# Rate limiter: max 2 requests every 60 seconds
_RATE_WINDOW_SEC = 60.0
_RATE_MAX_CALLS = 2
_rate_calls_log = deque()

batch_size = 20  # number of papers per batch
df_evaluated = df_all.copy()

_sent_calls_count = 0
_completed_evaluations_count = 0

# Process dataframe in batches
for start in range(0, len(df_evaluated), batch_size):
    chunk = df_evaluated.iloc[start:start + batch_size]
    items = is_open_data_relevant_batch(chunk, client, model=MODEL)
    mapping = {int(it.id): it for it in items} if items else {}

    decisions = []
    reasons = []
    sources = []
    for idx in chunk.index:
        it = mapping.get(int(idx))
        if it is None:
            decisions.append(fallback.decision)
            reasons.append(fallback.reason)
            sources.append(fallback.data_source)
        else:
            decisions.append(it.decision)
            reasons.append(it.reason)
            sources.append(it.data_source)

    # Update dataframe with results
    df_evaluated.loc[chunk.index, 'Include'] = decisions
    df_evaluated.loc[chunk.index, 'Reason'] = reasons
    df_evaluated.loc[chunk.index, 'Data Source'] = sources

df_evaluated[['DOI', 'Include', 'Reason', 'Data Source']]

[attempt 1] Gemini error: 503 UNAVAILABLE. {'error': {'code': 503, 'message': 'The model is overloaded. Please try again later.', 'status': 'UNAVAILABLE'}}
[attempt 2] Gemini error: 503 UNAVAILABLE. {'error': {'code': 503, 'message': 'The model is overloaded. Please try again later.', 'status': 'UNAVAILABLE'}}
[attempt 3] Gemini error: 503 UNAVAILABLE. {'error': {'code': 503, 'message': 'The model is overloaded. Please try again later.', 'status': 'UNAVAILABLE'}}


In [17]:
df_evaluated.to_csv("data/validation/df_evaluated_validation.csv", index=False)