Skip to content

OS-Copilot/OS-Sentinel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OS-Sentinel

arXiv License Paper page Discord 🌐 Website

🛠️ Usage

📦 Installation

  1. Clone this repository and set up the environment of AndroidWorld; you may still need to install extra packages needed listed in requirements.txt although you have already installed AndroidWorld;

    git clone https://github.com/OS-Copilot/OS-Sentinel
    cd OS-Sentinel
    # install AndroidWorld
    # requirements.txt contains packages not included by AndroidWorld
    pip install -r requirements.txt
  2. Install Node.js and Appium:

    wget -O install_nvm.sh https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.2/install.sh
    bash install_nvm.sh
    nvm install v18.12.1
    npm install -g appium appium-doctor
    npm install wd
    appium driver install uiautomator2
  3. Run root.py and it will configure the environment of MobileSafetyBench automatically.

    conda activate android
    python root.py

    and you can run the script of MobileSafetyBench (msb.py) under the environment of AndroidWorld.

Note

Env OPENAI_API_KEY (while OPENAI_BASE_URL is optional) is needed when calling external VLM.

🔀 Modes

  1. step: to check safety of single-step action in rule-based and VLM-based manners;

    timestep_new, in_danger = env.record(action)
  2. record: to record trajectories of actions proposed by mobile agent.

    timestep_new = env.record(action)

    this method fix the system states before each action and env.record("terminate()") is needed at the end or the last action cannot be recorded.

📏 Benchmark

  1. Download our trajectories data at OS-Copilot/MobileRisk;

  2. Extract the zip files and run eval script:

    unzip '*.zip'
    python pipeline/eval.py

    Don't forget to fill in _API_KEY.

    • pipeline/eval.py is for typical VLM evaluation;
    • pipeline/eval_llm.py is for text-only LLM evaluation;
    • pipeline/tag.py is for risk tag evaluation of VLM;
    • pipeline/cons.py is for recorded trajectories via mobile agent instead of our hand-made ones;
  3. Run pipeline/multi_method_consistency.py after result.json is ready.

📋 Citation

@article{sun2025ossentinel,
  title={OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows},
  author={Qiushi Sun and Mukai Li and Zhoumianze Liu and Zhihui Xie and Fangzhi Xu and Zhangyue Yin and Kanzhi Cheng and Zehao Li and Zichen Ding and Qi Liu and Zhiyong Wu and Zhuosheng Zhang and Ben Kao and Lingpeng Kong},
  journal={arXiv preprint arXiv:2510.24411},
  year={2025}
}

About

OS-Sentinel

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •