Midscene Python

Midscene Python is an AI-based automation framework that supports UI automation operations on Web and Android platforms.

Overview

Midscene Python provides comprehensive UI automation capabilities with the following core features:

Natural Language Driven: Describe automation tasks using natural language
Multi-platform Support: Supports Web (Selenium/Playwright) and Android (ADB)
AI Model Integration: Supports multiple vision-language models such as GPT-4V, Qwen2.5-VL, and Gemini
Visual Debugging: Provides detailed execution reports and debugging information
Caching Mechanism: Intelligent caching to improve execution efficiency

Project Architecture

midscene-python/
├── midscene/                    # Core framework
│   ├── core/                    # Core framework
│   │   ├── agent/              # Agent system
│   │   ├── insight/            # AI inference engine
│   │   ├── ai_model/           # AI model integration
│   │   ├── yaml/               # YAML script executor
│   │   └── types.py            # Core type definitions
│   ├── web/                     # Web integration
│   │   ├── selenium/           # Selenium integration
│   │   ├── playwright/         # Playwright integration
│   │   └── bridge/             # Bridge mode
│   ├── android/                 # Android integration
│   │   ├── device.py           # Device management
│   │   └── agent.py            # Android Agent
│   ├── cli/                     # Command line tools
│   ├── mcp/                     # MCP protocol support
│   ├── shared/                 # Shared utilities
│   └── visualizer/             # Visual reports
├── examples/                   # Example code
├── tests/                      # Test cases
└── docs/                       # Documentation

Tech Stack

Python 3.9+: Core runtime environment
Pydantic: Data validation and serialization
Selenium/Playwright: Web automation
OpenCV/Pillow: Image processing
HTTPX/AIOHTTP: HTTP client
Typer: CLI framework
Loguru: Logging

Quick Start

Installation

pip install midscene-python

Basic Usage

from midscene import Agent
from midscene.web import SeleniumWebPage

# Create a Web Agent
with SeleniumWebPage.create() as page:
    agent = Agent(page)
    
    # Perform automation operations using natural language
    await agent.ai_action("Click the login button")
    await agent.ai_action("Enter username 'test@example.com'")
    await agent.ai_action("Enter password 'password123'")
    await agent.ai_action("Click the submit button")
    
    # Data extraction
    user_info = await agent.ai_extract("Extract user personal information")
    
    # Assertion verification
    await agent.ai_assert("Page displays welcome message")

Key Features

🤖 AI-Driven Automation

Describe operations using natural language, and AI automatically understands and executes:

await agent.ai_action("Enter 'Python tutorial' in the search box and search")

🔍 Intelligent Element Location

Supports multiple location strategies and automatically selects the optimal solution:

element = await agent.ai_locate("Login button")

📊 Data Extraction

Extract structured data from the page:

products = await agent.ai_extract({
    "products": [
        {"name": "Product Name", "price": "Price", "rating": "Rating"}
    ]
})

✅ Intelligent Assertions

AI understands page state and performs intelligent assertions:

await agent.ai_assert("User has successfully logged in")

📝 Credits

Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
midscene		midscene
scripts		scripts
tests		tests
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh.md		README.zh.md
midscene.yml		midscene.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Midscene Python

Overview

Project Architecture

Tech Stack

Quick Start

Installation

Basic Usage

Key Features

🤖 AI-Driven Automation

🔍 Intelligent Element Location

📊 Data Extraction

✅ Intelligent Assertions

📝 Credits

License

About

Uh oh!

Releases 2

Packages

Languages

License

Python51888/Midscene-Python

Folders and files

Latest commit

History

Repository files navigation

Midscene Python

Overview

Project Architecture

Tech Stack

Quick Start

Installation

Basic Usage

Key Features

🤖 AI-Driven Automation

🔍 Intelligent Element Location

📊 Data Extraction

✅ Intelligent Assertions

📝 Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages