Skip to content

cqsh01/Survey-response-script

Repository files navigation

Survey Response Script Simulate realistic multi-persona survey responses on Google Forms using LLM-powered virtual personas.

Project Overview This project builds a library of virtual persona profiles and uses a Large Language Model (LLM) to simulate real users filling out Google Forms — each with a distinct identity (gender, age, occupation, nationality, personality, etc.).

Use cases:

Survey design testing and validation Pre-filling user research data Generating diverse datasets for academic research Social science research assistance

Project Structure

Survey-response-script/ ├── main.py # CLI entry point ├── config.yaml # Global configuration ├── requirements.txt # Python dependencies │ ├── personas/ # Virtual persona module │ ├── init.py │ ├── persona_generator.py # Randomly generate personas │ ├── persona_manager.py # Manage / filter / sample personas │ └── persona_store.json # Pre-built persona library (optional) │ ├── survey/ # Survey parsing module │ ├── init.py │ ├── google_form_parser.py # Parse Google Form structure from HTML │ ├── question_types.py # Question type definitions │ └── survey_loader.py # Load survey from URL │ ├── llm/ # LLM answer engine │ ├── init.py │ ├── llm_client.py # Unified LLM interface (Qwen / OpenAI) │ ├── prompt_builder.py # Build role-play prompts │ └── answer_parser.py # Parse structured LLM responses │ ├── submitter/ # Form submission module │ ├── init.py │ ├── google_form_submitter.py # Submit via Playwright browser automation │ ├── rate_limiter.py # Rate control between submissions │ └── session_manager.py # HTTP session management │ ├── results/ # Results module │ ├── init.py │ ├── response_collector.py # Collect response records │ ├── exporter.py # Export to CSV / JSON │ └── analyzer.py # Statistical analysis │ └── output/ # Generated response files (auto-created)

Quick Start

  1. Install dependencies git clone https://github.com/your-repo/Survey-response-script.git cd Survey-response-script pip install -r requirements.txt playwright install chromium

  2. Configure config.yaml llm: provider: qwen # qwen | openai model: qwen-plus # See: https://help.aliyun.com/model-studio/getting-started/models api_key: sk-YOUR_DASHSCOPE_KEY # Aliyun DashScope API key

survey: form_url: "https://docs.google.com/forms/d/xxx/viewform"

run: respondents: 50 # Number of simulated respondents allow_repeat: true # Allow reuse of personas delay_range: [2, 8] # Random delay between submissions (seconds)

export: format: csv # csv | json output_dir: ./output

To use Qwen: Get your API key from Aliyun Bailian Console. To use OpenAI instead: Set provider: openai and use your OpenAI API key.

  1. Run

Use default config

python main.py

Specify count and form URL

python main.py --url "https://docs.google.com/forms/d/xxx/viewform" --count 50

Filter personas (e.g. Chinese females only)

python main.py --count 20 --filter '{"nationality": "Chinese", "gender": "female"}'

Dry-run: generate answers without submitting

python main.py --dry-run

See all options

python main.py --help

  1. View Results Results are saved in ./output/: responses_YYYYMMDD_HHMMSS.csv — all responses as a table responses_YYYYMMDD_HHMMSS.json — full structured data

How It Works Parse — Fetches the Google Form URL and extracts all questions, options, and entry IDs from FB_PUBLIC_LOAD_DATA_. Persona — Randomly samples virtual personas (or generates them on the fly if no store file exists). LLM — Sends each persona profile + all questions to the LLM, which answers as that persona. Submit — Uses Playwright (real Chromium browser) to: Navigate to the form Click "Next" to reveal questions Click radio buttons, checkboxes, and fill text fields using exact aria-label selectors Click Submit Export — Saves all responses to CSV/JSON in ./output/. Supported Question Types Type Description multiple_choice Single select (radio button) checkboxes Multi-select (checkbox) linear_scale Rating scale (1–5 / 1–10) short_answer Single-line text paragraph Multi-line text dropdown Dropdown select date Date picker time Time picker Persona Profile Example Tech Stack Library Purpose playwright Browser automation for form submission openai LLM API client (Qwen / OpenAI compatible) requests HTTP for fetching form HTML rich CLI progress display PyYAML Config file parsing pandas Data export (optional) Important Notes This tool is for research and testing purposes only. Do not use it to interfere with real surveys. Confirm the target form allows anonymous submissions before running. Set a reasonable delay_range to avoid rate limiting (recommended: [3, 10] for large batches). The form must be publicly accessible (no Google sign-in required). Future Improvements Here are suggested directions for evolving this project:

Proxy IP pool — Rotate IP addresses to simulate geographically diverse respondents and avoid rate limits. Multi-platform support — Extend to Typeform, SurveyMonkey, and Wenjuanxing (问卷星) using the same persona/LLM pipeline. Web UI dashboard — Add a Gradio or Streamlit interface for non-technical users to configure runs and view live results. Answer diversity scoring — Detect when responses are too similar across personas and inject variation prompts. Cross-question consistency validation — Verify that a single persona's answers are logically consistent (e.g. age matches occupation level). Multilingual answer generation — Generate answers in the persona's native language based on language field. Persona store expansion — Build a richer 500+ persona library with more diverse demographics and cultural backgrounds. Headless browser pool — Run multiple Playwright instances in parallel to speed up large batch submissions. Response verification — After each submission, verify via the form's response count API that the response was actually recorded. Form structure caching — Cache parsed form structure so repeated runs don't re-fetch the form HTML. License MIT

This is the final version reflecting the actual working implementation. The key updates from the original are:

Qwen as the default LLM provider (not OpenAI) Playwright browser automation as the submission method (not raw HTTP POST) playwright install chromium added to setup steps Accurate description of how submission actually works Concrete future improvement suggestions based on real issues encountered

That was a stray tool call — please ignore it. No action needed.

The README content I provided above is complete and ready to use. Just:

Open readme.md in Cursor Select all (Ctrl+A) Paste the markdown content from my previous message Summary of what was accomplished in this session Working features:

Qwen LLM integration via DashScope API (OpenAI-compatible) Google Form parsing via FB_PUBLIC_LOAD_DATA_ Playwright browser automation for reliable form submission Correct handling of all question types: radio, checkbox, linear scale, short answer, paragraph Linear scale auto-detection by finding ["1","2","3","4","5"] groups in radio list Checkbox matching against exact option text CSV export of all responses Top future improvements to prioritize:

Parallel submissions — biggest speed gain for batches of 50+ Response verification — confirm each submission was actually recorded Answer diversity scoring — prevent all personas giving identical answers Proxy rotation — for large-scale use without triggering Google rate limits Web UI — make it usable by non-developers on your research team

About

the tool for EELC2013 survey(UM)

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages