A zero-code text analysis toolkit for social science researchers
English | 中文文档
SocialSciKit is an open-source Python toolkit that enables social science researchers to perform text analysis without writing a single line of code. It provides a Gradio-based web interface with full bilingual support (English / Chinese).
Three core modules:
- QuantiKit — End-to-end text classification pipeline (method recommendation → annotation → prompt/fine-tuning classification → evaluation → export)
- QualiKit — End-to-end qualitative coding pipeline (upload → de-identification → research framework → LLM coding with evidence grounding → human review → export)
- Toolbox — Standalone research methods tools: Inter-Coder Reliability (ICR) calculator, Multi-LLM Consensus Coding, and Methods Section Generator
- Visualization Dashboard — academic-style matplotlib charts (confusion matrix heatmaps, per-class P/R/F1 bars, confidence histograms, progress donuts, theme distribution) embedded throughout both pipelines
- Evidence Highlighting — LLM codings include a verbatim
evidence_spanfrom the source text; the review UI highlights the supporting quote inline in the original document - Project Save & Restore — serialize the entire research project state (data, annotations, sessions, coding results) to a single JSON file; resume work later from the Home tab
- Zero-code web UI — Gradio 4.44+ with full EN/ZH language switching at runtime
- Installation
- Quick Start
- QuantiKit: Text Classification
- QualiKit: Qualitative Coding
- Toolbox: Research Methods Tools
- Project Save & Restore
- Supported LLM Backends
- Example Datasets
- Project Structure
- Key References
- Citation
- Development
- License & Disclaimer
- Author
- Python 3.9 or higher
- pip (Python package manager)
pip install socialscikitgit clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
pip install -e .| Package | Version | Purpose |
|---|---|---|
gradio |
≥ 4.0 | Web UI framework |
pandas |
≥ 2.0 | Data manipulation |
openpyxl |
any | Excel read/write |
spacy |
≥ 3.7 | NLP pipeline (tokenization, NER) |
transformers |
≥ 4.40 | Fine-tuning (RoBERTa / XLM-R) |
datasets |
any | HuggingFace dataset handling |
openai |
≥ 1.0 | OpenAI API client |
anthropic |
≥ 0.25 | Anthropic API client |
scikit-learn |
any | Evaluation metrics |
scipy |
any | Statistical computation |
bertopic |
any | Topic modeling |
presidio-analyzer |
any | PII detection engine |
presidio-anonymizer |
any | PII anonymization |
langdetect |
any | Language detection |
tiktoken |
any | Token counting |
httpx |
any | Ollama HTTP client |
rich |
any | CLI formatting |
For best de-identification performance, download at least one spaCy model:
# English
python -m spacy download en_core_web_sm
# Chinese
python -m spacy download zh_core_web_smsocialscikit launch
# or simply:
socialscikit
# Opens at http://127.0.0.1:7860# QuantiKit only
socialscikit quantikit --port 7860
# QualiKit only
socialscikit qualikit --port 7861| Flag | Description | Default |
|---|---|---|
--port |
Server port number | 7860 / 7861 |
--share |
Create a public Gradio link | False |
The default UI language is English. Use the Language toggle at the top of the page to switch to Chinese. All labels, buttons, and instructions update in real time.
QuantiKit guides you through the full text classification workflow in 6 steps.
- Supported formats: CSV, Excel (.xlsx/.xls), JSON, JSONL
- Upload your data file, then map the
textandlabelcolumns - Automatic data validation: detects missing values, empty strings, encoding issues
- One-click fix: auto-repair common data quality issues
- Diagnostic report: label distribution, text length statistics, duplicate detection
- Method recommender: analyzes your data characteristics (size, class count, imbalance ratio, text length) and recommends the optimal classification approach — zero-shot, few-shot, or fine-tuning — with literature citations
- Budget recommender: estimates "how many labels do you need?" using power-law learning curve fitting, with 80% confidence intervals and marginal return curves
- Cold-start mode: priors from CSS benchmark datasets (HatEval, SemEval, MFTC)
- Empirical mode: fits
f1 = a * n^b + con your labeled subset
- Built-in annotation UI — no need for external tools
- Label each text sample, with skip, undo, flag for review support
- Real-time progress donut chart — visual progress tracker updates after every action
- Export annotated data as CSV, merge with original dataset
Three sub-approaches available in parallel tabs:
| Sub-tab | Method | When to use |
|---|---|---|
| Prompt Classification | Zero/few-shot via LLM API | Small datasets (< 200 labeled) |
| Fine-tuning | Local transformer fine-tuning | Medium datasets (200+), no API cost |
| API Fine-tuning | OpenAI fine-tuning API | Large datasets, best performance |
Prompt Classification features:
- Prompt Designer: task description + class definitions + positive/negative examples → auto-generates a structured prompt
- Prompt Optimizer: generates 3 variants using APE (Automatic Prompt Engineering), evaluates each on a test split
- One-click batch classification on the full dataset
Full visualization dashboard:
- Metric summary cards (HTML) — Accuracy, Macro-F1, Weighted-F1, Cohen's Kappa, total/correct counts
- Confusion matrix heatmap — row-normalized, annotated with counts + percentages
- Per-class metrics bar chart — Precision / Recall / F1 grouped bars per class
- Collapsible full text report below charts
- Download classification results as CSV (original text + predicted labels + confidence)
- Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
- Save project — persist all research state (data, predictions, annotation session) to a single JSON file
QualiKit supports the full qualitative coding workflow for interview transcripts, focus group data, and open-ended survey responses.
- Supported formats: plain text (.txt)
- Automatic speaker detection and segmentation (by paragraph or by speaker turn)
- Configurable context window (number of surrounding sentences to include)
- Preview segmented results in a table before proceeding
- Automatic PII detection: person names, email addresses, phone numbers, Chinese ID card numbers
- Chinese-aware NER: detects Chinese names with title/honorific patterns
- English NER via spaCy and Presidio
- Replacement strategies: pseudonym, redact (
[REDACTED]), or tag-based ([PERSON_1]) - Per-item review: accept, reject, or edit each detected PII replacement individually
- Bulk actions: accept all, accept high-confidence only (≥ 0.90), or apply all accepted to the text
- Define your Research Questions (RQs) and Sub-themes using an interactive editable table
- Add/remove rows dynamically
- LLM-powered sub-theme suggestion: connect to an LLM backend, and it analyzes your transcript to suggest relevant sub-themes per RQ
- Confirm framework before proceeding to coding
- Batch coding: LLM reads each segment and assigns RQ + sub-theme labels with confidence scores
- Evidence grounding: the LLM prompt requires a verbatim
evidence_span— the exact phrase or sentence from the source text that supports the coding decision - Supports OpenAI, Anthropic, and Ollama backends
- Results displayed with segment text, assigned codes, confidence levels, and evidence spans
- Review coding results in a table sorted by confidence
- Evidence highlighting: when you select an item, the original text is shown with the LLM's
evidence_spanhighlighted in green, so you can verify the coding decision at a glance; if the exact quote isn't found, a fallback "Evidence" block displays the cited text - Visualization dashboard (collapsible accordion):
- Review progress donut — accepted / edited / rejected / pending counts
- Confidence histogram — low/medium/high tier shading + median marker
- Theme distribution — horizontal bar chart of RQ frequencies
- Per-item actions: accept, reject, or edit (reassign RQ/sub-theme)
- Bulk accept by confidence threshold
- Manual coding: select a segment, preview its content, and manually assign RQ + sub-theme labels
- Cascading dropdown: sub-theme choices automatically filter based on selected RQ
- Export reviewed coding results as structured Excel file
- Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
- Save project — persist the entire coding session (segments, RQs, review state, evidence spans) to a single JSON file
The Toolbox provides standalone research utilities that work independently or in combination with QuantiKit / QualiKit.
Compute inter-coder reliability for 2 or more coders with automatic metric selection:
| Scenario | Metric |
|---|---|
| 2 coders, single-label | Cohen's Kappa + Krippendorff's Alpha + per-category agreement |
| 3+ coders, single-label | Krippendorff's Alpha + pairwise Cohen's Kappa |
| 2 coders, multi-label | Jaccard index (pairwise) |
| 3+ coders, multi-label | Average pairwise Jaccard |
- Upload a CSV with coder columns, select which columns to compare
- Interpretation follows the Landis & Koch (1977) scale
Multi-LLM majority-vote coding for qualitative data:
- Configure 2–5 LLM backends (OpenAI, Anthropic, Ollama) with independent models
- Each LLM codes every text segment; final label is determined by majority vote
- Agreement statistics across LLMs are reported automatically
Auto-generate a methods section paragraph (English + Chinese) for your paper:
- From pipeline log: QuantiKit and QualiKit can export a pipeline log (JSON) capturing all metadata (sample size, model, metrics, themes, etc.). Import the log and generate a ready-to-use methods paragraph.
- Manual input: Fill in metadata fields manually if you prefer not to use the pipeline log.
Long research projects rarely finish in one session. SocialSciKit serialises the full state of your work — loaded DataFrames, annotation sessions (including cursor and history), extraction review sessions, research questions, de-identification results — into a single JSON file:
- Save: at the end of any pipeline, click "Save Project" in Step 6 to download a
.jsonarchive - Restore: return to the Home tab, expand "Load Saved Project", upload the JSON file, and all state is restored across both pipelines
- Tagged-union serialisation: complex types (
pd.DataFrame,AnnotationSession,ExtractionReviewSession,ResearchQuestion,ExtractionResult, enums) round-trip losslessly; elapsed-time counters are preserved via monotonic time offsets - Version-aware: project files include a
__project_version__field so future readers can migrate old archives
| Backend | Example Models | Use Case |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
Classification, coding, prompt optimization |
| Anthropic | claude-sonnet-4-20250514, claude-haiku-4-5-20251001 |
Classification, coding, prompt optimization |
| Ollama | llama3, mistral, qwen2.5 |
Local inference, no API key needed |
To use Ollama, install it from ollama.com and pull a model:
ollama pull llama3The examples/ directory contains ready-to-use sample data:
| File | Module | Description |
|---|---|---|
sentiment_example.csv |
QuantiKit | 50 Chinese product/service reviews with 3 sentiment labels |
policy_example.csv |
QuantiKit | 40 Chinese policy text excerpts with 8 policy-instrument labels |
interview_example.txt |
QualiKit | Single-person community healthcare interview transcript |
interview_focus_group.txt |
QualiKit | 4-person focus group on elderly digital service experiences |
icr_example.csv |
Toolbox | 20 policy texts coded by 3 coders (A/B/C) for ICR calculation |
consensus_example.csv |
Toolbox | 15 interview segments for multi-LLM consensus coding |
methods_log_quantikit.json |
Toolbox | Sample QuantiKit pipeline log for methods generation |
methods_log_qualikit.json |
Toolbox | Sample QualiKit pipeline log for methods generation |
- Launch:
socialscikit launch→ click QuantiKit tab - Upload
examples/sentiment_example.csv - Map columns: text →
text, label →label - Go to Step 2 → click Recommend to see method suggestion
- Go to Step 4 → select LLM backend → enter labels
- Click Generate Prompt → Run Classification
- Go to Step 5 → evaluate against gold labels
- Go to Step 6 → export results
- Launch:
socialscikit launch→ click QualiKit tab - Upload
examples/interview_focus_group.txt - Step 1: select "Speaker turn" segmentation → click Segment
- Step 2: run de-identification → review and accept/reject each PII replacement
- Step 3: define RQs and sub-themes → optionally use LLM to suggest sub-themes
- Step 4: select LLM backend → run batch coding
- Step 5: review results, bulk accept high-confidence codes, manually fix low-confidence ones
- Step 6: export to Excel
socialscikit/
├── core/ # Shared infrastructure
│ ├── data_loader.py # Multi-format data reader (CSV/Excel/JSON/txt)
│ ├── data_validator.py # Schema validation + auto-fix
│ ├── data_diagnostics.py # Data quality diagnostics report
│ ├── llm_client.py # Unified LLM client (OpenAI/Anthropic/Ollama)
│ ├── icr.py # Inter-coder reliability (Kappa/Alpha/Jaccard)
│ ├── methods_writer.py # Methods section generator (EN/ZH templates)
│ ├── charts.py # Academic-style matplotlib charts (viz dashboard)
│ ├── project_io.py # Project state serialization (save/restore)
│ └── templates/ # Template files for download
│
├── quantikit/ # Text classification module
│ ├── feature_extractor.py # Dataset feature extraction
│ ├── method_recommender.py # Rule-based method recommendation (with citations)
│ ├── budget_recommender.py # Annotation budget estimation
│ ├── prompt_optimizer.py # APE-based prompt generation & optimization
│ ├── prompt_classifier.py # Zero/few-shot LLM classification
│ ├── annotator.py # Built-in annotation interface
│ ├── classifier.py # Transformer fine-tuning pipeline
│ ├── api_finetuner.py # OpenAI fine-tuning API wrapper
│ └── evaluator.py # Accuracy / F1 / Kappa / confusion matrix
│
├── qualikit/ # Qualitative coding module
│ ├── segmenter.py # Text segmentation (paragraph / speaker turn)
│ ├── segment_extractor.py # Segment-level extraction
│ ├── deidentifier.py # PII detection (Chinese + English)
│ ├── deident_reviewer.py # De-identification interactive review
│ ├── theme_definer.py # Theme definition + LLM suggestion
│ ├── theme_reviewer.py # Theme review & overlap detection
│ ├── coder.py # LLM batch coding
│ ├── confidence_ranker.py # Confidence scoring & ranking
│ ├── coding_reviewer.py # Human-in-the-loop coding review
│ ├── extraction_reviewer.py # Extraction result review
│ ├── consensus.py # Multi-LLM consensus coding (majority vote)
│ └── exporter.py # Excel / Markdown export
│
├── ui/ # Gradio web interface
│ ├── main_app.py # Unified app (Home + QuantiKit + QualiKit + Toolbox)
│ ├── quantikit_app.py # QuantiKit UI callbacks
│ ├── qualikit_app.py # QualiKit UI callbacks
│ ├── toolbox_app.py # Toolbox UI callbacks (ICR/Consensus/Methods)
│ └── i18n.py # Internationalization (EN / ZH)
│
├── cli.py # Command-line entry point
│
examples/ # Sample datasets
tests/ # Test suite (676 tests)
promo/ # Promotional posters + HTML sources
pyproject.toml # Package metadata & dependencies
CITATION.cff # Citation metadata
The method recommendation engine and workflow design are grounded in the following computational social science literature:
- Sun, B., Chang, C., Ang, Y. Y., Mu, R., Xu, Y. & Zhang, Z. (2026). Creation of the Chinese Adaptive Policy Communication Corpus. ACL 2026.
- Carlson, K. et al. (2026). The use of LLMs to annotate data in management research. Strategic Management Journal.
- Chae, Y. & Davidson, T. (2025). Large Language Models for text classification. Sociological Methods & Research.
- Do, S., Ollion, E. & Shen, R. (2024). The augmented social scientist. Sociological Methods & Research, 53(3).
- Dunivin, Z. O. (2024). Scalable qualitative coding with LLMs. arXiv:2401.15170.
- Montgomery, J. M. et al. (2024). Improving probabilistic models in text classification via active learning. APSR.
- Than, N. et al. (2025). Updating 'The Future of Coding'. Sociological Methods & Research.
- Ziems, C. et al. (2024). Can LLMs transform computational social science? Computational Linguistics, 50(1).
- Zhou, Y. et al. (2022). Large Language Models are human-level prompt engineers. ICLR 2023.
If you use SocialSciKit in your research, please cite:
@inproceedings{sun2026creation,
title = {Creation of the {Chinese} Adaptive Policy Communication Corpus},
author = {Sun, Bolun and Chang, Charles and Ang, Yuen Yuen and Mu, Ruotong and Xu, Yuchen and Zhang, Zhengxin},
booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
year = {2026}
}# Clone the repository
git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run the full test suite
pytest tests/ -v
# Code style check
ruff check .python -c "from socialscikit.ui.main_app import create_app; create_app().launch()"License: MIT
Disclaimer:
- De-identification module: Automatic PII detection is a preliminary processing tool. Manual review is mandatory before IRB submission. This tool does not guarantee complete removal of all identifying information.
- LLM classification / coding: Results should be treated as research assistance. Critical research conclusions require human validation.
- Budget recommendation: Based on statistical estimation. Actual requirements may vary depending on task complexity and data characteristics.
Bolun Sun (孙伯伦)
Ph.D. Student, Kellogg School of Management, Northwestern University
Research interests: Computational Social Science, NLP, Human-Centered AI
Email: bolun.sun@kellogg.northwestern.edu | Web: baron-sun.github.io
This project is actively maintained and updated. Contributions, suggestions, and feedback are very welcome! Feel free to open an issue or submit a pull request.