SocialSciKit

A zero-code text analysis toolkit for social science researchers

What is SocialSciKit?

SocialSciKit is an open-source Python toolkit that enables social science researchers to perform text analysis without writing a single line of code. It provides a Gradio-based web interface with full bilingual support (English / Chinese).

Three core modules:

QuantiKit — End-to-end text classification pipeline (method recommendation → annotation → prompt/fine-tuning classification → evaluation → export)
QualiKit — End-to-end qualitative coding pipeline (upload → de-identification → research framework → LLM coding with evidence grounding → human review → export)
Toolbox — Standalone research methods tools: Inter-Coder Reliability (ICR) calculator, Multi-LLM Consensus Coding, and Methods Section Generator

Highlights

Visualization Dashboard — academic-style matplotlib charts (confusion matrix heatmaps, per-class P/R/F1 bars, confidence histograms, progress donuts, theme distribution) embedded throughout both pipelines
Evidence Highlighting — LLM codings include a verbatim evidence_span from the source text; the review UI highlights the supporting quote inline in the original document
Project Save & Restore — serialize the entire research project state (data, annotations, sessions, coding results) to a single JSON file; resume work later from the Home tab
Zero-code web UI — Gradio 4.44+ with full EN/ZH language switching at runtime

Installation

Requirements

Python 3.9 or higher
pip (Python package manager)

Option A: Install from PyPI

pip install socialscikit

Option B: Install from source

git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
pip install -e .

Core Dependencies

Package	Version	Purpose
`gradio`	≥ 4.0	Web UI framework
`pandas`	≥ 2.0	Data manipulation
`openpyxl`	any	Excel read/write
`spacy`	≥ 3.7	NLP pipeline (tokenization, NER)
`transformers`	≥ 4.40	Fine-tuning (RoBERTa / XLM-R)
`datasets`	any	HuggingFace dataset handling
`openai`	≥ 1.0	OpenAI API client
`anthropic`	≥ 0.25	Anthropic API client
`scikit-learn`	any	Evaluation metrics
`scipy`	any	Statistical computation
`bertopic`	any	Topic modeling
`presidio-analyzer`	any	PII detection engine
`presidio-anonymizer`	any	PII anonymization
`langdetect`	any	Language detection
`tiktoken`	any	Token counting
`httpx`	any	Ollama HTTP client
`rich`	any	CLI formatting

Optional: spaCy language models

For best de-identification performance, download at least one spaCy model:

# English
python -m spacy download en_core_web_sm

# Chinese
python -m spacy download zh_core_web_sm

Quick Start

Launch the unified app (recommended)

socialscikit launch
# or simply:
socialscikit
# Opens at http://127.0.0.1:7860

Launch individual modules

# QuantiKit only
socialscikit quantikit --port 7860

# QualiKit only
socialscikit qualikit --port 7861

CLI Options

Flag	Description	Default
`--port`	Server port number	7860 / 7861
`--share`	Create a public Gradio link	`False`

First-time language switch

The default UI language is English. Use the Language toggle at the top of the page to switch to Chinese. All labels, buttons, and instructions update in real time.

QuantiKit: Text Classification

QuantiKit guides you through the full text classification workflow in 6 steps.

Step 1 · Data Upload

Supported formats: CSV, Excel (.xlsx/.xls), JSON, JSONL
Upload your data file, then map the text and label columns
Automatic data validation: detects missing values, empty strings, encoding issues
One-click fix: auto-repair common data quality issues
Diagnostic report: label distribution, text length statistics, duplicate detection

Step 2 · Recommendation

Method recommender: analyzes your data characteristics (size, class count, imbalance ratio, text length) and recommends the optimal classification approach — zero-shot, few-shot, or fine-tuning — with literature citations
Budget recommender: estimates "how many labels do you need?" using power-law learning curve fitting, with 80% confidence intervals and marginal return curves
- Cold-start mode: priors from CSS benchmark datasets (HatEval, SemEval, MFTC)
- Empirical mode: fits f1 = a * n^b + c on your labeled subset

Step 3 · Annotation

Built-in annotation UI — no need for external tools
Label each text sample, with skip, undo, flag for review support
Real-time progress donut chart — visual progress tracker updates after every action
Export annotated data as CSV, merge with original dataset

Step 4 · Classification

Three sub-approaches available in parallel tabs:

Sub-tab	Method	When to use
Prompt Classification	Zero/few-shot via LLM API	Small datasets (< 200 labeled)
Fine-tuning	Local transformer fine-tuning	Medium datasets (200+), no API cost
API Fine-tuning	OpenAI fine-tuning API	Large datasets, best performance

Prompt Classification features:

Prompt Designer: task description + class definitions + positive/negative examples → auto-generates a structured prompt
Prompt Optimizer: generates 3 variants using APE (Automatic Prompt Engineering), evaluates each on a test split
One-click batch classification on the full dataset

Step 5 · Evaluation

Full visualization dashboard:

Metric summary cards (HTML) — Accuracy, Macro-F1, Weighted-F1, Cohen's Kappa, total/correct counts
Confusion matrix heatmap — row-normalized, annotated with counts + percentages
Per-class metrics bar chart — Precision / Recall / F1 grouped bars per class
Collapsible full text report below charts

Step 6 · Export

Download classification results as CSV (original text + predicted labels + confidence)
Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
Save project — persist all research state (data, predictions, annotation session) to a single JSON file

QualiKit: Qualitative Coding

QualiKit supports the full qualitative coding workflow for interview transcripts, focus group data, and open-ended survey responses.

Step 1 · Upload & Segment

Supported formats: plain text (.txt)
Automatic speaker detection and segmentation (by paragraph or by speaker turn)
Configurable context window (number of surrounding sentences to include)
Preview segmented results in a table before proceeding

Step 2 · De-identification

Automatic PII detection: person names, email addresses, phone numbers, Chinese ID card numbers
Chinese-aware NER: detects Chinese names with title/honorific patterns
English NER via spaCy and Presidio
Replacement strategies: pseudonym, redact ([REDACTED]), or tag-based ([PERSON_1])
Per-item review: accept, reject, or edit each detected PII replacement individually
Bulk actions: accept all, accept high-confidence only (≥ 0.90), or apply all accepted to the text

Step 3 · Research Framework

Define your Research Questions (RQs) and Sub-themes using an interactive editable table
Add/remove rows dynamically
LLM-powered sub-theme suggestion: connect to an LLM backend, and it analyzes your transcript to suggest relevant sub-themes per RQ
Confirm framework before proceeding to coding

Step 4 · LLM Coding

Batch coding: LLM reads each segment and assigns RQ + sub-theme labels with confidence scores
Evidence grounding: the LLM prompt requires a verbatim evidence_span — the exact phrase or sentence from the source text that supports the coding decision
Supports OpenAI, Anthropic, and Ollama backends
Results displayed with segment text, assigned codes, confidence levels, and evidence spans

Step 5 · Review

Review coding results in a table sorted by confidence
Evidence highlighting: when you select an item, the original text is shown with the LLM's evidence_span highlighted in green, so you can verify the coding decision at a glance; if the exact quote isn't found, a fallback "Evidence" block displays the cited text
Visualization dashboard (collapsible accordion):
- Review progress donut — accepted / edited / rejected / pending counts
- Confidence histogram — low/medium/high tier shading + median marker
- Theme distribution — horizontal bar chart of RQ frequencies
Per-item actions: accept, reject, or edit (reassign RQ/sub-theme)
Bulk accept by confidence threshold
Manual coding: select a segment, preview its content, and manually assign RQ + sub-theme labels
Cascading dropdown: sub-theme choices automatically filter based on selected RQ

Step 6 · Export

Export reviewed coding results as structured Excel file
Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
Save project — persist the entire coding session (segments, RQs, review state, evidence spans) to a single JSON file

Toolbox: Research Methods Tools

The Toolbox provides standalone research utilities that work independently or in combination with QuantiKit / QualiKit.

ICR Calculator

Compute inter-coder reliability for 2 or more coders with automatic metric selection:

Scenario	Metric
2 coders, single-label	Cohen's Kappa + Krippendorff's Alpha + per-category agreement
3+ coders, single-label	Krippendorff's Alpha + pairwise Cohen's Kappa
2 coders, multi-label	Jaccard index (pairwise)
3+ coders, multi-label	Average pairwise Jaccard

Upload a CSV with coder columns, select which columns to compare
Interpretation follows the Landis & Koch (1977) scale

Consensus Coding

Multi-LLM majority-vote coding for qualitative data:

Configure 2–5 LLM backends (OpenAI, Anthropic, Ollama) with independent models
Each LLM codes every text segment; final label is determined by majority vote
Agreement statistics across LLMs are reported automatically

Methods Section Generator

Auto-generate a methods section paragraph (English + Chinese) for your paper:

From pipeline log: QuantiKit and QualiKit can export a pipeline log (JSON) capturing all metadata (sample size, model, metrics, themes, etc.). Import the log and generate a ready-to-use methods paragraph.
Manual input: Fill in metadata fields manually if you prefer not to use the pipeline log.

Project Save & Restore

Long research projects rarely finish in one session. SocialSciKit serialises the full state of your work — loaded DataFrames, annotation sessions (including cursor and history), extraction review sessions, research questions, de-identification results — into a single JSON file:

Save: at the end of any pipeline, click "Save Project" in Step 6 to download a .json archive
Restore: return to the Home tab, expand "Load Saved Project", upload the JSON file, and all state is restored across both pipelines
Tagged-union serialisation: complex types (pd.DataFrame, AnnotationSession, ExtractionReviewSession, ResearchQuestion, ExtractionResult, enums) round-trip losslessly; elapsed-time counters are preserved via monotonic time offsets
Version-aware: project files include a __project_version__ field so future readers can migrate old archives

Supported LLM Backends

Backend	Example Models	Use Case
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`	Classification, coding, prompt optimization
Anthropic	`claude-sonnet-4-20250514`, `claude-haiku-4-5-20251001`	Classification, coding, prompt optimization
Ollama	`llama3`, `mistral`, `qwen2.5`	Local inference, no API key needed

To use Ollama, install it from ollama.com and pull a model:

ollama pull llama3

Example Datasets

The examples/ directory contains ready-to-use sample data:

File	Module	Description
`sentiment_example.csv`	QuantiKit	50 Chinese product/service reviews with 3 sentiment labels
`policy_example.csv`	QuantiKit	40 Chinese policy text excerpts with 8 policy-instrument labels
`interview_example.txt`	QualiKit	Single-person community healthcare interview transcript
`interview_focus_group.txt`	QualiKit	4-person focus group on elderly digital service experiences
`icr_example.csv`	Toolbox	20 policy texts coded by 3 coders (A/B/C) for ICR calculation
`consensus_example.csv`	Toolbox	15 interview segments for multi-LLM consensus coding
`methods_log_quantikit.json`	Toolbox	Sample QuantiKit pipeline log for methods generation
`methods_log_qualikit.json`	Toolbox	Sample QualiKit pipeline log for methods generation

Cookbook: Sentiment Classification (QuantiKit)

Launch: socialscikit launch → click QuantiKit tab
Upload examples/sentiment_example.csv
Map columns: text → text, label → label
Go to Step 2 → click Recommend to see method suggestion
Go to Step 4 → select LLM backend → enter labels
Click Generate Prompt → Run Classification
Go to Step 5 → evaluate against gold labels
Go to Step 6 → export results

Cookbook: Focus Group Coding (QualiKit)

Launch: socialscikit launch → click QualiKit tab
Upload examples/interview_focus_group.txt
Step 1: select "Speaker turn" segmentation → click Segment
Step 2: run de-identification → review and accept/reject each PII replacement
Step 3: define RQs and sub-themes → optionally use LLM to suggest sub-themes
Step 4: select LLM backend → run batch coding
Step 5: review results, bulk accept high-confidence codes, manually fix low-confidence ones
Step 6: export to Excel

Project Structure

socialscikit/
├── core/                         # Shared infrastructure
│   ├── data_loader.py            # Multi-format data reader (CSV/Excel/JSON/txt)
│   ├── data_validator.py         # Schema validation + auto-fix
│   ├── data_diagnostics.py       # Data quality diagnostics report
│   ├── llm_client.py             # Unified LLM client (OpenAI/Anthropic/Ollama)
│   ├── icr.py                    # Inter-coder reliability (Kappa/Alpha/Jaccard)
│   ├── methods_writer.py         # Methods section generator (EN/ZH templates)
│   ├── charts.py                 # Academic-style matplotlib charts (viz dashboard)
│   ├── project_io.py             # Project state serialization (save/restore)
│   └── templates/                # Template files for download
│
├── quantikit/                    # Text classification module
│   ├── feature_extractor.py      # Dataset feature extraction
│   ├── method_recommender.py     # Rule-based method recommendation (with citations)
│   ├── budget_recommender.py     # Annotation budget estimation
│   ├── prompt_optimizer.py       # APE-based prompt generation & optimization
│   ├── prompt_classifier.py      # Zero/few-shot LLM classification
│   ├── annotator.py              # Built-in annotation interface
│   ├── classifier.py             # Transformer fine-tuning pipeline
│   ├── api_finetuner.py          # OpenAI fine-tuning API wrapper
│   └── evaluator.py              # Accuracy / F1 / Kappa / confusion matrix
│
├── qualikit/                     # Qualitative coding module
│   ├── segmenter.py              # Text segmentation (paragraph / speaker turn)
│   ├── segment_extractor.py      # Segment-level extraction
│   ├── deidentifier.py           # PII detection (Chinese + English)
│   ├── deident_reviewer.py       # De-identification interactive review
│   ├── theme_definer.py          # Theme definition + LLM suggestion
│   ├── theme_reviewer.py         # Theme review & overlap detection
│   ├── coder.py                  # LLM batch coding
│   ├── confidence_ranker.py      # Confidence scoring & ranking
│   ├── coding_reviewer.py        # Human-in-the-loop coding review
│   ├── extraction_reviewer.py    # Extraction result review
│   ├── consensus.py              # Multi-LLM consensus coding (majority vote)
│   └── exporter.py               # Excel / Markdown export
│
├── ui/                           # Gradio web interface
│   ├── main_app.py               # Unified app (Home + QuantiKit + QualiKit + Toolbox)
│   ├── quantikit_app.py          # QuantiKit UI callbacks
│   ├── qualikit_app.py           # QualiKit UI callbacks
│   ├── toolbox_app.py            # Toolbox UI callbacks (ICR/Consensus/Methods)
│   └── i18n.py                   # Internationalization (EN / ZH)
│
├── cli.py                        # Command-line entry point
│
examples/                         # Sample datasets
tests/                            # Test suite (676 tests)
promo/                            # Promotional posters + HTML sources
pyproject.toml                    # Package metadata & dependencies
CITATION.cff                      # Citation metadata

Key References

The method recommendation engine and workflow design are grounded in the following computational social science literature:

Sun, B., Chang, C., Ang, Y. Y., Mu, R., Xu, Y. & Zhang, Z. (2026). Creation of the Chinese Adaptive Policy Communication Corpus. ACL 2026.
Carlson, K. et al. (2026). The use of LLMs to annotate data in management research. Strategic Management Journal.
Chae, Y. & Davidson, T. (2025). Large Language Models for text classification. Sociological Methods & Research.
Do, S., Ollion, E. & Shen, R. (2024). The augmented social scientist. Sociological Methods & Research, 53(3).
Dunivin, Z. O. (2024). Scalable qualitative coding with LLMs. arXiv:2401.15170.
Montgomery, J. M. et al. (2024). Improving probabilistic models in text classification via active learning. APSR.
Than, N. et al. (2025). Updating 'The Future of Coding'. Sociological Methods & Research.
Ziems, C. et al. (2024). Can LLMs transform computational social science? Computational Linguistics, 50(1).
Zhou, Y. et al. (2022). Large Language Models are human-level prompt engineers. ICLR 2023.

Citation

If you use SocialSciKit in your research, please cite:

@inproceedings{sun2026creation,
  title     = {Creation of the {Chinese} Adaptive Policy Communication Corpus},
  author    = {Sun, Bolun and Chang, Charles and Ang, Yuen Yuen and Mu, Ruotong and Xu, Yuchen and Zhang, Zhengxin},
  booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
  year      = {2026}
}

Development

# Clone the repository
git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run the full test suite
pytest tests/ -v

# Code style check
ruff check .

Running the app in development mode

python -c "from socialscikit.ui.main_app import create_app; create_app().launch()"

License & Disclaimer

License: MIT

Disclaimer:

De-identification module: Automatic PII detection is a preliminary processing tool. Manual review is mandatory before IRB submission. This tool does not guarantee complete removal of all identifying information.
LLM classification / coding: Results should be treated as research assistance. Critical research conclusions require human validation.
Budget recommendation: Based on statistical estimation. Actual requirements may vary depending on task complexity and data characteristics.

Author

Bolun Sun (孙伯伦)

Ph.D. Student, Kellogg School of Management, Northwestern University

Research interests: Computational Social Science, NLP, Human-Centered AI

Email: bolun.sun@kellogg.northwestern.edu | Web: baron-sun.github.io

Contributing

This project is actively maintained and updated. Contributions, suggestions, and feedback are very welcome! Feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
examples		examples
promo		promo
socialscikit		socialscikit
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
launch_app.py		launch_app.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

SocialSciKit

What is SocialSciKit?

Highlights

Table of Contents

Installation

Requirements

Option A: Install from PyPI

Option B: Install from source

Core Dependencies

Optional: spaCy language models

Quick Start

Launch the unified app (recommended)

Launch individual modules

CLI Options

First-time language switch

QuantiKit: Text Classification

Step 1 · Data Upload

Step 2 · Recommendation

Step 3 · Annotation

Step 4 · Classification

Step 5 · Evaluation

Step 6 · Export

QualiKit: Qualitative Coding

Step 1 · Upload & Segment

Step 2 · De-identification

Step 3 · Research Framework

Step 4 · LLM Coding

Step 5 · Review

Step 6 · Export

Toolbox: Research Methods Tools

ICR Calculator

Consensus Coding

Methods Section Generator

Project Save & Restore

Supported LLM Backends

Example Datasets

Cookbook: Sentiment Classification (QuantiKit)

Cookbook: Focus Group Coding (QualiKit)

Project Structure

Key References

Citation

Development

Running the app in development mode

License & Disclaimer

Author

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages