Skip to content

Baron-Sun/socialscikit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SocialSciKit

A zero-code text analysis toolkit for social science researchers

Python 3.9+ Tests License: MIT Gradio UI i18n

English | 中文文档


What is SocialSciKit?

SocialSciKit is an open-source Python toolkit that enables social science researchers to perform text analysis without writing a single line of code. It provides a Gradio-based web interface with full bilingual support (English / Chinese).

Three core modules:

  • QuantiKit — End-to-end text classification pipeline (method recommendation → annotation → prompt/fine-tuning classification → evaluation → export)
  • QualiKit — End-to-end qualitative coding pipeline (upload → de-identification → research framework → LLM coding with evidence grounding → human review → export)
  • Toolbox — Standalone research methods tools: Inter-Coder Reliability (ICR) calculator, Multi-LLM Consensus Coding, and Methods Section Generator

Highlights

  • Visualization Dashboard — academic-style matplotlib charts (confusion matrix heatmaps, per-class P/R/F1 bars, confidence histograms, progress donuts, theme distribution) embedded throughout both pipelines
  • Evidence Highlighting — LLM codings include a verbatim evidence_span from the source text; the review UI highlights the supporting quote inline in the original document
  • Project Save & Restore — serialize the entire research project state (data, annotations, sessions, coding results) to a single JSON file; resume work later from the Home tab
  • Zero-code web UI — Gradio 4.44+ with full EN/ZH language switching at runtime

Table of Contents


Installation

Requirements

  • Python 3.9 or higher
  • pip (Python package manager)

Option A: Install from PyPI

pip install socialscikit

Option B: Install from source

git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
pip install -e .

Core Dependencies

Package Version Purpose
gradio ≥ 4.0 Web UI framework
pandas ≥ 2.0 Data manipulation
openpyxl any Excel read/write
spacy ≥ 3.7 NLP pipeline (tokenization, NER)
transformers ≥ 4.40 Fine-tuning (RoBERTa / XLM-R)
datasets any HuggingFace dataset handling
openai ≥ 1.0 OpenAI API client
anthropic ≥ 0.25 Anthropic API client
scikit-learn any Evaluation metrics
scipy any Statistical computation
bertopic any Topic modeling
presidio-analyzer any PII detection engine
presidio-anonymizer any PII anonymization
langdetect any Language detection
tiktoken any Token counting
httpx any Ollama HTTP client
rich any CLI formatting

Optional: spaCy language models

For best de-identification performance, download at least one spaCy model:

# English
python -m spacy download en_core_web_sm

# Chinese
python -m spacy download zh_core_web_sm

Quick Start

Launch the unified app (recommended)

socialscikit launch
# or simply:
socialscikit
# Opens at http://127.0.0.1:7860

Launch individual modules

# QuantiKit only
socialscikit quantikit --port 7860

# QualiKit only
socialscikit qualikit --port 7861

CLI Options

Flag Description Default
--port Server port number 7860 / 7861
--share Create a public Gradio link False

First-time language switch

The default UI language is English. Use the Language toggle at the top of the page to switch to Chinese. All labels, buttons, and instructions update in real time.


QuantiKit: Text Classification

QuantiKit guides you through the full text classification workflow in 6 steps.

Step 1 · Data Upload

  • Supported formats: CSV, Excel (.xlsx/.xls), JSON, JSONL
  • Upload your data file, then map the text and label columns
  • Automatic data validation: detects missing values, empty strings, encoding issues
  • One-click fix: auto-repair common data quality issues
  • Diagnostic report: label distribution, text length statistics, duplicate detection

Step 2 · Recommendation

  • Method recommender: analyzes your data characteristics (size, class count, imbalance ratio, text length) and recommends the optimal classification approach — zero-shot, few-shot, or fine-tuning — with literature citations
  • Budget recommender: estimates "how many labels do you need?" using power-law learning curve fitting, with 80% confidence intervals and marginal return curves
    • Cold-start mode: priors from CSS benchmark datasets (HatEval, SemEval, MFTC)
    • Empirical mode: fits f1 = a * n^b + c on your labeled subset

Step 3 · Annotation

  • Built-in annotation UI — no need for external tools
  • Label each text sample, with skip, undo, flag for review support
  • Real-time progress donut chart — visual progress tracker updates after every action
  • Export annotated data as CSV, merge with original dataset

Step 4 · Classification

Three sub-approaches available in parallel tabs:

Sub-tab Method When to use
Prompt Classification Zero/few-shot via LLM API Small datasets (< 200 labeled)
Fine-tuning Local transformer fine-tuning Medium datasets (200+), no API cost
API Fine-tuning OpenAI fine-tuning API Large datasets, best performance

Prompt Classification features:

  • Prompt Designer: task description + class definitions + positive/negative examples → auto-generates a structured prompt
  • Prompt Optimizer: generates 3 variants using APE (Automatic Prompt Engineering), evaluates each on a test split
  • One-click batch classification on the full dataset

Step 5 · Evaluation

Full visualization dashboard:

  • Metric summary cards (HTML) — Accuracy, Macro-F1, Weighted-F1, Cohen's Kappa, total/correct counts
  • Confusion matrix heatmap — row-normalized, annotated with counts + percentages
  • Per-class metrics bar chart — Precision / Recall / F1 grouped bars per class
  • Collapsible full text report below charts

Step 6 · Export

  • Download classification results as CSV (original text + predicted labels + confidence)
  • Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
  • Save project — persist all research state (data, predictions, annotation session) to a single JSON file

QualiKit: Qualitative Coding

QualiKit supports the full qualitative coding workflow for interview transcripts, focus group data, and open-ended survey responses.

Step 1 · Upload & Segment

  • Supported formats: plain text (.txt)
  • Automatic speaker detection and segmentation (by paragraph or by speaker turn)
  • Configurable context window (number of surrounding sentences to include)
  • Preview segmented results in a table before proceeding

Step 2 · De-identification

  • Automatic PII detection: person names, email addresses, phone numbers, Chinese ID card numbers
  • Chinese-aware NER: detects Chinese names with title/honorific patterns
  • English NER via spaCy and Presidio
  • Replacement strategies: pseudonym, redact ([REDACTED]), or tag-based ([PERSON_1])
  • Per-item review: accept, reject, or edit each detected PII replacement individually
  • Bulk actions: accept all, accept high-confidence only (≥ 0.90), or apply all accepted to the text

Step 3 · Research Framework

  • Define your Research Questions (RQs) and Sub-themes using an interactive editable table
  • Add/remove rows dynamically
  • LLM-powered sub-theme suggestion: connect to an LLM backend, and it analyzes your transcript to suggest relevant sub-themes per RQ
  • Confirm framework before proceeding to coding

Step 4 · LLM Coding

  • Batch coding: LLM reads each segment and assigns RQ + sub-theme labels with confidence scores
  • Evidence grounding: the LLM prompt requires a verbatim evidence_span — the exact phrase or sentence from the source text that supports the coding decision
  • Supports OpenAI, Anthropic, and Ollama backends
  • Results displayed with segment text, assigned codes, confidence levels, and evidence spans

Step 5 · Review

  • Review coding results in a table sorted by confidence
  • Evidence highlighting: when you select an item, the original text is shown with the LLM's evidence_span highlighted in green, so you can verify the coding decision at a glance; if the exact quote isn't found, a fallback "Evidence" block displays the cited text
  • Visualization dashboard (collapsible accordion):
    • Review progress donut — accepted / edited / rejected / pending counts
    • Confidence histogram — low/medium/high tier shading + median marker
    • Theme distribution — horizontal bar chart of RQ frequencies
  • Per-item actions: accept, reject, or edit (reassign RQ/sub-theme)
  • Bulk accept by confidence threshold
  • Manual coding: select a segment, preview its content, and manually assign RQ + sub-theme labels
  • Cascading dropdown: sub-theme choices automatically filter based on selected RQ

Step 6 · Export

  • Export reviewed coding results as structured Excel file
  • Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
  • Save project — persist the entire coding session (segments, RQs, review state, evidence spans) to a single JSON file

Toolbox: Research Methods Tools

The Toolbox provides standalone research utilities that work independently or in combination with QuantiKit / QualiKit.

ICR Calculator

Compute inter-coder reliability for 2 or more coders with automatic metric selection:

Scenario Metric
2 coders, single-label Cohen's Kappa + Krippendorff's Alpha + per-category agreement
3+ coders, single-label Krippendorff's Alpha + pairwise Cohen's Kappa
2 coders, multi-label Jaccard index (pairwise)
3+ coders, multi-label Average pairwise Jaccard
  • Upload a CSV with coder columns, select which columns to compare
  • Interpretation follows the Landis & Koch (1977) scale

Consensus Coding

Multi-LLM majority-vote coding for qualitative data:

  • Configure 2–5 LLM backends (OpenAI, Anthropic, Ollama) with independent models
  • Each LLM codes every text segment; final label is determined by majority vote
  • Agreement statistics across LLMs are reported automatically

Methods Section Generator

Auto-generate a methods section paragraph (English + Chinese) for your paper:

  • From pipeline log: QuantiKit and QualiKit can export a pipeline log (JSON) capturing all metadata (sample size, model, metrics, themes, etc.). Import the log and generate a ready-to-use methods paragraph.
  • Manual input: Fill in metadata fields manually if you prefer not to use the pipeline log.

Project Save & Restore

Long research projects rarely finish in one session. SocialSciKit serialises the full state of your work — loaded DataFrames, annotation sessions (including cursor and history), extraction review sessions, research questions, de-identification results — into a single JSON file:

  • Save: at the end of any pipeline, click "Save Project" in Step 6 to download a .json archive
  • Restore: return to the Home tab, expand "Load Saved Project", upload the JSON file, and all state is restored across both pipelines
  • Tagged-union serialisation: complex types (pd.DataFrame, AnnotationSession, ExtractionReviewSession, ResearchQuestion, ExtractionResult, enums) round-trip losslessly; elapsed-time counters are preserved via monotonic time offsets
  • Version-aware: project files include a __project_version__ field so future readers can migrate old archives

Supported LLM Backends

Backend Example Models Use Case
OpenAI gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano Classification, coding, prompt optimization
Anthropic claude-sonnet-4-20250514, claude-haiku-4-5-20251001 Classification, coding, prompt optimization
Ollama llama3, mistral, qwen2.5 Local inference, no API key needed

To use Ollama, install it from ollama.com and pull a model:

ollama pull llama3

Example Datasets

The examples/ directory contains ready-to-use sample data:

File Module Description
sentiment_example.csv QuantiKit 50 Chinese product/service reviews with 3 sentiment labels
policy_example.csv QuantiKit 40 Chinese policy text excerpts with 8 policy-instrument labels
interview_example.txt QualiKit Single-person community healthcare interview transcript
interview_focus_group.txt QualiKit 4-person focus group on elderly digital service experiences
icr_example.csv Toolbox 20 policy texts coded by 3 coders (A/B/C) for ICR calculation
consensus_example.csv Toolbox 15 interview segments for multi-LLM consensus coding
methods_log_quantikit.json Toolbox Sample QuantiKit pipeline log for methods generation
methods_log_qualikit.json Toolbox Sample QualiKit pipeline log for methods generation

Cookbook: Sentiment Classification (QuantiKit)

  1. Launch: socialscikit launch → click QuantiKit tab
  2. Upload examples/sentiment_example.csv
  3. Map columns: text → text, label → label
  4. Go to Step 2 → click Recommend to see method suggestion
  5. Go to Step 4 → select LLM backend → enter labels
  6. Click Generate PromptRun Classification
  7. Go to Step 5 → evaluate against gold labels
  8. Go to Step 6 → export results

Cookbook: Focus Group Coding (QualiKit)

  1. Launch: socialscikit launch → click QualiKit tab
  2. Upload examples/interview_focus_group.txt
  3. Step 1: select "Speaker turn" segmentation → click Segment
  4. Step 2: run de-identification → review and accept/reject each PII replacement
  5. Step 3: define RQs and sub-themes → optionally use LLM to suggest sub-themes
  6. Step 4: select LLM backend → run batch coding
  7. Step 5: review results, bulk accept high-confidence codes, manually fix low-confidence ones
  8. Step 6: export to Excel

Project Structure

socialscikit/
├── core/                         # Shared infrastructure
│   ├── data_loader.py            # Multi-format data reader (CSV/Excel/JSON/txt)
│   ├── data_validator.py         # Schema validation + auto-fix
│   ├── data_diagnostics.py       # Data quality diagnostics report
│   ├── llm_client.py             # Unified LLM client (OpenAI/Anthropic/Ollama)
│   ├── icr.py                    # Inter-coder reliability (Kappa/Alpha/Jaccard)
│   ├── methods_writer.py         # Methods section generator (EN/ZH templates)
│   ├── charts.py                 # Academic-style matplotlib charts (viz dashboard)
│   ├── project_io.py             # Project state serialization (save/restore)
│   └── templates/                # Template files for download
│
├── quantikit/                    # Text classification module
│   ├── feature_extractor.py      # Dataset feature extraction
│   ├── method_recommender.py     # Rule-based method recommendation (with citations)
│   ├── budget_recommender.py     # Annotation budget estimation
│   ├── prompt_optimizer.py       # APE-based prompt generation & optimization
│   ├── prompt_classifier.py      # Zero/few-shot LLM classification
│   ├── annotator.py              # Built-in annotation interface
│   ├── classifier.py             # Transformer fine-tuning pipeline
│   ├── api_finetuner.py          # OpenAI fine-tuning API wrapper
│   └── evaluator.py              # Accuracy / F1 / Kappa / confusion matrix
│
├── qualikit/                     # Qualitative coding module
│   ├── segmenter.py              # Text segmentation (paragraph / speaker turn)
│   ├── segment_extractor.py      # Segment-level extraction
│   ├── deidentifier.py           # PII detection (Chinese + English)
│   ├── deident_reviewer.py       # De-identification interactive review
│   ├── theme_definer.py          # Theme definition + LLM suggestion
│   ├── theme_reviewer.py         # Theme review & overlap detection
│   ├── coder.py                  # LLM batch coding
│   ├── confidence_ranker.py      # Confidence scoring & ranking
│   ├── coding_reviewer.py        # Human-in-the-loop coding review
│   ├── extraction_reviewer.py    # Extraction result review
│   ├── consensus.py              # Multi-LLM consensus coding (majority vote)
│   └── exporter.py               # Excel / Markdown export
│
├── ui/                           # Gradio web interface
│   ├── main_app.py               # Unified app (Home + QuantiKit + QualiKit + Toolbox)
│   ├── quantikit_app.py          # QuantiKit UI callbacks
│   ├── qualikit_app.py           # QualiKit UI callbacks
│   ├── toolbox_app.py            # Toolbox UI callbacks (ICR/Consensus/Methods)
│   └── i18n.py                   # Internationalization (EN / ZH)
│
├── cli.py                        # Command-line entry point
│
examples/                         # Sample datasets
tests/                            # Test suite (676 tests)
promo/                            # Promotional posters + HTML sources
pyproject.toml                    # Package metadata & dependencies
CITATION.cff                      # Citation metadata

Key References

The method recommendation engine and workflow design are grounded in the following computational social science literature:

  • Sun, B., Chang, C., Ang, Y. Y., Mu, R., Xu, Y. & Zhang, Z. (2026). Creation of the Chinese Adaptive Policy Communication Corpus. ACL 2026.
  • Carlson, K. et al. (2026). The use of LLMs to annotate data in management research. Strategic Management Journal.
  • Chae, Y. & Davidson, T. (2025). Large Language Models for text classification. Sociological Methods & Research.
  • Do, S., Ollion, E. & Shen, R. (2024). The augmented social scientist. Sociological Methods & Research, 53(3).
  • Dunivin, Z. O. (2024). Scalable qualitative coding with LLMs. arXiv:2401.15170.
  • Montgomery, J. M. et al. (2024). Improving probabilistic models in text classification via active learning. APSR.
  • Than, N. et al. (2025). Updating 'The Future of Coding'. Sociological Methods & Research.
  • Ziems, C. et al. (2024). Can LLMs transform computational social science? Computational Linguistics, 50(1).
  • Zhou, Y. et al. (2022). Large Language Models are human-level prompt engineers. ICLR 2023.

Citation

If you use SocialSciKit in your research, please cite:

@inproceedings{sun2026creation,
  title     = {Creation of the {Chinese} Adaptive Policy Communication Corpus},
  author    = {Sun, Bolun and Chang, Charles and Ang, Yuen Yuen and Mu, Ruotong and Xu, Yuchen and Zhang, Zhengxin},
  booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
  year      = {2026}
}

Development

# Clone the repository
git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run the full test suite
pytest tests/ -v

# Code style check
ruff check .

Running the app in development mode

python -c "from socialscikit.ui.main_app import create_app; create_app().launch()"

License & Disclaimer

License: MIT

Disclaimer:

  • De-identification module: Automatic PII detection is a preliminary processing tool. Manual review is mandatory before IRB submission. This tool does not guarantee complete removal of all identifying information.
  • LLM classification / coding: Results should be treated as research assistance. Critical research conclusions require human validation.
  • Budget recommendation: Based on statistical estimation. Actual requirements may vary depending on task complexity and data characteristics.

Author

Bolun Sun (孙伯伦)

Ph.D. Student, Kellogg School of Management, Northwestern University

Research interests: Computational Social Science, NLP, Human-Centered AI

Email: bolun.sun@kellogg.northwestern.edu | Web: baron-sun.github.io


Contributing

This project is actively maintained and updated. Contributions, suggestions, and feedback are very welcome! Feel free to open an issue or submit a pull request.