Chatbot Manipulation Analysis

This repository contains analysis scripts for studying manipulation tactics in AI chatbot conversations. The analysis examines various manipulation techniques, persuasion strategies, and inter-annotator agreement on conversational data.

Overview

The project analyzes chatbot conversations from multiple perspectives:

Manipulation Detection: Identifying and categorizing manipulation tactics (peer pressure, gaslighting, guilt-tripping, etc.)
Persuasion Analysis: Examining persuasion strategies and their effectiveness
Inter-Annotator Agreement: Measuring consistency between human annotators
Predictive Modeling: Machine learning models to classify manipulative content

Data Structure

The analysis uses three primary data files located in the data/ directory:

conversations.json: Raw conversation data with metadata about prompts, models, and conversation types
survey_responses.json: Human annotations rating manipulation tactics on various dimensions
all_data.json: Combined dataset with conversation content and ratings

Installation

Clone this repository
Install Python dependencies:

pip install -r requirements.txt

(Optional) For NLP-specific analyses, download required NLTK data:

python -c "import nltk; nltk.download('punkt')"

Analysis Modules

Plots and Visualizations (`analysis/plots/`)

firestore_analysis.py: Main analysis pipeline generating correlation matrices, confusion matrices, and manipulation tactic heatmaps
manipulation_scores_plot.py: Visualizes distribution of manipulation scores
persuasion_helpful_stacked_by_model.py: Compares persuasion effectiveness across different AI models

Inter-Annotator Agreement (`analysis/inter_annotator_agreement/`)

iaa_analysis.py: Calculates agreement metrics (Cohen's Kappa, Fleiss' Kappa, etc.)
krippendorff_alpha_calculator.py: Computes Krippendorff's Alpha for reliability
disagreement_analyzer.py: Identifies and analyzes cases where annotators disagree

Predictive Modeling (`analysis/predictive_modelling/`)

bert_bilstm.py: BERT-based BiLSTM model for manipulation classification
zero_shot_analysis.py: Zero-shot classification using large language models
fold_creator.py: Creates cross-validation folds for model evaluation
aggregate_zero_shot_results.py: Aggregates results from multiple zero-shot experiments

Usage

Running the Main Analysis Pipeline

cd analysis/plots
python firestore_analysis.py

This generates:

Manipulation tactics heatmaps
Correlation analysis between manipulation types
Confusion matrices for classification accuracy

Inter-Annotator Agreement Analysis

cd analysis/inter_annotator_agreement
python iaa_analysis.py

Predictive Model Training

cd analysis/predictive_modelling
python bert_bilstm.py

Key Findings

The analysis categorizes manipulation tactics into several dimensions:

Peer Pressure: Using social conformity to influence decisions
Reciprocity Pressure: Creating obligation through perceived favors
Gaslighting: Undermining the user's perception of reality
Guilt-Tripping: Inducing guilt to drive behavior
Emotional Blackmail: Threatening emotional consequences
Fear Enhancement: Amplifying anxieties to motivate action
Negging: Undermining confidence to increase compliance

Output Files

Analysis scripts generate various output files:

PDF/PNG plots: Visualizations of manipulation patterns and correlations
CSV files: Prediction results and aggregated metrics
JSON files: Structured analysis results and model outputs
Log files: Detailed execution logs for debugging

Data Loading

All scripts have been updated to use local JSON data files. The shared data loader module (analysis/shared_data_loader.py) handles data loading consistently across all analysis scripts, replacing the previous Google Cloud Firestore integration.

Contributing

When adding new analysis scripts:

Use the shared data loader from analysis/shared_data_loader.py
Follow the existing code structure for consistency
Document key findings and outputs in code comments

License

See LICENSE file for details.

Citation

If you use this analysis in your research, please cite the associated paper (details to be added).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
analysis		analysis
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot Manipulation Analysis

Overview

Data Structure

Installation

Analysis Modules

Plots and Visualizations (`analysis/plots/`)

Inter-Annotator Agreement (`analysis/inter_annotator_agreement/`)

Predictive Modeling (`analysis/predictive_modelling/`)

Usage

Running the Main Analysis Pipeline

Inter-Annotator Agreement Analysis

Predictive Model Training

Key Findings

Output Files

Data Loading

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chatbot Manipulation Analysis

Overview

Data Structure

Installation

Analysis Modules

Plots and Visualizations (analysis/plots/)

Inter-Annotator Agreement (analysis/inter_annotator_agreement/)

Predictive Modeling (analysis/predictive_modelling/)

Usage

Running the Main Analysis Pipeline

Inter-Annotator Agreement Analysis

Predictive Model Training

Key Findings

Output Files

Data Loading

Contributing

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Plots and Visualizations (`analysis/plots/`)

Inter-Annotator Agreement (`analysis/inter_annotator_agreement/`)

Predictive Modeling (`analysis/predictive_modelling/`)

Packages