This repository contains analysis scripts for studying manipulation tactics in AI chatbot conversations. The analysis examines various manipulation techniques, persuasion strategies, and inter-annotator agreement on conversational data.
The project analyzes chatbot conversations from multiple perspectives:
- Manipulation Detection: Identifying and categorizing manipulation tactics (peer pressure, gaslighting, guilt-tripping, etc.)
- Persuasion Analysis: Examining persuasion strategies and their effectiveness
- Inter-Annotator Agreement: Measuring consistency between human annotators
- Predictive Modeling: Machine learning models to classify manipulative content
The analysis uses three primary data files located in the data/ directory:
conversations.json: Raw conversation data with metadata about prompts, models, and conversation typessurvey_responses.json: Human annotations rating manipulation tactics on various dimensionsall_data.json: Combined dataset with conversation content and ratings
- Clone this repository
- Install Python dependencies:
pip install -r requirements.txt- (Optional) For NLP-specific analyses, download required NLTK data:
python -c "import nltk; nltk.download('punkt')"firestore_analysis.py: Main analysis pipeline generating correlation matrices, confusion matrices, and manipulation tactic heatmapsmanipulation_scores_plot.py: Visualizes distribution of manipulation scorespersuasion_helpful_stacked_by_model.py: Compares persuasion effectiveness across different AI models
iaa_analysis.py: Calculates agreement metrics (Cohen's Kappa, Fleiss' Kappa, etc.)krippendorff_alpha_calculator.py: Computes Krippendorff's Alpha for reliabilitydisagreement_analyzer.py: Identifies and analyzes cases where annotators disagree
bert_bilstm.py: BERT-based BiLSTM model for manipulation classificationzero_shot_analysis.py: Zero-shot classification using large language modelsfold_creator.py: Creates cross-validation folds for model evaluationaggregate_zero_shot_results.py: Aggregates results from multiple zero-shot experiments
cd analysis/plots
python firestore_analysis.pyThis generates:
- Manipulation tactics heatmaps
- Correlation analysis between manipulation types
- Confusion matrices for classification accuracy
cd analysis/inter_annotator_agreement
python iaa_analysis.pycd analysis/predictive_modelling
python bert_bilstm.pyThe analysis categorizes manipulation tactics into several dimensions:
- Peer Pressure: Using social conformity to influence decisions
- Reciprocity Pressure: Creating obligation through perceived favors
- Gaslighting: Undermining the user's perception of reality
- Guilt-Tripping: Inducing guilt to drive behavior
- Emotional Blackmail: Threatening emotional consequences
- Fear Enhancement: Amplifying anxieties to motivate action
- Negging: Undermining confidence to increase compliance
Analysis scripts generate various output files:
- PDF/PNG plots: Visualizations of manipulation patterns and correlations
- CSV files: Prediction results and aggregated metrics
- JSON files: Structured analysis results and model outputs
- Log files: Detailed execution logs for debugging
All scripts have been updated to use local JSON data files. The shared data loader module (analysis/shared_data_loader.py) handles data loading consistently across all analysis scripts, replacing the previous Google Cloud Firestore integration.
When adding new analysis scripts:
- Use the shared data loader from
analysis/shared_data_loader.py - Follow the existing code structure for consistency
- Document key findings and outputs in code comments
See LICENSE file for details.
If you use this analysis in your research, please cite the associated paper (details to be added).