# MAPPING AFRICA'S CONFLICT RELATIONSHIPS

**An exploration of actor-to-actor conflict dynamics across the African continent (1997–2014) using ACLED Dyadic Data**

## Business Understanding  
In Africa’s conflict zones, it’s not just about what happened- it’s about **who keeps coming back to fight whom**, and **where things are heating up**.  
That’s the part most datasets skip. But the ACLED Dyadic data? That’s where the real signal lives.

We’re not here to count bullets. We’re here to **map relationships**, **track escalations**, and surface the patterns that matter- before things spiral.  
This project leans into that gap- turning actor-to-actor conflict data into something **actionable** for NGOs, peacebuilders, analysts, and anyone serious about understanding violence from the inside out.

## Project Overview  
We’re breaking down over a decade of conflict- **who fought whom**, **where**, and **how it played out**- to answer the questions that lead to better decisions:

- What dyads keep reappearing?
- Who’s triggering the worst violence?
- Which areas are consistently volatile?
- How do these relationships evolve?

From mapping conflict webs to scoring high-risk actor pairs, the goal is simple:  
**Give the right people the right lens before the next crisis hits.**

## Deliverables

- **Cleaned + enriched dataset** (dyad ID, conflict region, actor normalization, year breakdown)
- **Conflict dyad explorer**- who fights whom, how often, and with what impact
- **Escalation curves** for the most volatile dyads
- **Hotspot heatmaps** and regional breakdowns
- **Network graphs** showing actor relationships and central nodes
- **Dyadic Risk Score**- composite risk index based on frequency, intensity, and recency
- **Notebooks + visuals + ready-to-use summaries** for stakeholders

## Success Metrics

- Top 10 riskiest dyads identified and profiled  
- Escalation trends clearly visualized for key actor pairs  
- Accurate hotspot detection by region and year  
- Reusable code and clean outputs for policy teams or analysts  
- Project structured for future ACLED updates or country- focused expansions  

> This isn’t just a dataset. It’s a lens. One that tells us not just what happened- **but who’s likely to make it happen again.**
> Powered by data. Grounded in people.

# 3️ Data Cleaning & Preprocessing
- Handle missing values
- Rename ambiguous columns
- Create new features: dyad name, year, month, actor interaction type
- Normalize actor names (optional)

# 4️ Exploratory Data Analysis (EDA)
## A. Univariate
- Most common Actor1 / Actor2
- Top countries, event types, interactions

## B. Bivariate
- Fatalities by dyad
- Dyad frequency vs. fatalities
- Temporal trend per country / actor

## C. Geospatial
- Choropleth: conflicts by country
- Heatmap: event locations
- Regional focus maps

# 5️ Network Analysis
- Construct directed graph of actors
- Degree, centrality, clustering
- Visualize with NetworkX or Plotly
- Highlight conflict communities

# 6️ Modeling & Scoring
## A. Fatality Classifier (LogReg / Tree)
- Predict high-fatality dyadic events

## B. Dyad Risk Scoring
- Create a composite score per dyad
- Rank and visualize

## C. Temporal Prediction (Optional)
- Predict future dyadic recurrence or escalation

# 7️ Insights & Dashboards
- Top 10 riskiest dyads
- Timeline of escalation by actor
- Region-wise conflict summaries
- Downloadable actor profiles

# 8️ Deliverables
- Cleaned dataset
- Visuals + charts
- Notebook (.ipynb)
- PDF/HTML report
- GitHub repo with README

# 9 Conclusion & Next Steps
- Insights recap
- Policy relevance
- Future data integrations (refugees, elections, natural resources)

## INITIAL DATA EXPLORATION (IDE)

Every dataset tells a story- but before I dive into any narratives, I'll flip through the table of contents. This phase is about getting comfortable with the data: seeing what’s there, what’s missing, and what might surprise me later if I don’t pay attention now.

#### What's happening:
- Importing key libraries like 'pandas', 'numpy', 'seaborn', 'matplotlib', and 'plotly'- the usual suspects for slicing, dicing and visualizing data.
- Previewing the first few rows to get a feel for the dataset’s structure, naming conventions, and early red flags (no one likes nasty surprises 30 cells in).
- Checking the shape of the data because whether it's 500 rows or 50,000 completely changes the game.
- Get metadata
- Get basic statistics information of both numerica and categorical columns

This might not be the flashiest part of the workflow, but it’s where trust is built- between me and the dataset. And as I’ve learned from previous projects, a few extra minutes spent here can save hours of confusion down the road.

Exploration done right is part instinct, part structure- this is BOTH!

In [69]:
# Mathematical computation and data manipulation libraries
import numpy as np
import pandas as pd

# Data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Modeling and ML libraries
from sklearn.preprocessing import LabelEncoder

# Load the data
conflict_df = pd.read_excel('ACLED Dyadic Relationships.xlsx')

# Preview first 5
conflict_df.head()


Unknown extension is not supported and will be removed



Unnamed: 0,GWNO,EVENT_ID_CNTY,EVENT_ID_NO_CNTY,EVENT_DATE,YEAR,TIME_PRECISION,EVENT_TYPE,ACTOR1,ALLY_ACTOR_1,INTER1,...,ADMIN1,ADMIN2,ADMIN3,LOCATION,LATITUDE,LONGITUDE,GEO_PRECIS,SOURCE,NOTES,FATALITIES
0,615,1ALG,1,1997-01-02,1997,1,Violence against civilians,GIA: Armed Islamic Group,,2,...,Blida,Blida,,Blida,36.4686,2.8289,1,www.algeria-watch.org,4 January: 16 citizens were murdered in the vi...,16.0
1,615,2ALG,2,1997-01-03,1997,1,Violence against civilians,GIA: Armed Islamic Group,,2,...,Tipaza,Douaouda,,Douaouda,36.6725,2.7894,1,www.algeria-watch.org,5 January: Massacre of 18 citizens in the Oliv...,18.0
2,615,3ALG,3,1997-01-04,1997,1,Violence against civilians,GIA: Armed Islamic Group,,2,...,Tipaza,Hadjout,,Hadjout,36.5139,2.4178,1,www.algeria-watch.org,6 January: 23 citizens were horribly mutilated...,23.0
3,615,4ALG,4,1997-01-05,1997,1,Remote violence,GIA: Armed Islamic Group,,2,...,Alger,Bouzareah,,Algiers,36.766,3.05,1,www.algeria-watch.org,7 January: Explosion of a bomb in the Didouche...,20.0
4,615,5ALG,5,1997-01-09,1997,1,Violence against civilians,GIA: Armed Islamic Group,,2,...,Alger,Ouled Chebel,,Ouled Chebel,36.5994,2.9944,1,www.algeria-watch.org,11 January: 5 citizens massacred in Ouled Cheb...,5.0


In [70]:
# Check how many rows and columns I am working with
print(f'The dataset has {conflict_df.shape[0]} rows and {conflict_df.shape[1]} columns')

# Check column names to inform on standardisation needs
print('\nColumn Names:\n', conflict_df.columns)

The dataset has 99548 rows and 25 columns

Column Names:
 Index(['GWNO', 'EVENT_ID_CNTY', 'EVENT_ID_NO_CNTY', 'EVENT_DATE', 'YEAR',
       'TIME_PRECISION', 'EVENT_TYPE', 'ACTOR1', 'ALLY_ACTOR_1', 'INTER1',
       'ACTOR2', 'ALLY_ACTOR_2', 'INTER2', 'INTERACTION', 'COUNTRY', 'ADMIN1',
       'ADMIN2', 'ADMIN3', 'LOCATION', 'LATITUDE', 'LONGITUDE', 'GEO_PRECIS',
       'SOURCE', 'NOTES', 'FATALITIES'],
      dtype='object')


In [71]:
# Standardise column names
conflict_df.columns = (conflict_df.columns.str.strip().str.lower())

# Preview changes
conflict_df.sample(3)

Unnamed: 0,gwno,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,actor1,ally_actor_1,inter1,...,admin1,admin2,admin3,location,latitude,longitude,geo_precis,source,notes,fatalities
3545,540,635ANG,3544,1998-12-06,1998,1,Non-violent transfer of territory,Military Forces of Democratic Republic of Cong...,,8,...,Bié,Catabola,Katabola,Catabola,-12.15,17.2833,1,24 Horas,Unita assisted by Banyamulenge rebels (NOT DRC...,0.0
15551,490,2741DRC,15552,2005-09-02,2005,1,Battle-No change of territory,Military Forces of Democratic Republic of Cong...,,1,...,Sud-Kivu,Sud-Kivu,Uvira,Lemera,-3.035,28.98,1,Radio Bukavu 5 Sep 05,"Deserters rally with FDLR and attack FARDC, al...",0.0
26612,651,4835EGY,26613,2014-05-09,2014,1,Riots/Protests,Protesters (Egypt),Muslim Brotherhood,6,...,Alexandria,Muntazah,,Sidi Beshr,31.255441,29.983228,1,Aswat Masriya,Police forces dispersed a pro-Muslim Brotherho...,0.0


In [72]:
# Get metadata
conflict_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99548 entries, 0 to 99547
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   gwno              99548 non-null  int64         
 1   event_id_cnty     99548 non-null  object        
 2   event_id_no_cnty  99548 non-null  int64         
 3   event_date        99548 non-null  datetime64[ns]
 4   year              99548 non-null  int64         
 5   time_precision    99548 non-null  int64         
 6   event_type        99548 non-null  object        
 7   actor1            99548 non-null  object        
 8   ally_actor_1      14384 non-null  object        
 9   inter1            99548 non-null  int64         
 10  actor2            77440 non-null  object        
 11  ally_actor_2      8594 non-null   object        
 12  inter2            99548 non-null  int64         
 13  interaction       99548 non-null  int64         
 14  country           9954

In [73]:
# Get basic statistical info of numerical variables
conflict_df.describe()

Unnamed: 0,gwno,event_id_no_cnty,event_date,year,time_precision,inter1,inter2,interaction,latitude,longitude,geo_precis,fatalities
count,99548.0,99548.0,99548,99548.0,99548.0,99548.0,99548.0,99548.0,99548.0,99548.0,99548.0,95070.0
mean,531.333096,49774.5,2008-06-09 20:01:05.383533312,2007.951199,1.172259,3.414825,3.157864,30.286746,4.711653,23.942672,1.275535,6.208815
min,230.0,1.0,1997-01-01 00:00:00,1997.0,1.0,1.0,0.0,10.0,-34.71011,-17.47389,1.0,0.0
25%,490.0,24887.75,2003-02-04 00:00:00,2003.0,1.0,2.0,1.0,13.0,-1.466667,13.20841,1.0,0.0
50%,520.0,49774.5,2010-07-26 00:00:00,2010.0,1.0,3.0,2.0,27.0,4.31887,29.2833,1.0,0.0
75%,560.0,74661.25,2013-06-23 00:00:00,2013.0,1.0,5.0,7.0,38.0,11.01667,34.0,1.0,1.0
max,651.0,99548.0,2014-12-31 00:00:00,2014.0,3.0,8.0,8.0,88.0,37.274423,51.2668,3.0,25000.0
std,61.163793,28737.176636,,5.678059,0.500496,2.121612,2.814157,17.573653,15.365206,16.858715,0.544663,100.121537


In [74]:
# Get basic statistical info of categorical variables
conflict_df.describe(include = 'O').T

Unnamed: 0,count,unique,top,freq
event_id_cnty,99548,99548,1ALG,1
event_type,99548,9,Battle-No change of territory,30131
actor1,99548,2627,Unidentified Armed Group (Somalia),4301
ally_actor_1,14384,2060,Muslim Brotherhood,1344
actor2,77440,2245,Civilians (Somalia),4165
ally_actor_2,8594,1488,MDC: Movement for Democratic Change,904
country,99548,50,Somalia,15150
admin1,99548,651,Banaadir,4752
admin2,96169,3522,Mogadisho,4752
admin3,48168,2854,Harare City Council,1686


In [75]:
# Check for duplicates
print("Duplicates:", conflict_df.duplicated().sum())

# Check for nulls and get their percentage to advice on best imputing or dropping criteria
null_counts = conflict_df.isna().sum()
null_percent = (null_counts / len(conflict_df)) * 100
null_summary = pd.DataFrame({'Null Count': null_counts, 'Null %': null_percent.round(2)})

print("\nNull Values Summary:\n", null_summary)

Duplicates: 0

Null Values Summary:
                   Null Count  Null %
gwno                       0    0.00
event_id_cnty              0    0.00
event_id_no_cnty           0    0.00
event_date                 0    0.00
year                       0    0.00
time_precision             0    0.00
event_type                 0    0.00
actor1                     0    0.00
ally_actor_1           85164   85.55
inter1                     0    0.00
actor2                 22108   22.21
ally_actor_2           90954   91.37
inter2                     0    0.00
interaction                0    0.00
country                    0    0.00
admin1                     0    0.00
admin2                  3379    3.39
admin3                 51380   51.61
location                   0    0.00
latitude                   0    0.00
longitude                  0    0.00
geo_precis                 0    0.00
source                   187    0.19
notes                  10929   10.98
fatalities              4478    4.50


# DATA UNDERSTANDING

Here's what we’re working with: a robust dataset of **99,548 conflict event records** spread across **25 columns**- a goldmine of information with just enough chaos to make it interesting.

At first glance, it’s clear the data isn't just big- it's rich. We're talking detailed temporal, geographic, and actor-based breakdowns for every recorded incident. A few key highlights:

- **Temporal coverage:** Events range from **1997 to 2014**, with timestamp precision reaching down to the day level (event_date) and granularity flagged via time_precision.
- **Spatial detail:** Latitude and longitude have no nulls, and we’ve got hierarchical administrative geographies (admin1, admin2, admin3)- though admin3 is a bit flaky with over **51K nulls**.
- **Actors & Interactions:** actor1 and actor2 describe the primary participants in each event. Their allies? Less reliable. ally_actor_1 and ally_actor_2 are sparsely populated- **missing in over 85%** and **91%** of records respectively. Still, when they show up, they say a lot (*Muslim Brotherhood*, *MDC*, etc).
- **Event context:** event_type spans **9 unique categories**, dominated by conflict like *"Battle – No change of territory"*. interaction codes encode the dance between actors- civilian vs. armed group, military vs. rebel, etc.
- **Location hot zones:** *Mogadishu*, *Harare*, and *Banaadir* show up a lot- unsurprising, but still worth confirming with visuals.
- **Fatalities:** Always the grim but vital metric. Recorded for most events (**95% coverage**), with counts ranging from **0 to 25,000**- yep, that’s not a typo. Definitely a candidate for outlier scrutiny.

### Missingness Breakdown

| Column       | Nulls      | % Missing |
|--------------|------------|-----------|
| ally_actor_1 | 85,164     | 85.6%     |
| actor2       | 22,108     | 22.2%     |
| ally_actor_2 | 90,954     | 91.4%     |
| admin2       | 3,379      | 3.4%      |
| admin3       | 51,380     | 51.6%     |
| source       | 187        | 0.2%      |
| notes        | 10,929     | 11.0%     |
| fatalities   | 4,478      | 4.5%      |

Not perfect, but nothing we can’t work with.

### Numeric Columns – Descriptive Stats

The quantitative side is clean, consistent, and complete for most features- no duplicates, tight types, and column ranges that check out:

- gwno, event_id_no_cnty, interaction - all standard coded IDs and classifications.
- latitude/longitude - geographically grounded, no junk entries.
- fatalities - right-skewed, heavy-tailed. The kind of variable that begs for log-transformation or robust binning, depending on how we use it.

From this snapshot, it’s clear the dataset is structured enough for serious modeling, but gritty enough to require smart preprocessing. Next up: let’s see what it’s trying to say.

# DATA CLEANING AND PREPROCESSING

## 1. HANDLING MISSING VALUES

Before diving into modeling or deeper analysis, we’ve got to clean house.

While the dataset is fairly complete, several columns come with missing values- and not all gaps are created equal. Some fields like ally_actor_1 and ally_actor_2 are missing in over 85% of rows, which raises questions about their reliability. Others like fatalities, admin3, or notes have moderate missingness that might be imputed, dropped, or otherwise handled based on context and downstream needs.

In this section, we'll:

- Quantify the extent of missing data across all columns
- Decide whether to **drop**, **impute**, or **ignore** based on:
  - Proportion of missing data
  - Importance of the feature
  - Modeling objectives

It’s not just about filling in blanks- it’s about making **informed trade-offs** that preserve data integrity while prepping it for analysis.

## 1. ally_actor_1

In [None]:
# Strip whitespace to ensure accurate counting
conflict_df['ally_actor_1'] = conflict_df['ally_actor_1'].str.strip()

# Check for nulls
null_count = conflict_df['ally_actor_1'].isna().sum()
print(f"Null values in 'ally_actor_1': {null_count}")

# Count unique non-null values
unique_count = conflict_df['ally_actor_1'].nunique(dropna = True)
print(f"Unique (non-null) allies in 'ally_actor_1': {unique_count}")

# Show top 15 most common allies (cleaned)
print("\nTop 20 Recorded Allies of Actor 1:\n")
print(conflict_df['ally_actor_1'].value_counts(dropna = False).head(20))

Null values in 'ally_actor_1': 85164
Unique (non-null) allies in 'ally_actor_1': 2013

Top 15 Recorded Allies of Actor 1:

ally_actor_1
NaN                                                              85164
Muslim Brotherhood                                                1344
AFRC: Armed Forces Revolutionary Council                           740
Students (Egypt)                                                   384
Military Forces of Somalia (2012-)                                 328
ZANU-PF: Zimbabwe African National Union-Patriotic Front           254
ZNLWVA: Zimbabwe National Liberation War Veterans Association      242
AMISOM: African Union Mission in Somalia (2007-)                   237
Military Forces of Rwanda (1994-)                                  162
COSATU: Congress of South African Trade Unions                     155
Police Forces of Zimbabwe (1987-)                                  136
Police Forces of Egypt (2011-)                                     126
Students (Za

### Context First: What Do Allies *Mean* Here?

In ACLED dyadic data, 'ally_actor_1' isn’t just filler- it signals **affiliations**: power amplifiers, ideological alignments, or proxy forces. It gives context to the **who** and *why* behind the fight.

But here’s the kicker: it’s **sparse**- missing in over **85%** of records. Which means:

- Sometimes, it’s **truly unknown**.
- More often, it’s **just not coded**, even when real-world alliances *did* exist.
- Yet when it *is* recorded, it reveals **deep structural ties**- like ZANU-PF & war veterans, AMISOM backing local forces, or student groups fueling civil resistance.

So, how do we handle this field?  
It’s not cosmetic- it’s **strategic**. Because in an analysis like this- where **relationships *are* the data**, not just attributes- we can’t afford to flatten the network.

### The Challenge

ally_actor_1 is 85% missing. But when it's there, it’s gold- it tells us who’s moving in packs, who’s propping up whom, and how certain actors may not act alone.

If we ignore this field or impute blindly, we lose signal.

### The Decision

We’re not going to guess. That’s a losing game.  
Instead, we’re going to **preserve what’s known** and **make the absence speak**.

- 'null' ≠ “no ally” - it often means **not recorded**, not “acted alone”.
- But the absence itself is informative: does this actor typically show up *with backup*, or do they tend to act solo?

### The Strategy

We keep it clean, useful, and ready for network-based analysis:

- **Fill nulls** with a clear, honest placeholder: 'No recorded ally'
- **Create a binary flag**: 'has_ally_actor_1' → separates **solo** from **networked** actors
- **Group rare allies** as 'Other ally' to reduce noise but preserve known networks

In [None]:
# Fill missing values with a clear label to preserve data structure
conflict_df['ally_actor_1'] = conflict_df['ally_actor_1'].fillna('No recorded ally')

# Strip any leading/trailing whitespace to avoid duplicate-looking values
conflict_df['ally_actor_1'] = conflict_df['ally_actor_1'].str.strip()

# Flag whether a recorded ally is present (1) or not (0)
conflict_df['has_ally_actor_1'] = (conflict_df['ally_actor_1'] != 'No recorded ally').astype(int)

# Group the top 20 most frequent allies and bucket the rest into 'Other ally'
top_allies1 = conflict_df['ally_actor_1'].value_counts().head(20).index
conflict_df['ally_actor_1_grouped'] = conflict_df['ally_actor_1'].apply(
    lambda x: x if x in top_allies1 or x == 'No recorded ally' else 'Other ally'
)

# Preview 

# Check for any remaining nulls
print("Null values in 'ally_actor_1':", conflict_df['ally_actor_1'].isna().sum())

# Display total number of unique ally labels
print("\nUnique allies in 'ally_actor_1':", conflict_df['ally_actor_1'].nunique())

# Show top 15 most common allies
print("\nTop 20 Allies of Actor 1:\n")
print(conflict_df['ally_actor_1'].value_counts().head(20))

# Show sample rows with relevant columns for verification
print("\nSample of updated ally_actor_1 columns:\n")
conflict_df[['actor1', 'ally_actor_1', 'has_ally_actor_1', 'ally_actor_1_grouped']].sample(10, random_state = 42)

Null values in 'ally_actor_1': 0

Unique allies in 'ally_actor_1': 2014

Top 15 Allies of Actor 1:

ally_actor_1
No recorded ally                                                 85164
Muslim Brotherhood                                                1344
AFRC: Armed Forces Revolutionary Council                           740
Students (Egypt)                                                   384
Military Forces of Somalia (2012-)                                 328
ZANU-PF: Zimbabwe African National Union-Patriotic Front           254
ZNLWVA: Zimbabwe National Liberation War Veterans Association      242
AMISOM: African Union Mission in Somalia (2007-)                   237
Military Forces of Rwanda (1994-)                                  162
COSATU: Congress of South African Trade Unions                     155
Police Forces of Zimbabwe (1987-)                                  136
Police Forces of Egypt (2011-)                                     126
Students (Zambia)                  

Unnamed: 0,actor1,ally_actor_1,has_ally_actor_1,ally_actor_1_grouped
1318,Police Forces of Algeria (1999-),No recorded ally,0,No recorded ally
97086,ZANU-PF: Zimbabwe African National Union-Patri...,No recorded ally,0,No recorded ally
10154,Military Forces of Central African Republic (2...,No recorded ally,0,No recorded ally
35578,Protesters (Kenya),No recorded ally,0,No recorded ally
13325,RCD-K: Rally for Congolese Democracy (Kisangani),No recorded ally,0,No recorded ally
62961,HI: Hizbul Islam,No recorded ally,0,No recorded ally
67177,Military Forces of Kenya (2002-2013),No recorded ally,0,No recorded ally
82955,Murle Ethnic Militia (Sudan),No recorded ally,0,No recorded ally
63650,Military Forces of Somalia (2004-2012),AMISOM: African Union Mission in Somalia (2007-),1,AMISOM: African Union Mission in Somalia (2007-)
28296,Military Forces of Ethiopia (1991-),Police Forces of Ethiopia (1991-),1,Other ally


## 2. actor2

In [None]:
# Strip whitespace to ensure accurate counting
conflict_df['actor2'] = conflict_df['actor2'].str.strip()

# Check for nulls
null_count = conflict_df['actor2'].isna().sum()
print(f"Null values in 'actor2': {null_count}")

# Count unique non-null values
unique_count = conflict_df['actor2'].nunique(dropna = True)
print(f"Unique (non-null) allies in 'actor2': {unique_count}")

# Show top 15 most common actors 2 (cleaned)
print("\nTop 20 Recorded Allies of Actor 2:\n")
print(conflict_df['actor2'].value_counts(dropna = False).head(20))

Null values in 'actor2': 22108
Unique (non-null) allies in 'actor2': 2217

Top 15 Recorded Allies of Actor 2:

actor2
NaN                                                           22108
Civilians (Somalia)                                            4165
Civilians (Zimbabwe)                                           3945
Civilians (Democratic Republic of Congo)                       2746
Al Shabaab                                                     2650
UNITA: National Union for the Total Independence of Angola     2461
Civilians (Nigeria)                                            2439
Civilians (Sudan)                                              2141
Civilians (Sierra Leone)                                       1981
Unidentified Armed Group (Somalia)                             1923
LRA: Lord's Resistance Army                                    1751
Military Forces of Somalia (2004-2012)                         1517
Civilians (Kenya)                                              140

### Understanding Missing Values in actor2  
**Grounded in ACLED’s Coding Framework**

In ACLED's dyadic data structure, every event is coded with up to **two actors**:  
- actor1: the initiator or primary aggressor  
- actor2: the opponent, target, or secondary party (if any)

Now, in our dataset, over **22,000 records** (22%) are missing actor2. But according to ACLED’s own documentation, this isn’t a glitch- it’s by design.

> “Events are coded with as much detail as is available; when an actor is unknown, unidentified, or the event only involves one side (e.g., protests, attacks on civilians), the second actor field is left blank.” - *ACLED Methodology*

### What Missing actor2 Really Means

So, we’re not just dealing with dirty data - we’re looking at a **coded absence**. Here’s what it could mean in ACLED’s terms:

- **Unopposed action** → No counter-party (e.g forced evictions, one-sided violence)
- **Unknown target** → E.g bombings or abductions with no reported victim group
- **Symbolic/strategic events** → Raids, looting, or property destruction not tied to a known adversary
- **Crowd-led unrest** → Protests, riots, or demonstrations with no clear ‘opponent’

Also worth noting: where actor2 *is* recorded, it’s dominated by **civilians** - a clear pattern that aligns with ACLED’s heavy inclusion of civilian-targeted violence across African states.

### So How Should We Handle It?

#### What we *won’t* do:
- Drop these records - they’re **intentionally included by ACLED**
- Impute with the mode (‘Civilians’) - that risks fabricating dyads and corrupting network structure

#### What we’ll do instead:
Use ACLED’s design principle to guide an honest, analysis-ready imputation:'

In [None]:
# Fill missing values with a clear, consistent label
conflict_df['actor2'] = conflict_df['actor2'].fillna('No recorded actor')

# Remove leading/trailing whitespace (avoids false uniqueness)
conflict_df['actor2'] = conflict_df['actor2'].str.strip()

# Create a binary flag to indicate presence of a recorded actor2
conflict_df['has_actor2'] = (conflict_df['actor2'] != 'No recorded actor').astype(int)

# Preview changes

# Check for any remaining nulls
print("Null values in 'actor2':", conflict_df['actor2'].isna().sum())

# Number of unique values
print("\nUnique 'actor2' values:", conflict_df['actor2'].nunique())

# Top 15 most common actor2 entities
print("\nTop 20 Actor 2 Entities:\n")
print(conflict_df['actor2'].value_counts().head(15))

# Sample preview of the key columns involved
print("\nSample of updated actor2 columns:\n")
conflict_df[['actor2', 'has_actor2']].sample(10, random_state = 42)

Null values in 'actor2': 0

Unique 'actor2' values: 2218

Top 15 Actor 2 Entities:

actor2
No recorded actor                                             22108
Civilians (Somalia)                                            4165
Civilians (Zimbabwe)                                           3945
Civilians (Democratic Republic of Congo)                       2746
Al Shabaab                                                     2650
UNITA: National Union for the Total Independence of Angola     2461
Civilians (Nigeria)                                            2439
Civilians (Sudan)                                              2141
Civilians (Sierra Leone)                                       1981
Unidentified Armed Group (Somalia)                             1923
LRA: Lord's Resistance Army                                    1751
Military Forces of Somalia (2004-2012)                         1517
Civilians (Kenya)                                              1406
Military Forces of Sudan 

Unnamed: 0,actor2,has_actor2
1318,AQIM: Al Qaeda in the Islamic Maghreb,1
97086,Civilians (Zimbabwe),1
10154,Civilians (Central African Republic),1
35578,Police Forces of Kenya (2002-2013),1
13325,No recorded actor,0
62961,Al Shabaab,1
67177,Al Shabaab,1
82955,Civilians (Sudan),1
63650,Civilians (Somalia),1
28296,Civilians (Eritrea),1


## 3. ally_actor_2

In [80]:
# Strip whitespace to ensure accurate counting
conflict_df['ally_actor_2'] = conflict_df['ally_actor_2'].str.strip()

# Check for nulls
null_count = conflict_df['ally_actor_2'].isna().sum()
print(f"Null values in 'ally_actor_2': {null_count}")

# Count unique non-null values
unique_count = conflict_df['ally_actor_2'].nunique(dropna = True)
print(f"Unique (non-null) allies in 'ally_actor_2': {unique_count}")

# Show top 15 most common allies (cleaned)
print("\nTop 15 Recorded Allies of Actor 2:\n")
print(conflict_df['ally_actor_2'].value_counts(dropna = False).head(15))

Null values in 'ally_actor_2': 90954
Unique (non-null) allies in 'ally_actor_2': 1469

Top 15 Recorded Allies of Actor 2:

ally_actor_2
NaN                                                           90954
MDC: Movement for Democratic Change                             917
AMISOM: African Union Mission in Somalia (2007-)                352
AFRC: Armed Forces Revolutionary Council                        293
Government of Somalia (2012-)                                   219
MDC-T: Movement for Democratic Change (Tsvangirai Faction)      216
Police Forces of Egypt (2011-)                                  147
Christian Group (Nigeria)                                       134
BRSC: Shura Council of Benghazi Revolutionaries                  94
ZANU-PF: Zimbabwe African National Union-Patriotic Front         89
Muslim Brotherhood                                               87
Civilians (International)                                        79
Military Forces of Rwanda (1994-)               

In [81]:
# Count null values in ally_actor_2
print("\nNull Values counts:", conflict_df['ally_actor_2'].isna().sum())

# Display unique values
print("\nUnique Allies 2:\n", conflict_df['ally_actor_2'].unique())

# Strip whitespace
conflict_df['ally_actor_2'] = conflict_df['ally_actor_2'].str.strip()

# Check who is in here
print("\nActor 2:\n", conflict_df['ally_actor_2'].value_counts().head(15))


Null Values counts: 90954

Unique Allies 2:
 [nan 'LIDD: The Islamic League for Preaching and Holy Struggle'
 'GSPC: Salafist Group for Call and Combat' ...
 'Journalists (Zimbabwe); Street Traders (Zimbabwe)' 'Transform Zimbabwe'
 "MRT: Masvingo Residents' Trust"]

Actor 2:
 ally_actor_2
MDC: Movement for Democratic Change                           917
AMISOM: African Union Mission in Somalia (2007-)              352
AFRC: Armed Forces Revolutionary Council                      293
Government of Somalia (2012-)                                 219
MDC-T: Movement for Democratic Change (Tsvangirai Faction)    216
Police Forces of Egypt (2011-)                                147
Christian Group (Nigeria)                                     134
BRSC: Shura Council of Benghazi Revolutionaries                94
ZANU-PF: Zimbabwe African National Union-Patriotic Front       89
Muslim Brotherhood                                             87
Civilians (International)                        

### What’s Up With ally_actor_2?

Let’s talk about the elephant in the dyad - 'ally_actor_2`.

This field has over **90,000 missing entries**, which means it's missing in **91%** of the dataset. But again, this isn’t broken data - it’s intentional. And ACLED’s documentation gives us the context:

> “Allies are coded only if explicitly mentioned as participating in the event. If no alliance or co-action is reported, the field is left blank.” - *ACLED Methodology*

So, ally_actor_2 isn’t a required field - it’s **bonus intelligence**. When it shows up, it means a secondary affiliation or support structure was clearly recorded for actor2. When it’s blank, it usually just means there was **no reported ally** - not that something’s missing.

### What Do We Actually See?

Where 'ally_actor_2' *is* recorded, it’s powerful stuff:

- **Political coalitions**: MDC & MDC-T in Zimbabwe, PDP in Nigeria  
- **Peacekeeping and international actors**: AMISOM, refugees, IDPs  
- **Religious groups, militias, and civic alliances**: Muslim Brotherhood, Masvingo Residents’ Trust, journalists and street traders

These aren’t just side mentions - they’re part of the **conflict structure**. Whether it’s coordination, reinforcement, or just ideological alignment, these allies change the meaning and interpretation of the event.

### What We’ll Do

As with ally_actor_1, we’re not here to fill in fantasy allies. We're keeping it real - and **explicitly surfacing the absence** to preserve meaning.

In [82]:
# Fill missing values in the 'ally_actor_2' column with a placeholder
conflict_df['ally_actor_2'] = conflict_df['ally_actor_2'].fillna('No recorded ally')

# Create a binary flag indicating whether an event involved a recorded ally for actor2
# 1 if there is an ally, 0 if it's 'No recorded ally'
conflict_df['has_ally_actor_2'] = (conflict_df['ally_actor_2'] != 'No recorded ally').astype(int)

# Identify the top 20 most frequently occurring allies in the 'ally_actor_2' column
# These will be preserved individually; all others will be grouped to reduce noise
top_allies2 = conflict_df['ally_actor_2'].value_counts().head(20).index

# Group allies:
# - Keep the top 20 as-is
# - Keep 'No recorded ally' as-is (to preserve the distinction)
# - All other entries are labeled as 'Other ally' to reduce category fragmentation
conflict_df['ally_actor_2_grouped'] = conflict_df['ally_actor_2'].apply(
    lambda x: x if x in top_allies2 or x == 'No recorded ally' else 'Other ally'
)

# Preview changes
print("Value counts for 'ally_actor_2_grouped':\n")
print(conflict_df['ally_actor_2_grouped'].value_counts(dropna = False))

print("\nSample of updated columns:\n")
conflict_df[['actor2', 'ally_actor_2', 'has_ally_actor_2', 'ally_actor_2_grouped']].sample(10, random_state = 42)

Value counts for 'ally_actor_2_grouped':

ally_actor_2_grouped
No recorded ally                                              90954
Other ally                                                     5398
MDC: Movement for Democratic Change                             917
AMISOM: African Union Mission in Somalia (2007-)                352
AFRC: Armed Forces Revolutionary Council                        293
Government of Somalia (2012-)                                   219
MDC-T: Movement for Democratic Change (Tsvangirai Faction)      216
Police Forces of Egypt (2011-)                                  147
Christian Group (Nigeria)                                       134
BRSC: Shura Council of Benghazi Revolutionaries                  94
ZANU-PF: Zimbabwe African National Union-Patriotic Front         89
Muslim Brotherhood                                               87
Civilians (International)                                        79
Military Forces of Rwanda (1994-)                    

Unnamed: 0,actor2,ally_actor_2,has_ally_actor_2,ally_actor_2_grouped
1318,AQIM: Al Qaeda in the Islamic Maghreb,No recorded ally,0,No recorded ally
97086,Civilians (Zimbabwe),No recorded ally,0,No recorded ally
10154,Civilians (Central African Republic),No recorded ally,0,No recorded ally
35578,Police Forces of Kenya (2002-2013),No recorded ally,0,No recorded ally
13325,No recorded actor,No recorded ally,0,No recorded ally
62961,Al Shabaab,No recorded ally,0,No recorded ally
67177,Al Shabaab,No recorded ally,0,No recorded ally
82955,Civilians (Sudan),No recorded ally,0,No recorded ally
63650,Civilians (Somalia),AMISOM: African Union Mission in Somalia (2007-),1,AMISOM: African Union Mission in Somalia (2007-)
28296,Civilians (Eritrea),No recorded ally,0,No recorded ally
