# Conflict Model Prediction Analysis

**Objective:**
Develop a prediction model to analyze political violence and predict future events, trends, and fatality risks in Africa based on historical data.

**Questions a successful model could answer**:

1) Where are future political violence events likely to occur?

*Predict regions/countries most prone to future incidents.*

2) Which events are likely to have the highest fatality rates?

*Forecast event types with high mortality, enabling early interventions.*

3) What are the most frequent sub-event types linked to disorder types?

*Identify patterns linking specific sub-event types to disorder outcomes.*

4) Which actors are most involved in escalating violence?

*Highlight key actors contributing to increased violent activities over time.*

5) How do geographic and temporal trends correlate with event severity?


The success of this model could help;


*   Analyze the impact of location and time on event escalation and fatalities.
*   This model could inform government agencies and humanitarian organizations for strategic planning and conflict prevention.




In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# Load the csv dataset
conflict_df = pd.read_csv("/content/Africa_1997-2024_Aug23.csv")

# Print the first 5 rows
print(conflict_df.head())

# Check datatypes and missing values

print(conflict_df.info())

  event_id_cnty  event_date  year  time_precision       disorder_type  \
0       ANG4104  2024-08-23  2024               1  Political violence   
1      BFO12464  2024-08-23  2024               1  Political violence   
2      BFO12471  2024-08-23  2024               1  Political violence   
3      BFO12472  2024-08-23  2024               1  Political violence   
4      CAO14533  2024-08-23  2024               1  Political violence   

  event_type sub_event_type  \
0      Riots   Mob violence   
1    Battles    Armed clash   
2    Battles    Armed clash   
3    Battles    Armed clash   
4    Battles    Armed clash   

                                              actor1  \
0                                   Rioters (Angola)   
1       JNIM: Group for Support of Islam and Muslims   
2       JNIM: Group for Support of Islam and Muslims   
3       JNIM: Group for Support of Islam and Muslims   
4  Islamic State (West Africa) and/or Boko Haram ...   

              assoc_actor_1  inter1  

In [3]:
conflict_df.head()

Unnamed: 0,event_id_cnty,event_date,year,time_precision,disorder_type,event_type,sub_event_type,actor1,assoc_actor_1,inter1,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,tags,timestamp
0,ANG4104,2024-08-23,2024,1,Political violence,Riots,Mob violence,Rioters (Angola),Vigilante Group (Angola),5,...,Luanda,-8.8383,13.2344,1,Ango Noticias; Correio da Kianda; Novo Journal,National,"On 23 August 2024, a mob assaulted a police of...",1,crowd size=no report,1724714023
1,BFO12464,2024-08-23,2024,1,Political violence,Battles,Armed clash,JNIM: Group for Support of Islam and Muslims,,2,...,Niempourou,12.6018,-3.2158,2,Signal,New media,"On 23 August 2024, JNIM ambushed a patrol of s...",0,,1724714023
2,BFO12471,2024-08-23,2024,1,Political violence,Battles,Armed clash,JNIM: Group for Support of Islam and Muslims,,2,...,Djibo,14.0875,-1.6418,1,Al Zallaqa,New media,"On 23 August 2024, JNIM claimed to have killed...",3,,1724714023
3,BFO12472,2024-08-23,2024,1,Political violence,Battles,Armed clash,JNIM: Group for Support of Islam and Muslims,,2,...,Diougo,11.2472,0.1221,1,Facebook; Whatsapp,New media,"On 23 August 2024, JNIM militants attacked vol...",10,,1724714023
4,CAO14533,2024-08-23,2024,1,Political violence,Battles,Armed clash,Islamic State (West Africa) and/or Boko Haram ...,,2,...,Moskota,10.9508,13.8671,2,Humanity Purpose,New media,"On 23 August 2024, ISWAP or Boko Haram militan...",0,,1724714031


In [4]:
# Feature Selection
# Pick region, country, year, latitude, longitude, fatalities, interaction, disorder_type, event_type

conflict_pred = conflict_df[["region", "country","year", "fatalities", "latitude", "longitude", "actor1",
                              "interaction", "disorder_type", "event_type"]]

conflict_pred.head()



Unnamed: 0,region,country,year,fatalities,latitude,longitude,actor1,interaction,disorder_type,event_type
0,Middle Africa,Angola,2024,1,-8.8383,13.2344,Rioters (Angola),15,Political violence,Riots
1,Western Africa,Burkina Faso,2024,0,12.6018,-3.2158,JNIM: Group for Support of Islam and Muslims,12,Political violence,Battles
2,Western Africa,Burkina Faso,2024,3,14.0875,-1.6418,JNIM: Group for Support of Islam and Muslims,24,Political violence,Battles
3,Western Africa,Burkina Faso,2024,10,11.2472,0.1221,JNIM: Group for Support of Islam and Muslims,24,Political violence,Battles
4,Middle Africa,Cameroon,2024,0,10.9508,13.8671,Islamic State (West Africa) and/or Boko Haram ...,24,Political violence,Battles


In [6]:
conflict_pred["event_type"].unique()

array(['Riots', 'Battles', 'Strategic developments',
       'Violence against civilians', 'Protests',
       'Explosions/Remote violence'], dtype=object)

In [7]:
conflict_pred.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 381997 entries, 0 to 381996
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   region         381997 non-null  object 
 1   country        381997 non-null  object 
 2   year           381997 non-null  int64  
 3   fatalities     381997 non-null  int64  
 4   latitude       381997 non-null  float64
 5   longitude      381997 non-null  float64
 6   interaction    381997 non-null  int64  
 7   disorder_type  381997 non-null  object 
 8   event_type     381997 non-null  object 
dtypes: float64(2), int64(3), object(4)
memory usage: 26.2+ MB


In [5]:
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoder for categorical variables
label_encoder = LabelEncoder()

# Apply encoding to the categorical columns
conflict_pred['event_type'] = label_encoder.fit_transform(conflict_pred['event_type'])
conflict_pred['actor1'] = label_encoder.fit_transform(conflict_pred['actor1'])
conflict_pred['disorder_type'] = label_encoder.fit_transform(conflict_pred['disorder_type'])
conflict_pred['region'] = label_encoder.fit_transform(conflict_pred['region'])
conflict_pred['country'] = label_encoder.fit_transform(conflict_pred['country'])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  conflict_pred['event_type'] = label_encoder.fit_transform(conflict_pred['event_type'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  conflict_pred['actor1'] = label_encoder.fit_transform(conflict_pred['actor1'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  conflict_pred['disorder_type'] = label_

In [6]:
# Split the data into training and test sets

X = conflict_pred.drop("event_type", axis=1)
y = conflict_pred["event_type"]

# Split the data into 80 train and 20 test sizes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Check the shape of the training and testing sets
print("Training set shape:", X_train.shape, y_train.shape)
print("Testing set shape:", X_test.shape, y_test.shape)

Training set shape: (305597, 9) (305597,)
Testing set shape: (76400, 9) (76400,)


The pipeline method used below to make the process of scaling and applying logistic regression easier and more reliable. It ensures that the same steps (like scaling) are applied to both the training and test data, avoiding mistakes and improving model accuracy. It also helps prevent data leakage during cross-validation and makes the code simpler, more organized, and easier to maintain. Plus, using a pipeline makes it easier to tune parameters for both scaling and the model at the same time.

In [7]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Standardize the features and apply Logistic Regression using a pipeline
pipeline = make_pipeline(StandardScaler(), LogisticRegression(max_iter=1000))

# class names
class_names = ['Riots', 'Battles', 'Strategic developments', 'Violence against civilians', 'Protests', 'Explosions/Remote violence']

# Fit the pipeline to the training data
pipeline.fit(X_train, y_train)

# Make predictions on the test set
y_pred = pipeline.predict(X_test)

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Print classification report with class names
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=class_names))

Accuracy: 0.7107329842931938
Classification Report:
                             precision    recall  f1-score   support

                     Riots       0.62      0.80      0.70     19526
                   Battles       0.38      0.02      0.03      5648
    Strategic developments       0.85      0.92      0.89     17914
Violence against civilians       0.64      0.22      0.33      7981
                  Protests       1.00      1.00      1.00      6827
Explosions/Remote violence       0.62      0.73      0.67     18504

                  accuracy                           0.71     76400
                 macro avg       0.68      0.61      0.60     76400
              weighted avg       0.69      0.71      0.67     76400



**Using polynomial features to try and see if it will improve the model**

In [8]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LogisticRegression

# Initialize PolynomialFeatures and Logistic Regression
poly = PolynomialFeatures(interaction_only=True, include_bias=False)
logreg_3= LogisticRegression() # Initialize a model for prediction

# Transform the training data
X_train_poly = poly.fit_transform(X_train)

# class names
class_names = ['Riots', 'Battles', 'Strategic developments', 'Violence against civilians', 'Protests', 'Explosions/Remote violence']

# Fit the Logistic Regression model on the transformed data
logreg_3.fit(X_train_poly, y_train)

# Transform the test data and make predictions
X_test_poly = poly.transform(X_test) # Transform test data the same way
y_pred = logreg_3.predict(X_test_poly) # Use the model to predict

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print classification report with class names
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=class_names))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  _warn_prf(average, modifier, msg_start, len(result))


Accuracy: 0.5574214659685864
Classification Report:
                             precision    recall  f1-score   support

                     Riots       0.50      0.80      0.62     19526
                   Battles       0.24      0.01      0.02      5648
    Strategic developments       0.68      0.88      0.77     17914
Violence against civilians       0.33      0.13      0.18      7981
                  Protests       0.00      0.00      0.00      6827
Explosions/Remote violence       0.54      0.54      0.54     18504

                  accuracy                           0.56     76400
                 macro avg       0.38      0.39      0.36     76400
              weighted avg       0.47      0.56      0.49     76400



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Evaluation of the model improvement alternatives**

Summary of Method Comparison:
Class Imbalance Handling (SMOTE):

Accuracy: Moderate improvement in handling underrepresented classes (e.g., Class 1).
Effect: Improved recall for minority classes but slightly reduced overall accuracy due to trade-offs with majority classes' precision.
Polynomial Features:

Accuracy: Significant increase in overall accuracy by introducing interaction terms.
Effect: Enhanced performance for majority classes, but still poor results for minority classes (like Class 1), showing that polynomial features alone do not resolve class imbalance issues.
Combination of Both Methods (SMOTE + Polynomial Features):

Accuracy: Best balance between overall accuracy and class-level performance.
Effect: Addressed both majority and minority class issues. SMOTE improved recall for minority classes, while polynomial features enhanced the model’s overall complexity and accuracy.
Recommendations:
Use a Combination: Apply both class imbalance handling (SMOTE) and polynomial features to optimize the model for both overall accuracy and minority class detection.
Further Techniques: Explore ensemble methods (e.g., Random Forest) or tuning hyperparameters to refine the model and enhance performance across all classes.
This combination is likely the best way to ensure a more robust and balanced performance.

In [9]:
from sklearn.ensemble import RandomForestClassifier

# Initialize and train a Random Forest model
rf_model = RandomForestClassifier()

# class names
class_names = ['Riots', 'Battles', 'Strategic developments', 'Violence against civilians', 'Protests', 'Explosions/Remote violence']

#Fit the model on the training set
rf_model.fit(X_train, y_train)

# Predict on the test set
y_pred_rf = rf_model.predict(X_test)

# Evaluate the Random Forest model
print("Accuracy:", accuracy_score(y_test, y_pred_rf))

# Print classification report with class names
print("Classification Report:\n", classification_report(y_test, y_pred_rf, target_names=class_names))

Accuracy: 0.9503403141361256
Classification Report:
                             precision    recall  f1-score   support

                     Riots       0.92      0.95      0.94     19526
                   Battles       0.71      0.57      0.63      5648
    Strategic developments       1.00      1.00      1.00     17914
Violence against civilians       1.00      1.00      1.00      7981
                  Protests       1.00      1.00      1.00      6827
Explosions/Remote violence       0.96      0.98      0.97     18504

                  accuracy                           0.95     76400
                 macro avg       0.93      0.92      0.92     76400
              weighted avg       0.95      0.95      0.95     76400

