# [POLSCI 5] Civil Wars

Estimated Time: 30-40 Minutes <br>
Created by: Lauren Hom, Ravi Singhal

Today we will be examining a dataset (i.e. a table) on civil wars around the world from 1946-2013. The data, which contains information on various types of wars, comes from the Uppsala Conflict Data Program (UCDP). We will be exploring what rows and columns represent in the dataset as well as analyzing some trends in the types of conflicts over time.


### Table of Contents 
* [Jupyter Introduction ](#0) <br>
* [The Dataset](#1)<br>
    * [Rows](#1a)<br>
    * [Columns](#1b)<br>
* [Civil Wars](#2)<br>
* [Visualize the Data](#3)<br>
* [Data Science Opportunities at UC Berkeley](#4)<br>
* [Bibliography](#5)


# Jupyter Introduction <a id='0'></a>

This webpage is a Jupyter Notebook. We will use this notebook to analyze the Uppsala Conflict Data Program (UCDP) Conflict Termination Dataset. Jupyter Notebooks are composed of both regular text and code cells. Code cells have a gray background. In order to run a code cell, click the cell and press `Shift + Enter` while the cell is selected or hit the `▶| Run` button in the toolbar at the top. An example of a code cell is below. Try running it. If everything works properly, the word "Success" should be printed under the cell.

In [1]:
# This cell sets up the notebook. Just run this cell.
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

print("Success!")

Success!


# The Dataset <a id='1'></a>

The dataset we will look at contains information about conflicts around world from 1946-2013. A **dataset** is a table with rows and columns that contain values. Here is the description of dateset by the author, Joadim Kreutz.

> "Armed conflict is defined by Uppsala Conflict Data Program (UCDP) \[as\] a contested incompatibility that concerns government and/or territory where the use of armed force between two parties, of which at least one is the government of a state, results in at least 25 battle-related deaths in a calender-year."

> "A conflict episode, thus, is defined as the a continuous period of active conflict years in the UCDP-PRIO armed conflict dataset. A conflict episode ends when an active year is followed by a year in which there are fewer than 25 battle-related deaths."


<br>If you want to read more about the dataset, click [here](https://www.pcr.uu.se/research/ucdp/).

<br>Here are the first 5 rows of the dataset. You can scroll horizontally when hovering over the table to see the whole data set. There are 2741 rows and 33 columns

In [2]:
raw = Table.read_table('ucdp-term-dyadic-2015.csv') # read in the dataset
raw.show(5)

ConflictId,DyadId,DyadEp,Year,Location,SideA,SideA2nd,SideB,SideBID,SideB2nd,Incompatibility,TerritoryName,IntensityLevel,TypeOfConflict,Type2,StartDate,StartPrec,StartDate2,StartPrec2,Dyadterm,EpEndDate,EpEndPrec,Outcome_early,Outcome,CfireDate,PeAgDate,GWNoA,GWNoA2nd,GWNoB,GWNoB2nd,GWNoLoc,Region,Version
1-191,1,101,1992,Algeria,Government of Algeria,,AIS,1389,,2,,1,3,3,1985-08-27,1,1992-03-10,5,0,,,,,,,615,,,,615,4,2.0-2015
1-191,1,101,1993,Algeria,Government of Algeria,,AIS,1389,,2,,1,3,3,1985-08-27,1,1992-03-10,5,0,,,,,,,615,,,,615,4,2.0-2015
1-191,1,101,1994,Algeria,Government of Algeria,,AIS,1389,,2,,2,3,3,1985-08-27,1,1992-03-10,5,0,,,,,,,615,,,,615,4,2.0-2015
1-191,1,101,1995,Algeria,Government of Algeria,,AIS,1389,,2,,1,3,3,1985-08-27,1,1992-03-10,5,0,,,,,,,615,,,,615,4,2.0-2015
1-191,1,101,1996,Algeria,Government of Algeria,,AIS,1389,,2,,1,3,3,1985-08-27,1,1992-03-10,5,0,,,,,,,615,,,,615,4,2.0-2015


As you can see, there are many columns in the dataset. We have cleaned the dataset by removing some columns and making the values more readable. The cleaned dataset has 18 columns.

In [1]:
wars = Table.read_table('cleanedWars.csv') # read in the dataset
wars.set_format('Year', formats.FunctionFormatter(lambda x: x)) # fix format of year column
wars.show(5)

NameError: name 'Table' is not defined

Note that while the number of **columns** have changed in the cleaned dataset, the number of **rows** have remained the same.

Let's look at the dataset more closely.

## Rows <a id='1a'></a>


First, let's examine what a *row* is. Here is the first row of the dataset.

In [None]:
wars.take(0) # take the first row

This row gives us information about a specific conflict happening in Algeria in the year 1992. One thing to note is that the `IntensityLevel` of the conflict during this year is Minor. As described in the `IntensityLevel` column, a Minor conflict indicates a war where there were between 25-999 battle-related deaths. 

In [None]:
wars.take(2) # take the third row

In comparison, by looking at the third row of the dataset, we can see that the same conflict has now changed to have an `IntensityLevel` of War in the year 1994. We know that the first and third rows correspond to the same conflict because the `ConflictId`s are the same. Each row of this dataset describes one year of one conflict.

## Columns <a id='1b'></a>


Next, we will look at some of the columns in the dataset, specifically columns that allow us to better understand the information we are seeing.

### Location
To understand what a *column* is, let's look at a few examples. First, we shall examine the `Location` column. According to the author of the dataset, 
> "Location is defined as the government side of a conflict, and should not be interpreted as the geographical location of the conflict."

To get an idea of what kinds of values are in the `Location` column, here are the number of occurences of the 10 most common locations.

In [4]:
locations = wars.group('Location').sort('count', descending=True) # group by location and sort in descending order
locations

Location,count
Myanmar (Burma),314
India,178
Ethiopia,136
Israel,123
Philippines,121
Afghanistan,120
Colombia,93
Iraq,80
Angola,71
Sudan,70


Some of the values in the `Location` column of the table are separated by hypens ('-'). This inidcates that the location is a single location. 

Let's look at some of the examples in the table.

In [5]:
# Locations with the two sides of an interstate war
locations.where('Location', are.containing('-'))

Location,count
Bosnia-Herzegovina,15
Guinea-Bissau,13


We can see there are two locations where the countries/regions are separated by a hypen ('-'): Bosnia-Herzegovina and Guinea-Bissau. 


### Region
Now let's look at the `Region` column. Again, first we'll look at the number of war occurrences of each region.

In [None]:
regions = wars.group('Region').sort('count', descending=True) # count occurences of each region
regions

Above, we can see that the first row with the 'Asia' region has the highest `count`, which means that Asia has the largest number of war occurrences in their region.

In case you were curious, the cell below is the row for the conflict that took place in Europe, Middle East, Asia, and Americas. This marks the beginning of the Iraq War in 2003. For more info, click [here](https://en.wikipedia.org/wiki/Iraq_War).

In [None]:
wars.where('Region', 'Europe, Middle East, Asia, Americas')

### Intensity Level
Lastly, we'll look at the `IntensityLevel` column. 
>The intensity variable is coded in two categories:
>1. Minor: between 25 and 999 battle-related deaths in a given year.
>2. War: at least 1,000 battle-related deaths in a given year.

Here are the number of wars for each type of intensity level, which is represented in the `count` column.

In [None]:
wars.group('IntensityLevel') # group by IntensityLevel

As you can see above, there is a higher count for number of "Minor" occurences (2121 occurences) versus "War" occurences (620 occurences).

Let's break the Intensity Level down by region. First here are the counts of each `IntensityLevel` in each region.

In [None]:
wars.group(['Region', 'IntensityLevel']).sort('count', descending=True).show() # group by Region and IntensityLevel

We can see that each region appears twice in the `Region` column, once for each type of intensity level. For example, in the first row we can see that Asia had 860 wars with a "Minor" intensity level and in the fourth row, we can see that Asia had 281 wars with a "War" intensity level. 

Let's look specifically at the Middle East. Here are the counts of every type of conflict in each Middle East location.

In [None]:
# filter by region, then group by location and intensity level
m_east = wars.where('Region', 'Middle East').group(['Region', 'Location', 'IntensityLevel']).sort(
    'count', descending=True)
m_east

We can see that the greatest number ‘Minor’ war occurences happened in Israel (121 in total) while the great number of ‘war’ wars happened in Iraq (21 in total). The two rows that show this information from the dataset above are shown below.

In [None]:
m_east.take(0) #first row of the dataset, above

In [None]:
m_east.take(4) #fifth row of the dataset, above

Lastly, here are the Middle East locations with the most wars.

In [None]:
m_east.where('IntensityLevel', are.containing('War')) # filter to just show war rows

Again, we can see that the greatest number of war occurences with a 'War' intensity level was Iraq (21 in total), followed by Yemen (North Yemen), and interstate war occurences between Iran and Iraq.  

# Civil Wars <a id='2'></a>
In this section, we will do further analysis on civil war occurrences. Here is the original dataset again.

In [None]:
wars.show(5)

Let's look specifically at civil wars. To do so, we will filter the dataset to select conflicts that are only internal (civil wars).

In [None]:
civil_wars = wars.where('TypeOfConflict', 'Internal armed conflict') # remove all non civil war rows
civil_wars.show(5)

Notice the change in the number of rows when we remove all the rows that do not correspond to civil wars.

In [None]:
print("The original dataset has {} rows. \n\
After removing all non civil war rows, the dataset has {} rows left.".format(wars.num_rows, civil_wars.num_rows))

First, let's find which years had the most civil wars. In case you're curious about the wars in a particular year, here is [list of civil wars](https://en.wikipedia.org/wiki/List_of_civil_wars).

In [None]:
years = civil_wars.group('Year').sort('count', descending = True) # count number of occurences of each year
years.set_format('Year', formats.FunctionFormatter(lambda x: x)) # fix format of year column
years

Let's make a bar graph of the `IntensityLevel` for a particular year. Replace the `...` in the first line of the code cell below with a year.

In [None]:
year = ... # replace the ... with a year, for example 1991

# filter data by chosen year and group by intensity level
one_year = civil_wars.where('Year', year).group('IntensityLevel').sort('IntensityLevel')
plt.bar(one_year.column('IntensityLevel'), one_year.column('count')) # create bar graph
plt.xticks((0, 1), ('Minor', 'War')) # set x axis labels to be Minor and War
plt.ylabel("Count") # set y axis label
plt.title('Number of Conflicts of Each Intensity Level in {}'.format(year)) # set title of graph
plt.show()

You should be able to see two bars in the plot above, one for the "Minor" and one for the "War" intensity levels, for the year you chose.

Next, we're going to examine the `Outcome` of the civil wars. Here are all the possible values in the `Outcome` column.

In [None]:
list(set(civil_wars.column('Outcome')))

The nan value is a little confusing. Let's replace it with something more informative. In the code cell below, replace the `...` with an informative value. Remember, in this dataset, nan means the conflict did not terminate in this year.

After running the cell, scroll the table all the way to the right to see the new column we created, `Termination Type`.

In [None]:
# Replace the ... with a new value. leave the quotes, only replacing the ...
# The new line should look something like this
# new_value = 'some text'
new_value = '...' 

new_column = np.where(civil_wars.column('Outcome') == 'nan', new_value, civil_wars.column('Outcome'))

with_termination = civil_wars.with_column('Termination Type', new_column)
with_termination.show(5)

Now we can see all of the the Termination Types in the dataset, where the "nan" values are replaced by your  informative value.

In [None]:
with_termination.group('Termination Type')

A summary of what the remaining Termination Types mean are below: 

- **Actor ceases to exist** -- War activity continues with at least one party no longer existing or becoming a different type of party. For states, this means that the state became part of another country or a central government was no longer obvious. For rebel organizations, this means that the organization changed their name along with altering their alliances or territorial goals
- **Ceasefire** -- Fighting stops, but not necessarily with any resolution. 
- **Low activity (less than 25 battle-deaths)** -- War activity may continue, but fatality rates are below the fatality level defined by the authors of the dataset
- **Peace agreement** -- An agreement signed by opposing sides to formally end the war. 
- **Victory for Side A/Government Side** -- The government side of the war wins and the rebel side loses
- **Victory for Side A/Rebel Side** -- The rebel side wins and the government side loses 

# Visualize the Data <a id='3'></a>

Here is a bar graph of the outcomes of all the civil wars in the dataset.

In [None]:
termination_count = with_termination.group('Termination Type') # count number of each termination type
termination_count.bar('Termination Type') # generate bar graph
plt.xticks(rotation=90) # make x axis labels vertical
plt.title("Number of Each Termination Type") # title graph
plt.show()

We can see that the greatest type of wars terminated are conflicts that did not terminate, labeled with the description you chose in the previous section. We can see the second greatest type of wars terminated are Low activity conflicts. 

Here is the same graph, but zoomed in so we can focus on the last 6 bars. Notice the change in the y axis.

In [None]:
termination_count.bar('Termination Type') # generate bar graph
plt.xticks(rotation=90) # make x axis labels vertical
plt.title("Number of Each Termination Type") # title graph
plt.ylim(0, 250) # show graph from y = 0 to 250
plt.show()

Now let's visualize how the number of civil wars varies over time. This line graph shows the number of civil wars in each year. You'll notice the spike in the number of wars during the 1990s.

In [None]:
civil_trend = civil_wars.group('Year') # count number of civil wars per year

civil_trend.plot('Year') # generate line graph
plt.title("Number of Civil Wars over Time") # title graph
plt.show()

We can see that there was an increasing trend in the number of civil wars between 1946 and 1990, with the highest number of civl wars occurring in the 1990s, particularly 1990 and 1991. The increase in civil wars is partly attributed to the end of the Cold War in 1991, where many civil wars were occurring in places previously under colonial rule. At the same time, 1990 and 1991 were also years where many civil wars ended, thus contributing to the spike in the 1990s. 

Some links relating to trends in the graph are below:
- For a list of civil wars in the 1990s, click [here](https://en.wikipedia.org/wiki/List_of_civil_wars) and scroll to the "Modern (1800-1945)" and "1945-2000" sections.
- For more information on the influence of the Cold War, click [here](https://en.wikipedia.org/wiki/Civil_war) and scroll to the "Effect of the Cold War" section.
- For more information on the increasing trends in civil wars, click [here](http://documents.worldbank.org/curated/en/908361468779415791/310436360_20050007005532/additional/multi0page.pdf).

We hope you enjoyed this notebook and learned a few things! In case you wish to explore the dataset more on your own, here is the link to download the dataset: https://ucdp.uu.se/downloads/#d5.

# Data Science Opportunities at UC Berkeley <a id='4'></a>

If you are interested in data science, we offer several courses and even a major / minor here at UC Berkeley. Some great courses to start out are Data 8 and some 2 unit connector courses like LEGALST 88 and Demography 88. For the full list of courses we offer, click [here](https://data.berkeley.edu/academics/undergraduate-programs/data-science-offerings)

# Bibliography <a id='5'></a>
* Kreutz, Joakim, 2010. How and When Armed Conflicts End: Introducing the UCDP Conflict Termination Dataset. *Journal of Peace Research* 47(2): 243-250.