# [POLSCI 5] Civil Wars

Estimated Time: 30-40 Minutes <br>
Created by: Lauren Hom, Ravi Singhal

In this notebook, we will be TO DO

### Table of Contents 
1 - [Jupyter Introduction](#1) <br>
2 - [The Dataset](#2)<br>
3 - [Civil Wars](#3)<br>
4 - [Visualize the Data](#4)<br>
5 - [Bibliography](#5)

# Jupyter Introduction <a id='1'></a>

This webpage is a Jupyter Notebook. We will use this notebook to analyze a UCDP Conflict Termination Dataset. Jupyter Notebooks are composed of both regular text and code cells. Code cells have a gray background. In order to run a code cell, click the cell and press `Shift + Enter` while the cell is selected or hit the `▶| Run` button in the toolbar at the top. An example of a code cell is below. Try running it. If everything works properly, the word "Success" should be printed under the cell.

In [None]:
# This cell sets up the notebook. Just run this cell.
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

print("Success!")

# The Dataset <a id='2'></a>

This dataset contains information about conflicts around world from 1946-2013. Here is the description of dateset by the author, Joadim Kreutz.

> "Armed conflict is defined by Uppsala Conflict Data Program (UCDP) a contested incompatibility that concerns government and/or territory where the use of armed force between two parties, of which at least one is the government of a state, results in at least 25 battle-related deaths in a calender-year."

> "A conflict episode, thus, is defined as the a continuous period of active conflict years in the UCDP-PRIO armed conflict dataset. A conflict episode ends when an active year is followed by a year in which there are fewer than 25 battle-related deaths."


<br>If you want to read more about the dataset, click [here](https://www.pcr.uu.se/research/ucdp/).

<br>Here are the first 5 rows of the dataset. You can scroll horizontally when hovering over the table to see the whole data set. There are 2741 rows and 33 columns

In [None]:
raw = Table.read_table('ucdp-term-dyadic-2015.csv') # read in the dataset
raw.show(5)

As you can see, there are quite a few columns and many missing values (nan) in the dataset. We have cleaned the dataset by removing some columns and making the values more readable. The cleaned dataset has 18 columns.

In [None]:
wars = Table.read_table('cleanedWars.csv') # read in the dataset
wars.set_format('Year', formats.FunctionFormatter(lambda x: x)) # fix format of year column
wars.show(5)

## Rows
Let's dive in! <br><br>

First, let's examine what a *row* is. Here is the first row of the dataset.

In [None]:
wars.take(0) # take the first row

This row gives us information about a specific conflict happening in Algeria in the year 1992. One thing to note is that the `IntensityLevel` of the conflict during this year is Minor.

In [None]:
wars.take(2) # take the third row

In comparison, by looking at the third row of the dataset, we can see that the same conflict has now changed to have an `IntensityLevel` of War in the year 1994. Each row of this dataset describes one year of one conflict.

## Columns
### Location
To understand what a *column* is, let's look at a few examples. First, we shall examine the `Location` column. According to the author of the dataset, 
> "Location is defined as the government side of a conflict, and should not be interpreted as the geographical location of the conflict."

To get an idea of what kinds of values are in the `Location` column, here are the number of occurences of the 10 most common locations.

In [None]:
locations = wars.group('Location').sort('count', descending=True) # group by location and sort in descending order
locations

The author also tells us,
> The string is split in two ways, hyphen (‘-‘) splits the different sides in an interstate war, and comma (‘,’) splits different countries fighting together on the same side.

Let's look at these two cases.

In [None]:
# Locations with the two sides of an interstate war
locations.where('Location', are.containing('-'))

In [None]:
# Locations with countries that are fighting on the same side
locations.where('Location', are.containing(','))

### Region
Now let's look at the `Region` column. Again, first we'll look at the number of occurences of each region.

In [None]:
regions = wars.group('Region').sort('count', descending=True) # count occurences of each region
regions

In case you were curious, here is the row that took place in Europe, Middle East, Asia, and Americas. This marks the beginning of the Iraq War in 2003. For more info, click [here](https://en.wikipedia.org/wiki/Iraq_War).

In [None]:
wars.where('Region', 'Europe, Middle East, Asia, Americas')

### Intensity Level
Lastly, we'll look at the `IntensityLevel` column. 
>The intensity variable is coded in two categories:
>1. Minor: between 25 and 999 battle-related deaths in a given year.
>2. War: at least 1,000 battle-related deaths in a given year.

Here are the counts of each type.

In [None]:
wars.group('IntensityLevel') # group by IntensityLevel

Let's break it down by region. First here are the counts of each `IntensityLevel` in each region.

In [None]:
wars.group(['Region', 'IntensityLevel']).sort('count', descending=True).show() # group by Region and IntensityLevel

Let's look specifically at the Middle East. Here are the counts of each type of conflict in each Middle East location.

In [None]:
# filter by region, then group by location and intensity level
m_east = wars.where('Region', 'Middle East').group(['Location', 'IntensityLevel']).sort('count', descending=True)
m_east

Lastly, here are the Middle East locations with the most wars.

In [None]:
m_east.where('IntensityLevel', are.containing('War')) # filter to just show war rows

# Civil Wars <a id='3'></a>
Here is the original dataset again.

In [None]:
wars.show(5)

Let's look specifically at civil wars.

In [None]:
civil_wars = wars.where('TypeOfConflict', 'Internal armed conflict') # remove all non civil war rows
civil_wars.show(5)

Notice the change in the number of rows when we remove all the rows that do not correspond to civil wars.

In [None]:
print("The original dataset has {} rows. \n\
After removing all non civil war rows, the dataset has {} rows left.".format(wars.num_rows, civil_wars.num_rows))

First, let's find which years had the most civil wars. In case you're curious about the wars in a particular year, here is [list of civil wars](https://en.wikipedia.org/wiki/List_of_civil_wars).

In [None]:
years = civil_wars.group('Year').sort('couitynt', descending = True) # count number of occurences of each year
years.set_format('Year', formats.FunctionFormatter(lambda x: x)) # fix format of year column
years

Let's make a bar graph of the `IntensityLevel` for a particular year. Replace the "..." in the first line of the code cell below with a year.

In [None]:
year = ... # replace the ... with a year, for example 1991

# filter data by chosen year and group by intensity level
one_year = civil_wars.where('Year', year).group('IntensityLevel').sort('IntensityLevel')
plt.bar(one_year.column('IntensityLevel'), one_year.column('count')) # create bar graph
plt.xticks((0, 1), ('Minor', 'War')) # set x axis labels to be Minor and War
plt.ylabel("Count") # set y axis label
plt.title('Number of Conflicts of Each Intensity Level in {}'.format(year)) # set title of graph
plt.show()

Next, we're going to examine the `Outcome` of the civil wars. Here are all the possible values in the `Outcome` column.

In [None]:
list(set(civil_wars.column('Outcome')))

The nan value is a little confusing. Let's replace it with something more informative. In the code cell below, replace the ... with an informative value. Remember, in this dataset, nan means the conflict did not terminate in this year.

After running the cell, scroll the table all the way to the right to see the new column we created, `Termination Type`.

In [None]:
# Replace the ... with a new value. leave the quotes, only replacing the ...
# The new line should look something like this
# new_value = 'some text'
new_value = '...' 

new_column = np.where(civil_wars.column('Outcome') == 'nan', new_value, civil_wars.column('Outcome'))

with_termination = civil_wars.with_column('Termination Type', new_column)
with_termination.show(5)

# Visualize the Data <a id='4'></a>

Here is a bar graph of the outcomes of all the civil wars in the dataset.

In [None]:
termination_count = with_termination.group('Termination Type') # count number of each termination type
termination_count.bar('Termination Type') # generate bar graph
plt.xticks(rotation=90) # make x axis labels vertical
plt.title("Number of Each Termination Type") # title graph
plt.show()

Here is the same graph, but zoomed in so we can focus on the last 6 bars. Notice the change in the y axis.

In [None]:
termination_count.bar('Termination Type') # generate bar graph
plt.xticks(rotation=90) # make x axis labels vertical
plt.title("Number of Each Termination Type") # title graph
plt.ylim(0, 250) # show graph from y = 0 to 250
plt.show()

Now let's visualize how the number of civil wars varies over time. This line graph shows the number of civil wars in each year. You'll notice the spike in the number of wars during the 1990s.

In [None]:
civil_trend = civil_wars.group('Year') # count number of civil wars per year

civil_trend.plot('Year') # generate line graph
plt.title("Number of Civil Wars over Time") # title graph
plt.show()

We hope you enjoyed this notebook and learned a few things! In case you wish to explore the dataset more on your own, here is the link to download the dataset: https://ucdp.uu.se/downloads/#d5.

# Bibliography <a id='5'></a>
* Kreutz, Joakim, 2010. How and When Armed Conflicts End: Introducing the UCDP Conflict Termination Dataset. *Journal of Peace Research* 47(2): 243-250.