<img src="/static/base/images/logo.png?v=641991992878ee24c6f3826e81054a0f" alt="Jupyter Notebook">
<h1 style="text-align: center">Notebook 4 - Analysis of ACLED data set</h1>

<h3>Prerequisites</h3>

- You must have Python 3 installed on your system (<a href="https://www.python.org/downloads/">Download</a>)
- You must have Jupyter installed on your system (<a href="https://jupyter.org/install">Download</a>)
- Some knowledge of Python may be required

<h3>Explanation of Notebook 4</h3>

In this notebook, you wil carry out a number of analysis on an ACLED (Armed Conflict Location and Event Data Project) data set (<a href="https://www.acleddata.com/curated-data-files/">Source</a>).<br>
This analysis can be done by either Python or R, or both, however for this, we will use Python since it is easier.

You will have the data of four continents; Africa, Asia, Europe, and Middle East.<br>
They are all updated as of 6th of July, 2019.

The data was initially in xlsx (Excel) format but was converted to csv format.

The analysis must carry out with charts showing the following:
+ Conflicts per country over time
+ Typical lengths of conflict in a country or across the continent
+ Correlations weather data (mean temperature and rainfall)

You can analyse all four continents, starting from Africa; once one of the continents is analysed, the rest will be very similar.

<h3>Getting started</h3>

To get started with this notebook, you will need to first install the matplotlib package.<br>
You can do this using pip in Command Prompt (Windows), or the Terminal (MacOS/Linux):<br>
+ <code>pip install matplotlib</code>

Otherwise, run the cell below.

In [None]:
pip install matplotlib

We must first make sure that the kernel is set to Python as only then will the Python code run, after the kernel is set to Python, try run the following cell.

In [None]:
import csv #Import the csv package

with open('data/Africa_1997-2019_Jul06.csv', 'r', encoding="ISO-8859-1") as csv_file: #Open the csv file and assign it to a variable ('csv_file')
    csv_reader = csv.reader(csv_file) #Read the file using the .reader function of csv package to 'csv_reader'
        
    csvList = list(csv_reader) #Convert to a list
    csvList.remove(csvList[0]) #This is used to remove the headers to reduce issues with parsing

This will import the African continent's ACLED data and convert it to a list which we can now use to analyse, you will notice that the encoding is set to "ISO-8859-1", this is because with another encoding it would result in errors.

You can access each row using its index, but using that index, you can also access each attribute/column by their indexes as well; an example is shown below.

In [None]:
csvList[0][3]

This should print out "01-January-1997", because it is the value of the fourth column on the first row, meaning that this is the event date for the first event.

<h3>Conflicts over years</h3>

The first task is to view the conflicts per country in the continent over time; if you look at the csv file manually, you'd see that the countries are in the 17th column (<code>COUNTRY</code>), meaning that it is the 16th index, and since it has to be over a period of time, we will also get the year column (<code>YEAR</code>) which is the 5th column, making it index 4. This will involve another thing; the conflicts, but for this we won't require the type of conflict, we simply want the number of conflicts.

+ <code>COUNTRY</code> - Index 16
+ <code>YEAR</code> - Index 4

The best way to visualise this is to have a line graph for each respective country with their conflicts over the years,
and to do this, there's two things we must do; gather all countries as well as the number of conflicts for each year.

This will require a lot of thinking and effort and so it is easy to get lost, and so all of it is completed in the cell below.

In [None]:
import matplotlib.pyplot as plt

def conflictYears():
    years = [] #To store the years
    counts = [] #To store the conflict count
    countries = [] #To store all countries

    count = 0 #Initial counter

    everything = {} #To store everything together

    for i, row in enumerate(csvList): #Enumerate through the list (Enables you to use index easily)
        country = row[16] #Store the country
        year = row[4] #Store the year

        if ((country in countries) == False): #If the country isn't previously entered
            countries.append(country) #Add the country to the list
            years = [] #Empty the years to reset to not clash with previous country
            counts = [] #Empty the counts to reset to not clash with previous country

        else: #If the country is entered (To allow multiple conflicts to be added)
            if ((year in years) == False): #If the year isn't added
                years.append(year) #Add the year

                if (count > 0): #If the count is above 0 (Done to prevent adding a 0 for the first element)
                    counts.append(count) #Add the current count (This is attached to the previous year)
                count = 0 #Reset the count to use for the next or the currently added year
                count += 1 #Increment the count (First row of the year)

            else: #If the year is added
                count += 1 #Increment the count (Second row and onwards of the year)
                if (i+1 == len(csvList)): #If it's the last row of the list (To prevent it not adding the final year)
                    counts.append(count) #Add the count of the current year

        dictionary = dict(zip(years, counts)) #Make a dictionary with both the years and counts for the current country
        everything[country] = dictionary #Set the dictionary as the value for the country
        
    return everything

The above cell will store each country with an attached list of years which has a counter for the amount of conflicts.<br>
Now we can finally make the graph to see this visualised.

In [None]:
plt.figure(figsize=(20,5)) #Change the size of the chart to fit everything

#Change this to any country in Africa to see the results for each year    
country = "Zambia"

#Gets the conflicts over the years
yearData = conflictYears()

#Convert both to a list since dictionaries don't work with plots
plt.plot((list(yearData[country].keys())), (list(yearData[country].values()))) 

#Set the labels
plt.ylabel('Conflict count')
plt.title('Conflict for country ['+ country +'] for all years')

Running the above cell will generate a well constructed chart for displaying the conflict count throughout the years for the country 'Zambia'; one thing to note is that it may not show the latest year in some countries due to lack of data.

Try attempt this but on the Europe data set ("data/Europe_2018-2019_Jul06.csv") on the following cell.<br>
Please ensure it is also under the function named "conflictYears2" to avoid clashing with the variables.

<b>Double click for the solution</b>

<!--
import matplotlib.pyplot as plt

def conflictYears():
    years = [] #To store the years
    counts = [] #To store the conflict count
    countries = [] #To store all countries

    count = 0 #Initial counter

    everything = {} #To store everything together

    for i, row in enumerate(csvList): #Enumerate through the list (Enables you to use index easily)
        country = row[16] #Store the country
        year = row[4] #Store the year

        if ((country in countries) == False): #If the country isn't previously entered
            countries.append(country) #Add the country to the list
            years = [] #Empty the years to reset to not clash with previous country
            counts = [] #Empty the counts to reset to not clash with previous country

        else: #If the country is entered (To allow multiple conflicts to be added)
            if ((year in years) == False): #If the year isn't added
                years.append(year) #Add the year

                if (count > 0): #If the count is above 0 (Done to prevent adding a 0 for the first element)
                    counts.append(count) #Add the current count (This is attached to the previous year)
                count = 0 #Reset the count to use for the next or the currently added year
                count += 1 #Increment the count (First row of the year)

            else: #If the year is added
                count += 1 #Increment the count (Second row and onwards of the year)
                if (i+1 == len(csvList)): #If it's the last row of the list (To prevent it not adding the final year)
                    counts.append(count) #Add the count of the current year

        dictionary = dict(zip(years, counts)) #Make a dictionary with both the years and counts for the current country
        everything[country] = dictionary #Set the dictionary as the value for the country
    
plt.figure(figsize=(20,5)) #Change the size of the chart to fit everything

#Change this to any country in Africa to see the results for each year    
country = "Ukraine"

#Convert both to a list since dictionaries don't work with plots
plt.plot((list(everything[country].keys())), (list(everything[country].values()))) 

#Set the labels
plt.ylabel('Conflict count')
plt.title('Conflict for country ['+ country +'] for all years')
-->

You will notice that there's not much data and so the date range will be considerably lower.

<h3>Average conflicts lengths per country</h3>

Next up is to find the typical lengths of conflicts in a country or as a continent as a whole, the way we must do this is by getting the average length for those conflicts for either a country or a continent.

However, this time, it's slightly different; since there is no official column for the duration of the conflicts, we must calculate this by getting the difference in dates between the current conflict and the next conflict, if there are more than one conflict in a day however, we count it as one day.

For this, we will need to use the column with the date for each row (<code>EVENT_TIME</code>), which is the 4th column, meaning it's the index 3, and we'd also need the column for the country to categorise and determine the average length (<code>COUNTRY</code>), which is the 17th column, making it index 16.

+ <code>EVENT_TIME</code> - Index 3
+ <code>COUNTRY</code> - Index 16

Check the cell below.

In [None]:
from datetime import datetime

def calculateDates(date, nextDate):
    d1 = datetime.strptime(date, "%d-%b-%y") #Formats the date
    d2 = datetime.strptime(nextDate, "%d-%b-%y") #Formats the date
    
    difference = abs((d2 - d1).days) #Gets the exact day difference between the two
    
    if (difference == 0): #If the difference is 0
        return 1 #Return 1 (day)
    
    return difference #Return the difference

This creates a function that accepts the current date and the next row's data, this is so that it can compare the difference between the two. <br>You can see that it returns 1 if the difference is 0, this is because the minimum must always be 1 day.

We will use this in the next function.

In [None]:
import matplotlib.pyplot as plt

def conflictLengths():
    lengths = [] #To store the conflict count
    countries = [] #To store all countries
    
    currLengths = [] #To store current lengths

    everything = {} #To store everything together

    for i, row in enumerate(csvList): #Enumerate through the list (Enables you to use index easily)
        country = row[16] #Store the country
        nextRow = csvList[i+1] #Gets the next row
        nextCountry = nextRow[16] #Gets the country of the next row
        
        date = row[3] #Gets the row
        nextDate = nextRow[3] #Gets the next row

        if ((country in countries) == False): #If the country isn't previously entered
            if (len(currLengths) > 0): #Ensures that it isnt the first row
                currMean = (sum(currLengths) / len(currLengths)) #Mean of the lengths           
                lengths.append(currMean) #Appends the mean
                            
            countries.append(country) #Add the country to the list
            currLengths = [] #Empty the lengths to reset to not clash with previous country
            num = calculateDates(date, nextDate) #Gets the difference
            currLengths.append(num) #Appends the difference
            
        else: #If the country is entered (To allow multiple conflicts to be added)
            if (country == nextCountry): #Ensures the difference is taken from the same countries
                num = calculateDates(date, nextDate) #Gets the difference
                currLengths.append(num) #Appends the difference

                if (i+2 == len(csvList)): #Set to 2 because if it's set to 1 it won't get to the row before the last (Error!)
                    currMean = (sum(currLengths) / len(currLengths)) #Gets the mean          
                    lengths.append(currMean) #Appends the mean
                    break #Breaks to prevent error (By going to check the next row in the next iteration)
            else:
                pass
            
    dictionary = dict(zip(countries, lengths)) #Make a dictionary with both the countries and their lengths
    
    return dictionary

This function is similar to the function for gaining conflicts over the years but this time we only work with two columns; <code>EVENT_TIME</code>, <code>COUNTRY</code>, and therefore it's shorter but it's done differently as the mean is calculated after the differences between the dates of the first and second rows are calculated.

The dictionary will contain a mix of both the countries (the keys) and the lengths (the values).<br>
You can get the result of this in the following cell.

In [None]:
#Gets the conflicts lengths for each country
lengthData = conflictLengths()

print(lengthData)

You can see that all of the average times are listed above in a dictionary allocated to their respective country, however, the problem with this is that the mean values are too precise, and therefore we must round each of them to their nearest whole number, how would we do this?<br>
__You must complete this step before moving to the next__

<b>Double click for the solution</b>

<!--for pair in lengthData:
    lengthData[pair] = round(lengthData[pair])
-->

Now we can use this variable "lengthData" to generate charts to display the averages for each country; there's multiple ways of doing this, however you need to be careful as there are 49 countries, and so displaying all of them would be hard in some charts.

A good way of displaying this data would be in a bar chart.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #Needed to set the interval

#Bar chart using countries and their average conflict days
fig, ax = plt.subplots(1,1, figsize=(20,10)) #Set the figure size so that the graph isn't too small
ax.bar(lengthData.keys(),lengthData.values())

#Set the intervals for the y axis data 
ax.yaxis.set_major_locator(ticker.MultipleLocator(5)) #Interval of 5

#Labels
plt.xlabel("Countries")
plt.ylabel("Average lengths of conflicts (Days)")

#Rotate 90 degrees
plt.xticks(rotation=90)

#Add the grid
ax.grid()

#Show the plot
plt.show()

Running the cell above will display a bar chart, you'll notice that the x axis's ticks (Countries) are rotated 90 degrees anti-clockwise using the parameter "rotation" using the method "xticks" of matplotlib; this is because there are too many countries and if it wasn't rotated, it'd all be clustered together, you can see that we imported a new package called "ticker" using the same matplotlib library to do this.

There's also a grid included as the there's many countries and it'd be hard to read their allocated lengths of conflicts without them.

Since all we've been doing so far is displaying the data, there isn't much that we had done with these graphs, and so this time, we're gonna highlight something in this visualisation; check the cell below.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #Needed to set the interval
import matplotlib.patches as ptch #Needed to get the rectangle properties

#Bar chart using countries and their average conflict days
fig, ax = plt.subplots(1,1, figsize=(20,10)) #Set the figure size so that the graph isn't too small
barChart = ax.bar(lengthData.keys(),lengthData.values())

lowest = 0 #Intiial index
lowH = ptch.Rectangle.get_height(barChart[lowest]) #Gets the height of the intial bar

for i, country in enumerate(barChart): #Goes through each bar
    currH = ptch.Rectangle.get_height(country) #Gets the current bar's height
    if (currH < lowH): #If it's the lowest
        lowest = i #Set the new lowest
        lowH = ptch.Rectangle.get_height(barChart[lowest]) #Set the new lowest height
        
#Set the colour of the lowest to green        
barChart[lowest].set_color('#2eff00')

#Set the intervals for the y axis data 
ax.yaxis.set_major_locator(ticker.MultipleLocator(5)) #Interval of 5

#Labels
plt.xlabel("Countries")
plt.ylabel("Average lengths of conflicts (Days)")

#Rotate 90 degrees
plt.xticks(rotation=90)

#Add the grid
ax.grid()

#Show the plot
plt.show()

Running the cell above will find the country with the lowest average length of conflicts using the height of each bar and colour it light green, which in this case is Tunisia.<br>
We can now conclude that based on this data, Tunisia has the shortest conflicts, although this does not mean that it is safe as it also means that Tunisia has the most frequent conflicts as the lengths are based on the difference between one conflict and the conflict after it.

Try the same but colour the highest with red in the cell below; it should colour Botswana.

<b>Double click for the solution</b>

<!--
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #Needed to set the interval
import matplotlib.patches as ptch #Needed to get the rectangle properties

#Bar chart using countries and their average conflict days
fig, ax = plt.subplots(1,1, figsize=(20,10)) #Set the figure size so that the graph isn't too small
barChart = ax.bar(lengthData.keys(),lengthData.values())

highest = 0 #Intiial index
highH = ptch.Rectangle.get_height(barChart[highest]) #Gets the height of the intial bar

for i, country in enumerate(barChart): #Goes through each bar
    currH = ptch.Rectangle.get_height(country) #Gets the current bar's height
    if (currH > highH): #If it's the lowest
        highest = i #Set the new lowest
        highH = ptch.Rectangle.get_height(barChart[highest]) #Set the new lowest height
        
#Set the colour of the lowest to green        
barChart[highest].set_color('red')

#Set the intervals for the y axis data 
ax.yaxis.set_major_locator(ticker.MultipleLocator(5)) #Interval of 5

#Labels
plt.xlabel("Countries")
plt.ylabel("Average lengths of conflicts (Days)")

#Rotate 90 degrees
plt.xticks(rotation=90)

#Add the grid
ax.grid()

#Show the plot
plt.show()
-->

<h3>Correlations in weather data</h3>

Now that we have visualised the conflicts both over time and averages over each country in Africa, we have some experience in developing algorithms and displaying data, and so the next step is to show the correlations of what type of a conflict usually occurs based on the weather data. 

The problem with this is that just like the average conflicts, there's no column nor any relationship in the data for the weather at all, and so we must use a difference source of data just for the weather. 

Since we concluded that Tunisia has the most frequent conflicts, we want to test this climate data on the country with a less frequent conflicts, but we also don't want it to be too few as then we'd have less data, and so for this, we'll use Benin; this is so that there's a big enough gap for conflicts to start between each and we could use this to correlate with the climate data.

It's important to know that this is to correlate the weather with the start dates of the conflicts instead of during them as every day is counted as a day of conflict in our previous data to measure their lengths. This means that we have to create a new algorithm for the old data to detect the start dates of a conflict regardless if there's another conflict on those same days, meaning that we must ignore duplicates.

To reduce the hassle, the new source of data for the climate is already included in the project folder as the name "BeninClimate.csv", the dates of for the climate data are in invalid format and is too long to be manually formatted and in order to correlate the climate data with the conflict data the dates are a requirement, and so a new csv file with only Benin's filtered climate data is created with and saved in a file named "BeninConflicts.csv".

The format for dates in both files will be: <code>yyyymmdd (e.g 20190526 for 2019/05/26)</code>

The climate data contains the date, mean temperature, and precipitation, each in their respective columns:

<code>YEARMODA</code> - Index 0<br>
<code>TEMP</code> - Index 1<br>
<code>PRCP</code> - Index 2

The conflict data that we need is the event date, event type and sub event type:

<code>EVENT_DATE</code> - Index 3<br>
<code>EVENT_TYPE</code> - Index 6<br>
<code>SUB_EVENT_TYPE</code> - Index 7

Now that we know the columns that we are going to be working with, we will then first import Benin's conflict data and convert it to a list.

In [None]:
import csv #Import the csv package

with open('data/BeninConflicts.csv', 'r', encoding="utf-8") as bConflictCsv: #Open the csv file and assign it to a variable ('bConflictCsv')
    bConflictReader = csv.reader(bConflictCsv) #Read the file using the .reader function of csv package to 'csv_reader'

    bConflictList = list(bConflictReader) #Convert to a list
    bConflictList.remove(bConflictList[0]) #This is used to remove the headers to reduce issues with parsing

Now that it's imported, if you run the cell below, it will give you the first row of the csv file.

In [None]:
bConflictList[0]

We will then import the climate data and also convert it to a list as shown below.

In [None]:
with open('data/BeninClimate.csv', 'r') as bClimateCsv: #Open the csv file and assign it to a variable ('bConflictCsv')
    bClimateReader = csv.reader(bClimateCsv) #Read the file using the .reader function of csv package to 'csv_reader'

    bClimateList = list(bClimateReader) #Convert to a list
    bClimateList.remove(bClimateList[0]) #This is used to remove the headers to reduce issues with parsing

If you run the cell below, it'll show you the first row.

In [None]:
bClimateList[0]

Unlike the conflict data, climate data only has 3 columns, the only common column is the year now.

Now we can build our algorithm; since we only have the temperature and the precipitation to correlate with the conflicts, we could think of multiple things, but in this case we'll first show the sub type of conflict depending on the temperature.

For this, we will use two csv files at the same time.

In [None]:
def getTemperature(conflict, climateList):
    
    #Go through each climate record
    for climate in climateList:
        if climate[0] == conflict[3]: #If the climate record's date matches the conflict's date
            return float(climate[1]) #Return the temperature of the climate

In [None]:
def conflictClimate():
    
    typesOfConflict = [] #To store the types of conflicts
    
    currTempMeans = [] #To store the temperature means temporarily for each type
    tempMeans = [] #To store all the temperature means for all types
    
    combined = {} #To store the conflicts and the temperature means for all types
    
    #Get the types of conflicts
    for conflict in bConflictList:
        confType = conflict[7]
        
        #If the conflict type isn't already entered
        if ((confType in typesOfConflict) == False):
            typesOfConflict.append(confType) #Append the conflict type
    
    #Get the type matching conflict temp means
    for typeC in typesOfConflict:
        
        currTempMeans = [] #Empty the current temporary temperature means
        
        #Enumerature through each conflict
        for i, conflict in enumerate(bConflictList):
            prevConflict = bConflictList[i-1] #Get the previous conflict
            
            #Ensure "duplicate"/concurrent conflicts do not get included
            if (prevConflict[3] == conflict[3]):
                pass
            else:
                if (conflict[7] == typeC): #If the conflict matches the current conflict type
                    currTemp = getTemperature(conflict, bClimateList) #Get the temperature by matching the conflict to the climate list
                    if (currTemp == None): #If it returns 'None'
                        currTempMeans.append(0.0) #Append 0.0 to prevent errors
                    else: #If it returns a valid value
                        currTempMeans.append(currTemp) #Append

        #Calculate the temperature mean of the current type
        mean = (sum(currTempMeans) / len(currTempMeans))

        #Append the temperature mean of the current type
        tempMeans.append(mean)
        
    #Combine all the conflicts and the temperature means into one dictionary
    combined = dict(zip(typesOfConflict, tempMeans))
    
    #Return the dictionary
    return combined
            
confClim = conflictClimate()
confClim

When running the above cell, you'll see that it gets all the conflict/event types and the temperature means for all of them.<br>

Knowing this, how would you alter it to get the temperature means for the main types instead of the main types?<br>
Try it in the cell below, use the above cell as a basis, use the function name "conflictClimate2" to avoid clashing.

<b>Double click for the solution</b>

<!--
def conflictClimate2(): 
    
    typesOfConflict = [] #To store the types of conflicts
    
    currTempMeans = [] #To store the temperature means temporarily for each type
    tempMeans = [] #To store all the temperature means for all types
    
    combined = {} #To store the conflicts and the temperature means for all types
    
    #Get the types of conflicts
    for conflict in bConflictList:
        confType = conflict[6]
        
        #If the conflict type isn't already entered
        if ((confType in typesOfConflict) == False):
            typesOfConflict.append(confType) #Append the conflict type
    
    #Get the type matching conflict temp means
    for typeC in typesOfConflict:
        
        currTempMeans = [] #Empty the current temporary temperature means
        
        #Enumerature through each conflict
        for i, conflict in enumerate(bConflictList):
            prevConflict = bConflictList[i-1] #Get the previous conflict
            
            #Ensure "duplicate"/concurrent conflicts do not get included
            if (prevConflict[3] == conflict[3]):
                pass
            else:
                if (conflict[6] == typeC): #If the conflict matches the current conflict type
                    currTemp = getTemperature(conflict, bClimateList) #Get the temperature by matching the conflict to the climate list
                    if (currTemp == None): #If it returns 'None'
                        currTempMeans.append(0.0) #Append 0.0 to prevent errors
                    else: #If it returns a valid value
                        currTempMeans.append(currTemp) #Append

        #Calculate the temperature mean of the current type
        mean = (sum(currTempMeans) / len(currTempMeans))

        #Append the temperature mean of the current type
        tempMeans.append(mean)
        
    #Combine all the conflicts and the temperature means into one dictionary
    combined = dict(zip(typesOfConflict, tempMeans))
    
    #Return the dictionary
    return combined
            
confClim2 = conflictClimate2()
confClim2
-->

We will represent the sub type/temperature correlations using a bar chart.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #Needed to set the interval
import matplotlib.patches as ptch #Needed to get the rectangle properties

#Bar chart using countries and their average conflict days
fig, ax = plt.subplots(1,1, figsize=(20,10)) #Set the figure size so that the graph isn't too small
barChart = ax.bar(confClim.keys(),confClim.values())

lowest = 0 #Intiial index
lowH = ptch.Rectangle.get_height(barChart[lowest]) #Gets the height of the intial bar

for i, conflict in enumerate(barChart): #Goes through each bar
    currH = ptch.Rectangle.get_height(conflict) #Gets the current bar's height
    if (currH < lowH): #If it's the lowest
        lowest = i #Set the new lowest
        lowH = ptch.Rectangle.get_height(barChart[lowest]) #Set the new lowest height
        
#Set the colour of the lowest to green        
barChart[lowest].set_color('orange')

#Set the intervals for the y axis data 
ax.yaxis.set_major_locator(ticker.MultipleLocator(5)) #Interval of 5

#Labels
plt.xlabel("Conflict sub types")
plt.ylabel("Average temperature (Celcius)")

#Rotate 90 degrees
plt.xticks(rotation=45)

#Add the grid
ax.grid()

#Show the plot
plt.show()

As you can see from the bar chart above, the conflict sub type mob violence has a low average temperature, meaning that in Benini, we can conclude that in lower temperatures, it's likely for a conflict to be related to mob violence.

However, this was only for the temperature, we must also take into account the effect of rainfall as well, the process is very similar, it is listed below.

In [None]:
def getRainfall(conflict, climateList):
    
    #Go through each climate record
    for climate in climateList:
        if climate[0] == conflict[3]: #If the climate record's date matches the conflict's date
            rainfall = climate[2][:-1] #Set the rainfall variable (Without last letter to parse without erro)
            return float(rainfall) #Return the rainfall of the climate

You can see that it's identical to the function for getting the temperature, except this time it's for rainfall and so the last character is removed after the rainfall value is selected to prevent parsing errors as values under the column in the data often contains a random letter at the end.

In [None]:
def conflictClimate2():
    
    typesOfConflict = [] #To store the types of conflicts
    
    currPRCPMeans = [] #To store the rainfall means temporarily for each type
    PRCPMeans = [] #To store all the rainfall means for all types
    
    combined = {} #To store the conflicts and the rainfall means for all types
    
    #Get the types of conflicts
    for conflict in bConflictList:
        confType = conflict[7]
        
        #If the conflict type isn't already entered
        if ((confType in typesOfConflict) == False):
            typesOfConflict.append(confType) #Append the conflict type
    
    #Get the type matching conflict temp means
    for typeC in typesOfConflict:
        
        currPRCPMeans = [] #Empty the current temporary rainfall means
        
        #Enumerature through each conflict
        for i, conflict in enumerate(bConflictList):
            prevConflict = bConflictList[i-1] #Get the previous conflict
            
            #Ensure "duplicate"/concurrent conflicts do not get included
            if (prevConflict[3] == conflict[3]):
                pass
            else:
                if (conflict[7] == typeC): #If the conflict matches the current conflict type
                    currPRCP = getRainfall(conflict, bClimateList) #Get the rainfall by matching the conflict to the climate list
                    if (currPRCP == None): #If it returns 'None'
                        currPRCPMeans.append(0.0) #Append 0.0 to prevent errors
                    else: #If it returns a valid value
                        currPRCPMeans.append(currPRCP) #Append

        #Calculate the temperature mean of the current type
        mean = (sum(currPRCPMeans) / len(currPRCPMeans))

        #Append the temperature mean of the current type
        PRCPMeans.append(mean)
        
    #Combine all the conflicts and the temperature means into one dictionary
    combined = dict(zip(typesOfConflict, PRCPMeans))
    
    #Return the dictionary
    return combined
            
confClim2 = conflictClimate2()
confClim2

The exact same is done here, except that the variables are difference to avoid clashing with the previous method's variables and to fit the context further.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #Needed to set the interval
import matplotlib.patches as ptch #Needed to get the rectangle properties

#Bar chart using countries and their average conflict days
fig, ax = plt.subplots(1,1, figsize=(20,10)) #Set the figure size so that the graph isn't too small
barChart = ax.bar(confClim2.keys(),confClim2.values())

highest = 0 #Intiial index
highH = ptch.Rectangle.get_height(barChart[highest]) #Gets the height of the intial bar

for i, conflict in enumerate(barChart): #Goes through each bar
    currH = ptch.Rectangle.get_height(conflict) #Gets the current bar's height
    if (currH > highH): #If it's the lowest
        highest = i #Set the new lowest
        highH = ptch.Rectangle.get_height(barChart[highest]) #Set the new lowest height
        
#Set the colour of the lowest to green        
barChart[highest].set_color('blue')

#Labels
plt.xlabel("Conflict sub types")
plt.ylabel("Average rainfall (Inches and hundreths)")

#Rotate 90 degrees
plt.xticks(rotation=45)

#Add the grid
ax.grid()

#Show the plot
plt.show()

By running the cell above, you'll get a graph, similar to the temperature graph but this time with rainfall, and in this graph, you can see a different result; every type seems to have fairly low values except for one of them; abduction/forced disappearence.

With this, we can also conclude that in Benin, there's a higher chance of an abduction/forced disappearence being a conflict on rainier days compared to the others as shown above.

<h3>The end</h3>

This concludes the end of notebook 4, you should now be more familiar with data visualisation as not only have you displayed data but also took advantage of their graphs by exploring them to make your own conclusions about correlations within the data.

<h3>Bibliography</h3>

+ <a href="#Getting-started">ACLED data set</a> by “Raleigh, Clionadh, Andrew Linke, Håvard Hegre and Joakim Karlsen. (2010). “Introducing ACLED-Armed Conflict Location and Event Data.” Journal of Peace Research 47(5) 651-660.” - Retrieved 10th of July, 2019, from <a href="https://www.acleddata.com/curated-data-files/">https://www.acleddata.com/curated-data-files/</a>.
+ <a href="#Correlations-in-weather-data">Benin climate data</a> by U.S. Air Force Climatology Center - Retrieved 10th of July, 2019, from <a href="https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd?datasetabbv=GSOD&countryabbv=&georegionabbv=AFR&resolution=40">https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd?datasetabbv=GSOD&countryabbv=&georegionabbv=AFR&resolution=40</a>.