# Exploring World Historical Battle Database
Access to this database was granted by its creator and curator, [Dr. Shuhei Kitamura of Osaka University](https://osf.io/j357k). It's important we acknowledge his generosity. 

In the interest of looking at applying data science to social studies topics, Canada's involvement in world conflict is an important part of Canadian history. This database allows us to explore battles from throughout human history, and across the world.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import warnings
import math
import pycountry_convert as pc
from geopy .geocoders import Nominatim

In [2]:
# Read in the data
data = pd.read_excel('data/whbd_v11-2.xlsx')
data = data[data['year'].notna()]
data['year'] = data['year'].map(lambda x: int(x))
data.sort_values('year',inplace=True)
data

Unnamed: 0,uid,bid,lr,bname,year,year_end,war,bell,mult_sides,win,...,air,multiple,entire,unknown,ongoing,plan,nopage,wd_url,wp_url,casualties
142,143,53,l,Battle of Zhuolu,-2500,,,Yanhuang tribe,,win,...,,,,,,,,http://www.wikidata.org/entity/Q1064923,https://en.wikipedia.org/wiki/Battle_of_Zhuolu,
143,144,53,r,Battle of Zhuolu,-2500,,,Jiuli tribes,,loss,...,,,,,,,,http://www.wikidata.org/entity/Q1064923,https://en.wikipedia.org/wiki/Battle_of_Zhuolu,
19448,19449,6850,l,Battle of Banquan,-2500,,,Shennong (tribe),,loss,...,,,,,,,,http://www.wikidata.org/entity/Q755758,https://en.wikipedia.org/wiki/Battle_of_Banquan,
19449,19450,6850,r,Battle of Banquan,-2500,,,Youxiong (tribe),,win,...,,,,,,,,http://www.wikidata.org/entity/Q755758,https://en.wikipedia.org/wiki/Battle_of_Banquan,
21481,21482,7590,r,Battle of Uruk,-2271,,,Sumerian provinces (?),,loss,...,,,,,,,,http://www.wikidata.org/entity/Q3309009,https://en.wikipedia.org/wiki/Battle_of_Uruk,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17506,17507,6213,l,Battle of Baghuz Fawqani,2019,,Deir ez-Zor offensive,United Kingdom,,win,...,,,,,,,,http://www.wikidata.org/entity/Q61843818,https://en.wikipedia.org/wiki/Battle_of_Baghuz...,
17505,17506,6213,l,Battle of Baghuz Fawqani,2019,,Deir ez-Zor offensive,France,,win,...,,,,,,,,http://www.wikidata.org/entity/Q61843818,https://en.wikipedia.org/wiki/Battle_of_Baghuz...,
17504,17505,6213,l,Battle of Baghuz Fawqani,2019,,Deir ez-Zor offensive,United States,,win,...,,,,,,,,http://www.wikidata.org/entity/Q61843818,https://en.wikipedia.org/wiki/Battle_of_Baghuz...,
17510,17511,6213,r,Battle of Baghuz Fawqani,2019,,Deir ez-Zor offensive,Wilayat al-Sham,,loss,...,,,,,,,,http://www.wikidata.org/entity/Q61843818,https://en.wikipedia.org/wiki/Battle_of_Baghuz...,


In [3]:
# See what the columns contain
data.columns

Index(['uid', 'bid', 'lr', 'bname', 'year', 'year_end', 'war', 'bell',
       'mult_sides', 'win', 'uk', 'fr', 'de', 'sp', 'sw', 'tr', 'at', 'ru',
       'nl', 'it', 'pt', 'dk', 'habsburg', 'hre', 'lat', 'lng', 'locn',
       'naval', 'river', 'lake', 'air', 'multiple', 'entire', 'unknown',
       'ongoing', 'plan', 'nopage', 'wd_url', 'wp_url', 'casualties'],
      dtype='object')

In [4]:
# Filter to only look at the battles that specify Canada as a combatant. Drop unused columns
candata = data[data['bell']=='Canada'].drop(['uk', 'fr', 'de', 'sp', 'sw', 'tr', 'at', 
                                             'ru', 'nl', 'it', 'pt', 'dk', 'habsburg', 'hre',
                                             'naval', 'river', 'lake', 'air', 'multiple', 'entire',
                                             'ongoing', 'plan', 'nopage', 'unknown'], axis=1)
display(candata)

Unnamed: 0,uid,bid,lr,bname,year,year_end,war,bell,mult_sides,win,lat,lng,locn,wd_url,wp_url,casualties
2929,2930,986,r,Battle of Trout River,1870,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0
21394,21395,7558,r,Battle of Fish Creek,1885,,North-West Rebellion,Canada,,loss,52.601944,-105.947220,,http://www.wikidata.org/entity/Q2890611,https://en.wikipedia.org/wiki/Battle_of_Fish_C...,79.0
21359,21360,7544,r,Battle of Frenchman's Butte,1885,,North-West Rebellion,Canada,,loss,53.627224,-109.575836,,http://www.wikidata.org/entity/Q2888576,https://en.wikipedia.org/wiki/Battle_of_French...,
21407,21408,7560,r,Battle of Duck Lake,1885,,North-West Rebellion,Canada,,loss,52.816509,-106.232727,,http://www.wikidata.org/entity/Q2890920,https://en.wikipedia.org/wiki/Battle_of_Duck_Lake,32.0
13132,13133,4613,r,Battle of Fort Pitt,1885,,North-West Rebellion,Canada,,loss,53.650180,-109.751540,"Frenchman Butte No. 501, near Frenchman Butte,...",http://www.wikidata.org/entity/Q4871050,https://en.wikipedia.org/wiki/Battle_of_Fort_Pitt,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22441,22442,10016,l,,2013,2014.0,Northern Mali conflict,Canada,,win,,,Mali,http://www.wikidata.org/entity/Q16627878,https://en.wikipedia.org/wiki/Battle_of_Dayet_...,
3359,3360,1129,l,Battle of Baiji,2014,2015.0,Iraqi Civil War (2014–2017),Canada,,win,34.933333,43.483333,"Baiji, Saladin Governorate, Iraq",http://www.wikidata.org/entity/Q17621839,https://en.wikipedia.org/wiki/Battle_of_Baiji_...,4846.0
3791,3792,1275,l,Battle of Ramadi,2014,2015.0,Anbar offensive,Canada,,loss,33.416667,43.300000,"Ramadi, Anbar Governorate, Iraq",http://www.wikidata.org/entity/Q18639122,https://en.wikipedia.org/wiki/Battle_of_Ramadi...,
5014,5015,1682,l,Battle of Ramadi,2015,2016.0,Iraqi Civil War (2014–2017),Canada,,win,33.416667,43.300000,"Ramadi, Anbar Governorate, Iraq",http://www.wikidata.org/entity/Q21685887,https://en.wikipedia.org/wiki/Battle_of_Ramadi...,3250.0


We can take the data and consider only the wars that list Canada as a participant:

In [5]:
candata = candata[candata['bname'].notna()]

list(candata['bname'].sort_values().unique())

['Action of 26 April 1944',
 'Allied invasion of Italy',
 'Battle for Caen',
 'Battle for the Kapelsche Veer',
 'Battle of Amiens',
 'Battle of Anzio',
 'Battle of Arghandab',
 'Battle of Arras',
 'Battle of Attu',
 'Battle of Baiji',
 'Battle of Batoche',
 'Battle of Bubiyan',
 'Battle of Cambrai',
 'Battle of Chambois',
 'Battle of Chuam-ni',
 'Battle of Cut Knife',
 'Battle of Diamond Hill',
 'Battle of Drocourt-Quéant Line',
 'Battle of Duck Lake',
 'Battle of Dunkirk',
 'Battle of Elands River',
 'Battle of Festubert',
 'Battle of Fish Creek',
 'Battle of Flers–Courcelette',
 'Battle of Fort Pitt',
 "Battle of Frenchman's Butte",
 'Battle of Givenchy',
 'Battle of Groningen',
 'Battle of Hill 60',
 'Battle of Hill 70',
 'Battle of Hong Kong',
 'Battle of Inchon',
 'Battle of Kapyong',
 "Battle of Kitcheners' Wood",
 'Battle of Le Mesnil-Patry',
 'Battle of Le Transloy',
 'Battle of Leliefontein',
 'Battle of Loon Lake',
 'Battle of Maehwa-san',
 'Battle of Messines',
 'Battle of M

We can plot the battles on a map with their lattitude and longitude coordinates.

You can drag to move around the map, zoom in and out to get more clarity. Hovering over each data point lists the name of the battle, as well as the war in which the battle was fought.

In [6]:
fig = px.scatter_geo(candata, lat='lat', lon='lng', 
               hover_name='bname', 
               hover_data=['year'],
               color= 'war',
               title='Battles participated in by post-Confederation Canada')

fig.update_layout(showlegend=False)
fig.show()

Let's animate through the years so see the wars that Canada has participated in

In [7]:
warnings.filterwarnings("ignore")
years = list(candata['year'].unique())

animate_candata = pd.DataFrame(candata)

def animation_years(row):
    global animate_candata
   
    df = pd.DataFrame(columns=animate_candata.columns)
    index = years.index(row['year'])
    for i in years[index+1:]:
        row[4] = i
        df.loc[len(df.index)] = row
    
    animate_candata = pd.concat([animate_candata,df],ignore_index=True)

for i in range(len(candata.index)):
    animation_years(candata.iloc[i])



In [8]:
def rank_rows(df):
    
    df['sort_rank'] = 0
    curr_rank = 1
    for b in battles:
        temp = df.loc[df['bname'] == b]
        for i,rows in temp.iterrows():
            
            if df['sort_rank'][i] == 0:
                df['sort_rank'][i] = curr_rank
                curr_rank += 1
    return df        
            

    
animate_candata.sort_values(['year'],inplace=True)            
battles = list(animate_candata.sort_values('year')['bname'].unique())
animate_candata = rank_rows(animate_candata)

animate_candata.sort_values('sort_rank',inplace=True)
animate_candata

Unnamed: 0,uid,bid,lr,bname,year,year_end,war,bell,mult_sides,win,lat,lng,locn,wd_url,wp_url,casualties,sort_rank
0,2930,986,r,Battle of Trout River,1870,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0,1
108,2930,986,r,Battle of Trout River,1885,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0,2
109,2930,986,r,Battle of Trout River,1900,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0,3
110,2930,986,r,Battle of Trout River,1914,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0,4
111,2930,986,r,Battle of Trout River,1915,,Fenian raids,Canada,,win,45.087337,-74.173851,"Huntingdon, Quebec, Canada",http://www.wikidata.org/entity/Q16822756,https://en.wikipedia.org/wiki/Battle_of_Trout_...,4.0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1903,5015,1682,l,Battle of Ramadi,2016,2016.0,Iraqi Civil War (2014–2017),Canada,,win,33.416667,43.3,"Ramadi, Anbar Governorate, Iraq",http://www.wikidata.org/entity/Q21685887,https://en.wikipedia.org/wiki/Battle_of_Ramadi...,3250.0,1900
104,3360,1129,l,Battle of Baiji,2014,2015.0,Iraqi Civil War (2014–2017),Canada,,win,34.933333,43.483333,"Baiji, Saladin Governorate, Iraq",http://www.wikidata.org/entity/Q17621839,https://en.wikipedia.org/wiki/Battle_of_Baiji_...,4846.0,1901
1899,3360,1129,l,Battle of Baiji,2015,2015.0,Iraqi Civil War (2014–2017),Canada,,win,34.933333,43.483333,"Baiji, Saladin Governorate, Iraq",http://www.wikidata.org/entity/Q17621839,https://en.wikipedia.org/wiki/Battle_of_Baiji_...,4846.0,1902
1900,3360,1129,l,Battle of Baiji,2016,2015.0,Iraqi Civil War (2014–2017),Canada,,win,34.933333,43.483333,"Baiji, Saladin Governorate, Iraq",http://www.wikidata.org/entity/Q17621839,https://en.wikipedia.org/wiki/Battle_of_Baiji_...,4846.0,1903


By clicking on the "play" button on the bottom we'll be able to look at the wars that Canada has found throughout the years

In [9]:
px.scatter_geo(animate_candata, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               animation_frame= 'year',
               animation_group='war',
               title='Battles participated in by post-Confederation Canada')

Now let's add in the number of casualties of each battle that Canada has fought in. We will change the size of each bubble to match the number of casualties each battle had. Larger bubbles will indicate battles with more casualties.

In [10]:
casualties_can = candata[candata['casualties'].notna()]

px.scatter_geo(casualties_can, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               hover_data=['war', 'year'],
               size = 'casualties',
               color='casualties',
               title='Casualties of battles participated in by post-Confederation Canada')

We can also create a bar graph that shows the bloodiest wars that Canada has been a part of.

In [11]:
can_bloodiest_war = candata.groupby('war',as_index=False)['casualties'].sum()
can_bloodiest_war = can_bloodiest_war[can_bloodiest_war['casualties'] > 0]
can_bloodiest_war.sort_values('casualties',inplace=True)

In [12]:
top_10_bloddiest_can_wars = can_bloodiest_war[:10:-1]

fig = px.bar(top_10_bloddiest_can_wars,x='war',y='casualties',title="Canada's Bloodiest Wars")

fig.show()

We can also look at which continents Canada has fought the most in.

In [19]:
def FindContinent(location):
   address = location.raw['address']
   country = address.get('country','')
   change = {'Palestinian Territory': 'Palestine'}
   
   if country in change:
      country = change[country]
      
   country_alpha2 = pc.country_name_to_country_alpha2(country)
   continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
   continent = pc.convert_continent_code_to_continent_name(continent_code)
   return continent

locator = Nominatim(user_agent='battles')

battles_continent = data[(data['lat'].notna()) & (data['lng'].notna())]

true_locations = []
for i,row in battles_continent.iterrows():
   true_locations.append(locator.reverse(str(row['lat']) + "," + str(row['lng']),language='en'))
   #country = address.get('country','')

battles_continent['True Location'] = true_locations
battles_continent = battles_continent[battles_continent['True Location'].notna()]
battles_continent['continent'] = battles_continent['True Location'].map(lambda x: FindContinent(x))
battles_continent.to_csv('historical_battles.csv',index=False)

GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /reverse?lat=41.8819&lon=12.6972&format=json&accept-language=en&addressdetails=1 (Caused by ReadTimeoutError("HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Read timed out. (read timeout=1)"))

In [None]:
continent_grouped = can_battles_continent.groupby('continent')['locn'].count().reset_index(name='Number of Battles Fought')
continent_grouped.sort_values('Number of Battles Fought',ascending=False,inplace=True)
fig = px.bar(continent_grouped,x='continent',y='Number of Battles Fought',title='Number of Battles fought by Canada in different Continents')

fig.show()

But these only consider battles that happened after Confederation, as the sovereign state of Canada didn't exist yet. We can also look at battles that happened in geographic Canada by only considering lattitude and longitude values in that region:

In [None]:
# Southernmost point of Canada is Lake Erie, ON, at 41°40' N; easternmost is Cape Spear, NL, at 52°37' W
# We also need to exclude a single WWII Pacific battle that happened off the coast of Alaska that didn't involve Canada
NA_data = data[(data['lat']>41.6) & 
               (data['lng']<-52.6) & 
               (data['lng']>-160)]

# There's also many wars in this subset that don't feature Canada, so we can list them here to remove:
remove = ['American Revolutionary War',
          'Sioux Wars',
          "Red Cloud's War",
          'Dakota War of 1862',
          'Russo-Tlingit War',
          'Great Sioux War of 1876',
          'Powder River Expedition',
          'American Civil War',
          'Yellowstone Expedition of 1873',
          'Nez Perce War',
          'Comanche Campaign',
          'Boston campaign',
          'Modoc War',
          'American Revolution',
          "King Philip's War",
          'Black Hawk War',
          'Colorado War',
          'American Indian Wars',
          'Forage War',
          "Coeur d'Alene War",
          'Yakima War',
          'Philadelphia campaign',
          'Ghost Dance War']

# Remove wars listed above, as well as battles without a specific war (that all happened in the USA)
NA_data = NA_data[(~NA_data['war'].isin(remove)) & (~NA_data['war'].isnull())]
          
          
          
fig = px.scatter_geo(NA_data, lat='lat', lon='lng', 
               hover_name='bname', 
               color='war',
               hover_data=['year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

fig.update_layout(showlegend=False)
fig.show()

In [None]:
list(NA_data['bname'].sort_values().unique())

Similar to before, we can also include casualties into this and have the bubble sizes match the number of casualties

In [None]:
NA_casualties = NA_data[NA_data['casualties'].notna()]

px.scatter_geo(NA_casualties, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               size = 'casualties',
               size_max = 30,
               color = 'casualties',
               hover_data=['war','year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

In [None]:
fig = px.scatter_mapbox(NA_data, lat="lat", lon="lng", hover_name="bname", hover_data=['year'],
                        color_discrete_sequence=["fuchsia"],zoom=2.5)
fig.update_layout(
    mapbox_style="white-bg",
    mapbox_layers=[
        {
            "below": 'traces',
            "sourcetype": "raster",
            "sourceattribution": "United States Geological Survey",
            "source": [
                "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
            ]
        }
      ])
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## Next Steps

Though the data can be kind of spotty for some battles, it could be interesting to bring in the number (or rate) of casualities for each battle, and size the markers proportional to that. It's only an estimate of the importance of the battle, but it's a good first step.

We could also use the plotting function to animate the conflicts throughout Canadian history.

Lastly, we could focus on WWI or WWII data and look at the advance of the Allied gains in Europe, highlighting battles that Canada was a major part of.