# Exploring World Historical Battle Database
Access to this database was granted by its creator and curator, [Dr. Shuhei Kitamura of Osaka University](https://osf.io/j357k). It's important we acknowledge his generosity. 

In the interest of looking at applying data science to social studies topics, Canada's involvement in world conflict is an important part of Canadian history. This database allows us to explore battles from throughout human history, and across the world.

![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=SocialStudies/HansardAnalysis/hansard-analysis.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import warnings
import pycountry_convert as pc
import requests

In [3]:
!pip install pycountry_convert

Collecting pycountry_convert
  Downloading pycountry_convert-0.7.2-py3-none-any.whl (13 kB)
Collecting pycountry>=16.11.27.1
  Downloading pycountry-22.3.5.tar.gz (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pytest-mock>=1.6.3
  Downloading pytest_mock-3.11.1-py3-none-any.whl (9.6 kB)
Collecting repoze.lru>=0.7
  Downloading repoze.lru-0.7-py3-none-any.whl (10 kB)
Collecting pytest-cov>=2.5.1
  Downloading pytest_cov-4.1.0-py3-none-any.whl (21 kB)
Collecting pprintpp>=0.3.0
  Downloading pprintpp-0.4.0-py2.py3-none-any.whl (16 kB)
Collecting coverage[toml]>=5.2.1
  Downloading coverage-7.2.7-cp39-cp39-macosx_10_9_x86_64.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.7/

In [None]:
# Read in the data
data = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/HistoricalBattles/whbd_v11.xlsx')
data2 = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/HistoricalBattles/historical_battles.csv')
data = data.merge(data2,how='left')
data = data[data['year'].notna()]
data['year'] = data['year'].map(lambda x: int(x))
data.sort_values('year',inplace=True)
data

In [None]:
# See what the columns contain
data.columns

### CANADA

In [None]:
# Filter to only look at the battles that specify Canada as a combatant. Drop unused columns
country_data = data[data['bell']=='Canada'].drop(['uk', 'fr', 'de', 'sp', 'sw', 'tr', 'at', 
                                             'ru', 'nl', 'it', 'pt', 'dk', 'habsburg', 'hre',
                                             'naval', 'river', 'lake', 'air', 'multiple', 'entire',
                                             'ongoing', 'plan', 'nopage', 'unknown'], axis=1)

In [None]:
fig = px.scatter_geo(country_data, lat='lat', lon='lng', 
               hover_name='bname', 
               hover_data=['year'],
               color= 'war',
               title='Battles participated in by post-Confederation Canada')

fig.update_layout(showlegend=False)
fig.show()

In [None]:
casualties_con = country_data[country_data['casualties'].notna()]

px.scatter_geo(casualties_con, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               hover_data=['war', 'year'],
               size = 'casualties',
               color='casualties',
               title='Casualties of battles participated in by Canada')

In [None]:
con_bloodiest_war = country_data.groupby('war',as_index=False)['casualties'].sum()
con_bloodiest_war = con_bloodiest_war[con_bloodiest_war['casualties'] > 0]
con_bloodiest_war.sort_values('casualties',inplace=True)

In [None]:
top_10_bloddiest_con_wars = con_bloodiest_war[:10:-1]

fig = px.bar(top_10_bloddiest_con_wars,x='war',y='casualties',title= "Canada's Bloodiest Wars")

fig.show()

In [None]:
# Southernmost point of Canada is Lake Erie, ON, at 41°40' N; easternmost is Cape Spear, NL, at 52°37' W
# We also need to exclude a single WWII Pacific battle that happened off the coast of Alaska that didn't involve Canada
NA_data = data[(data['lat']>41.6) & 
               (data['lng']<-52.6) & 
               (data['lng']>-160)]

# There's also many wars in this subset that don't feature Canada, so we can list them here to remove:
remove = ['American Revolutionary War',
          'Sioux Wars',
          "Red Cloud's War",
          'Dakota War of 1862',
          'Russo-Tlingit War',
          'Great Sioux War of 1876',
          'Powder River Expedition',
          'American Civil War',
          'Yellowstone Expedition of 1873',
          'Nez Perce War',
          'Comanche Campaign',
          'Boston campaign',
          'Modoc War',
          'American Revolution',
          "King Philip's War",
          'Black Hawk War',
          'Colorado War',
          'American Indian Wars',
          'Forage War',
          "Coeur d'Alene War",
          'Yakima War',
          'Philadelphia campaign',
          'Ghost Dance War']

# Remove wars listed above, as well as battles without a specific war (that all happened in the USA)
NA_data = NA_data[(~NA_data['war'].isin(remove)) & (~NA_data['war'].isnull())]
          
          
          
fig = px.scatter_geo(NA_data, lat='lat', lon='lng', 
               hover_name='bname', 
               color='war',
               hover_data=['year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

fig.update_layout(showlegend=False)
fig.show()

In [None]:
NA_casualties = NA_data[NA_data['casualties'].notna()]

px.scatter_geo(NA_casualties, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               size = 'casualties',
               size_max = 30,
               color = 'casualties',
               hover_data=['war','year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

In [None]:
fig = px.scatter_mapbox(NA_data, lat="lat", lon="lng", hover_name="bname", hover_data=['year'],
                        color_discrete_sequence=["fuchsia"],zoom=2.5)
fig.update_layout(
    mapbox_style="white-bg",
    mapbox_layers=[
        {
            "below": 'traces',
            "sourcetype": "raster",
            "sourceattribution": "United States Geological Survey",
            "source": [
                "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
            ]
        }
      ])
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

### WORLD

In [None]:
country_of_interest = 'Russia' #you can change this to a country you are interested in

# Filter to only look at the battles that specify Canada as a combatant. Drop unused columns
country_of_interest_data = data[data['bell']==country_of_interest].drop(['uk', 'fr', 'de', 'sp', 'sw', 'tr', 'at', 
                                             'ru', 'nl', 'it', 'pt', 'dk', 'habsburg', 'hre',
                                             'naval', 'river', 'lake', 'air', 'multiple', 'entire',
                                             'ongoing', 'plan', 'nopage', 'unknown'], axis=1)
display(country_of_interest_data)

We can plot the battles on a map with their lattitude and longitude coordinates.

You can drag to move around the map, zoom in and out to get more clarity. Hovering over each data point lists the name of the battle, as well as the war in which the battle was fought.

In [None]:
country_of_interest_data = country_of_interest_data[country_of_interest_data['bname'].notna()]

fig = px.scatter_geo(country_of_interest_data, lat='lat', lon='lng', 
               hover_name='bname', 
               hover_data=['year'],
               color= 'war',
               title='Battles participated in by ' + country_of_interest)

fig.update_layout(showlegend=False)
fig.show()

Let's animate through the years so see the wars that Canada has participated in

In [None]:
warnings.filterwarnings("ignore")
years = list(country_of_interest_data['year'].unique())

animate_country_data = pd.DataFrame(country_of_interest_data)

def animation_years(row):
    global animate_country_data
   
    df = pd.DataFrame(columns=animate_country_data.columns)
    index = years.index(row['year'])
    for i in years[index+1:]:
        row[4] = i
        df.loc[len(df.index)] = row
    
    animate_country_data = pd.concat([animate_country_data,df],ignore_index=True)

for i in range(len(country_of_interest_data.index)):
    animation_years(country_of_interest_data.iloc[i])

animate_country_data

In [None]:
def rank_rows(df):
    
    df['sort_rank'] = 0
    curr_rank = 1
    for b in battles:
        temp = df.loc[df['bname'] == b]
        for i,rows in temp.iterrows():  
            if df['sort_rank'][i] == 0:
                df['sort_rank'][i] = curr_rank
                curr_rank += 1
    return df        
            

    
animate_country_data.sort_values(['year'],inplace=True)            
battles = list(animate_country_data.sort_values('year')['bname'].unique())
animate_country_data = rank_rows(animate_country_data)

animate_country_data.sort_values('sort_rank',inplace=True)
animate_country_data

By clicking on the "play" button on the bottom we'll be able to look at the wars that Canada has found throughout the years

In [None]:
px.scatter_geo(animate_country_data, lat='lat', lon='lng', 
               height=800, hover_name='bname', 
               animation_frame= 'year',
               animation_group='war',
               title='Battles participated in by ' + country_of_interest)

We can also look at which continents Canada has fought the most in.

In [None]:
countries = ['Canada','Japan','India','Spain'] #you can change and add/remove countries in this list

battles_continent = data
battles_continent = battles_continent[battles_continent['continent'].notna()]
battles_continent['interest'] = data['bell'].map(lambda x: True if x in countries else False)
battles_continent = battles_continent[battles_continent['interest'] == True]
battles_continent.rename(columns={'bell':'Country'},inplace=True)
battles_continent

In [None]:
continent_grouped = battles_continent.groupby(['continent','Country'])['locn'].count().reset_index(name='Number of Battles Fought')
continent_grouped.sort_values('Number of Battles Fought',ascending=False,inplace=True)
fig = px.bar(continent_grouped,x='Country',y='Number of Battles Fought',color='continent',title='Number of Battles fought by each country in different Continents')
fig.show()