![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=SocialStudies/HistoricalBattles/historical-battles.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Exploring World Historical Battle Database

<img src="images/battle1.png" width="250"/>

Access to this database was granted by its creator and curator, [Dr. Shuhei Kitamura of Osaka University](https://osf.io/j357k). It's important we acknowledge his generosity. 

In the interest of looking at applying data science to social studies topics, Canada's involvement in world conflict is an important part of Canadian history. This database allows us to explore battles from throughout human history, and across the world.

*Curriculum Connections*

- [Historical Battles outside and within Canada](https://education.alberta.ca/media/160209/program-of-study-grade-10.pdf)- Alberta Curriculum

    - Grade 7: Canada: Origins, Histories and Movementof People
    - Grade 8: Canada: Worldviews in Conflict
    - High School Social Studies: Living in a Globalizing World 
    - High School Social Studies: Canada: Worldviews in Conflict
    - Military Studies & Canada and War - World History Course



*Investigating Questions*

- How can data science techniques be applied to analyze Canada's involvement in world conflicts throughout history?
- How does studying battles and conflicts from different time periods and regions contribute to our understanding of human history as a whole?
- How does studying conflicts from around the world promote cross-cultural understanding and contribute to a global perspective on historical events?
- What are the potential limitations and challenges in applying data science to analyze and interpret historical data, particularly in the context of Canada's involvement in world conflicts?



### Import the Data
The code below will import the Python programming libraries we need to gather and organize the data to answer our question. `▶Run` the code cell below 

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import warnings
import pycountry_convert as pc
import requests

`▶Run` the code cell below to load the historical battle data into a dataframe

Note: This data takes some time to load 

In [None]:
# Read in the data
data = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/HistoricalBattles/whbd_v11.xlsx')
data2 = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/HistoricalBattles/historical_battles.csv')
data = data.merge(data2,how='left')
data = data[data['year'].notna()]
data['year'] = data['year'].map(lambda x: int(x))
data.sort_values('year',inplace=True)
data

In [None]:
# Listing all 42 columns in the data
data.columns
for col in data.columns:
    print(col)

There seems to be a lot of columns in this dataframe. Let's drop some columns that we won't be using so the dataframe will be easier to read and understand

In [None]:
#Drop unused columns
data.drop(['uk', 'fr', 'de', 'sp', 'sw', 'tr', 'at', 
            'ru', 'nl', 'it', 'pt', 'dk', 'habsburg', 'hre',
            'naval', 'river', 'lake', 'air', 'multiple', 'entire',
            'ongoing', 'plan', 'nopage', 'unknown'], axis=1,inplace=True)

for col in data.columns:
    print(col)

_Column Description_
- **bname:**  Name of a battle
- **lr:** Which side a belligerent belongs to.  For example, if more than one belligerent is markedwith  an  \l"  (which  means  that  it  appears  on  the  \left"  side  in  the  summary  box  in  theWikipedia page) in a battle, it means that these belligerents fought on the same side in thatbattle.  For battles with more than two sides, seemultsidesbelow.
- **year, year_end:** Start and end years of the battle, according to the Gregorian calendar.  If thebattle did not end in the same year,yearendreports the end year of the battle.  Otherwise,6
yearendgives a missing value.
- **war:**  Name of a war that a battle belongs to (if any).  This variable is incomplete and someinformation is missing.  However, it is left in the dataset for reference.
- **bell:**  Name of a belligerent.  In some cases, the belligerent name can be a name of a coali-tion or group of (city-)states (e.g., United Nations, CJTF{OIR, Arcadian League in ancientGreece) without indicating any particular (city-)states involved in the battle.  The belliger-ent name ending in "(?)" means that a summary box in the English Wikipedia page is notavailable.  Wikipedia pages in other languages are used as alternative sources of information.
- **multsides:**  Indicator of a battle in which belligerents are divided into more than two sides.For such battles,  because of the data structure of the WHBD,bellgives a missing value.This  means  that  all  the  GP  variables  for  such  battles  also  take  the  value  of  zero.   If  youwant to use such battles in your analysis, you need to modify the GP variables accordingly.Fortunately, such battles are not very common.
- **win:**  Result of a battle.  This takes the same value for all belligerents who fought on the sameside (see lr above).
- **lat:** lng:  Geo-coordinates of a battle
- **locn:** Name of a place where a battle took place
- **wd_url** WikiData link for the battle
- **wp_url** Wikipedia link for the battle
- **casualties** Number of deaths in the battle
- **True Location** The geographical location of the battle
- **continent** The continent the battle was fought on

## CANADA 🇨🇦

The code below will arrange the data cleanly so that we can analyze it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g. structure, missing values), fixing the oddities, and checking if the fixes worked.

The first we will look at the battle that specify Canada as a combatant

In [None]:
# Filter to only look at the battles that specify Canada as a combatant.
country_data = data[data['bell']=='Canada']
country_data

### Visualizations of battles where Canada is listed as a combatant and casualties

Scroll in and you can hover of the name of the battle to see more information. Bigger the "bubble" the more casualties.

In [None]:
fig = px.scatter_geo(country_data, lat='lat', lon='lng', 
               height=500,
               hover_name='bname', 
               hover_data=['year'],
               color= 'war',
               title='Battles participated in by post-Confederation Canada')

fig.update_layout(showlegend=False)
fig.show()

In [None]:
casualties_con = country_data[country_data['casualties'].notna()]

px.scatter_geo(casualties_con, lat='lat', lon='lng', 
               height=500, hover_name='bname', 
               hover_data=['war', 'year'],
               size = 'casualties',
               color='casualties',
               title='Casualties of battles participated in by Canada')

`▶Run` the code cell below to load values for Canada Bloodiest wars 

In [None]:
con_bloodiest_war = country_data.groupby('war',as_index=False)['casualties'].sum()
con_bloodiest_war = con_bloodiest_war[con_bloodiest_war['casualties'] > 0]
con_bloodiest_war.sort_values('casualties',inplace=True)
con_bloodiest_war

Visualizization of the top 10 bloodiest wars for Canada

In [None]:
top_10_bloddiest_con_wars = con_bloodiest_war[:10:-1]

fig = px.bar(top_10_bloddiest_con_wars,x='war',y='casualties',height=500,title= "Canada's Bloodiest Wars")

fig.show()

### Visualizations showcase historical battles fought in present-day Canada and casualties

The scatter plot on a geographic map displays the battles, where each battle is represented by a marker at its corresponding latitude and longitude coordinates. The markers are color-coded based on the war in which the battle took place. Hovering over a marker reveals additional information such as the battle's name and the corresponding year.

In [None]:
# Southernmost point of Canada is Lake Erie, ON, at 41°40' N; easternmost is Cape Spear, NL, at 52°37' W
# We also need to exclude a single WWII Pacific battle that happened off the coast of Alaska that didn't involve Canada
NA_data = data[(data['lat']>41.6) & 
               (data['lng']<-52.6) & 
               (data['lng']>-160)]

# There's also many wars in this subset that don't feature Canada, so we can list them here to remove:
remove = ['American Revolutionary War',
          'Sioux Wars',
          "Red Cloud's War",
          'Dakota War of 1862',
          'Russo-Tlingit War',
          'Great Sioux War of 1876',
          'Powder River Expedition',
          'American Civil War',
          'Yellowstone Expedition of 1873',
          'Nez Perce War',
          'Comanche Campaign',
          'Boston campaign',
          'Modoc War',
          'American Revolution',
          "King Philip's War",
          'Black Hawk War',
          'Colorado War',
          'American Indian Wars',
          'Forage War',
          "Coeur d'Alene War",
          'Yakima War',
          'Philadelphia campaign',
          'Ghost Dance War']

# Remove wars listed above, as well as battles without a specific war (that all happened in the USA)
NA_data = NA_data[(~NA_data['war'].isin(remove)) & (~NA_data['war'].isnull())]
          
          
          
fig = px.scatter_geo(NA_data, lat='lat', lon='lng',
               height= 500, 
               hover_name='bname', 
               color='war',
               hover_data=['year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

fig.update_layout(showlegend=False)
fig.show()

In [None]:
NA_casualties = NA_data[NA_data['casualties'].notna()]

px.scatter_geo(NA_casualties, lat='lat', lon='lng', 
               height=500, hover_name='bname', 
               size = 'casualties',
               size_max = 30,
               color = 'casualties',
               hover_data=['war','year'], 
               fitbounds='locations',
               title='Historical battles fought in present-day Canada')

### A scatter plot visualization of geographic mapping of battles fought in Canada
This visualization combines geographical data with interactivity, allowing users to explore specific battle locations and associated details based on landscape of the map.

In [None]:
fig = px.scatter_mapbox(NA_data, lat="lat", lon="lng", hover_name="bname", hover_data=['year'],height=500,
                        color_discrete_sequence=["fuchsia"],zoom=2.5)
fig.update_layout(
    mapbox_style="white-bg",
    mapbox_layers=[
        {
            "below": 'traces',
            "sourcetype": "raster",
            "sourceattribution": "United States Geological Survey",
            "source": [
                "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
            ]
        }
      ])
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Questions: 

1. Is there a correlation between the location of battles and the wars in which post-Confederation Canada was involved?
1. What is the overall geographical distribution of the battles in which post-Confederation Canada participated?
1. How does the map representation enhance the understanding or analysis of the data compared to other visualization techniques?

## WORLD 🌎

Analyze other countries' military engagements: 

You can analyze the battles in which **your chosen country** was involved as a combatant. This includes studying the historical context, examining the locations and dates of the battles, and identifying the rivals.

### Data of the battles involving the country of interest


In [None]:
country_of_interest = 'Russia' #you can change this to a country you are interested in

# Filter to only look at the battles that specify Canada as a combatant. Drop unused columns
country_of_interest_data = data[data['bell']==country_of_interest]
display(country_of_interest_data)

### Visualization of the battles involving the country of interest

We can plot the battles on a map with their lattitude and longitude coordinates.

You can drag to move around the map, zoom in and out to get more clarity. Hovering over each data point lists the name of the battle, as well as the war in which the battle was fought.

In [None]:
country_of_interest_data = country_of_interest_data[country_of_interest_data['bname'].notna()]

fig = px.scatter_geo(country_of_interest_data, lat='lat', lon='lng', 
               height=500,
               hover_name='bname', 
               hover_data=['year'],
               color= 'war',
               title='Battles participated in by ' + country_of_interest)

fig.update_layout(showlegend=False)
fig.show()

### Animated Visualization that showcases the wars fought by the country over the years

This visualization represents the battles in which a specific country of interest participated over time.

`▶Run` the next two code cells below to load animation

In [None]:
warnings.filterwarnings("ignore")
years = list(country_of_interest_data['year'].unique())

animate_country_data = pd.DataFrame(country_of_interest_data)

def animation_years(row):
    global animate_country_data
   
    df = pd.DataFrame(columns=animate_country_data.columns)
    index = years.index(row['year'])
    for i in years[index+1:]:
        row[4] = i
        df.loc[len(df.index)] = row
    
    animate_country_data = pd.concat([animate_country_data,df],ignore_index=True)

for i in range(len(country_of_interest_data.index)):
    animation_years(country_of_interest_data.iloc[i])

animate_country_data

In [None]:
def rank_rows(df):
    
    df['sort_rank'] = 0
    curr_rank = 1
    for b in battles:
        temp = df.loc[df['bname'] == b]
        for i,rows in temp.iterrows():  
            if df['sort_rank'][i] == 0:
                df['sort_rank'][i] = curr_rank
                curr_rank += 1
    return df        
            

    
animate_country_data.sort_values(['year'],inplace=True)            
battles = list(animate_country_data.sort_values('year')['bname'].unique())
animate_country_data = rank_rows(animate_country_data)

animate_country_data.sort_values('sort_rank',inplace=True)
animate_country_data

By clicking on the "play" button on the bottom we'll be able to look at the wars that Canada has found throughout the years

In [None]:
px.scatter_geo(animate_country_data, lat='lat', lon='lng', 
               height=500, hover_name='bname', 
               animation_frame= 'year',
               animation_group='war',
               title='Battles participated in by ' + country_of_interest)

Questions: 
1. How does the country of interest's participation in battles compare to its neighboring countries or regions?
1. How does the number of battles vary across different years for the country of interest?

### Visualization focuses on battles involving specific countries 

The below code cell filters the data to only look at battles that contains battles specifically related to the countries of interest. This filtered dataset can be further utilized for analysis, visualization, or other tasks related to battles involving the selected countries.

We can also look at which continents Canada has fought the most in.

**You can change and add/remove countries in this list**

In [None]:
countries = ['Canada','Japan','India','Spain'] #change and add/remove countries in this list

battles_continent = data
battles_continent = battles_continent[battles_continent['continent'].notna()]
battles_continent['interest'] = data['bell'].map(lambda x: True if x in countries else False)
battles_continent = battles_continent[battles_continent['interest'] == True]
battles_continent.rename(columns={'bell':'Country'},inplace=True)
battles_continent

In [None]:
continent_grouped = battles_continent.groupby(['continent','Country'])['locn'].count().reset_index(name='Number of Battles Fought')
continent_grouped.sort_values('Number of Battles Fought',ascending=False,inplace=True)
fig = px.bar(continent_grouped,x='Country',y='Number of Battles Fought',color='continent',height=500,title='Number of Battles fought by each country in different Continents')
fig.show()

Questions: 
1. Are there any notable differences in the number of battles fought between countries within the same continent?
1. Is there a correlation between a country's geographical location (continent) and the number of battles fought?
1. Is there any correlation between Canada's geographic proximity to a specific country and its number of battles fought within that country?

# Conclusion

Want more data science resources for battles fought by Canada? Check out [The cost of the First World War](https://www.callysto.ca/2020/11/11/data-visualization-of-the-week-the-cost-of-the-first-world-war/).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)