<h1>Identified Australian Wars and Resistance</h1>
(c) Bill Pascoe and Kaine Usher, 2025

For important information on how to understand this notebook, see the Introduction <a href="AWR_Introduction.ipynb">AWR_Introduction.ipynb</a>.

This notebook reads in a file of adjusted data on Australian Wars and Resistance prior to the 1930s, including clustering results and produces maps of:

- stages within wars
- wars
- regions within which wars occurred
- periods within which regions were at war

This file is the result of careful observation of results from statio-temporal distance based and k-nearest neighbour clustering methods. This method uses massacres to identify periods of intense conflict that can be regarded as a 'war'. However, many other factors are relevant to wars, so this method can be expected to be roughly accurate but require some adjustment.

Each site was reviewed by Dr Bill Pascoe, who has been working the massacre data for 8 years, to check whether:

- the clusters distinguished by automated methods made sense as wars from his knowledge of historical sources and transmitted Indigenous knowledge
- whether each site should be in the cluster identified

If not, appropriate changes were made. The clustering method proved generally effective in identifying wars with only slight adjustments and corrections needed.

For example, different thresholds result in differently sized clusters in different parts of the country, so clusters identified at different thresholds were selected for different regions. In some cases sites there were outliers and removed from a cluster. For example a massacre happening a decade or more after nearby sites would not have skewed the cluster or war to a very late end date, and not appropriately represented the period of greatest intensity of conflict in that region, though that massacre remains an important part of history. In other cases, clusters could not take into account factors like the Great Dividing Range seperating peoples and sphere's of activity, or yet other cases where massacres did involve people and events across such divides, because the range was used strategically and tactically.

This reviewed and adjusted data contains groupings at several scales:

- Stages: These are stages within a war and are important for understanding the ebb and flow of the war, such as a war starting in one region and ending in another.
- War: These can be understood as distinct wars, for example involving the same peoples, in the same region over a distinct period of time. Wars are often closely related to each other, but to understand them as connected we must first understand them as distinct.
- Regions: These wars happened within broader regions, such as the Kimberley, or the gulf Country, or the East Coast.
- Periods: There are distinct overall periods or phases (early, south, north and late).

The names used for each of these groupings are only temporary working terms related to the location they are in. The naming of these wars needs to be considered with the involvement of Aboriginal and Torres Strait Islander people. 

At this early stage, while there are no clear and fixed boundaries, massacres indicate the extent of open mortal violence across country, and a minimal start and end date for each war. Massacres are only a part of the story. It is expected that the extent of wars will be adjusted in response to further information, and that other events are already, or will come to be regarded as signalling the 'start' or 'end' of war. 

These points may be debated for years or decades to come, but it is important to have identified a kernel of each war (indicated by these clusters of extreme violence in the form of massacres) as a basis from which to improve our knowledge of history. 

<h3>Parameter Selection</h3>

In [None]:
# Enter file path of dataset:
file_path = "MassacresInAustralianWars.csv"


In [None]:
# this setting enables displaying output multiple times in one cell, such as when looping, instead of just the last call.
from IPython.core.interactiveshell import InteractiveShell  
InteractiveShell.ast_node_interactivity = "all" 

<h3>Prepare Data</h3>

In [None]:
import pandas as pd
df_initial = pd.read_csv(file_path)

#df = df_initial.drop(["Narrative", "Sources", "Group", "Linkback"], axis=1)
df = df_initial.filter(["ghap_id", "title", "description", "latitude", "longitude", "datestart", "dateend", "linkback", "Victims", "VictimsDead", "Attackers", "AttackersDead", "MassacreGroup", "KNN1", "War", "Stage", "Region", "Period"], axis=1)

df["ghap_id"] = df["ghap_id"].astype(str)

from geojikuu.preprocessing.projection import MGA2020Projector
mga_2020_projector = MGA2020Projector("wgs84")
results = mga_2020_projector.project(list(zip(df["latitude"], df["longitude"])))
df["mga_2020"] = results["mga2020_coordinates"]
unit_conversion = results["unit_conversion"]

from geojikuu.preprocessing.conversion_tools import DateConvertor
date_convertor = DateConvertor(date_format_in="%Y-%m-%d", date_format_out="%Y-%m-%d")
df['date_converted'] = df['datestart'].apply(date_convertor.date_to_days)


In [None]:
import geopandas


<h3>Output</h3>

<h4>Methods For Cluster Summary</h4>

In [None]:
# This should be refactored. You'd want to just form the cluster once, then get all these properties off the one object
# but it works, and right now getting the results urgently is more important than perfect code.

def getConvexHull(bycol, id, polygononly):
    ## query df_initial for assigned_cluster = id, and make into list, and make into convex hull and add to summary
    cluster = df_initial[df_initial[bycol] == id]

    # temporarily use geopandas to create a 'geometry' from the coordinates in this cluster so we can call the convexhull method on it
    gdf = geopandas.GeoDataFrame(
        cluster, geometry=geopandas.points_from_xy(cluster.longitude, cluster.latitude), crs="EPSG:4326"
    )
    # print ("Convex Hull")
    chull = gdf.geometry.union_all().convex_hull
    #display(chull)

    if (not polygononly) :
        return chull
    if len(cluster.index) > 2 and polygononly :
        return chull
    else :
        return None

def getCentroid(bycol, id, polygononly):
    ## query df_initial for assigned_cluster = id, and make into list, and make into convex hull and add to summary
    cluster = df_initial[df_initial[bycol] == id]

    # temporarily use geopandas to create a 'geometry' from the coordinates in this cluster so we can call the convexhull method on it
    gdf = geopandas.GeoDataFrame(
        cluster, geometry=geopandas.points_from_xy(cluster.longitude, cluster.latitude), crs="EPSG:4326"
    )
    return gdf.geometry.union_all().centroid

def getStart(bycol, id):
    sortedC = df_initial[df_initial[bycol] == id]
    sortedC = sortedC.sort_values(by=["datestart"], ascending=True)
    return sortedC['datestart'].values[:1][0]

def getEnd(bycol, id):
    sortedC = df_initial[df_initial[bycol] == id]
    sortedC = sortedC.sort_values(by=["dateend"], ascending=False)
    return sortedC['dateend'].values[:1][0]

def getCount(bycol, id):
    countthis = df_initial[df_initial[bycol] == id]
    return len(countthis.index)


<h3>Visualisation</h3>

In [None]:
import random
import folium
from folium import plugins

def flipLatLng(ll) :
    return (ll[1],ll[0])

def showMap(df_initial, summary, keyname, fillpolygon):
    
    map_center = [df_initial['latitude'].mean(), df_initial['longitude'].mean()]
    mapc = folium.Map(location=map_center, zoom_start=4)
    
    folium.TileLayer(
        tiles = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
        attr = 'Esri',
        title = 'Esri Satellite',
        overlay = False,
        control = True
        ).add_to(mapc)

 
        # R < ee, R - 6 < G < R-3, B < G+2
    def random_color():
        red = ''.join([random.choice('123456789ABCDE') for _ in range(2)])
        green = ''.join([random.choice('123456789ABCDE') for _ in range(2)])
        blue =  ''.join([random.choice('123456789ABCDE') for _ in range(2)])
        return "#" + red + green + blue
#        return "#" + ''.join([random.choice('23456789ABCDE') for _ in range(6)])
   

    cluster_colors = {cluster: random_color() for cluster in df_initial[keyname].unique()}

    # Add polygons

    if fillpolygon : 
        popacity = 0.4
    else :
        popacity = 0
    
    for _, row in summary.iterrows():
        if (not row["wkt"]) :
            continue
            
        # geopanda, spacey etc generate lat lng in the opposite order to what folium and leaflet assume, so we have to flip the coordinates
        locpoly = list(map(flipLatLng, list(row["wkt"].exterior.coords)))
        
        folium.Polygon(
            locations=locpoly,
            color='#EEEEEE', #cluster_colors[row['War']],
            weight=1,
            opacity=1,
            line_join='round',
            fill_color=cluster_colors[row[keyname]],
            fill_opacity=popacity,
            fill=True,
            popup=f"<b>Title:</b> {row[keyname]}<br><br>"
                  f"<b>Count:</b> {row['count']}<br><br>"
                  f"<b>Centroid:</b> {row['centroid']}<br><br>"
                  f"<b>Earliest massacre:</b> {row['startdate']}<br><br>"
                  f"<b>Latest massacre:</b> {row['enddate']}<br><br>",
             #     f"<b>Temporal Midpoint:</b> {row['temporal_midpoint']}<br><br>"
             #     f"<b>Spatial Midpoint:</b> {row['spatial_midpoint']}<br><br>",
            tooltip="Cluster details",
        ).add_to(mapc)

    # "ghap_id", "title", "datestart", "dateend", "linkback", "Victims", "VictimsDead", "Attackers", "AttackersDead", "count", "mbr", "earliest_date", "latest_date", "temporal_midpoint", "spatial_midpoint", "lat_mid", "lon_mid"
    

    # Add points colour coded in clusters
    for _, row in df_initial.iterrows():
        folium.CircleMarker(
            location=(row['latitude'], row['longitude']),
            radius=5,
            color=cluster_colors[row[keyname]],
            fill=True,
            fill_color=cluster_colors[row[keyname]],
            fillOpacity=1,
            popup=f"<b>Site:</b> {row['title']}<br><br>"
                  f"<b>Lat:</b> {row['latitude']}<br><br>"
                  f"<b>Lon:</b> {row['longitude']}<br><br>"
                  f"<b>Date:</b> {row['datestart']}<br><br>"
                  f"<b>Victims Dead:</b> {row['VictimsDead']}<br><br>"
                  f"<b>Attackers Dead:</b> {row['AttackersDead']}<br><br>"
                  f"<b>Assigned Cluster:</b> {row['KNN1']}<br>"
                  f"<b>Link:</b> <a href='{row['linkback']}' target='_blank'>{row['linkback']}</a><br>"
        ).add_to(mapc)
    folium.plugins.Fullscreen(
        position="topright",
        title="Expand me",
        title_cancel="Exit me",
        force_separate_button=True,
    ).add_to(mapc)
    return mapc



<h2>Results</h2>

In [None]:
<h3>Wars</h3>

In [None]:
# Go through the spreadsheet, and get each unique key for war, stage, region, period

warTitles = df_initial['War'].unique()
#display(warTitles)

allWars = []

for war in warTitles:
    #print(war)
    #warSites = getConvexHull("War", war, True)
 #   display(warSites)
    allWars.append(
        {
            'War': war,
            'wkt': getConvexHull("War", war, True),
            'count': getCount("War", war),
            'centroid': getCentroid("War", war, True),
            'startdate': getStart("War", war),
            'enddate': getEnd("War", war)
        }
    )

df_war = pd.DataFrame(allWars)
#display(df_war)
#df_initial.head

showMap(df_initial, df_war, "War", True)




In [None]:
<h3>Stages</h3>

In [None]:
# Go through the spreadsheet, and get each unique key for war, stage, region, period

stageTitles = df_initial['WarStage'].unique()

allStages = []

for stage in stageTitles:

    allStages.append(
        {
            'WarStage': stage,
            'wkt': getConvexHull("WarStage", stage, False),
            'count': getCount("WarStage", stage),
            'centroid': getCentroid("WarStage", stage, False),
            'startdate': getStart("WarStage", stage),
            'enddate': getEnd("WarStage", stage)
        }
    )

df_stage = pd.DataFrame(allStages)

#showMap(df_initial, df_stage, "WarStage", True)


In [None]:
<h3>Regions</h3>

In [None]:
# Go through the spreadsheet, and get each unique key for war, stage, region, period

regionTitles = df_initial['Region'].unique()
#display(regionTitles)

allRegions = []

for region in regionTitles:

    allRegions.append(
        {
            'Region': region,
            'wkt': getConvexHull("Region", region, True),
            'count': getCount("Region", region),
            'centroid': getCentroid("Region", region, True),
            'startdate': getStart("Region", region),
            'enddate': getEnd("Region", region)
        }
    )

df_region = pd.DataFrame(allRegions)

showMap(df_initial, df_region, "Region", True)


In [None]:
<h3>Periods</h3>

In [None]:
# Go through the spreadsheet, and get each unique key for war, stage, region, period

periodTitles = df_initial['Period'].unique()
display(periodTitles)

allPeriods = []

for period in periodTitles:

    allPeriods.append(
        {
            'Period': period,
            'wkt': getConvexHull("Period", period, True),
            'count': getCount("Period", period),
            'centroid': getCentroid("Period", period, True),
            'startdate': getStart("Period", period),
            'enddate': getEnd("Period", period)
        }
    )

df_period = pd.DataFrame(allPeriods)

showMap(df_initial, df_period, "Period", True)

<h3>Output to file</h3>

In [None]:
# Out to file
df_war.to_csv('WarsOut.csv', index=False)
gdf = geopandas.GeoDataFrame(
    df_war, geometry="wkt"
)
gdf.to_file("WarsOut.json", driver="GeoJSON")
gdf.to_file("WarsOut.kml", driver="KML")

df_stage.to_csv('StagesOut.csv', index=False)
gdf = geopandas.GeoDataFrame(
    df_stage, geometry="wkt"
)
gdf.to_file("StagesOut.json", driver="GeoJSON")
gdf.to_file("StagesOut.kml", driver="KML")

df_region.to_csv('RegionsOut.csv', index=False)
gdf = geopandas.GeoDataFrame(
    df_region, geometry="wkt"
)
gdf.to_file("RegionsOut.json", driver="GeoJSON")
gdf.to_file("RegionsOut.kml", driver="KML")

df_period.to_csv('PeriodsOut.csv', index=False)
gdf = geopandas.GeoDataFrame(
    df_period, geometry="wkt"
)
gdf.to_file("PeriodsOut.json", driver="GeoJSON")
gdf.to_file("PeriodsOut.kml", driver="KML")

<b>NOTE: the error "UserWarning: 'crs' was not provided." is a known bug that is difficult to work around. The files with coordinates are output regardless of this message.</b>
