# Case study: The use of camels in warfare

Imagine you are a historian working with the Seshat Global History Databank and you're particularly interested in exploring the use of camels in warfare. Let's imagine you have a particular interest in the millenium 500CE to 1500CE.

One of the first things you might wish to do is load all of the data that Seshat has for the camel variable, and explore which polities (and how many) are recorded as having "present" for the camels field, compared with "absent" and "unknown". Note a value "unknown" means that the polity has been recorded as being unknown for this variable, not that the data is missing.

In [None]:
from seshat_api import SeshatAPI, get_variable_classes
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
client = SeshatAPI(base_url="https://seshatdata.com/api")
# client = SeshatAPI(base_url="https://seshat-db.com/api")

In [None]:
from seshat_api.wf import Camels
camels = Camels(client)
camels_df = pd.DataFrame(camels.get_all())

In [None]:
# Extract the polities column to a new dataframe
polities_with_camels_df = pd.DataFrame(camels_df['polity'].tolist())

# Add the camel column to the new dataframe
polities_with_camels_df['camel'] = camels_df['camel']

polities_with_camels_df.sample(5)

In [None]:
# Get the number of polities active in the year 500CE where camel == 'present'
len(polities_with_camels_df[
    (polities_with_camels_df['start_year'] <= 500) &
    (polities_with_camels_df['end_year'] >= 500) &
    (polities_with_camels_df['camel'] == 'present')
])

In [None]:
# Get the range of years where data on camels exists
# Note: the range() function in Python generates numbers up to, but not including, the stop value
years = range(int(polities_with_camels_df['start_year'].min()),
              int(polities_with_camels_df['end_year'].max()) + 1)
years

In [None]:
# Let's say we're only interested in the years 500CE to 1500CE
# Note: the range() function in Python generates numbers up to, but not including, the stop value
years = range(500, 1501)
years

In [None]:
# Get a list of all the possible values for this variable (camel) which we know is an absent/present variable
absent_present_values = list(polities_with_camels_df['camel'].unique())
absent_present_values

In [None]:
# Create a new empty DataFrame to store the frequency of each camel value for each year
frequency_df = pd.DataFrame(index=years, columns=absent_present_values).infer_objects(copy=False).fillna(0)
print(frequency_df.sample(5))  # Show a random sample of 5 rows

In [None]:
# Iterate over each year and count the occurrences of each camel value
# Count the number of rows that match the filter criteria and assign it to the frequency DataFrame
for year in years:
    for val in absent_present_values:
        frequency_df.loc[year, val] = len(polities_with_camels_df[
            (polities_with_camels_df['start_year'] <= year) &
            (polities_with_camels_df['end_year'] >= year) &
            (polities_with_camels_df['camel'] == val)
        ])

In [None]:
print(frequency_df.sample(5))  # Show a random sample of 5 rows

In [None]:
# View the frequencies for the specific year 500
frequency_df.loc[500]

Now let's take a look at the whole 500CE to 1500CE time period and see the numbers of polities recorded as "absent", "present" or "unknown" with a plot:

In [None]:
# Create a new figure to plot the data
plt.figure(figsize=(13, 7))

# Iterate over each absent/present value and plot the data
for val, color in zip(absent_present_values, ['orange', 'green', 'red']):
    plt.plot(frequency_df.index, frequency_df[val], color=color)

plt.xlabel('Year')
plt.ylabel('Polities')
plt.title('Polities using Camels in warfare: 500CE to 1500CE')

# Ensure y-axis ticks are whole numbers
plt.yticks(range(int(frequency_df.values.max()) + 1))

# Ensure x-axis ticks go right to the edge of the plot
plt.xlim(frequency_df.index.min(), frequency_df.index.max())

# Add legend to the plot
plt.legend([val.capitalize() if val != 'unknown' else 'Coded Unknown' for val in absent_present_values])

# Display the plot
plt.show()

# Suggested task

Imagine you are now exploring the use of different metals for military purposes. The Seshat database has data for `Steel`, `Iron`, `Copper` and `Bronze`. Choose an era of interest and see if you can make a plot showing the number of polities having each of these metals recorded as "present" over time.