# Roman Emperors from 26BC-395AD Deaths Analyzed
#### A Collection of Findings on how Century and Accession Type Related to Cause of Death


Name: Clayton Cotter-Wasmund

Student Number: 100824037

Course: Scientific Data Analysis

## Context

#### Background
This dataset is a collection of Roman Emperors from 26BC - 395AD and various important facts and dates about their life, including:
- Names
- Dates of Birth
- Dates of Death
- Reign Beginings
- Reign Ends
- Cause of Death
- Accession
- and others...

It was retrived through [Zonination's GitHub repository](https://github.com/zonination/emperors/) after being found on [Kaggle.com](https://www.kaggle.com/datasets/lberder/roman-emperors-from-26-bc-to-395-ad).

#### Questions

1. How did **accession type**(rise to power) conincide with **cause of death**?
    - Assumedly most userpers we're assassinated/executed.
    - Assumedly emperors by birthright died in wars/of natural causes.
    
2. How does an emperors **century of reign** relate to their **cause of death**?
    - Assumedly very little deaths in earlier years of the empire, increasing in quantity and ferocity with time.
        - Roman Empire falls 84 years after this dataset ends in 479AD.
        
3. How does the **length of reign** relate to **cause of death**?
    - Assumedly the longer the reign, the more likely a natural death is, and vice versa.
    - Assumedly longer reigns in the earlier empire.

#### Thoughts
The reason I chose these three questions was because they closely relate with each other and are revelent enough to draw their own and connected conclusions. Other questions I had thought to ask were, 'What emperors had others keep their namesake?', 'Does the age of the emperor play any effect on the length of their reign?' etc. But many of these questions were effected far to heavily by unrelated factors that drawing any conclusions from them well interesting, would lack meaningful relation.
  

In [159]:
#Library Importing

import pandas as pd
import matplotlib as plt
import numpy as np

In [160]:
#Dataset Importing

#Initializing Raw DataFrame
df = pd.read_csv('https://raw.githubusercontent.com/zonination/emperors/master/emperors.csv', encoding='latin-1')

In [161]:
#Preliminary Cleaning

#Removal of Unnecessary Data
df = df.drop(['verif.who', 'notes', 'index', 'birth.cty', 'birth.prv', 'dynasty', 'era', 'name.full', 'killer'], axis=1)

#Renaming for Readability
df.columns = ['Name', 'DOB', 'DOD', 'Accession','Reign Start','Reign End', 'Cause of Death']

In [162]:
#Check of Dataframe to Insure Proper Import
df

Unnamed: 0,Name,DOB,DOD,Accession,Reign Start,Reign End,Cause of Death
0,Augustus,0062-09-23,0014-08-19,Birthright,0026-01-16,0014-08-19,Assassination
1,Tiberius,0041-11-16,0037-03-16,Birthright,0014-09-18,0037-03-16,Assassination
2,Caligula,0012-08-31,0041-01-24,Birthright,0037-03-18,0041-01-24,Assassination
3,Claudius,0009-08-01,0054-10-13,Birthright,0041-01-25,0054-10-13,Assassination
4,Nero,0037-12-15,0068-06-09,Birthright,0054-10-13,0068-06-09,Suicide
...,...,...,...,...,...,...,...
63,Valentinian I,0321-07-03,0375-11-17,Election,0364-02-26,0375-11-17,Natural Causes
64,Valens,0328-01-01,0378-08-09,Birthright,0364-03-28,0378-08-09,Died in Battle
65,Gratian,0359-04-18,0383-08-25,Birthright,0367-08-04,0383-08-25,Assassination
66,Valentinian II,0371-01-01,0392-05-15,Birthright,0375-11-17,0392-05-15,Suicide


## Imported Dataset

The first step to be able to import .csv files into Jupyter, is to import the pandas library. Pandas is a data structures library that I will use primarily for it's initialization and manipulation features of dataframes. Pandas automatically reads and initalizes a dataset from the .csv file provided through its *read_csv()* function. I also import matplotlib, a plotting library I will explain further when used. 

After this I will clean the dataset of columns that aren't related to my questions. This is done using the .drop() function which gets rid of columns, as stated by *axis=1*, of the names contained within it's *[ ]*. Finally, the last step in the initalization of the dataframe is to improve the readability and typeability of the dataframe. To do this I take the columns of the dataframe using *.columns* method.



In [163]:
#Creating a Dataframe with Dates Usable for Plots

#To easily track my steps I create a new dataframe based of the initialized & cleaned one.
dfReigns = df

#Cannot use pandas built in datetime system since it doesn't read dates early.
#Remove MM/DD from Reigns
dfReigns['Reign Start'] = df["Reign Start"].str[:-6]
dfReigns['Reign End'] = df["Reign End"].str[:-6]

In [164]:
#Create Centurys Column by Reigns End Year
dfReigns['Century'] = dfReigns['Reign End'].str[1:-2]
dfReigns = dfReigns.replace({'Century': {'0': '1st','1': '2nd','2': '3rd','3': '4th'}})

In [165]:
#Convert to Ints
dfReigns['Reign Start'] = dfReigns['Reign Start'].astype(int)
dfReigns['Reign End'] = dfReigns['Reign End'].astype(int)

#Augustus rose to power in 26 BC so make his start negative
dfReigns['Reign Start'] = dfReigns['Reign Start'].replace([26], -26)

In [166]:
#Reign Length Calculation
dfReigns['Reign Length'] = dfReigns['Reign End'] - dfReigns['Reign Start']

## Fault-Tolerant Cleaning

When working with data that you are changing, it's important to keep a history of how the data was modified. This dataframe will  be used for plotting. Because of this, it's important to put the data into a form interpretable by both pandas, and matplotlib. Because these dates are lower than pandas minimum datetime range, to use these values I have to cut them to their year. Since the reigns were written as strings, I can slice out the parts I don't need (month and day) and convert them into integers. Then I'll create a Century column using a sliced *Reign End* column and changing the years to match the centuries the reigns ended in. I further clean the data, and keep Augustus' data correct by representing it as a BC value in relative integer form. I then create a reign length column using a reign length calculation that takes the ending reign year and subtracts it from the starting reign year.

In [168]:
#Plot Creation (Quanitity and Type of Death byCentury)

#Grouping Relevent Data
dfCentByCause = dfReigns.groupby(['Century', 'Cause of Death']).size().unstack()

#Plot Creation, Stacked for Readability
cbcPlot = dfCentByCause.plot.bar(stacked = True, cmap='Pastel1')

#Plot Adjustment
cbcPlot.legend(loc='upper left', bbox_to_anchor=(0.01, 1), shadow=True)
cbcPlot.set_title("Cause of Death by Century")
cbcPlot.set_ylabel('Number of Deaths')

<IPython.core.display.Javascript object>

Text(0, 0.5, 'Number of Deaths')

In [169]:
#Table Creation(Total Number of Deaths By Century)
dfCentByCauseTable = dfReigns.groupby(['Century', 'Cause of Death']).size().unstack().fillna(0)
dfCentByCauseTable['Total Deaths'] = dfCentByCauseTable.sum(axis=1)

#Drop all columns not total death or century
dfCentByCauseTable = dfCentByCauseTable['Total Deaths']
dfCentByCauseTable = pd.DataFrame(dfCentByCauseTable)

pd.set_option('display.precision', 0)
dfCentByCauseTable


Unnamed: 0_level_0,Total Deaths
Century,Unnamed: 1_level_1
1st,12
2nd,8
3rd,28
4th,20


In [170]:
#Reign Length During the Third Century
dfThird = dfReigns
dfThird = dfThird[dfThird['Century'] == '3rd']
print(len(dfThird.index))
dfThird = dfThird.drop(['DOB', 'DOD', 'Accession', 'Cause of Death', 'Century'], axis=1)
dfThirdLR = dfThird[dfThird['Reign Length'] >= 10]
print(dfThirdLR['Reign Length'].sum())
dfThirdLR

28
65


Unnamed: 0,Name,Reign Start,Reign End,Reign Length
20,Septimus Severus,193,211,18
21,Caracalla,198,217,19
25,Severus Alexander,222,235,13
38,Gallienus,253,268,15


In [171]:
#Reign Length During the First Century
dfFirst = dfReigns
dfFirst = dfFirst[dfFirst['Century'] == '1st']
print(len(dfFirst.index))
dfFirst = dfFirst.drop(['DOB', 'DOD', 'Accession', 'Cause of Death', 'Century'], axis=1)
dfFirstSR = dfFirst[dfFirst['Reign Length'] >= 10]
print(dfFirst.sum())
dfFirstSR

#89 years of reign between these 6 during 1st century

12
Name            AugustusTiberiusCaligulaClaudiusNeroGalbaOthoV...
Reign Start                                                   651
Reign End                                                     775
Reign Length                                                  124
dtype: object


Unnamed: 0,Name,Reign Start,Reign End,Reign Length
0,Augustus,-26,14,40
1,Tiberius,14,37,23
3,Claudius,41,54,13
4,Nero,54,68,14
8,Vespasian,69,79,10
10,Domitian,81,96,15


In [172]:
#Plot Creation (% of Cause of Death by Accession Type)

#Grouping Relevent Data
dfAccByCause = dfReigns.groupby(['Accession', 'Cause of Death']).size().unstack()

#Divide Each Accession Cause of Death by Total Cause Of Each Accessions Death.
dfAccByCause = dfAccByCause.div(dfAccByCause.sum(axis=1), axis=0)

#Drop Election and Purchase since only 1 of each happened, and outliers of death for this data is impossible to draw conclusions
dfAccByCause = dfAccByCause.drop(['Election', 'Purchase'])

#Improve Aethestics and Readabiltiy
dfAccByCause.index = ['Apt. by Army','Apt. by Emperor','Apt. Praet. Guard','Apt. by Senate','Birthright','Seized Power']

#Plot Adjustment
abcPlot = dfAccByCause.plot.bar(cmap='Pastel1')
abcPlot.legend(loc='upper right', bbox_to_anchor=(1, 1), ncol=3, shadow=True)
abcPlot.set_ylim([0,1])
abcPlot.set_xlabel("Accession Type")
abcPlot.set_ylabel("Cause of Death (%)")
abcPlot.set_title('Cause of Death by Accession Type')

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Cause of Death by Accession Type')

In [173]:
dfLenByCent = dfReigns.groupby(['Reign Length', 'Cause of Death']).size().unstack().fillna(0)

dfLenByCent2 = dfLenByCent.iloc[0:0]

dfLenByCent2.loc['0'] = dfLenByCent.iloc[0].values
dfLenByCent2.loc['1-5'] = dfLenByCent.iloc[1:6].values.sum(axis=0)
dfLenByCent2.loc['6-10'] = dfLenByCent.iloc[6:10].values.sum(axis=0)
dfLenByCent2.loc['11-15'] = dfLenByCent.iloc[10:14].values.sum(axis=0)
dfLenByCent2.loc['16-20'] = dfLenByCent.iloc[14:18].values.sum(axis=0)
dfLenByCent2.loc['21-25'] = dfLenByCent.iloc[18:21].values.sum(axis=0)
dfLenByCent2.loc['26-30'] = [0,0,0,0,0,0,0]
dfLenByCent2.loc['31-35'] = dfLenByCent.iloc[21:22].values.sum(axis=0)
dfLenByCent2.loc['31-35'] = dfLenByCent.iloc[22].values.sum(axis=0)

#learned this can be done with the 'nth' groupby function in lecture 21

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfLenByCent2.loc['0'] = dfLenByCent.iloc[0].values
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfLenByCent2.loc['1-5'] = dfLenByCent.iloc[1:6].values.sum(axis=0)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfLenByCent2.loc['6-10'] = dfLenByCent.iloc[6:10].values.sum(axis=0)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-vi

In [174]:
#Plot Creation (Length of Reign Compared to Cause of Death)

lbcPlot = dfLenByCent2.plot.bar(stacked = True, cmap='Pastel1')


#Plot Adjustment
lbcPlot.legend(loc='upper right', bbox_to_anchor=(1.4, 1), shadow=True)
lbcPlot.set_xlabel("Length of Reign")
lbcPlot.set_ylabel("Cause of Death")
lbcPlot.set_title('Length of Reign by Cause of Death')

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Length of Reign by Cause of Death')

## Plot Creation

Plot generation was done with the matplotlib, a customizable plotting library used primarily for it's easy manipulation of graphs, which is why it was imported in line 1. To keep each plot's data relevent and accurate, and maintain a fault-tolerant assignment, a new dataframe was created to evaluate each dataset.

All graphs follow a standard proccess of:

- Grouping using the *.groupby()* function for neccesary table parameters.
- Plotting as a bar graph using the *.plot.bar()* function.
- Adjusting using matplotlibs highly customizable plotting library.

1. **Quantity and Cause of Emperor Death by Century**
    
    - Grouping of 'Century' and 'Cause of Death' columns.
    - Creation of stacked-bar-plot and assignment of the built in colourmap *Pastel1*.
    - Adjustment of Legend.
    - Creation of graph title and y-label subtitle.
    - Creation of table that takes sum of all deaths every century.
    - Creation of table that checks reign lengths >= 10 yrs in first century.
    
    
2. **Death by Accession Type**

    - Grouping of 'Accession' and 'Cause of Death' columns.
    - Dividing all 'Cause of Death' values by the sum of the total value for each accession type.
        - Done to improve presentablility of graph.
        - Graph now displays percentages of a whole instead of full numbers.
    - Dropping of 'Election' and 'Purchase' accession types as only 1 of each happened (outlier data).
    - Shortened accession strings to improve readability when graphed.
    - Creation of bar plot and assignment of the built in colourmap *Pastel1*.
    - Adjustment of Legend.
    - Changing the graphs maximum y-data height to evaluate from 0-1 (0%-100%).
    - Creation of title and both labels.


3. **Length of Reign by Cause of Death**

    - Grouping of 'Reign Length' and 'Cause of Death' columns (need int values so all NaN data changed to 0).
    - Creation of second data frame as a copy of the original with all index values slice out.
    - Manually index slicing them to input into in the new data frame, sorted in intervals of 5. (exception of 0)
        - Due to pandas and numpys sum() function and interactions with dataframes in for-loops, I was unable to get the sum of the values by the value in the index with an efficent code time complexity or manually slicing index values regardless.
    - Creation of stacked-bar-plot and assignment of the built in colourmap *Pastel1*.
    - Adjustment of Legend.
    - Creation of title and both labels.

## Discussion 

Before starting the project, I initally assumed that overall deaths would go up over time, simply due to the tumultuous later half of the Roman empire.

**Figure 1: Table of Total Deaths by Century**

In [175]:
dfCentByCauseTable

Unnamed: 0_level_0,Total Deaths
Century,Unnamed: 1_level_1
1st,12
2nd,8
3rd,28
4th,20


Although overall the quantity of deaths did rise with time, I was quite shocked to see the third century carrying more deaths than the fourth, and passing all other centuries by a significant amount. While I was correct in my inital assumption, there appears to be something else going on.

After realizing significantly more deaths occured in the third century I began to wonder why. To gather more data I grouped all emperors who lived more than 10 years. What I found was staggering, despite twenty eight emperors ruling in the third century, only four had ruled for more than 10 years, and those four comprised sixty five of the one hundred years of that centuries rule.

**Figure 2: Table of Reigns Longer Than 10 Years During the Third Century**

In [176]:
print("Number of Emperors in Third Century: " + str(len(dfThird.index)))
print("Sum of Reigns Lasting more than ten years in Third Century: " + str(dfThirdLR['Reign Length'].sum()))
dfThirdLR

Number of Emperors in Third Century: 28
Sum of Reigns Lasting more than ten years in Third Century: 65


Unnamed: 0,Name,Reign Start,Reign End,Reign Length
20,Septimus Severus,193,211,18
21,Caracalla,198,217,19
25,Severus Alexander,222,235,13
38,Gallienus,253,268,15


As the table shows, the vast majority of those years occured going into the third century ending in 235, and never quite recovering to any reasonable metric. History shows that political and militaristic instability often destroy empires, and while true, the start of the fall occurs almost 200 years before the fall of Rome in 479. During the Roman empire, and republic before it, [civil wars](https://en.wikipedia.org/wiki/Fall_of_the_Western_Roman_Empire#313%E2%80%93376:_Civil_and_foreign_wars) were common, this was especially true of the third century of the Roman empire, commonly refered to as the [crisis of the third century](https://en.wikipedia.org/wiki/Crisis_of_the_Third_Century).

**Figure 3: Bar Plot of Causes of Death By Century**

In [177]:
cbcPlot = cbcPlot
cbcPlot

<AxesSubplot:title={'center':'Cause of Death by Century'}, xlabel='Century', ylabel='Number of Deaths'>

## Discussion 



Before starting the project, I initally assumed that overall deaths would go up over time, simply due to the tumultuous later half of the Roman empire.

**Figure 1: Table of Total Deaths by Century**
![Total Deaths By Century Table](./images/TotalDeathsByCenturyTable.png)

Although overall the quantity of deaths did rise with time, I was quite shocked to see the third century carrying more deaths than the fourth, and passing all other centuries by a significant amount. While I was correct in my inital assumption, there appears to be something else going on.

After realizing significantly more deaths occured in the third century I began to wonder why. To gather more data I grouped all emperors who lived more than 10 years. What I found was staggering, despite twenty eight emperors ruling in the third century, only four had ruled for more than 10 years, and those four comprised sixty five of the one hundred years of that centuries rule.

**Figure 2: Table of Reigns Longer Than 10 Years During the Third Century**
![Third Century Reigns Longer Than 10 Years](./images/3rdCentLongerThan10Y.png)

As the table shows, the vast majority of those years occured going into the third century ending in 235, and never quite recovering to any reasonable metric. History shows that political and militaristic instability often destroy empires, and while true, the start of the fall occurs almost 200 years before the fall of Rome in 479. During the Roman empire, and republic before it, [civil wars](https://en.wikipedia.org/wiki/Fall_of_the_Western_Roman_Empire#313%E2%80%93376:_Civil_and_foreign_wars) were common, this was especially true of the third century of the Roman empire, commonly refered to as the [crisis of the third century](https://en.wikipedia.org/wiki/Crisis_of_the_Third_Century).

**Figure 3: Bar Plot of Causes of Death By Century**
![Cause of Death by Century plot](./images/CauseByCenturyPlot.png)

While assassinations were always somewhat common in the Roman empire, the third century contains significantly more. Many emporers were assainated via coup, their own military, or by political opponents. Many of those who didn't, met their fate on the battle fields. Despite many emporers dying due to natural causes in this time period, they often didn't reign long and were considered temporary leaders until others could be found. Overall, the third century definitely stands out as a defining year in the Roman empires existence, but not in a good way, and is likely to be the seeds that sowed the fall of the nation.

**Figure 4: Bar Plot of Cause of Death by Length of Reign**
![Reign Length by Cause of Death plot](./images/ReignLengthByCausePlot.png)

Something of notice is that during the crisis of the third century, of the four emperors who lived more than ten years, only the first, Septimus Severus, died of natural causes, all three others were assassinated, again highlighting the incredible instability of that time period. As this plot shows, assassinations remained a fairly common way for emperors to die until they reigned more than fifteen years, at which point it becomes incredibly uncommon comparativley. It's important to note that this graph is right-skewed, in large part due to the crisis of the third century, many died of natural causes very early on in their reigns simply because there wasn't many people deemed reliable enough to run the empire, most of these emperors children were still young children to teenagers at the time of their deaths and so often times seniority would take place in familys/dynasties. 

**Figure 5: Bar Plot of Causes of Death By Accession Type**
![Cause of Death by Accession plot](./images/CauseByAccessionPlot.png)

Intriguingly, the probability of an emperor being assassinated was much greater if they were installed by a militaristic power. This may seem obvious, but it's important to note that in the Roman republic and many other nations before modern day, military endorsement was considered extremely important to political power, as despite modern totalitarian ideals, this graphs seems to indicate the opposite of a global normal. Those appointed by other emperors often lived long enough to die of natural causes, but again, this is also an inflated statistic due to the third centuries instabilitiy and age of emperors. What is incredibly interesting is that many emperors who siezed power from others died of natural causes, far more than those by birthright and almost as many as those appointed by other emperors, they also were assassinated in staggeringly low frequencies. Hilariousily, during an empire, one of the most stable accession types, was by senate election.

#### Final Thoughts

In the questions I posed, the vast majority of my assumptions we're proven false. Most userpers ruled until their deaths, in complete opposition to modern examples, and while birthright wasn't the least stable, still almost 50% of all emperors by birthright were assassinated, about the same number as their deaths by natural causes and battlefield combined.

While deaths did increase with the age of the empire, and the poltical instability that followed, the graph was far from a linear increase. Almost as many emperors in the first century were assassinated as the total deaths of all in the second. The third, and the crisis that it created increased both quantity and variety of deaths to an unheard of level, that was never quite satiated.

Longer reigns did coincide with higher natural/self-inflicted death rates, however this is only true for rulers of fifteen years and beyond. Until that point in an emperors reign deaths were still mainly assassinations, even those who ruled for over a decade. Shockingly, the most likely outcome of death for emperors who ruled between eleven and fifteen years was assassination, and by a large margin. There were however longer reigns in the earlier empire with eighty-nine years of the first century ruled by just 6 people including Augustus, the first emperors, 40 year long reign. Fourteen years of which happened during the first century.

## References

   [GitHub.com](https://github.com/zonination/emperors/)
    
   [Kaggle.com](https://www.kaggle.com/datasets/lberder/roman-emperors-from-26-bc-to-395-ad)
   
   [MatPlotLib](https://matplotlib.org/)
   
   [Pandas](https://pandas.pydata.org/)
      
   Wikipedia:
   
   - [Crisis of the Third Century](https://en.wikipedia.org/wiki/Crisis_of_the_Third_Century)
   
   - [Fall of the Western Roman Empire (Civil and Foreign wars)](https://en.wikipedia.org/wiki/Fall_of_the_Western_Roman_Empire#313%E2%80%93376:_Civil_and_foreign_wars)