# Women And Children First (Code-free)

## Introduction

The goal of this short project is to **assess whether survival rates on the _Titanic_ obeyed the non-written rule "women and children first."** To do that, I put together a more comprehensive dataset than you'd normally find on most ML websites and such likes; see the [data wrangling](https://github.com/NicolaBagala/portfolio/blob/master/titanic/titanic_wrangling.ipynb) part of this project for more information on the data sources. Note that this is the code-free version of the project; if you want to see the code, you can do so [here](https://github.com/NicolaBagala/portfolio/blob/master/titanic/titanic_analysis.ipynb).

The fact that victims of the _Titanic_ were mostly men, both in absolute and percentage terms, is actually nothing new, but I was curious to see whether this fact was true also when accounting for factors like age, affiliation (i.e., passenger or crew member), class, etc. That's what we'll do in this project.

## The dataset

The dataset contains information about 2165 people in total, which is just barely short of the estimated full number of people onboard ([2,224](https://en.wikipedia.org/wiki/Titanic) according to Wikipedia.) Of these, 1304 were passengers and 861 were crew. The dataset is organised into columns as follows:

- `NAME`, `AGE`, and `SEX` are just standard demographic information. 
- `AGE_BRACKET` is a broader categorisation of each person's age: `0-9`, `10-19`, `20-29`, etc.
- `AFFILIATION` is either `Crew` or `Passengers`, and it is used exclusively to be able to quickly separate the two sets of people.
- `GROUP` applies to crew members only, and it identifies the eight broader groups to which a crew member could belong: Officers, Deck, Engineering, Victualling, Restaurant, Postal clerks, Guarantee group, and Orchestra. For passengers, this column has value `Not applicable`.
- `POSITION` identifies the specific job of each crew member. For passengers, this column has value `Not applicable`.
- `CLASS` can be `First`, `Second`, or `Third` in the case of passengers and of the Guarantee group crew members, who for some reason were given passenger accommodations. For other crew members, this column has value `Not applicable`.
- `SURVIVED` is `True` for people who survived the disaster, and `False` for those who didn't make it.

In some instances, we're going to compare actual survival and death rates with partly randomised ones, obtained by randomly choosing 1462 people as victims and the rest as survivors. (As we'll see later on, 1462 is the number of actual victims in the dataset.) The overall randomised death and survival rates will be identical to the originals, but all other possible influencing factors will be eliminated. In other words, we'll know there are 1462 victims whose age, sex, affiliation, etc played no role in their fate. For the rest of the analysis, we'll refer to the ensemble of randomly chosen survivors and victims as the *shuffled Titanic*; the *shuffled crew*, the *shuffled passengers*, etc, will refer to randomly chosen survivors/victims among the crew, the passengers, etc. We'll often use the abbreviation "SHF" to refer to shuffled groups of people.

## Broad overview

Before analysing death and survival rates of crew and passengers, let's familiarise a bit with both on a broader level.

On average, both crew members and passengers of the _Titanic_ were fairly young; the mean age on board was just above 30, and most of the people were between 20 and 40, especially among the crew. If age-wise the two groups are quite different, they're even more so sex-wise.

![](figures/fig_1.png)

_`Figure 1. Age and sex distribution of the Titanic's crew and passengers.`_

Overall, most people on the ship were male. Women constituted only a very small fraction of the crew, and only about 36% of the passengers.

![](figures/fig_2.png)

_`Figure 2. Sex distribution of the Titanic's passengers.`_

Below is a bar chart visualising the composition of the different crew groups by sex. The victualling and engineering people constituted the vast majority of the crew; most women belonged to the victualling crew, and only very few of them were part of the restaurant crew.

![](figures/fig_3.png)

_`Figure 3. The Titanic's crew, by group and sex.`_

With the exception of the nine people in the Guarantee group, crew members had different accomodations than the passengers that weren't divided by class in the same way. For both sexes, the majority of the passengers had a third-class accommodation (which is unsurprising because of price considerations), but the percentages of men and women in third class differ significantly, about 60% and 45% respectively. Slightly more female passengers, in percentage, occupied second-class accommodations, and about 11% more females than males occupied first-class accommodations.

![](figures/fig_4.png)

_`Figure 4. Passengers of the Titanic, by sex and class.`_

## Analysis of the survivors

Let's move on to survival and death rates, to see if and how they vary when accounting for different factors. Let's start with the larger picture: **Overall, 67.5% of the people aboard the ship died—that is 1462 people.**

![](figures/fig_5.png)

_`Figure 5. Survival and death rates on the whole ship.`_

**Amongst the dead, around 91% were men, while only about 9% were women.** The split among survivors is instead nearly 50-50 between males and females. Put another way, if you picked a _Titanic_ survivor at random, they could be almost equally likely a man or a woman; if you picked a random victim, the odds are the person was male.

![](figures/fig_6.png)

_`Figure 6. Percentages of male and females among victims and survivors. Real and shuffled Titanic.`_

Since men were more numerous than women, if survival and death were up only to chance you would expect to see more men than women among both victims and survivors, in roughly the same percentages as they appear in the sample: 77% for men and 22% for women, which is what happens on the shuffled _Titanic_. On the real _Titanic_, we see instead that **the actual rates are extremely skewed in favour of women.** 

Another question is what percentage of men and women survived—or equivalently, if you picked a random man (woman) from all the people who were onboard, what are the odds that he (she) was a survivor? The data shows that **a randomly chosen woman has just above a 73% survival chance, whereas a randomly chosen man only has around 20.5% chance of being among male survivors.** On the shuffled _Titanic_, men and women would die and survive in roughly the same percentages, of course with more victims overall because we set the general death rate very high.

![](figures/fig_7.png)

_`Figure 7. Percentages of victims and survivors among males and females. Real and shuffled Titanic.`_

Let's see whether the percentage of survivors for males and females changes when age is taken into account. The histogram below shows all people on the _Titanic_ categorised by age group, together with the survivors and shuffled survivors for each bracket.  (Note that the green percentages only refer to the real _Titanic_; the green bars and the red line are absolute counts in this and similar charts.) We see that, with few exceptions, survivors are always below 50% for each bracket; the numbers of shuffled survivors for each age bracket are often close to those actually observed, suggesting that age might not have been a very important factor determining survival.

![](figures/fig_8.png)

_`Figure 8. Titanic survivors by age bracket. Real and shuffled Titanic.`_

The situation changes quite dramatically when sex is taken into account: **the percentage of female survivors is always significantly above 50%, no matter the age bracket; for males, only two brackets have a percentage of survivors higher than 50%,** while all others are between 0 and 22 percent. While the difference in size of each age bracket between the sexes is significant, the pattern is perfectly consistent; for each shared age bracket, female survival rates are moderately to much higher compared to male survival rates. Additionally, if we compare the real _Titanic_ to the shuffled _Titanic_, we see that on the latter far fewer females survived per age bracket, sometimes significantly so; the opposite is true for males, for nearly every age bracket.

![](figures/fig_9.png)

_`Figure 9. Survivors of both the real and shuffled Titanic, by sex and age bracket.`_

So far, we've looked at all the people on board, regardless of whether they were crew or passengers. This might skew the results significantly if, for whatever reason, crew members were more (or less) likely to survive than passengers; so, let's look at how the picture changes when affiliation is taken into account.

### Crew survival

Let's start by answering the question: **Were crew members more or less likely to survive, compared to passengers?** In percentage, crew members were fewer than passengers both among victims and survivors; in other words, both a random victim and a random survivor are less likely to have been a crew member. Note however that, if being crew or passenger made no difference in terms of survival chances, you'd expect to see the same crew-passenger split among victims and survivors as you do on the whole ship, i.e. about 40-60, while we see that the situation is slightly better for passengers.

![](figures/fig_10.png)

_`Figure 10. Percentages of passengers and crew among victims and survivors.`_

Indeed, **a random crew member has almost an 80% chance of being a victim, against about 60% for a random passenger.** Since most crew members were men, this suggests that at least some of the imbalance seen in the survival rates of men and women may not depend on the unwritten rule "women and children first", but rather on a possible "passengers first" rule. It's likely that many crew members, such as officers or deck staff, were left behind as they supervised the evacuation of the ship. Others still, like engineers, were deep in the belly of the ship, and escaping was harder for them. (Not to mention that they tried to keep the electrical systems going almost until the very end.)

![](figures/fig_11.png)

_`Figure 11. Survival and death rate among crew and passengers.`_

Since women were a negligible part of the crew, seeing what pecentage of survivors and victims amongst the crew were women wouldn't be very informative; it was perforce a very small percentage in both cases. It's going to be more interesting to see what percentage of men and women of the crew survived or died.

![](figures/fig_12.png)

_`Figure 12. Survival and death rates on both the real and shuffled Titanic, by sex.`_

From the charts above, we can see that the survival chances of male and female crew members is comparable to that of people on the ship as a whole—or at least, it is for males. 

- **Male crew members have a death rate of about 78%;** across the whole ship, the same rate for males is 79%.
- **Female crew members have a death rate of 13%;** across the whole ship, the same rate for females is about 26.5%.

Put in another way, **a randomly chosen male on the _Titanic_ has a very high chance to be a victim to begin with; if we know that he was crew, this changes virtually nothing.** This isn't true for females, who are half as likely to be amongst victims if they're crew, although both likelihoods are much lower than they are for males. Since only 23 women in total were part of the crew, it's not impossible that their high survival rate might have been due to mere chance; however, on the shuffled _Titanic_, where _everything_ was up to mere chance, we see that both males and females died and survived in very similar percentages.

As we've seen in `Figure 3`, female crew were only part of the Restaurant or Victualling crew. Twenty-one of them were in Victualling, while the remaining two women were in the Restaurant crew. Since nearly all female crew survived, we automatically know that nearly all the female Victualling crew survived too, despite the fact that, as we can see below, Victualling was one of the groups with the highest death rates. (As a side note, both women in the Restaurant group survived.)

![](figures/fig_13.png)

_`Figure 13. Survival and death rates of the crew on the real Titanic, by crew group.`_

With a group size of merely 21 individuals, random chance *might* have caused the high survival rate of Victualling women, but it's hard not to notice that all female groups we've seen so far have had very high survival rates, regardless of size; instead, Postal clerks, the Orchestra, and the Guarantee group—small all-male groups with less than 10 people each—all had a 100% death rate. 

Below is the number of survivors among crew members, illustrated by age bracket.

![](figures/fig_14.png)

_`Figure 14. Survival of the crew by age bracket. Real and shuffled Titanic.`_

As we can see below, **female crew survived in much higher percentages than male crew in every bracket.** While female age brackets have far fewer people than male age brackets, the situation on the shuffled _Titanic_ is nonetheless very different. There's a noticeable reduction in female survivors, and small-to-modest increases in the number of male survivors per bracket.

![](figures/fig_15.png)

_`Figure 15. Crew survival and death rates by sex and age bracket. Real and shuffled Titanic. (Note that the y-axis is different for each chart.)`_

Before moving on to analysing the passengers' survival rates, it would be interesting to take a small detour to see which jobs among the crew were the most dangerous—that is, those with the highest death rates. To make sure the results are meaningful, we'll only consider positions occupied by at least 10 crew members.

![](figures/fig_16.png)

_`Figure 16. Jobs on the Titanic by absolute death rate. Only jobs with at least 10 people considered. Totals shown for comparison.`_

The chart above shows the absolute death rate instead of percentages, because given the significant difference between the sizes of the groups, percentages could be misleading. For example, 100% of waiters on the _Titanic_ died, but there were less than 20 of them in total and it surely wasn't a more dangerous job than that of engineers. With such a small number of waiters, random chance may well have been the most important factor that determined the fate of these unlucky people. By contrast, "only" about 72% of the Firemen/Stokers died, but as they were much more numerous, it's reasonable to conclude that so many of them died not because of chance, but because of their job's inherent risk in the event of the ship sinking. Indeed, they were part of the engineering team, who due to their physical location on the ship, as well as their desperate attempts to keep the systems of the ship going for as long as possible, had it arguably worse than anybody else onboard. (Incidentally, note that "Stewardess"—a job obviously done only by women—is at the bottom of the list, both in absolute and relative terms.)

### Passenger survival

The charts below show the death and survival rates for male and female passengers aboard the real and the shuffled _Titanic_, and it tells the same story as the charts illustrating sex-dependent survival rates for crew memebers and for the whole ship: males were vastly more likely to die than females. Once again, the situation on the shuffled _Titanic_ is far more balanced.

![](figures/fig_17.png)

_`Figure 17. Passenger survival and death rates by sex. Real and shuffled Titanic.`_

It's worth asking whether passengers of different classes had different survival rates, and the answer is a predictable "yes".

![](figures/fig_18.png)

_`Figure 18. Passenger survival and death rates, by class.`_

The chart leaves no doubt that class and death rates on the _Titanic_ were correlated: the lower the class, the higher the death rate, and obviously vice-versa for the survival rate. The reasons for this may be many, for example the location of cabins of different classes, possible discrimination operated by the crew, or the strings that higher-class passengers might have been able to pull to more easily secure a place on a lifeboat. However, once more this trend changes significantly when sex is accounted for.

![](figures/fig_19.png)

_`Figure 19. Survival and death rates by sex and class. Real (top) and shuffled (bottom) Titanic.`_

The pattern "high class, low death" is not only preserved, but it is even more evident when looking at female passengers only: **death rates for female passengers decrease almost four-fold with each step from lower to higher class.** However, death rates among males are virtually identical (slightly above 80%) for second and third class passengers; first-class male passengers had it only slightly better, with a death rate of about 65%. A randomly chosen third-class female passenger of the _Titanic_ has a pretty much fair chance of being a survivor, about 15% higher than for a randomly chosen first-class male passenger. **A higher-class ticket always came with significantly better survival chances for female passengers; not so for males.** On the shuffled _Titanic_, class privilege is just not a thing; female and male passengers survive and die in very similar percentages across all classes.

Age is another factor that could impact survival: older passengers might have had a harder time fleeing or finding their way to the lifeboats, and the same is true for very young children, who most likely could not escape without the assistance of an adult. We can check that by plotting passenger survival rates by age bracket.

![](figures/fig_20.png)

_`Figure 20. Passenger and survived passenger count on the Titanic, by age bracket. The red line shows the survivor count on the shuffled Titanic.`_

Survival rates are actually higher amongst the youngest and oldest, though their small number, especially for the latter cohort, is likely an important factor. It's not so strange, however that the youngest cohort had such a high survival rate, as their parents must surely have tried their hardest to ensure they wouldn't die. Survival rates are anyway fairly low amongst people between 60 and 79, for reasons that may include self-sacrifice or higher difficulty in escaping, but on which we can ultimately only speculate. Note, however, that just like in `Figure 8` we saw that the real survival rates by bracket were very close to shuffled ones for the whole ship, they're fairly close to each other for passengers only as well, again suggesting that the role of age might have been limited.

Once again, **splitting the above histogram by sex reveals very different trends among males and females;** among male passengers, only children aged 0-9 and men aged 80-89 had exceptionally high survival rates; among female passengers, survival rates were always very high, 64% at a minimum. While these two distributions have a different number of age brackets of rather different sizes compared to the same distributions for the crew, the situation is again very similar: very low survival rates for (most) males, very high survival rates for all females. The situation is much more balanced on the shuffled _Titanic_ —which, let's not forget, has nonetheless the same overall death rate as the real _Titanic_.

![](figures/fig_21.png)

_`Figure 21. Passenger count and survived passenger count by sex and age bracket. The red line shows the survivor count on the shuffled Titanic.`_

What we've seen so far highlights quite clearly the different survival rates for men and women, but it's not very clear at this point if more children or more adults lost their lives on the _Titanic_. A glance at the charts above is enough to tell that more adults died in absolute terms, but let's try to quantify this more precisely: let's define "adult" as anyone aged at least 18 and "child" anyone younger than 18. The table below shows the four categories sorted by their percentage of survivors. **Female adults have the highest percentage,** followed by female children and male children. Male adults come in last, with less than 21% of survivors. The `TOTAL` column shows the total count of people that were in each category.

![](figures/table_1.png)

_`Table 1. Total count and percentage of survivors\dead among adults and children of both sexes.`_

## Conclusions

We've analysed the survival rates for different groups of people aboard the _Titanic_, and compared some of them to those observed on the "shuffled _Titanic_"—that is, an imaginary ship that had the exact same overall death and survival rates, but whose victims and survivors were chosen entirely randomly.

The analysis leaves little doubt that **women and children did indeed come first:** no matter how you look at it, women had much lower death rates than males, irrespective of affiliation, age, or class. The number of women on the _Titanic_ was very small compared to that of men, and thus it might be that at least in the case of some particularly small subset of women, random chance might have determined their high survival rates. This, however, happened systematically for each and every group of women considered, while it hardly ever happened for any group of men, no matter how small. Additionally, on the shuffled _Titanic_, women and men died in very similar percentages, and no bias in favour of female (or male) survival was observed, irrespective of group size. (It should be noted that, on the shuffled _Titanic_, any possible influencing factor was eliminated, not just sex.) We also saw that, on the real _Titanic_, female adults had the highest survival rates, followed by female children, male children, and finally adult males.

Given the above, it's very unlikely that the imbalance in survival rates observed on the _Titanic_ was due to mere chance; "women and children first" was definitely at play, whether intentionally or by unconscious bias. Naturally, it is also possible that some other factor might have been at play—women might have a stronger survival instinct, for example, or be more resilient to the conditions everyone was exposed to during the tragedy—but this is mere speculation and not something we can infer from the data.