# A Simulated Dataset of a Real-World Phenomenom

## Introduction

The real-world phenomenom that I will be looking at is the relationship between obesity and increased mortaility. Obesity grades 2 and 3 (BMI >/= 35) is associated with an aproximate 30% increase in all-cause mortality https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855514/. 

This dataset can be used to predict the number of people in a population that will suffer from obesity and therefore an increase in all-cause mortality.



## Investigation
Variables to look at:
Sex (What is the proportion of males to females?)
Weight (Average weight by sex and the standard deviation)
Height (Average height and standard deviation)
BMI 
BMI Category 
Risk of all-cause mortality

For sex, there are different average weights and heights for males and females.

We'll start by importing some libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Sex ratio

The ratio of males to females in Ireland are:
Aged 15 to 24: 301,114 males and 292,055 females (1.03 m/f)
Aged 25 to 54: 1,087,587 males and 1,077,383 females (1.01 m/f)
Aged 55 to 64: 261,650 males and 260,737 females (1 m/f)

(Reference - https://www.indexmundi.com/ireland/demographics_profile.html)

In [2]:
# Number of males and females in Ireland aged 15 to 64 years old

males = 301114 + 1087587 + 261650
females = 292055 + 1077383 + 260737

total = males + females

print(f"The number of males aged 15 to 64 in Ireland is {males}")
print(f"The number of females aged 15 to 64 in Ireland is {females}")
print(f"The total number of males and females aged 15 to 64 in Ireland is {total}")
      

The number of males aged 15 to 64 in Ireland is 1650351
The number of females aged 15 to 64 in Ireland is 1630175
The total number of males and females aged 15 to 64 in Ireland is 3280526


In [3]:
# Ratio of males to females in Ireland aged 15 to 64

ratioOfMales = males/ total
ratioOfFemales = females/ total

print(ratioOfMales)
print(ratioOfFemales)

0.5030751166123969
0.4969248833876031


The figures below for weight and height are for 18 to 64 year olds. Here an assumption will be made that the ratio for males to females in Ireland aged 18 to 64 year olds will be the same as 15 to 64 year olds. However, as can be seen from the ratios above, the number of males to females decreases as age increases (1.03 for 15 to 24 year olds, 1.01 for 25 to 54 and 1.00 55 to 64 years).

We can now create a random sample of 1000 people which will have the same ratio of males to females as was found above

In [4]:
# Ratio of males to females

sex =["male", "female"]

sampleSize = 1000

sexArray = np.random.choice(sex, sampleSize, p=[ratioOfMales, ratioOfFemales])

np.unique(sexArray, return_counts=True)

(array(['female', 'male'],
       dtype='<U6'), array([499, 501], dtype=int64))

### Weight

Human bodyweight is roughly bell-shaped (http://www.usablestats.com/lessons/normal)

Average adult (20 to 69 years old) male weight is 76.7 kg with a standard deviation of 12.1 kg and average female weight is 61.5 kg with a standard deviation of 11.1 kg. https://jech.bmj.com/content/jech/40/4/319.full.pdf

These figures were collected in 1981 in Canada. 

The weighted average adult weight for men and women therefore is:

In [5]:
# Setting some variables

maleWeight, femaleWeight= 76.7, 61.5

maleWeightSD, femaleWeightSD = 12.1, 11.1

 
#Calculating the weighted average
averageWeight = (maleWeight * ratioOfMales) + (femaleWeight * ratioOfFemales)
print(averageWeight)

69.14674177250843


#### Making some assumptions

The average adult weight in Europe is 70.8 kg (Reference - https://en.wikipedia.org/wiki/Human_body_weight ). As this is close to the weighted averaga weight from the the Candian study above, I'll assume that these figures will be similar to the real values for adult Europeans.

As no figures could be found for the standard deviation of Male and Female weights in Europe I'll assume that the standard deviations will be similar to the Canadian study above.

#### Limitations to these assumptions

Increase in obesity and anorexia (need citation) 

Possible differences in weight distributions between different countries (give examples - use extremes)

As the average european weight is slightly heavier that the weighted average weight found in the Canadian study above, a factor will be devised and then applied to the male and female weights and standard deviations.

In [6]:
# Divding European weight by weighted average Canadian weight
weightFactor = 70.8/69.18
print(weightFactor)

1.0234171725932348


Applying this factor to the Canadian weights and standard deviations:

In [7]:
# Muliplying adult male and female weights and standard deviations by weight factor

maleWeight = maleWeight * weightFactor
femaleWeight = femaleWeight * weightFactor
maleWeightSD = maleWeightSD * weightFactor
femaleWeightSD = femaleWeightSD * weightFactor

print(f"Adjusted European average male weight is {maleWeight:.1f}")
print(f"Adjusted European average female weight is {femaleWeight:.1f}")
print(f"Adjusted European male standard deviation is {maleWeightSD:.1f}")
print(f"Adjusted European female standard deviation is {femaleWeightSD:.1f}")

Adjusted European average male weight is 78.5
Adjusted European average female weight is 62.9
Adjusted European male standard deviation is 12.4
Adjusted European female standard deviation is 11.4


### Height

Like human bodyweight, height is also roughly bell-shaped (http://www.usablestats.com/lessons/normal)

From ( https://jech.bmj.com/content/jech/40/4/319.full.pdf ) the average male height is 174.1 cm with a standard deviation of 7.0 cm and the average female height is 160.7 cm with a standard deviation of 6.6 cm.

The average male height in Ireland is 177 cm and 163 cm for males and females respectively (this however is for 20 to 49 year olds)


Limitation - the aveage height for Irish adults is given for 20 to 49 year olds whereas in the Canadian study it was for 20 to 69 year olds. Adults lose height as they get older. Up to 1 cm can be lost for every decade after 40 years of age (https://medlineplus.gov/ency/article/003998.htm )

No information could be found on the standard deviation for heights for Irish adults.

I'll make the assumption that Irish heights for Adults from 20 to 69 years of age are similar to those found in the Canadian study and that the distribution and therefore standard deviations are the same.



In [8]:
# Setting height variables

maleHeight, femaleHeight = 174.1, 160.7

maleHeightSD, femaleHeightSD = 7.0, 6.6

Percentage in ireland with obesity

https://www.safefood.eu/SafeFood/media/SafeFoodLibrary/Documents/Professional/Nutrition/Adult-and-children-obesity-trends-ROI.pdf



## Simulating the Dataset

## Conclusion