# A Simulated Dataset of a Real-World Phenomenom

## Introduction

The real-world phenomenom that I will be looking at is the relationship between obesity and increased mortaility. Obesity grades 2 and 3 (BMI >/= 35) is associated with an aproximate 30% increase in all-cause mortality https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855514/. 

This dataset can be used to predict the number of people in a population that will suffer from obesity and therefore an increase in all-cause mortality.



## Investigation
Variables to look at:
Sex (What is the proportion of males to females?)
Weight (Average weight by sex and the standard deviation)
Height (Average height and standard deviation)
BMI 
BMI Category 
Risk of all-cause mortality

For sex, there are different average weights and heights for males and females.

We'll start by importing some libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Sex ratio

Worldwide there are 102 males to 100 females (Reference: https://en.wikipedia.org/wiki/Human_sex_ratio)

In [2]:
# Number of males and females.

population = 2000

sex =["male", "female"]

ratioOfMales = 102/202
ratioOfFemales = 100/202

sexArray = np.random.choice(sex, population, p=[ratioOfMales, ratioOfFemales])

np.unique(sexArray, return_counts=True)

(array(['female', 'male'],
       dtype='<U6'), array([ 964, 1036], dtype=int64))

### Weight

Human bodyweight is roughly bell-shaped (http://www.usablestats.com/lessons/normal)

Average adult (20 to 69 years old) male weight is 76.7 kg with a standard deviation of 12.1 kg and average female weight is 61.5 kg with a standard deviation of 11.1 kg. https://jech.bmj.com/content/jech/40/4/319.full.pdf

These figures were collected in 1981 in Canada. 

The weighted average adult weight for men and women therefore is:

In [3]:
# Setting some variables

maleWeight, femaleWeight= 76.7, 61.5

maleSD, femaleSD = 12.1, 11.1

 
#Calculating the weighted average
averageWeight = (maleWeight * ratioOfMales) + (femaleWeight * ratioOfFemales)
print(averageWeight)

69.17524752475248


#### Making some assumptions

The average adult weight in Europe is 70.8 kg (Reference - https://en.wikipedia.org/wiki/Human_body_weight). As this is close to the weighted averaga weight from the the Candian study above, I'll assume that these figures will be similar to the real values for adult Europeans.

As no figures could be found for the standard deviation of Male and Female weights in Europe I'll assume that the standard deviations will be similar to the Canadian study above.

#### Limitations to these assumptions

Increase in obesity and anorexia (need citation) 

Possible differences in weight distributions between different countries (give examples - use extremes)

As the average european weight is slightly heavier that the weighted average weight found in the Canadian study above, a factor will be devised and then applied to the male and female weights and standard deviations.

In [4]:
# Divding European weight by weighted average Canadian weight
weightFactor = 70.8/69.18
print(weightFactor)

1.0234171725932348


Applying this factor to the Canadian weights and standard deviations:

In [5]:
# Muliplying adult male and female weights and standard deviations by weight factor

maleWeight = maleWeight * weightFactor
femaleWeight = femaleWeight * weightFactor
maleSD = maleSD * weightFactor
femaleSD = femaleSD * weightFactor

print(f"Adjusted European average male weight is {maleWeight:.1f}")
print(f"Adjusted European average female weight is {femaleWeight:.1f}")
print(f"Adjusted European male standard deviation is {maleSD:.1f}")
print(f"Adjusted European female standard deviation is {femaleSD:.1f}")

Adjusted European average male weight is 78.5
Adjusted European average female weight is 62.9
Adjusted European male standard deviation is 12.4
Adjusted European female standard deviation is 11.4


### Height

Like human bodyweight, height is also roughly bell-shaped (http://www.usablestats.com/lessons/normal)

male 70 inches 4 SD
female 65 inces 3.5 SD



## Simulating the Dataset

## Conclusion