# BMI INDEX IN IRELAND
## AUTHOR: ANTE DUJIC

## INTRODUCTION

This notebook contains a step by step simulated dataset of a relationship between BMI and Alcohol consumption in Ireland. The data is modeled and synthesised using *numpy.random* package in Python. 

Body mass index (BMI) is a value derived from the mass (weight) and height of a person. The BMI is defined as the body mass divided by the square of the body height, and is expressed in units of kg/m2, resulting from mass in kilograms and height in metres. [1]  It is widely used to diagnose whether individuals are underweight, overweight, or obese. [2] Alcohol consumption is often linked to a high weight. Drinking more than seven times per week was associated with increased risk of weight gain and development of overweight and obesity. [3] Average BMI is increased with an increased level of alcohol consumption in men and women. [4] The goal of this project is to create a dataset that will reflect the real life, with its destribution and relationships. The variables that will be generated are:
1. ID
2. Gender
3. Age
4. Height
5. Weight
6. BMI
7. Alcohol Consumption
8. BMI Classification


In [16]:
# Libraries.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import names

In [19]:
# Random number generator.
rng = np.random.default_rng(seed = 1)

### 1. ID

This variable will represent each person in the final dataset. There are 2 different approaches that were considered for this project:
- Creating a list of numerated persons
- Creating a list of person's names

Both are shown below.

In [12]:
# Creating a list of numerated persons.
person = []
for i in range (1, 11):
    person.append ("Person {}".format(i))
person

['Person 1',
 'Person 2',
 'Person 3',
 'Person 4',
 'Person 5',
 'Person 6',
 'Person 7',
 'Person 8',
 'Person 9',
 'Person 10']

In [14]:
# Creating a list of names. [1]
for i in range(10):
    print(names.get_full_name())

Johnnie Cochran
Kari Moore
Thomas Hale
Corrine Stoner
Randall Thibodeaux
Kimberly Ralls
David Wade
Brittany Ackerman
Freddy Stokes
Willa Wogan


### 2. GENDER

Gender is a categorical variable [5] and it can have two values: *male* and *female*. According to the latest World Bank data from 2020., percentage of female in Irish population is 50.4% [6] Male population is then 49.6%. I've used *random.choice* function to generate this data. For the sake of this project, I've decided to round these percentages.

In [28]:
# Generating gender with set probability.
gender_choice = ["Female", "Male"]
gender = rng.choice (gender_choice, p = [0.51, 0.49], size = 10)
gender

array(['Female', 'Male', 'Female', 'Male', 'Male', 'Male', 'Male', 'Male',
       'Male', 'Female'], dtype='<U6')

## REFERENCES

### MAIN

- [1] https://en.wikipedia.org/wiki/Body_mass_index
- [2] https://www.hindawi.com/journals/tswj/2012/849018/
- [3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4338356/
- [4] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6484200/
- [5] https://builtin.com/data-science/data-types-statistics
- [6] https://data.worldbank.org/indicator/SP.POP.TOTL.FE.ZS?locations=IE

### CODE

- [1] https://moonbooks.org/Articles/How-to-generate-random-names-first-and-last-names-with-python-/