**Project 2019: Programming for Data Analysis.**

The tasks to be carried out in this project are:

1. Choose a real-world phenomenon that can be measured and for which you could
collect at least one-hundred data points across at least four different variables.
2. Investigate the types of variables involved, their likely distributions, and their
relationships with each other.
3. Synthesise/simulate a data set as closely matching their properties as possible.
4.  Detail your research and implement the simulation in a Jupyter notebook – the
data set itself can simply be displayed in an output cell within the notebook.

This project is based on the real world phenomenon of factors that improves increased lifestyle. Improved rates of survival from heart disease and cancer among older people in Ireland is leading to increased life expectancy however, lifestyle issues that may jeopardise these advances are:

1. Smoking
2. Drinking
3. Inactivity
4. Obesity

This project uses Ppython script to calculate lifestyle risk. This script would be used to interpret data collected during a survey. The factors are discussed how they increase the risk of developing diseases 

**Smoking**

Tobacco use is the leading cause of preventable death in Ireland with almost 6,000 smokers dying each year from tobacco related diseases.

Current Smoking Prevalence
• Males 24%, Females 21%
• Highest among those aged 25-34 years
• 864,000 current smokers in Ireland
• 1,050,000 ex-smokers in Ireland

**Drinking**

The Healthy Ireland Survey 2015 found that:

Men drink more frequently than women – 60% of men who drink do so at least weekly, compared with 46% of women.
Men across all age groups drink more frequently than women, however the difference is smallest amongst those aged 15 to 24 (men: 42%, women: 36%) and 35 to 44 (men: 54%, women: 47%).
Those drinking alcohol consume on average 5.6 standard drinks on a typical drinking occasion. The average is higher for men (7.2) than women (3.9).
Three-quarters (75%) of men aged 15 to 24 who drink consume six or more standard drinks on a typical drinking occasion.
Whilst prevalence of binge drinking declines with age, the extent of this decline is more substantial for women than men. Whilst over 1 in 3 men aged 65 and over who drink do so at this level on a typical drinking occasion (34%), fewer than 1 in 10 (9%) of women aged 65 and over who drink do so in this way.


A uniform distribution of alcohol consumption between zero and twenty units per week is used for simulating alcohol consumption. Individuals who consume in excess of the maximum recommended weekly number of standard units where considered to be heavy drinkers. Individuals who comsumed the maximum weekly units or under where considered moderate drinkers.

**Physical activity**

Investigators found that in 2016 more than a quarter of the global population - 1.4 billion people - were insufficiently active.

In Ireland, almost four out of 10 women are not active enough, while the same is true for three in 10 men.

**Obsesity**

Body Mass Index is a excellent indicator whether a person is overweight or underweight. It is calculated by dividing an individual's weight in Kilograms by the square of their height in metres.

The average height of a man in Ireland is 177cm

The average height of a woman in Ireland is 163cm

The average weight of a man in Ireland is 80.7 kilograms and 72% of men are overweight

The average weight of a woman in Ireland is 69 kilograms and 52% of women are overweight

A normal distribution is used to simulate data to obtain mean heights and weights.

Body Mass Categories are defined as follows:

Underweight: <18.5
Normal: 18.5 - 25
Overweight: 25 - 30
Obese: >30
Underweight, overweight and obese individuals where considered to have a higher lifestyle risk than individuals of a normal weight.

In [4]:
import pandas as pd
import numpy as np

In [5]:
#Create genders object
genders = ['Male', 'Female']

#Create the smoker object
smokers = ['Yes', 'No']

#Dataframe with column headings
df = pd.DataFrame(columns=['Genders', 'Height', 'Weight', 'Alcohol units/ wk', 'Exercise', 'BMI', 'BMI Range',
                           'Alcohol Consumption','Smoker', 'Life Style Risk'])

#Genders column with men and women
df['Genders'] = np.random.choice(genders, 1000, p=[0.48, 0.52])

#List of average weekly minutes of exercise taken
ave_minutes_exercise_week = [25, 50, 100, 125, 150, 175, 200]

#Dataframe with minutes of exercise according to actual probabilities inferred from HSE publications
df['Exercise'] = np.random.choice(ave_minutes_exercise_week, 1000, p=[0.05, 0.05, 0.1, 0.1, 0.5, 0.1, 0.1])

#function to create Male and Female heights according to a normal distribution
def what_height(height):
        if height == 'Male':
            return np.random.normal(1.77, 0.05)
        elif height == 'Female':
            return np.random.normal(1.63, 0.04)
        
#function to create Male and Female weights according to a normal distribution
def what_weight(weight):
        if weight == 'Male':
            return np.random.normal(80.7, 0.74)
        elif weight == 'Female':
            return np.random.normal(69, 0.71)

#function that assigns smoker or non-smoker
def what_smoker(smoke):
    if smoke == 'Male':
        return np.random.choice(smokers, p=[0.211, 0.789])
    if smoke == 'Female':
        return np.random.choice(smokers, p=[0.166, 0.834])

#function that assigns weekly units of standard drinks
def what_alcohol(alcohol):
    if alcohol == 'Male':
        return np.random.uniform(0, 20)
    if alcohol == 'Female':
        return np.random.uniform(0, 20)  

#function that converts alcohol consumption into a catagorical value of moderate or heavy drinker
def drinker_type(row):
    if row['Genders'] == 'Male' and row['Alcohol units/ wk'] < 17:
        return 'Moderate'
    if row['Genders'] == 'Female' and row['Alcohol units/ wk'] < 11:
        return 'Moderate'
    else:
        return 'Heavy'    

#function that assigns lifestyle risk based on the output of the other functions
def lifestyle_risk(row):
    if row['Smoker'] == 'Yes':
        return 'Extremely High'
    if row['BMI Range'] == 'Underweight' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'Medium'
    elif row['BMI Range'] == 'Underweight' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'High'
    elif row['BMI Range'] == 'Underweight' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Underweight' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Normal' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'Medium'
    elif row['BMI Range'] == 'Normal' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'High'
    elif row['BMI Range'] == 'Normal' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Normal' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Overweight' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'High'
    elif row['BMI Range'] == 'Overweight' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'Very High'
    elif row['BMI Range'] == 'Overweight' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Overweight' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Obese' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'High'
    elif row['BMI Range'] == 'Obese' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'No' and row['Exercise'] <151:
        return 'Very High'
    elif row['BMI Range'] == 'Obese' and row['Alcohol Consumption'] == 'Moderate' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    elif row['BMI Range'] == 'Obese' and row['Alcohol Consumption'] == 'Heavy' and row['Smoker'] == 'Yes' and row['Exercise'] <151:
        return 'Extremely High'
    else:
        return 'Normal'


#Run the functions in the DataFrame
df['Height'] = df['Genders'].apply(what_height)
df['Weight'] = df['Genders'].apply(what_weight)
df['BMI'] = df['Weight']/df['Height']**2
df['Smoker'] = df['Genders'].apply(what_smoker)
df['Alcohol units/ wk'] = df['Genders'].apply(what_alcohol)
df['BMI Range'] = pd.cut(df['BMI'], [-np.inf,18.5,24.9, 29.9, np.inf], 
                         labels=["Underweight", "Normal", "Overweight", "Obese"]
                        ).astype(str)
df['Alcohol Consumption'] = df.apply(lambda row: drinker_type(row), axis=1)
df['Life Style Risk'] = df.apply(lambda row: lifestyle_risk(row), axis=1)

In [6]:
#Display the DataFrame, rounding continuous variables to two decimal places
df.round({'Height': 2, 'Weight': 2, 'BMI': 2, 'Alcohol units/ wk': 1})

Unnamed: 0,Genders,Height,Weight,Alcohol units/ wk,Exercise,BMI,BMI Range,Alcohol Consumption,Smoker,Life Style Risk
0,Male,1.76,81.09,8.3,150,26.16,Overweight,Moderate,No,High
1,Male,1.80,80.92,13.2,150,25.01,Overweight,Moderate,No,High
2,Male,1.78,81.43,15.6,150,25.70,Overweight,Moderate,No,High
3,Female,1.63,69.49,9.6,125,26.31,Overweight,Moderate,No,High
4,Female,1.63,69.01,7.4,150,26.03,Overweight,Moderate,No,High
5,Female,1.66,69.37,7.4,150,25.19,Overweight,Moderate,No,High
6,Male,1.75,80.65,0.8,200,26.44,Overweight,Moderate,No,Normal
7,Female,1.60,70.98,10.6,100,27.71,Overweight,Moderate,No,High
8,Female,1.56,69.33,6.0,100,28.32,Overweight,Moderate,No,High
9,Female,1.58,68.44,14.1,100,27.31,Overweight,Heavy,Yes,Extremely High
