# Factor Analysis Results - Choosing Independent Variables

Factor Analysis was performed on five domains extracted from the dataset:
* Demographics
* Context 
* Intependent Living
* Activities
* Health

**Factor Analysis** is a method to group similar variables into components with the goal of simplifying the data. We used **Principal Components Analysis** which is a method that extracts components which are uncorrelated linear combinations of variables. First component explains maximum amount of variance, second components explaints 2nd maximum amount of variance, ...

Usually we do not take all components to describe the dataset but only subset. **Analysis of variance** describes how much variance is added by each of components. We performed factor analysis using PCA and using correlation matrix of the input fields in estimating the model. If we take all components, we get 100% of variance of original dataset. However, to define criteria for only taking subset of components that describe most of variance we can select one of the following two approaches: 
* Eigenvalues cut off value
* Maximum number of components

We decided to keep components where **eigenvalues cutoff value > 1.0**.

| Domain            | Number of all components |Number of returned components |
|-------------------|--------------------------|------------------------------|
| Context           | 70                       |20                            |
| Health            | 63                       |20                            |
| Independent Living| 14                       |3                             |
| Activities        | 8                        |3                             |

## Context

![Context-PercentOfVarianceOfPCAComponents.jpg](attachment:Context-PercentOfVarianceOfPCAComponents.jpg)

PCA results when using correlation matrix of the input fields show the first 3 components have the following properties: 
* **Component 1**: Describes adult social life meaning with which sieblings/people he/she is communicating or living in a common household. 
* **Component 2**: DEscribes if adult is using some accessories (e.g. glasses, hearing device, walking stick, orthopedic shoes,...).
* **Component 3**: Describes if adult is using spiritual (prayer, meditation, yoga, celebrating holidays,...)
* **Component 4**: Describes the equipment of apartment (hot water, central heating, bath/shower, toilet, kitchen, balcony).
* **Component 5**: Describes if in apartment there is special equipment for assitance (e.g. bathing seat, toilet with wall handles, washbasin on a wall, adapted bed, lift, stairs with handrails, ramp)
* **...**

![Context_FactorAnalysis_PCA_Correlation.jpg](attachment:Context_FactorAnalysis_PCA_Correlation.jpg)

## Health

![Health-PercentOfVarianceOfPCAComponents.jpg](attachment:Health-PercentOfVarianceOfPCAComponents.jpg)

PCA results when using correlation matrix of the input fields show the first 3 components have the following properties: 
* **Component 1**: Describes how the adult is feeling (forgeting things, feeling lonely, felling anxious, restless, nostalgic, feeling everything is pointless, fear of death, view on life so far).
* **Component 2**: Describes vital functions of adult (heart, pressure, teeth, digestion, breathing, urinary tract, moving, skin, sight, hearing, balance, memor, sleeping, mental health, brain, overall health)
* **Component 3**: Describes if adult is part of any organization (rofessional society, culturalsociety, sport society, Karitas, Red Cross, firefighting society, political party, pensioners association, third life university)
* **Component 4**: Describes the negative correlation with exercising, recreation, entertainment
* **Component 5**: Describes positive correlation for socializing with family.
* **...**

![Health_FactorAnalysis_PCA_Correlation.jpg](attachment:Health_FactorAnalysis_PCA_Correlation.jpg)

## Independent Living

![IndepLiving-PercentOfVarianceOfPCAComponents.jpg](attachment:IndepLiving-PercentOfVarianceOfPCAComponents.jpg)

PCA results when using correlation matrix of the input fields show the first 3 components have the following properties: 
* **Component 1**: Describes capability if adult can alone take care of basic activities: dressing, personal hygiene, lavatory, eating and getting out of bed.
* **Component 2**: Describes how the adult is moving around. E.g. by walking without or with accessories or with help of another person.
* **Component 3**: Describes if person needs help of another person 

![IndependentLiving_FactorAnalysis_PCA_Correlation.jpg](attachment:IndependentLiving_FactorAnalysis_PCA_Correlation.jpg)

## Activities

![Activities-PercentOfVarianceOfPCAComponents.jpg](attachment:Activities-PercentOfVarianceOfPCAComponents.jpg)

PCA results when using correlation matrix of the input fields show the first 3 components have the following properties: 
* **Component 1**: Describes if person is regularly exercising in nature (walk, run, gardening and taking care of nutrition and next to it consciously taking care of its health.  
* **Component 2**: Describes if person is regularly exercising (indoor) and sporting except gardening or other physical work.
* **Component 3**: Describes if person is regulary doing sport and not consciously taking care of health (e.g. not taking care of nutrition).

![Activities_FactorAnalysis_PCA_Correlation.jpg](attachment:Activities_FactorAnalysis_PCA_Correlation.jpg)

# Choosing Dependent Variables

When choosing dependent variable for predicting ones independent living we can choose:
* Known indexes (e.g. Katz Index)
* Result of annotation application

**Katz Index** is widely used instrument to access functional status of elderly which result shows how independent are elderly in performing activities of daily living. It measures functional independence for basic activities of daily living where six functions are being evaluated: 
* Bathing
* Dressing 
* Toileting 
* Transferring
* Continence
* Feeding

Each function is evaluated by yes(1) or no (0). Score of 6 means full function, 4 means indicated moderate impairment and 2 or less indicates severe functional impairment.

Below table shows list of potential variables from the dataset where:
* Variables in grey were determined by Anton Trstenjak Institute as part of Independent Living domain
* Other variables we added as variables that can be linked to one of above Katz basic functions

![ListOfPossibleKatzInputVariables.jpg](attachment:ListOfPossibleKatzInputVariables.jpg)

# Annotator Application

One option for creating dependent variable is by using annotation application that will visualize important variables that can indicate ones level of independence. Domain specific experts can based on visualizations for each adult annotate level of his independence.

Options for building annotation application:
* **Simple annotation application**: Each Katz index variable is described by one variable based on which gerontholigist desiced on person independence.
* **Advanced annotation application**:  Each Katz area is described by multiple variables that describe Katz Index area and geronthologists need to decide on independence based on Likert scale. 

## Simple Annotation Application Example

![Simple%20annotation%20application%20example.jpg](attachment:Simple%20annotation%20application%20example.jpg)

## Advanced Annotation Application Example

![Complex%20annotation%20application%20example.jpg](attachment:Complex%20annotation%20application%20example.jpg)

# Index of Independent Living Modeling

Once we choose independent and dependent variables we can create predictive model that will predict independence of an adult:
* **Independent variables** are chosen based on results of Factor Analysis
* **Dependent variable** which is index of independent living is choosen either by using existing indexes or as a result of annotation application


## Calculation of Independence Score based on Katz Index

Below you can find result of simple predictive model where dependent variable was calculated out of existing variables in the dataset. Katz index was calculated based on 5 inputs (one is missing) and therefore score was adjusted as well.

Variables used to calculate Katz index:
* **Bathing**: V113b - Alone: How do you take care of personal hygiene?
* **Dressing**: V113a - Alone: How do you take care of dressing?
* **Toileting**: V113c: Alone: How do you use bathroom and toilet?
* **Transferring**: V113f - Alone: How do you get from bed?
* **Feeding**: V113e - Alone: How do you eat?

Score was calculated based on the following wa (as we left out one of parameters fo continence we decreased overall score interpretation table for 1 point):
* Score 5 (instead of 6): Full function
* Scores 4,3,2 (instead of 5,4,3): Moderate impairment
* Scores 1,0 (instead of 2,1,0): Severe impairment

Table showing distribution of calculated Independence Score:

| Value            | Percentage |Count |
|-------------------|--------------------------|------------------------------|
| Full Function       | 96.2%                       |986                            |
| Moderate Impairment | 3.22%                       |33                           |
| Severe Impairment | 0.59%                      |6                            |


## Example of predictive model using C5.0 decision tree technique

![CHAID_DecisionTree_IndepScore_Model.jpg](attachment:CHAID_DecisionTree_IndepScore_Model.jpg)