# Introduction to Microsim: Person Factory and Person 

In [None]:
import os

microsimDir = "/Users/deligkaris.1/OneDrive - The Ohio State University Wexner Medical Center/MICROSIM/CODE/microsim"
os.chdir(microsimDir)

## Person Factory

The [person factory](https://github.com/jburke5/microsim/blob/master/microsim/person_factory.py#L23) imports person-level data, organizes it, completes the information when important variables are not available in the data set, and returns a Microsim Person object based on that data. 

In this example, we will illustrate a bit how the person factory works to create a NHANES person object. In order to create an NHANES person, we first need to read the data file that contains the NHANES information about all the people that were surveyed. The NHANES survey data set includes the information the person factory will use to create an NHANES person.

In [2]:
from microsim.population_factory import PopulationFactory

nhanesDf = PopulationFactory.get_nhanesDf()

We can use the head function to take a look at the first 5 rows of the NHANES dataframe. There are a lot of columns and Pandas does not show all of them.

In [3]:
nhanesDf.head()

Unnamed: 0,name,index,WTINT2YR,a1c,age,antiHypertensiveCount,bmi,diedBy2015,gender,hdl,...,alcoholPerWeek,completedInterview,missingSBP,htn3,htn4,meanSBP3,jnc8,raceEthnicity,smokingStatus,education
0,0,0,102641.406474,5.1,22.0,0.0,23.3,0,1.0,41.0,...,0,1,0,0,0,110.666667,0,3,0,3
1,1,3,127351.373299,4.9,44.0,0.0,23.2,0,2.0,28.0,...,3,1,0,0,0,118.0,0,3,0,4
2,2,8,14391.77847,5.4,21.0,1.0,20.1,0,1.0,43.0,...,3,1,0,0,0,124.666667,0,5,0,3
3,3,11,26960.774346,5.6,43.0,0.0,33.3,0,2.0,73.0,...,3,1,0,0,0,102.0,0,4,2,3
4,4,13,24912.668432,5.0,80.0,1.0,33.9,1,1.0,54.0,...,2,1,0,1,1,97.0,0,3,0,5


Let's take a look at the first row of the NHANES dataframe. We can see now all the information about the person that was surveyed for NHANES and whose data were stored on the first row. This person was surveyed in 2011 and was 22 years old. The WTINT2YR is the weight we can use during sampling in order to create a US-representative population. Note that this is just the first row of a dataframe. The first row of the dataframe includes information, but it is not a Microsim person object.

In [17]:
x = nhanesDf.iloc[0]
x

name                                   0
index                                  0
WTINT2YR                   102641.406474
a1c                                  5.1
age                                 22.0
antiHypertensiveCount                0.0
bmi                                 23.3
diedBy2015                             0
gender                               1.0
hdl                                 41.0
ldl                                110.0
monthsToDeath                       61.0
monthsToDeath2                      61.0
otherLipidLowering                   0.0
selfReportCurrentHtnMed                0
selfReportHtn                          0
selfReportMI                           0
selfReportMIAge                      NaN
selfReportStroke                       0
selfReportStrokeAge                  NaN
creatinine                          0.91
statin                               0.0
timeInUS                             NaN
totChol                            168.0
trig            

Now that we have this information about a single person we can use it to create a Microsim Person object. Even though the two of them, the dataframe row and the Microsim Person contain the same information, eg age is 22 years for both, they are however programming-wise two very different objects.

In [19]:
from microsim.person_factory import PersonFactory

person = PersonFactory.get_nhanes_person(x)

We can see the object type of the first dataframe row and of the person object to understand that those two are of two different types. The dataframe row is of type Pandas Series because each row of a Pandas dataframe is of that type. The person object has a type of Microsim Person.

In [23]:
type(x)

pandas.core.series.Series

In [24]:
type(person)

microsim.person.Person

Let's take a look at the person we created. Age is 22 years as expected and all other variables are consistent with what we saw above on the dataframe row. For example, if you compare sbp, dbp, bmi their values are the same in the person object and the dataframe row.

If you look more carefully, you may notice that some variables that are present in the person object are not present in the dataframe row, for example afib, or pvd. The NHANES survey did not include that information but we do need these variables to be present in a Microsim person in order to carry out our projects. The person factory uses initialization models in order to make predictions for whether that Person object has afib/pvd or not.

In [6]:
person

Person(name = 0 index = None  raceEthnicity=3 education=3 gender=1 smokingStatus=0 modality=ct age=22.0 sbp=110.7 dbp=74.7 a1c=5.1 hdl=41.0 ldl=110.0 trig=84.0 totChol=168.0 bmi=23.3 anyPhysicalActivity=0.0 waist=81.0 alcoholPerWeek=0.0 creatinine=0.9 afib=0.0 pvd=0.0 statin=False antiHypertensiveCount=0.0)

This marks the end of the section on person factories. We presented some information regarding the NHANES person factory but the idea is similar for other person types, eg Kaiser.

## Person

### State

As described in the [Microsim wiki page](https://github.com/jburke5/microsim/wiki/Person#person-object-state), the state of a person object includes its risk factors, treatments, outcomes and treatment strategies. Now, we will take a closer look at those for our person object.

A person's set of risk factors is divided into static and dynamic risk factors. Our person's static risk factors are shown below. Note that this simply shows us what static risk factors the person has, it does not show us what the values are for those risk factors.

In [8]:
person._staticRiskFactors

['raceEthnicity', 'education', 'gender', 'smokingStatus', 'modality']

We can access the values of those static risk factors as shown below. Our person is male and is a high school graduate.

In [13]:
person._gender

<NHANESGender.MALE: 1>

In [14]:
person._education

<Education.HIGHSCHOOLGRADUATE: 3>

We can see all dynamic risk factors our person has, age is one of them.

In [9]:
person._dynamicRiskFactors

['age',
 'sbp',
 'dbp',
 'a1c',
 'hdl',
 'ldl',
 'trig',
 'totChol',
 'bmi',
 'anyPhysicalActivity',
 'waist',
 'alcoholPerWeek',
 'creatinine',
 'afib',
 'pvd']

In contrast with the static risk factors that a single variable, dynamic risk factors do change over time and that is why they are stored in lists. For example, the age dynamic risk factor is a list of all the ages of this person. Because our person was created just now and we have not made any predictions about its future state, the age list contains only a single number, the current age.

In [15]:
person._age

[22.0]

The same is true for the sbp risk factor.

In [16]:
person._sbp

[110.66666666666667]

In a similar way, we can see the default treatments for our person.

In [11]:
person._defaultTreatments

['statin', 'antiHypertensiveCount']

In [20]:
person._statin

[False]

And treatment strategies, which are stored in a dictionary. Our person does not have any treatment strategy applied regarding blood pressure. Currently, the only treatment strategy available in Microsim is for blood pressure.

In [12]:
person._treatmentStrategies

{'bp': {'status': None}}

The last thing that completes the state of our person is outcomes. Outcomes are stored in a dictionary, and each key is an outcome type. 

In [21]:
person._outcomes

{<OutcomeType.WMH: 'wmh'>: [],
 <OutcomeType.COGNITION: 'cognition'>: [],
 <OutcomeType.CI: 'ci'>: [],
 <OutcomeType.CARDIOVASCULAR: 'cv'>: [],
 <OutcomeType.STROKE: 'stroke'>: [],
 <OutcomeType.MI: 'mi'>: [],
 <OutcomeType.NONCARDIOVASCULAR: 'noncv'>: [],
 <OutcomeType.DEMENTIA: 'dementia'>: [],
 <OutcomeType.DEATH: 'death'>: [],
 <OutcomeType.QUALITYADJUSTED_LIFE_YEARS: 'qalys'>: []}

If we want to access all outcomes of a particular outcome type, eg dementia, then we can do it by using that outcome type as the key. In this case, our person has no dementia outcome, which is why the list of dementia outcomes is empty. We use a list for each outcome type because in general, there can be more than outcome of a
specific outcome type. For example, a Microsim person can have more than one strokes in their lifetime (but only one such outcome is stored in the person for each year in the simulation).

In [22]:
from microsim.outcome import OutcomeType

person._outcomes[OutcomeType.DEMENTIA]

[]

### Functions

[Functions of person objects](https://github.com/jburke5/microsim/wiki/Person#person-object-functions) help us either obtain information about the state of the object or predict the future state of the object. I will start with two examples of functions that provide information about the state of our person.

Person objects have functions that allow us to probe, to obtain some information about the state of the object. For example, the function _mi allow us to know if the person has ever had a MI outcome. We could have used the MI outcome type and manually see all outcomes of that type to obtain the same answer, but using this function allows us to achieve the goal easier and faster.

In [29]:
person._mi

False

Another function as an example is the has_diabetes() function, it allow us to easily obtain that type of information about our person. Note that the response we get from the has_diabetes() function is currently False but as we make predictions for our person for subsequent years, this response may at some point become True.

In [30]:
person.has_diabetes()

False

Person objects also have functions that allow them to make predictions for the future state of the person. However, those functions include only general instructions, eg know how to use a linear regression model to make a prediction, but do not have the specific information, eg the model coefficients. So, in order to use a person function to make a prediction we need to provide to that function some specific information.

As an example here, we will predict the value of the dynamic risk factors of our person one year from the present. We need to get the specific information about making those predictions from the CohortDynamicRiskFactorModelRepository and then provide them to the advance_risk_factor function. We can see the type of our model repository to be a CohortDynamicRiskFactorModelRepository.

In [37]:
from microsim.cohort_risk_model_repository import CohortDynamicRiskFactorModelRepository

drfRepository = CohortDynamicRiskFactorModelRepository()

type(drfRepository)

microsim.cohort_risk_model_repository.CohortDynamicRiskFactorModelRepository

drfRepository has a dictionary, where the keys are dynamic risk factors, eg afib, and the values are the models with the coefficients.

In [38]:
drfRepository._repository

{'afib': <microsim.afib_model.AFibIncidenceModel at 0x16c1d3070>,
 'pvd': <microsim.pvd_model.PVDIncidenceModel at 0x175248b50>,
 'age': <microsim.age_model.AgeModel at 0x16c187160>,
 'alcoholPerWeek': <microsim.cohort_risk_model_repository.AlcoholCategoryModel at 0x16c1d3f40>,
 'hdl': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c15b9a0>,
 'bmi': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1d3f10>,
 'totChol': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1d3310>,
 'trig': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1bbbe0>,
 'a1c': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1d7b20>,
 'ldl': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1da910>,
 'waist': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16c1d

Now, we can use the person advance_risk_factors function to make those predictions. Note that we expect the dynamic risk factors of our person to change, but everything else should stay the same, as we are not making any predictions for treatments, treatment strategies or outcomes.

In [34]:
person.advance_risk_factors(CohortDynamicRiskFactorModelRepository())

person

For example, let's see what is the age attribute of our person. The age list now has two elements, the first one is the age of our person when the person was first created (22), and the next value is the age of our person one year later (23).

In [39]:
person._age

[22.0, 23.0]

Something similar happens to other dynamic risk factors.

In [40]:
person._sbp

[110.66666666666667, 112.16993908783287]

But default treatments should not change as no predictions were made for those.

In [43]:
person._statin

[False]

Outcomes should also stay the same.

In [44]:
person._outcomes

{<OutcomeType.WMH: 'wmh'>: [],
 <OutcomeType.COGNITION: 'cognition'>: [],
 <OutcomeType.CI: 'ci'>: [],
 <OutcomeType.CARDIOVASCULAR: 'cv'>: [],
 <OutcomeType.STROKE: 'stroke'>: [],
 <OutcomeType.MI: 'mi'>: [],
 <OutcomeType.NONCARDIOVASCULAR: 'noncv'>: [],
 <OutcomeType.DEMENTIA: 'dementia'>: [],
 <OutcomeType.DEATH: 'death'>: [],
 <OutcomeType.QUALITYADJUSTED_LIFE_YEARS: 'qalys'>: []}

### Models

Microsim [Person objects do not include any models](https://github.com/jburke5/microsim/wiki/Person#person-object-models). This is why we had to create a dynamic risk factor model repository and provide the models to our person in order to make the predictions.