# Introduction to Microsim: Population Factory and Population

In [1]:
import os

microsimDir = "/Users/deligkaris.1/OneDrive - The Ohio State University Wexner Medical Center/MICROSIM/CODE/microsim"
os.chdir(microsimDir)

## Population Factory

The Population Factory helps us in obtaining easily a Population. Creating a population requires two major parts and each one includes several subparts. The population factory allows us to easily create our population. In this section, I will go over the two major parts of a Microsim Population Object. The NHANES Population will be used as an example but this should work in a similar way for any other population type.

When we use the Population Factory we can check that the object we get is indeed a Microsim Population.

In [10]:
from microsim.population_factory import PopulationFactory

pop = PopulationFactory.get_nhanes_population(n=10, year=2017, nhanesWeights=True)

type(pop)

microsim.population.Population

In summary, the Population Factory prepares all the person objects that will be part of the people and initializes all the models the population needs to know. 

## Population

A population object, as a person object, can be thought of consisted of three things: its state, its functions and its models. We will start with its state. 

### State

The state of the population is determined by the state of its people.

In [12]:
pop._people

55482    Person(name = 55482 index = 0  raceEthnicity=4...
53505    Person(name = 53505 index = 1  raceEthnicity=5...
55183    Person(name = 55183 index = 2  raceEthnicity=3...
53518    Person(name = 53518 index = 3  raceEthnicity=3...
53973    Person(name = 53973 index = 4  raceEthnicity=3...
55394    Person(name = 55394 index = 5  raceEthnicity=3...
56697    Person(name = 56697 index = 6  raceEthnicity=3...
55714    Person(name = 55714 index = 7  raceEthnicity=3...
54207    Person(name = 54207 index = 8  raceEthnicity=3...
56143    Person(name = 56143 index = 9  raceEthnicity=1...
dtype: object

The people is a Pandas Series object. Pandas Series objects are 1-dimensional objects, a single index is used to access the elements.
In this case, one element of the people is a Microsim Person.

In [15]:
type(pop._people)

pandas.core.series.Series

We can get each person by using the Series index. Index 0 will get us the first of the 10 people.

In [17]:
person = pop._people.iloc[0]
person

Person(name = 55482 index = 0  raceEthnicity=4 education=5 gender=2 smokingStatus=0 modality=ct age=51.0 sbp=118.7 dbp=88.7 a1c=5.3 hdl=64.0 ldl=115.0 trig=28.0 totChol=178.0 bmi=19.9 anyPhysicalActivity=1.0 waist=72.9 alcoholPerWeek=2.0 creatinine=0.6 afib=0.0 pvd=0.0 statin=False antiHypertensiveCount=0.0)

We can verify that the first element of the people Series is indeed a Microsim Person.

In [18]:
type(person)

microsim.person.Person

Each one of the 10 person objects is independent from each other. Because though all 10 came by sampling with replacement from the NHANES dataset for that year, there is always a chance that two Microsim Person objects will have originated from the same NHANES data frame row. When that is the case, the name of the two persons will be the same. Our person has a name of 55482 and if we come across another person with the same name then that means they were created based on the same NHANES data frame row.

In [19]:
person._name

55482

The person index however is unique to each person of the population. There should not be another person with index 0.

In [20]:
person._index

0

A Microsim Person that is part of a population is similar to what we explored in the Person notebook. For example, our person has an age list with a single element because have just created the population.

In [21]:
person._age

[51.0]

### Models

All models of a population are stored in its _modelRepository. The population model repository is a dictionary.

In [14]:
pop._modelRepository

{'dynamicRiskFactors': <microsim.cohort_risk_model_repository.CohortDynamicRiskFactorModelRepository at 0x15778e580>,
 'defaultTreatments': <microsim.cohort_risk_model_repository.CohortDefaultTreatmentModelRepository at 0x17beae7f0>,
 'outcomes': <microsim.outcome_model_repository.OutcomeModelRepository at 0x17beaf580>,
 'staticRiskFactors': <microsim.cohort_risk_model_repository.CohortStaticRiskFactorModelRepository at 0x17bebe3a0>}

In [23]:
type(pop._modelRepository)

dict

The population needs to know the models for 1) dynamic risk factors, 2) default treatments, 3) outcomes, and 4) static risk factors. This is why there are four entries in the model repository dictionary, one for each model repository type needed.

In [25]:
len(pop._modelRepository.keys())

4

We can access the model repository for the dynamic risk factors by using that key. What we get a dynamic risk factor model repository.

In [28]:
pop._modelRepository['dynamicRiskFactors']

<microsim.cohort_risk_model_repository.CohortDynamicRiskFactorModelRepository at 0x15778e580>

We can go one level deeper and see what this includes. It includes a different model for each dynamic risk factor. For example for afib, it includes an afib incidence model.

In [29]:
pop._modelRepository['dynamicRiskFactors']._repository

{'afib': <microsim.afib_model.AFibIncidenceModel at 0x15778e6d0>,
 'pvd': <microsim.pvd_model.PVDIncidenceModel at 0x15778e9a0>,
 'age': <microsim.age_model.AgeModel at 0x17e855e80>,
 'alcoholPerWeek': <microsim.cohort_risk_model_repository.AlcoholCategoryModel at 0x177b04cd0>,
 'hdl': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x177b04ee0>,
 'bmi': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x177b04130>,
 'totChol': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x17e8558e0>,
 'trig': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16a8f7df0>,
 'a1c': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16a8fc280>,
 'ldl': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x16a88a730>,
 'waist': <microsim.statsmodel_linear_risk_factor_model.StatsModelLinearRiskFactorModel at 0x177b1

With this model repository a population can predict the future of its people under the US standard of care, meaning, without any interventions.

### Functions

The population includes functions that allow it to predict the future of its people. For example the advance function makes predictions for all risk factors, default treatments, and outcomes. However, for the first year this is done, only the outcomes are predicted because
the population has risk factors and default treatments from the NHANES data set but not outcomes. This is why all outcome lists for all people in the population are empty.

In [33]:
person._outcomes

{<OutcomeType.WMH: 'wmh'>: [],
 <OutcomeType.COGNITION: 'cognition'>: [],
 <OutcomeType.CI: 'ci'>: [],
 <OutcomeType.CARDIOVASCULAR: 'cv'>: [],
 <OutcomeType.STROKE: 'stroke'>: [],
 <OutcomeType.MI: 'mi'>: [],
 <OutcomeType.NONCARDIOVASCULAR: 'noncv'>: [],
 <OutcomeType.DEMENTIA: 'dementia'>: [],
 <OutcomeType.DEATH: 'death'>: [],
 <OutcomeType.QUALITYADJUSTED_LIFE_YEARS: 'qalys'>: []}

Let's predict the outcomes for this first year.

In [34]:
pop.advance(1)

Now we can see that we actually made predictions for the outcomes of our person.

In [35]:
person._outcomes

{<OutcomeType.WMH: 'wmh'>: [(51.0,
   WMH Outcome: OutcomeType.WMH, fatal: False, sbi: False, wmh: False,
                      wmhSeverityUnknown: False, wmhSeverity: WMHSeverity.NO)],
 <OutcomeType.COGNITION: 'cognition'>: [(51.0,
   Outcome type: OutcomeType.COGNITION, fatal: False, priorToSim: False)],
 <OutcomeType.CI: 'ci'>: [],
 <OutcomeType.CARDIOVASCULAR: 'cv'>: [],
 <OutcomeType.STROKE: 'stroke'>: [],
 <OutcomeType.MI: 'mi'>: [],
 <OutcomeType.NONCARDIOVASCULAR: 'noncv'>: [],
 <OutcomeType.DEMENTIA: 'dementia'>: [],
 <OutcomeType.DEATH: 'death'>: [],
 <OutcomeType.QUALITYADJUSTED_LIFE_YEARS: 'qalys'>: [(51.0,
   Outcome type: OutcomeType.QUALITYADJUSTED_LIFE_YEARS, fatal: False, priorToSim: False)]}

We can also verify that we have not changed the risk factors yet.

In [36]:
person._age

[51.0]

But if we advance again, the population will make predictions for risk factors, default treatments and outcomes

In [37]:
pop.advance(1)

In [38]:
person._age

[51.0, 52.0]

The population includes functions that allow it to report information about its state. For example, the get_outcome_count function allows us to easily see how many people in the population have a particular outcome type, eg MI.

In [39]:
from microsim.outcome import OutcomeType

pop.get_outcome_count(OutcomeType.MI)

0

As another example of a reporting function we can use the has_ci function of the population which returns a list. This function reports if the person objects of our population have cognitive impairment.

In [40]:
pop.has_ci()

[False, False, False, False, False, False, False, False, False, False]