# Data Container

Having both access to [configuration](./configuration.ipynb) and [data files](./file-loader.ipynb), we can now create the custom data model that we need for analysis.

The key information is how children of different ages transitions between different types of care. To do this we group both the ages and placements into broader groupings as defined by the configuration. 

We also enrich the model by looking at the sequence of placements, and what placements the children were in immediately before and after the current placement. 

In [1]:
from csdmpy import Config, DemandModellingDataContainer, fs_datastore

config = Config()
datastore = fs_datastore("sample://v1.zip")
dc = DemandModellingDataContainer(datastore, config)

df_ev = dc.enriched_view
df_ev[[
    'CHILD', 'DOB', 'DECOM', 'DEC', 'PLACE', 'age', 'end_age', 'age_bin', 'end_age_bin', 'placement_type_before', 'placement_type', 'placement_type_after'
]]

Unnamed: 0,CHILD,DOB,DECOM,DEC,PLACE,age,end_age,age_bin,end_age_bin,placement_type_before,placement_type,placement_type_after
69,91,1999-10-19,2016-09-01,2017-10-13,H5,16.871098,17.985434,16 to 18+,16 to 18+,Not in care,Supported,Not in care
360,290,2000-11-13,2016-11-19,2018-10-01,U5,16.016866,17.881393,16 to 18+,16 to 18+,Not in care,Foster,Foster
361,290,2000-11-13,2018-10-01,2018-11-08,U5,17.881393,17.985434,16 to 18+,16 to 18+,Foster,Foster,Not in care
507,519,2002-03-26,2017-03-09,2017-09-14,U4,14.954550,15.472018,10 to 16,10 to 16,Not in care,Foster,Foster
508,519,2002-03-26,2017-09-14,2018-02-19,U5,15.472018,15.904611,10 to 16,10 to 16,Foster,Foster,Foster
...,...,...,...,...,...,...,...,...,...,...,...,...
820,994520,1999-12-18,2015-10-24,2017-12-12,U2,15.849852,17.985434,10 to 16,16 to 18+,Not in care,Foster,Not in care
75,994928,2005-02-01,2017-01-11,2020-06-17,R2,11.942832,15.373453,10 to 16,10 to 16,Not in care,Other,Foster
55,994928,2005-02-01,2020-06-17,2021-11-25,U1,15.373453,16.813602,10 to 16,16 to 18+,Other,Foster,Resi
56,994928,2005-02-01,2021-11-25,NaT,K2,16.813602,,16 to 18+,,Foster,Resi,Not in care


The categorical columns are indexed by the configured Enum classes, so we can access these directly:

In [2]:
df_ev[df_ev.age_bin == config.AgeBrackets.TEN_TO_SIXTEEN]['age'].describe()

count    1801.000000
mean       12.856966
std         1.692576
min        10.004381
25%        11.411675
50%        12.720403
75%        14.278283
max        15.997700
Name: age, dtype: float64

In [3]:
df_ev['placement_type'].value_counts()

Foster       1958
Resi          320
Other         318
Supported     231
Name: placement_type, dtype: int64

In [4]:
df_ev['placement_type'].apply(lambda x: repr(x)).value_counts()

<PlacementCategories.FOSTERING: Foster>       1958
<PlacementCategories.RESIDENTIAL: Resi>        320
<PlacementCategories.OTHER: Other>             318
<PlacementCategories.SUPPORTED: Supported>     231
Name: placement_type, dtype: int64