# Advanced Metrics

As a pre-requisite for this notebook, completing the casesExample notebook is highly recommended, although if you have a good grasp of how to evaluate Metrics already, then feel free to continue.

In this notebook, we'll see how the C3 Metrics system can be extended on the fly to include new metrics defined in terms of the old. We'll also explore how options such as `interval` and aggregation functions such as `AVG` and `MIN` change the output.

This notebook can be run on any tag containing an image of the Covid-19 Datalake.

We've developed this notebook to be more accessible to DTI researchers so they can learn about Metrics without having to learn details surrounding provisioning and data integration. This should give you an accurate flavor of the capabilities of C3 Metrics which you can immediately apply to your own datasets once you master the provisioning and data integration aspects of C3.

## Setup

Here, we load necessary modules, and if necessary, establish a connection to a running C3 session.

In [1]:
import pandas as pd

In [None]:
try:
    # Check whether the c3 object is defined
    c3
except NameError:
    # Connect to a c3 cluster and create the c3 object
    from c3python import get_c3
    c3 = get_c3('<vanity_url>', '<tenant>', '<tag>')

In [4]:
try:
    c3
except NameError:
    from c3python import get_c3
    c3 = get_c3('https://dti-mkrafczyk.c3dti.ai', 'dti', 'mkrafczyk')

Username: mkrafcz2@illinois.edu
Password: ········


## Simple vs Compound Metrics

Generally, Metrics in C3 fall into two categories. The `SimpleMetrics` and the `CompoundMetrics`. Generally, Simple Metrics tell the C3 system how to evaluate a given metric on a given Type. So, for a specific SimpleMetric, information defining a source type, a 'path' showing how to navigate to the data inside the type, and an expression defining what the metric is.

Compound Metrics are an easy way for the C3 system to create new metrics in terms of already existing metrics. Perhaps perversely, they're easier to use than SimpleMetrics. This is because you don't have to define a specific source type for a given Compound Metric. At evaluation time, C3 checks whether all required SimpleMetrics (as part of the Compound Metric) are available, and fails with an error if any are missing.

While normally, `SimpleMetric`s and `CompoundMetric`s are defined as part of a C3 package you provision, new metrics can also be defined on the fly. Once defined, you can use the `evalMetricsWithMetadata` method to evaluate such metrics.

## Check what's already defined

The C3 type `MetricEvaluatable` defines the C3 Metrics framework. Usefully, it includes the function `listMetrics`. Let's have a look at that now on the OutbreakLocation type.

In [5]:
outbreaklocation_metrics = pd.DataFrame(c3.OutbreakLocation.listMetrics().toJson())
outbreaklocation_metrics

Unnamed: 0,type,name,expression,meta,id,version,srcType,path,tsDecl,unit,cache
0,SimpleMetric,ARITreatment_PercentUnder5,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",ARITreatment_PercentUnder5_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.STA....,,,
1,SimpleMetric,ATMs,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",ATMs_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'FB.ATM....,,,
2,SimpleMetric,AgeDependencyRatio,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",AgeDependencyRatio_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
3,SimpleMetric,AgeDependencyRatio_Old,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",AgeDependencyRatio_Old_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
4,SimpleMetric,AgeDependencyRatio_Young,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",AgeDependencyRatio_Young_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
...,...,...,...,...,...,...,...,...,...,...,...
2754,SimpleMetric,WomenMarriedby15,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",WomenMarriedby15_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.M15....,,,
2755,SimpleMetric,WomenMarriedby18,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",WomenMarriedby18_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.M18....,,,
2756,SimpleMetric,newHIVInfection,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",newHIVInfection_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....,,,
2757,SimpleMetric,newHIVInfections_0_14,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 7, 'tenant': '...",newHIVInfections_0_14_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....,,,


At the time of writing this notebook, there are ~2800 metrics, wow that's a lot! Let's evaluate a few. We focus on a couple interesting metrics here, but of course feel free to explore the above list yourself and try a few yourself.

In [6]:
pd.DataFrame(c3.SimpleMetric.fetch({'filter': 'srcType.typeName == "OutbreakLocation"'}).objs.toJson())

Unnamed: 0,type,name,expression,meta,id,version,srcType,path,tsDecl,unit,cache
0,SimpleMetric,ARITreatment_PercentUnder5,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ARITreatment_PercentUnder5_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.STA....,,,
1,SimpleMetric,ATMs,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ATMs_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'FB.ATM....,,,
2,SimpleMetric,AgeDependencyRatio_Old,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",AgeDependencyRatio_Old_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
3,SimpleMetric,AgeDependencyRatio,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",AgeDependencyRatio_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
4,SimpleMetric,AgeDependencyRatio_Young,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",AgeDependencyRatio_Young_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SP.POP....,,,
...,...,...,...,...,...,...,...,...,...,...,...
1995,SimpleMetric,PlaceIQ_DeviceExposure_Education4_Adjusted,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",PlaceIQ_DeviceExposure_Education4_Adjusted_Out...,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'adjuste...,,,
1996,SimpleMetric,PlaceIQ_DeviceExposure_Education4,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",PlaceIQ_DeviceExposure_Education4_OutbreakLoca...,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'device ...,,,
1997,SimpleMetric,PlaceIQ_DeviceExposure_Income1_Adjusted,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",PlaceIQ_DeviceExposure_Income1_Adjusted_Outbre...,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'adjuste...,,,
1998,SimpleMetric,PlaceIQ_DeviceExposure_Income1,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",PlaceIQ_DeviceExposure_Income1_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'device ...,,,


In [7]:
pd.DataFrame(c3.SimpleMetric.fetch({'filter': '(srcType.typeName == "OutbreakLocation") && (contains(name,"HIV"))'}).objs.toJson())

Unnamed: 0,type,name,expression,meta,id,version,srcType,path
0,SimpleMetric,AntiretroviralTherapyCoverage_PercentwithHIV,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",AntiretroviralTherapyCoverage_PercentwithHIV_O...,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
1,SimpleMetric,HIVCases_15_49,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIVCases_15_49_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
2,SimpleMetric,HIVPercent_15_49,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIVPercent_15_49_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.DYN....
3,SimpleMetric,HIV_0_14,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIV_0_14_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
4,SimpleMetric,HIV_15andOver_Female,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIV_15andOver_Female_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.DYN....
5,SimpleMetric,HIV_Females_Percent_15_24,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIV_Females_Percent_15_24_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
6,SimpleMetric,HIV_Males_Precent_15_24,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",HIV_Males_Precent_15_24_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
7,SimpleMetric,PMTCT_AntiretroviralTherapyCoverage_PercentPre...,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",PMTCT_AntiretroviralTherapyCoverage_PercentPre...,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
8,SimpleMetric,newHIVInfection,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",newHIVInfection_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....
9,SimpleMetric,newHIVInfections_0_14,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",newHIVInfections_0_14_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",pointMeasurements.(measurementType == 'SH.HIV....


In [17]:
pd.DataFrame(c3.SimpleMetric.fetch({'filter': 'contains(name, "JHU")'}).objs.toJson())

Unnamed: 0,type,name,expression,meta,id,version,srcType,path,tsDecl
0,SimpleMetric,CaseFatalityRatioJHU,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",CaseFatalityRatioJHU_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,
1,SimpleMetric,ConfirmedCasesJHURaw,,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedCasesJHURaw_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,"{'type': 'TSDecl', 'data': 'data', 'treatment'..."
2,SimpleMetric,ConfirmedCasesJHU,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedCasesJHU_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,
3,SimpleMetric,ConfirmedDeathsJHURaw,,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedDeathsJHURaw_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,"{'type': 'TSDecl', 'data': 'data', 'treatment'..."
4,SimpleMetric,ConfirmedDeathsJHU,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedDeathsJHU_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,
5,SimpleMetric,ConfirmedRecoveriesJHURaw,,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedRecoveriesJHURaw_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,"{'type': 'TSDecl', 'data': 'data', 'treatment'..."
6,SimpleMetric,ConfirmedRecoveriesJHU,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",ConfirmedRecoveriesJHU_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,
7,SimpleMetric,IncidenceRateJHU,interpolate(sum(sum(normalized.data.quantity))...,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",IncidenceRateJHU_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",denormDescendants.to.pointMeasurements.(measur...,
8,SimpleMetric,JHU_ConfirmedCasesInterpolated,,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",JHU_ConfirmedCasesInterpolated_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",aggregateMeasurements.(measurementType == 'nor...,"{'type': 'TSDecl', 'data': 'data', 'treatment'..."
9,SimpleMetric,JHU_ConfirmedCases,,"{'type': 'Meta', 'tenantTagId': 9, 'tenant': '...",JHU_ConfirmedCases_OutbreakLocation,1,"{'type': 'TypeRef', 'typeName': 'OutbreakLocat...",aggregateMeasurements.(measurementType == 'con...,"{'type': 'TSDecl', 'data': 'data', 'treatment'..."


In [22]:
jhu_metric = c3.SimpleMetric.get('JHU_ConfirmedCases_OutbreakLocation')

In [23]:
jhu_metric

c3.SimpleMetric(
 name='JHU_ConfirmedCases',
 meta=c3.Meta(
        tenantTagId=9,
        tenant='dti',
        tag='mkrafczyk',
        created=datetime.datetime(2020, 7, 22, 0, 2, 25, tzinfo=datetime.timezone.utc),
        createdBy='provisioner',
        updated=datetime.datetime(2020, 7, 22, 0, 2, 25, tzinfo=datetime.timezone.utc),
        updatedBy='provisioner',
        timestamp=datetime.datetime(2020, 7, 22, 0, 2, 25, tzinfo=datetime.timezone.utc),
        fetchInclude='[]',
        fetchType='SimpleMetric'),
 id='JHU_ConfirmedCases_OutbreakLocation',
 version=1,
 srcType=c3.TypeRef(typeName='OutbreakLocation'),
 path="aggregateMeasurements.(measurementType == 'confirmed' && origin == "
       "'Johns Hopkins University')",
 tsDecl=c3.TSDecl(
          data='data',
          treatment='PREVIOUS',
          start='start',
          value='value'))

In [33]:
c3.PointPhysicalMeasurementSeries.fetch({'filter': 'contains(asset,"Champaign_Illinois_UnitedStates") && measurementType == "SH.HIV.INCD.TL" && origin == "World Bank"'})

c3.FetchResult<PointPhysicalMeasurementSeries>(count=0, hasMore=False)

In [28]:
c3.AggregateOutbreakLocationMeasurementSeries.fetch({'filter': 'measurementType == "confirmed" && origin == "Johns Hopkins University"'})

c3.FetchResult<AggregateOutbreakLocationMeasurementSeries>(
 count=0,
 hasMore=False)

In [25]:
help(c3.AggregateOutbreakLocationMeasurementSeries)

In [8]:
newHIVInfectionMetric = c3.SimpleMetric.get('newHIVInfection_OutbreakLocation')

In [9]:
help(c3.ExpressionEngineFunction)

Supported values for AggregationFunction are SUM,AVG,MIN,MAX,MEDIAN,VARIANCE,STDDEV.

Supported values for aggFunc are SUM,AVG,MIN,MAX,MEDIAN,VARIANCE,STDDEV.

Supported values for aggFunc are SUM,AVG,MIN,MAX,MEDIAN.


In [10]:
newHIVInfectionMetric.expression

"interpolate(sum(sum(normalized.data.quantity)), 'PREVIOUS', 'MISSING')"

In [11]:
newHIVInfectionMetric.path

"pointMeasurements.(measurementType == 'SH.HIV.INCD.TL' && origin == 'World Bank')"

In [14]:
c3.PointPhysicalMeasurementSeries.fetch({'filter': 'measurementType == "SH.HIV.INCD.TL" && origin == "World Bank" && exists(earliest)'})

c3.FetchResult<PointPhysicalMeasurementSeries>(count=0, hasMore=False)

In [12]:
countPointsMetric = c3.SimpleMetric(name='newHIVInfectionCount',
                                    expression='',
                                    srcType='OutbreakLocation',
                                    path=newHIVInfectionMetric.path)

In [33]:
help(c3.PointPhysicalMeasurementSeries)

In [None]:
countMetric = c3.SimpleMetric(name='HIVCount',
                              expression='sum(count(data.quantity))',
                              srcType='OutbreakLocation',
                              path=newHIVInfectionMetric.path)

In [36]:
help(c3.OutbreakLocation)

In [34]:
print(c3.OutbreakLocation.fetch({'filter': 'id == "Champaign_Illinois_UnitedStates"',
                                 'include': 'pointMeasurements'}))

c3.FetchResult<OutbreakLocation>(
 objs=c3.Arry<OutbreakLocation>([c3.OutbreakLocation(
         typeIdent='EP_LOC',
         id='Champaign_Illinois_UnitedStates',
         meta=c3.Meta(
                fetchInclude='[{pointMeasurements:[id]},id,version,typeIdent]',
                fetchType='OutbreakLocation'),
         version=5505036,
         pointMeasurements=c3.Arry<PointPhysicalMeasurementSeries>([c3.PointPhysicalMeasurementSeries(
                              typeIdent='MS:BPMS:PPMS',
                              id='Champaign_Illinois_UnitedStates_ActiveListingCount',
                              meta=c3.Meta(
                                     fetchInclude='[id,asset,version,typeIdent]',
                                     fetchType='PointPhysicalMeasurementSeries'),
                              version=1,
                              asset=c3.PhysicalAsset(
                                      id='Champaign_Illinois_UnitedStates')),
                             c3.P

In [35]:
c3.OutbreakLocation.fetch()

c3.FetchResult<OutbreakLocation>(
 objs=c3.Arry<OutbreakLocation>([c3.OutbreakLocation(
         typeIdent='EP_LOC',
         id='-_AndhraPradesh_India',
         name='-',
         meta=c3.Meta(
                tenantTagId=7,
                tenant='covid',
                tag='dev',
                created=datetime.datetime(2020, 7, 18, 0, 1, tzinfo=datetime.timezone.utc),
                createdBy='dataloader',
                updated=datetime.datetime(2020, 7, 18, 0, 1, tzinfo=datetime.timezone.utc),
                updatedBy='dataloader',
                timestamp=datetime.datetime(2020, 7, 20, 7, 54, 41, tzinfo=datetime.timezone.utc),
                sourceFile='CanonicalIndianLineListRecordHistoric.csv',
                fetchInclude='[]',
                fetchType='OutbreakLocation'),
         version=65537,
         location=c3.PhysicalAssetLocation(
                    timestamp=datetime.datetime(2020, 4, 5, 0, 0, tzinfo=datetime.timezone.utc),
                    value=c3.Loc