# Mapping HER2 opportunity in APACs

The main focus of this notebook is to map the HER2 opportunity in APACs.

## Objectives

1. Identify and catalog procedures associated with HER2.
2. Search procedures in APAC-AQ and APAC-AM.
3. Create metrics for:
    - Total number of APACs
    - Total number of patiencs (CNS)
    - Total value of APACs.

## Results

There were no procedures found in APAC-AM.
In APAC-AQ:

| Metric | Value |
| --- | --- |
| Total number of APACs | 121,692 |
| Total number of patients | 18,916 |
| Total value of APACs | R$ 80,899,500.20 |

CIDs *(for some sanity)*
| cid10_main_descricao | n |
| --- | --- |
| Neoplasia maligna da mama | 121692 |

In [28]:
# --- Imports ---
import pandas as pd

## 1. Identify and catalog procedures associated with HER2

In [2]:
procedures = pd.read_csv('../data/external/procedimentos-sigtap.csv', sep = ';')
procedures.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5560 entries, 0 to 5559
Data columns (total 2 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   procedimento_principal            5560 non-null   int64 
 1   procedimento_principal_descricao  5560 non-null   object
dtypes: int64(1), object(1)
memory usage: 87.0+ KB


In [13]:
# Search HER-2 in procedures
her2_procedures = procedures[procedures['procedimento_principal_descricao'].str.contains('HER-2')]
print(her2_procedures)

her2_procedures = her2_procedures['procedimento_principal'].to_list()
print(her2_procedures)

      procedimento_principal  \
671                202100049   
1859               304020419   
1860               304020427   
1861               304020435   
1862               304020443   
1906               304040185   
1934               304050261   
1935               304050270   
1936               304050288   
1937               304050296   
1938               304050300   
1939               304050318   

                       procedimento_principal_descricao  
671                 QUANTIFICACAO/AMPLIFICACAO DO HER-2  
1859  POLIQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
1860  MONOQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
1861  POLIQUIMIOTERAPIA COM DUPLO ANTI HER-2 DO CARC...  
1862  QUIMIOTERAPIA COM DUPLO ANTI-HER-2 DO CARCINOM...  
1906  POLIQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
1934  POLIQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
1935  POLIQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
1936  POLIQUIMIOTERAPIA DO CARCINOMA DE MAMA HER-2 P...  
193

# 2. Search procedures in APAC-AQ and APAC-AM

### APAC AQ

In [6]:
apac_aq = pd.read_csv('../data/processed/apac-quimio-2022.csv')

In [7]:
apac_aq.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3985483 entries, 0 to 3985482
Data columns (total 83 columns):
 #   Column                          Dtype  
---  ------                          -----  
 0   data_movimento                  int64  
 1   tipo_gestao                     object 
 2   codigo_gestao                   int64  
 3   estabelecimento_id              int64  
 4   numero_apac                     int64  
 5   data_competencia                int64  
 6   procedimento_principal          float64
 7   valor_aprovado_total            float64
 8   municipio_estabelecimento       int64  
 9   tipo_estabelecimento            int64  
 10  tipo_prestador                  int64  
 11  modalidade_estabelecimento      object 
 12  cnpj_estabelecimento            int64  
 13  cnpj_mantenedora                int64  
 14  cns_paciente                    object 
 15  codigo_idade                    int64  
 16  idade                           int64  
 17  sexo                       

In [9]:
# Filter by HER-2
apac_aq_her2 = apac_aq[apac_aq['procedimento_principal'].isin(her2_procedures)].copy()
apac_aq_her2.shape

del apac_aq

<class 'pandas.core.frame.DataFrame'>
Index: 121692 entries, 6 to 3985481
Data columns (total 83 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   data_movimento                  121692 non-null  int64  
 1   tipo_gestao                     121692 non-null  object 
 2   codigo_gestao                   121692 non-null  int64  
 3   estabelecimento_id              121692 non-null  int64  
 4   numero_apac                     121692 non-null  int64  
 5   data_competencia                121692 non-null  int64  
 6   procedimento_principal          121692 non-null  float64
 7   valor_aprovado_total            121692 non-null  float64
 8   municipio_estabelecimento       121692 non-null  int64  
 9   tipo_estabelecimento            121692 non-null  int64  
 10  tipo_prestador                  121692 non-null  int64  
 11  modalidade_estabelecimento      121692 non-null  object 
 12  cnpj_estabelecimento

### APAC AM

In [10]:
apac_am = pd.read_csv('../data/processed/apac-am-2022.csv')
apac_am.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27162604 entries, 0 to 27162603
Data columns (total 27 columns):
 #   Column                            Dtype  
---  ------                            -----  
 0   Unnamed: 0                        int64  
 1   numero_apac                       int64  
 2   procedimento_principal            float64
 3   valor_total_apac                  float64
 4   codigo_uf_municipio               int64  
 5   cns_paciente                      object 
 6   idade                             int64  
 7   sexo                              object 
 8   raca_cor                          int64  
 9   motivo_saida_permanencia          int64  
 10  data_ocorrencia                   float64
 11  cid_principal                     object 
 12  peso                              int64  
 13  altura                            int64  
 14  indicador_transplante             object 
 15  quantidade_transplantes           int64  
 16  indicador_gestante                

In [11]:
# Filter by HER-2
apac_am_her2 = apac_am[apac_am['procedimento_principal'].isin(her2_procedures)].copy()
apac_am_her2.shape

del apac_am

## Create metrics for:
- Total number of APACs
- Total number of patients (CNS)
- Total value of APACs.

In [27]:
print(f"Total number of APACs: {apac_aq_her2.shape[0]:,.0f}")

print(f"Total number of patients: {apac_aq_her2.value_counts('cns_paciente').shape[0]:,.0f}")

print(f"Total number of patients: R$ {apac_aq_her2['valor_aprovado_total'].sum():,.2f}")

print(f"\nCIDs\n{apac_aq_her2.value_counts('cid10_main_descricao')}\n")

Total number of APACs: 121,692
Total number of patients: 18,916
Total number of patients: R$ 80,899,500.20

CIDs
cid10_main_descricao
Neoplasia maligna da mama    121692
Name: count, dtype: int64

