# Cohort features

In [1]:
# Libraries
suppressPackageStartupMessages(library(tidyverse))

# Scripts for data analysis
source("summaries.R")

# Global options
options(warn = -1)

# Data
load("df.Rdata")
glimpse(DATA)

Observations: 461
Variables: 53
$ patient_id            <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3,…
$ registry_id           <dbl> 19870070301, 19870070301, 19870070301, 19870070…
$ label_id              <fct> OO, OO, OO, OO, OO, OO, OO, JJ, JJ, JJ, I, I, I…
$ array_id              <int> 931, 931, 931, 935, 931, 931, 931, 935, 935, 93…
$ cores_number          <int> 3, 3, 3, 1, 3, 3, 3, 2, 2, 1, 6, 6, 6, 6, 6, 6,…
$ sp_id                 <fct> 05-S-4662, 05-S-4662, 05-S-4662, 05-S-9869, 07-…
$ turb_sequence         <int> 3, 3, 3, 4, 5, 5, 5, 3, 3, 1, 0, 0, 0, 0, 0, 0,…
$ histo_code            <ord> High-grade, High-grade, High-grade, Normal, Low…
$ pt_stage              <ord> Ta, Ta, Ta, NA, Ta, Ta, Ta, Tis, Tis, NA, T1, T…
$ recurrence            <fct> Yes, Yes, Yes, Yes, No, No, No, No, No, Yes, Ye…
$ progression_grade     <fct> No, No, No, No, No, No, No, No, No, Yes, No, No…
$ progression_stage     <fct> No, No, No, No, No, No, No, No, No, No, Yes, Ye…
$ progression_any   

Number of patients

In [2]:
unique(DATA$patient_id) %>% length

Number of TMA spots

In [3]:
nrow(DATA)

## Clinical and outcome features
The analysis carried out in this section includes features that are observed at the patient level (clinical and outcome features).

In [4]:
# Data wrangling: Summarizing clinical data by patients
CLINICAL <- DATA %>% 
    group_by(patient_id) %>% 
    summarize(
        sex = factor(unique(sex)),
        age = unique(age_dx)
    )

### Clinical features
#### Sex

In [5]:
CLINICAL %>% summary_fct_x(sex)

# A tibble: 2 x 3
  Levels     N  Freq
  <fct>  <int> <dbl>
1 Male      41  67.2
2 Female    20  32.8


#### Age, in years

In [6]:
CLINICAL %>% summary_num_x(age)

# A tibble: 1 x 8
      N  Mean    SD Median   IQR   Min   Max Missing
  <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <int>
1    61  67.9  9.81     68    13    47    89       0


### Outcome features
In order to describe tumor recurrence and progression at the patient level, we considered any positive event throughout follow-up as a positive event. In this scenario, a positive event means at least one positive event, v.g., a patient with tumor recurrence had _at least_ one episode of tumor recurrence. Overall mortality refers to all patients who died, regardless of cause of death.

In [7]:
# Data wrangling
OUTCOME <- DATA %>% 
    group_by(patient_id) %>% 
    summarize(
        fu_mo = unique(fu_mo),
        recurrence = factor(unique(rec_any_time)),
        progression_stage = factor(unique(pT_upstage_any_time)),
        progression_grade = factor(unique(grade_prog_any_time)),
        death = factor(unique(death))
    )

#### Follow-up, in months

In [8]:
OUTCOME %>% summary_num_x(fu_mo)

# A tibble: 1 x 8
      N  Mean    SD Median   IQR   Min   Max Missing
  <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <int>
1    61  42.6  46.5   39.1  35.3  2.13  275.       0


#### Recurrence (at any time)

In [9]:
OUTCOME %>% summary_fct_x(recurrence)

# A tibble: 2 x 3
  Levels                    N  Freq
  <fct>                 <int> <dbl>
1 With tumor recurrence    52  85.2
2 No tumor recurrence       9  14.8


#### Stage progression (at any time)

In [10]:
OUTCOME %>% summary_fct_x(progression_stage)

# A tibble: 2 x 3
  Levels                     N  Freq
  <fct>                  <int> <dbl>
1 No stage progression      55  90.2
2 With stage progression     6   9.8


#### Grade progression (at any time)

In [11]:
OUTCOME %>% summary_fct_x(progression_grade)

# A tibble: 2 x 3
  Levels                     N  Freq
  <fct>                  <int> <dbl>
1 No grade progression      56  91.8
2 With grade progression     5   8.2


#### Overall mortality

In [12]:
OUTCOME %>% summary_fct_x(death)

# A tibble: 3 x 3
  Levels     N  Freq
  <fct>  <int> <dbl>
1 Alive     49  80.3
2 Dead       6   9.8
3 <NA>       6   9.8


## Pathologic features
Pathologic fetures were evaluated at the TMA level, i.e., spot by spot. For the histologic diagnosis, "Nontumor" includes normal urothelium and papillary hyperplasia, "CIS" includes carcinoma in situ and dysplasia, "LG" and "HG" mean low-grade and high-grade noninvasive papillary urothelial carcinoma, respectively.

In [13]:
DX <- DATA %>% 
    mutate(
        dx = fct_collapse(histo_code,
            "Nontumor" = c("Normal", "Papillary hyperplasia"),
            "CIS" = c("CIS", "Dysplasia"),
            "LG" = "Low-grade",
            "HG" = "High-grade"
        ),
        dx = fct_relevel(dx,
            c("Nontumor", "CIS", "LG", "HG")
        ),
        pt_stage = fct_relevel(
            pt_stage,
            c("Tis", "Ta")
        ),
        host_response = fct_relevel(
            host_response,
            c("No inflammatory cells", "Rare inflammatory cells", "Lymphoid aggregates", "Intense inflammation")
        )
    )

### Histologic diagnosis

In [14]:
DX %>% summary_fct_x(dx)

# A tibble: 5 x 3
  Levels       N  Freq
  <ord>    <int> <dbl>
1 Nontumor    48  10.4
2 CIS         21   4.6
3 LG         116  25.2
4 HG         168  36.4
5 Invasive   108  23.4


### pT stage

In [15]:
DX %>% summary_fct_x(pt_stage)

# A tibble: 5 x 3
  Levels     N  Freq
  <ord>  <int> <dbl>
1 Tis       18   3.9
2 Ta       217  47.1
3 T1       161  34.9
4 T2        21   4.6
5 <NA>      44   9.5


### Host response

In [16]:
DX %>% summary_fct_x(host_response)

# A tibble: 5 x 3
  Levels                      N  Freq
  <ord>                   <int> <dbl>
1 No inflammatory cells     173  37.5
2 Rare inflammatory cells   163  35.4
3 Lymphoid aggregates        56  12.1
4 Intense inflammation        4   0.9
5 <NA>                       65  14.1
