# Code snippets for loading survival datasets into R

## `lung`

Sourced from the R `survival` package.

**Shape:**
167 observations of 9 variables,

Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.

**References:**
- Loprinzi CL. et al. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 12(3):601-7.

In [23]:
lung_df <- read.csv("lung_dataset.csv", header = TRUE)
lung_df <- lung_df[complete.cases(lung_df),]

# Surv(time, status)

## `udca`

Sourced from the R `survival` package.

**Shape:**
169 observations of 6 variables,

Data from a trial of ursodeoxycholic acid (UDCA) in patients with primary biliary cirrohosis (PBC).

**References:**
- T. M. Therneau and P. M. Grambsch. (2000). Modeling survival data: extending the Cox model. Springer.
- K. D. Lindor et al. (1994). Ursodeoxycholic acid in the treatment of primary biliary cirrhosis. Gastroenterology, 106:1284-1290

In [24]:
udca_df <- read.csv("udca_dataset.csv", header = TRUE)
udca_df <- udca_df[complete.cases(udca_df),]

# Surv(time, status)

## `diabetic`

Sourced from the R `survival` package.

**Shape:**
394 observations of 8 variables,

Partial results from a trial of laser coagulation for the treatment of diabetic retinopathy.

**References:**
- Huster, Brookmeyer and Self, Biometrics, 1989.
- American Journal of Ophthalmology, 1976, 81:4, pp 383-396

In [25]:
diabetic_df <- read.csv("diabetic_dataset.csv", header = TRUE)
diabetic_df <- diabetic_df[complete.cases(diabetic_df),]

# Surv(time, status)

## `flchain`

Sourced from the R `survival` package.

**Shape:**
1962 observations of 11 variables,

This is a stratified random sample containing 1/2 of the subjects from a study of the relationship between serum free light chain (FLC) and mortality. 

**References:**
- A Dispenzieri et al. (2012). Use of monclonal serum immunoglobulin free light chains to predict overall survival in the general population, Mayo Clinic Proceedings 87:512-523.
- R Kyle et al. (2006). Prevalence of monoclonal gammopathy of undetermined significance, New England J Medicine 354:1362-1369.

In [26]:
flchain_df <- read.csv("flchain_dataset.csv", header = TRUE)
flchain_df <- flchain_df[complete.cases(flchain_df),]

# Surv(futime, death)

## `rotterdam`

Sourced from the R `survival` package.

**Shape:**
2982 observations of 11 variables,

This dataset includes 2982 primary breast cancers patients whose records were included in the Rotterdam tumor bank.

**References:**
- Patrick Royston and Douglas Altman. (2013). External validation of a Cox prognostic model: principles and methods. BMC Medical Research Methodology, 13:33

In [27]:
rotterdam_df <- read.csv("rotterdam_dataset.csv", header = TRUE)
rotterdam_df <- rotterdam_df[complete.cases(rotterdam_df),]

# Surv(dtime, death)

## `patient`

Sourced from the R `pammtools` package.

**Shape:**
1985 observations of 9 variables,

A data set containing the survival time (or hospital release time) among other covariates. The full data is available [here](https://github.com/adibender/elra-biostats). 


In [28]:
patient_df <- read.csv("patient_dataset.csv", header = TRUE)
patient_df <- patient_df[complete.cases(patient_df),]

# Surv(Survdays, PatientDied)

## `pbc`

Sourced from the R `randomForestSRC` package.

**Shape:**
276 observations of 19 variables,

This data is from the Mayo Clinic trial in PBC conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine.

**References:**
- T Therneau and P Grambsch. (2000). Modeling Survival Data: Extending the Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.

In [29]:
pbc_df <- read.csv("pbc_dataset.csv", header = TRUE)
pbc_df <- pbc_df[complete.cases(pbc_df),]

# Surv(days, status)

## `ttm`

Sourced from the R `randomForestSRC` package.

**Shape:**
551 observations of 47 variables. 

Number of days before a movie grosses $1M USD. These data are a somewhat biased random sample of 551 movies released between 2015 and 2018.


In [30]:
ttm_df <- read.csv("ttm_dataset.csv", header = TRUE)
ttm_df <- ttm_df[complete.cases(ttm_df),]

# Surv(time, event)