# Analyzing the Impact of Ascites and D-Penicillamine on Mortality in Primary Biliary Cirrhosis Patients: A Statistical Inference Study
#### Authors: Hrishi Logani, Rithika Nair, Yuexiang Ni, Yuxi Zhang (Group 43)

## Introduction

Primary biliary cirrhosis (PBC) is a chronic condition characterized by gradual damage and deterioration of the liver's bile ducts, leading to substantial liver-related health challenges. PBC stands as a prominent contributor to liver-related illnesses and fatalities. The prognosis of PBC is influenced by multiple factors critical to effective treatment (Prince et al., 2002). Notably, advanced PBC often manifests the complication of ascites, characterized by an accumulation of fluid in the abdominal cavity (Purohit, 2015). This project endeavors to examine how ascites impacts the prognosis of patients receiving D-penicillamine. It's worth noting that the effectiveness of D-penicillamine remains uncertain (Purohit, 2015). We chose to analyze this to improve patient treatment insights. Thus, we are asking the following question:

#### For Primary Biliary Cirrhosis patients administered with D-Penicillamine, does the presence of Ascites determine their mortality?

To investigate this, we shall analyze the difference in mortality rate based on the following categorical variables and the scale parameter will be a standard error of proportions.

* Type of treatment (D-penicillamine or Placebo)
* Presence of Ascites

The analysis will focus on comparing proportions, as both "Drug" type and "Ascites" presence have only two discrete outcomes. Standard error will be used to quantify the statistic's variation from the population proportion. This project seeks to provide insights into the impact of "Ascites" on the prognosis of PBC patients treated with "D-penicillamine," using a comprehensive dataset for a thorough investigation.

The dataset used is the __[Cirrhosis Patient Survival Prediction](https://archive.ics.uci.edu/dataset/878/cirrhosis+patient+survival+prediction+dataset-1)__ dataset from the UCI Machine Learning Repository, containing information on 418 PBC patients. Each patient's data includes unique identifiers ("ID"), days between registration ("N_Days"), patient status ("Status"), drug type ("Drug"), age ("Age"), sex ("Sex"), and various clinical parameters.

## Methods & Results

### Loading Necessary Libraries

In [1]:
library(tidyverse)
library(broom)
library(repr)
library(digest)
library(infer)
library(gridExtra)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.4      [32m✔[39m [34mpurrr  [39m 1.0.1 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.3.0      [32m✔[39m [34mstringr[39m 1.5.1 
[32m✔[39m [34mreadr  [39m 2.1.4      [32m✔[39m [34mforcats[39m 1.0.0 
“package ‘stringr’ was built under R version 4.2.3”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Attaching package: ‘gridExtra’


The following object is masked from ‘package:dplyr’:

    combine




### Downloading Dataset from the Web

Downloading the dataset that is stored in a public repository.

In [2]:
# URL containing the dataset
url <- "https://raw.githubusercontent.com/Rithika-Nair/STAT-201-Final_Project/main/cirrhosis.csv"

# Download cirrhosis.csv from the url
download.file(url, "cirrhosis.csv")

# Store the dataset into variable
cirr_data <- read.csv("cirrhosis.csv")

# Display the first few rows of the dataset
head(cirr_data)

Unnamed: 0_level_0,ID,N_Days,Status,Drug,Age,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
Unnamed: 0_level_1,<int>,<int>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>
1,1,400,D,D-penicillamine,21464,F,Y,Y,Y,Y,14.5,261,2.6,156,1718.0,137.95,172,190.0,12.2,4
2,2,4500,C,D-penicillamine,20617,F,N,Y,Y,N,1.1,302,4.14,54,7394.8,113.52,88,221.0,10.6,3
3,3,1012,D,D-penicillamine,25594,M,N,N,N,S,1.4,176,3.48,210,516.0,96.1,55,151.0,12.0,4
4,4,1925,D,D-penicillamine,19994,F,N,Y,Y,S,1.8,244,2.54,64,6121.8,60.63,92,183.0,10.3,4
5,5,1504,CL,Placebo,13918,F,N,Y,Y,N,3.4,279,3.53,143,671.0,113.15,72,136.0,10.9,3
6,6,2503,D,Placebo,24201,F,N,Y,N,N,0.8,248,3.98,50,944.0,93.0,63,,11.0,3


### Data Cleaning and Preprocessing

Selecting columns relevant to our analysis: `Status`, `Drug`, `Ascites`.

In [3]:
cirrhosis <- cirr_data %>%
        select(Status, Drug, Ascites)

head(cirrhosis)
nrow(cirrhosis)

Unnamed: 0_level_0,Status,Drug,Ascites
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,D,D-penicillamine,Y
2,C,D-penicillamine,N
3,D,D-penicillamine,N
4,D,D-penicillamine,N
5,CL,Placebo,N
6,D,Placebo,N


Checking for missing values in each of the columns and removing the rows that contain missing values.

In [4]:
# Printing the number of missing values in each column
print(sum(is.na(cirrhosis$Status)))
print(sum(is.na(cirrhosis$Drug)))
print(sum(is.na(cirrhosis$Ascites)))

[1] 0
[1] 106
[1] 106


In [5]:
# Filtering out the rows containing NA values in either Drug or Ascites columns
cirrhosis_filtered <- cirrhosis %>%
        filter(!is.na(Drug)) %>%
        filter(!is.na(Ascites))

head(cirrhosis_filtered)
nrow(cirrhosis_filtered)

Unnamed: 0_level_0,Status,Drug,Ascites
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,D,D-penicillamine,Y
2,C,D-penicillamine,N
3,D,D-penicillamine,N
4,D,D-penicillamine,N
5,CL,Placebo,N
6,D,Placebo,N


## References


Prince, M., Chetwynd, A., Newman, W., Metcalf, J. V., &amp; James, O. F. W. (2002). Survival and symptom progression in a geographically based cohort of patients with primary biliary cirrhosis: Follow-up for up to 28 years. Gastroenterology, 123(4), 1044–1051. __https://doi.org/10.1053/gast.2002.36027__

Purohit, T. (2015). Primary biliary cirrhosis: Pathophysiology, clinical presentation and therapy. World Journal of Hepatology, 7(7), 926. __https://doi.org/10.4254/wjh.v7.i7.926__

Dickson,E., Grambsch,P., Fleming,T., Fisher,L., and Langworthy,A.. (2023). Cirrhosis Patient Survival Prediction. UCI Machine Learning Repository. __https://doi.org/10.24432/C5R02G.__