# **CIRRHOSIS**
It is a chronic liver disease marked by degeneration of cells, inflammation, and fibrous thickening of tissue. It is typically a result of alcoholism or hepatitis.

# Study Design
A total of 424 PBC(primary biliary cirrhosis) patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the **randomized placebo-controlled trial of the drug D-penicillamine**.

***Variable Information***

1.   ID: unique identifier
2.   N_Days: number of days between registration and the earlier of death, transplantation, or study analysis time in July 1986
3.  Status: status of the patient C (censored), CL (censored due to liver tx), or D (death)
4. Drug: type of drug D-penicillamine or placebo
5. Age: age in [days]
6. Sex: M (male) or F (female)
7. Ascites: presence of ascites N (No) or Y (Yes)
8. Hepatomegaly: presence of hepatomegaly N (No) or Y (Yes)
9. Spiders: presence of spiders N (No) or Y (Yes)
10. Edema: presence of edema N (no edema and no diuretic therapy for edema), S (edema present without diuretics, or edema resolved by diuretics), or Y (edema despite diuretic therapy)
11. Bilirubin: serum bilirubin in [mg/dl]
12. Cholesterol: serum cholesterol in [mg/dl]
13. Albumin: albumin in [gm/dl]
14. Copper: urine copper in [ug/day]
15. Alk_Phos: alkaline phosphatase in [U/liter]
16. SGOT: SGOT in [U/ml]
17. Triglycerides: triglicerides in [mg/dl]
18. Platelets: platelets per cubic [ml/1000]
19. Prothrombin: prothrombin time in seconds [s]
20. Stage: histologic stage of disease (1, 2, 3, or 4)

***Acknowledgement***

This dataset is taken from Kaggle.com.

**Cirrhosis Prediction Dataset**

https://www.kaggle.com/datasets/fedesoriano/cirrhosis-prediction-dataset

In [92]:
import pandas as pd

In [93]:
from google.colab import files
uploaded = files.upload()
df = pd.read_csv("cirrhosis.csv")
df.head()

Saving cirrhosis.csv to cirrhosis (3).csv


Unnamed: 0,ID,N_Days,Status,Drug,Age,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
0,1,400,D,D-penicillamine,21464,F,Y,Y,Y,Y,14.5,261.0,2.6,156.0,1718.0,137.95,172.0,190.0,12.2,4.0
1,2,4500,C,D-penicillamine,20617,F,N,Y,Y,N,1.1,302.0,4.14,54.0,7394.8,113.52,88.0,221.0,10.6,3.0
2,3,1012,D,D-penicillamine,25594,M,N,N,N,S,1.4,176.0,3.48,210.0,516.0,96.1,55.0,151.0,12.0,4.0
3,4,1925,D,D-penicillamine,19994,F,N,Y,Y,S,1.8,244.0,2.54,64.0,6121.8,60.63,92.0,183.0,10.3,4.0
4,5,1504,CL,Placebo,13918,F,N,Y,Y,N,3.4,279.0,3.53,143.0,671.0,113.15,72.0,136.0,10.9,3.0


In [94]:
df.tail()

Unnamed: 0,ID,N_Days,Status,Drug,Age,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
413,414,681,D,,24472,F,,,,N,1.2,,2.96,,,,,174.0,10.9,3.0
414,415,1103,C,,14245,F,,,,N,0.9,,3.83,,,,,180.0,11.2,4.0
415,416,1055,C,,20819,F,,,,N,1.6,,3.42,,,,,143.0,9.9,3.0
416,417,691,C,,21185,F,,,,N,0.8,,3.75,,,,,269.0,10.4,3.0
417,418,976,C,,19358,F,,,,N,0.7,,3.29,,,,,350.0,10.6,4.0


In [95]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             418 non-null    int64  
 1   N_Days         418 non-null    int64  
 2   Status         418 non-null    object 
 3   Drug           312 non-null    object 
 4   Age            418 non-null    int64  
 5   Sex            418 non-null    object 
 6   Ascites        312 non-null    object 
 7   Hepatomegaly   312 non-null    object 
 8   Spiders        312 non-null    object 
 9   Edema          418 non-null    object 
 10  Bilirubin      418 non-null    float64
 11  Cholesterol    284 non-null    float64
 12  Albumin        418 non-null    float64
 13  Copper         310 non-null    float64
 14  Alk_Phos       312 non-null    float64
 15  SGOT           312 non-null    float64
 16  Tryglicerides  282 non-null    float64
 17  Platelets      407 non-null    float64
 18  Prothrombi

In [96]:
df.shape

(418, 20)

In [97]:
df.describe()

Unnamed: 0,ID,N_Days,Age,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
count,418.0,418.0,418.0,418.0,284.0,418.0,310.0,312.0,312.0,282.0,407.0,416.0,412.0
mean,209.5,1917.782297,18533.351675,3.220813,369.510563,3.49744,97.648387,1982.655769,122.556346,124.702128,257.02457,10.731731,3.024272
std,120.810458,1104.672992,3815.845055,4.407506,231.944545,0.424972,85.61392,2140.388824,56.699525,65.148639,98.325585,1.022,0.882042
min,1.0,41.0,9598.0,0.3,120.0,1.96,4.0,289.0,26.35,33.0,62.0,9.0,1.0
25%,105.25,1092.75,15644.5,0.8,249.5,3.2425,41.25,871.5,80.6,84.25,188.5,10.0,2.0
50%,209.5,1730.0,18628.0,1.4,309.5,3.53,73.0,1259.0,114.7,108.0,251.0,10.6,3.0
75%,313.75,2613.5,21272.5,3.4,400.0,3.77,123.0,1980.0,151.9,151.0,318.0,11.1,4.0
max,418.0,4795.0,28650.0,28.0,1775.0,4.64,588.0,13862.4,457.25,598.0,721.0,18.0,4.0


In [98]:
df.nunique()

ID               418
N_Days           399
Status             3
Drug               2
Age              344
Sex                2
Ascites            2
Hepatomegaly       2
Spiders            2
Edema              3
Bilirubin         98
Cholesterol      201
Albumin          154
Copper           158
Alk_Phos         295
SGOT             179
Tryglicerides    146
Platelets        243
Prothrombin       48
Stage              4
dtype: int64

In [99]:
df.columns

Index(['ID', 'N_Days', 'Status', 'Drug', 'Age', 'Sex', 'Ascites',
       'Hepatomegaly', 'Spiders', 'Edema', 'Bilirubin', 'Cholesterol',
       'Albumin', 'Copper', 'Alk_Phos', 'SGOT', 'Tryglicerides', 'Platelets',
       'Prothrombin', 'Stage'],
      dtype='object')

In [100]:
df.corr()

Unnamed: 0,ID,N_Days,Age,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
ID,1.0,-0.354305,0.037136,-0.062154,0.032897,-0.128924,-0.098663,-0.352856,-0.012097,-0.0341,-0.076699,-0.19193,-0.033757
N_Days,-0.354305,1.0,-0.125934,-0.403953,-0.138236,0.430829,-0.364809,0.149269,-0.225492,-0.153,0.151361,-0.11147,-0.366193
Age,0.037136,-0.125934,1.0,0.002362,-0.15762,-0.18235,0.061549,-0.047247,-0.149869,0.022065,-0.148201,0.11376,0.189083
Bilirubin,-0.062154,-0.403953,0.002362,1.0,0.397129,-0.314177,0.456918,0.116984,0.44173,0.436748,-0.013435,0.314894,0.200731
Cholesterol,0.032897,-0.138236,-0.15762,0.397129,1.0,-0.069733,0.126115,0.149473,0.353246,0.27683,0.19171,-0.030811,0.011164
Albumin,-0.128924,0.430829,-0.18235,-0.314177,-0.069733,1.0,-0.264771,-0.101456,-0.220047,-0.103417,0.158659,-0.200592,-0.305296
Copper,-0.098663,-0.364809,0.061549,0.456918,0.126115,-0.264771,1.0,0.187357,0.293829,0.279852,-0.064403,0.218224,0.2694
Alk_Phos,-0.352856,0.149269,-0.047247,0.116984,0.149473,-0.101456,0.187357,1.0,0.112217,0.180082,0.143733,0.089384,0.041273
SGOT,-0.012097,-0.225492,-0.149869,0.44173,0.353246,-0.220047,0.293829,0.112217,1.0,0.126119,-0.120147,0.112174,0.164945
Tryglicerides,-0.0341,-0.153,0.022065,0.436748,0.27683,-0.103417,0.279852,0.180082,0.126119,1.0,0.103212,0.020122,0.123899


In [101]:
print(df.duplicated())

0      False
1      False
2      False
3      False
4      False
       ...  
413    False
414    False
415    False
416    False
417    False
Length: 418, dtype: bool


In [102]:
df.T
#Transposes rows and columns.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,408,409,410,411,412,413,414,415,416,417
ID,1,2,3,4,5,6,7,8,9,10,...,409,410,411,412,413,414,415,416,417,418
N_Days,400,4500,1012,1925,1504,2503,1832,2466,2400,51,...,1067,1072,1119,1097,989,681,1103,1055,691,976
Status,D,C,D,D,CL,D,C,D,D,D,...,C,C,C,C,C,D,C,C,C,C
Drug,D-penicillamine,D-penicillamine,D-penicillamine,D-penicillamine,Placebo,Placebo,Placebo,Placebo,D-penicillamine,Placebo,...,,,,,,,,,,
Age,21464,20617,25594,19994,13918,24201,20284,19379,15526,25772,...,15706,14245,18628,24472,12784,24472,14245,20819,21185,19358
Sex,F,F,M,F,F,F,F,F,F,F,...,F,F,F,F,F,F,F,F,F,F
Ascites,Y,N,N,N,N,N,N,N,N,Y,...,,,,,,,,,,
Hepatomegaly,Y,Y,N,Y,Y,Y,Y,N,N,N,...,,,,,,,,,,
Spiders,Y,Y,N,Y,Y,N,N,N,Y,Y,...,,,,,,,,,,
Edema,Y,N,S,S,N,N,N,N,N,Y,...,N,N,N,N,N,N,N,N,N,N


In [103]:
df['Drug'].value_counts

<bound method IndexOpsMixin.value_counts of 0      D-penicillamine
1      D-penicillamine
2      D-penicillamine
3      D-penicillamine
4              Placebo
            ...       
413                NaN
414                NaN
415                NaN
416                NaN
417                NaN
Name: Drug, Length: 418, dtype: object>

In [104]:
df.drop('Age',inplace = True, axis = 1)
df.head()

Unnamed: 0,ID,N_Days,Status,Drug,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
0,1,400,D,D-penicillamine,F,Y,Y,Y,Y,14.5,261.0,2.6,156.0,1718.0,137.95,172.0,190.0,12.2,4.0
1,2,4500,C,D-penicillamine,F,N,Y,Y,N,1.1,302.0,4.14,54.0,7394.8,113.52,88.0,221.0,10.6,3.0
2,3,1012,D,D-penicillamine,M,N,N,N,S,1.4,176.0,3.48,210.0,516.0,96.1,55.0,151.0,12.0,4.0
3,4,1925,D,D-penicillamine,F,N,Y,Y,S,1.8,244.0,2.54,64.0,6121.8,60.63,92.0,183.0,10.3,4.0
4,5,1504,CL,Placebo,F,N,Y,Y,N,3.4,279.0,3.53,143.0,671.0,113.15,72.0,136.0,10.9,3.0


In [105]:
new_df1 = df.copy()
new_df1.tail()

Unnamed: 0,ID,N_Days,Status,Drug,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
413,414,681,D,,F,,,,N,1.2,,2.96,,,,,174.0,10.9,3.0
414,415,1103,C,,F,,,,N,0.9,,3.83,,,,,180.0,11.2,4.0
415,416,1055,C,,F,,,,N,1.6,,3.42,,,,,143.0,9.9,3.0
416,417,691,C,,F,,,,N,0.8,,3.75,,,,,269.0,10.4,3.0
417,418,976,C,,F,,,,N,0.7,,3.29,,,,,350.0,10.6,4.0


In [106]:
new_df1.fillna(99, inplace = True)
new_df1.tail()

Unnamed: 0,ID,N_Days,Status,Drug,Sex,Ascites,Hepatomegaly,Spiders,Edema,Bilirubin,Cholesterol,Albumin,Copper,Alk_Phos,SGOT,Tryglicerides,Platelets,Prothrombin,Stage
413,414,681,D,99,F,99,99,99,N,1.2,99.0,2.96,99.0,99.0,99.0,99.0,174.0,10.9,3.0
414,415,1103,C,99,F,99,99,99,N,0.9,99.0,3.83,99.0,99.0,99.0,99.0,180.0,11.2,4.0
415,416,1055,C,99,F,99,99,99,N,1.6,99.0,3.42,99.0,99.0,99.0,99.0,143.0,9.9,3.0
416,417,691,C,99,F,99,99,99,N,0.8,99.0,3.75,99.0,99.0,99.0,99.0,269.0,10.4,3.0
417,418,976,C,99,F,99,99,99,N,0.7,99.0,3.29,99.0,99.0,99.0,99.0,350.0,10.6,4.0


In [107]:
new_df1.isna().sum()

ID               0
N_Days           0
Status           0
Drug             0
Sex              0
Ascites          0
Hepatomegaly     0
Spiders          0
Edema            0
Bilirubin        0
Cholesterol      0
Albumin          0
Copper           0
Alk_Phos         0
SGOT             0
Tryglicerides    0
Platelets        0
Prothrombin      0
Stage            0
dtype: int64

In [108]:
new_df1['Status'] = pd.factorize(new_df1.Status)[0]
new_df1['Drug'] = pd.factorize(new_df1.Drug)[0]
new_df1['Sex'] = pd.factorize(new_df1.Sex)[0]
new_df1['Ascites'] = pd.factorize(new_df1.Ascites)[0]
new_df1['Hepatomegaly'] = pd.factorize(new_df1.Hepatomegaly)[0]
new_df1['Spiders'] = pd.factorize(new_df1.Spiders)[0]
new_df1['Edema'] = pd.factorize(new_df1.Edema)[0]
print(new_df1)
# 0 = female
# 1 = male

      ID  N_Days  Status  Drug  Sex  Ascites  Hepatomegaly  Spiders  Edema  \
0      1     400       0     0    0        0             0        0      0   
1      2    4500       1     0    0        1             0        0      1   
2      3    1012       0     0    1        1             1        1      2   
3      4    1925       0     0    0        1             0        0      2   
4      5    1504       2     1    0        1             0        0      1   
..   ...     ...     ...   ...  ...      ...           ...      ...    ...   
413  414     681       0     2    0        2             2        2      1   
414  415    1103       1     2    0        2             2        2      1   
415  416    1055       1     2    0        2             2        2      1   
416  417     691       1     2    0        2             2        2      1   
417  418     976       1     2    0        2             2        2      1   

     Bilirubin  Cholesterol  Albumin  Copper  Alk_Phos    SGOT 

In [110]:
new_df1['Bilirubin'] = new_df1['Bilirubin'].astype(int)
new_df1['Cholesterol'] = new_df1['Cholesterol'].astype(int)
new_df1['Albumin'] = new_df1['Albumin'].astype(int)
new_df1['Copper'] = new_df1['Copper'].astype(int)
new_df1['Alk_Phos'] = new_df1['Alk_Phos'].astype(int)
new_df1['SGOT'] = new_df1['SGOT'].astype(int)
new_df1['Tryglicerides'] = new_df1['Tryglicerides'].astype(int)
new_df1['Platelets'] = new_df1['Platelets'].astype(int)
new_df1['prothrombin'] = new_df1['Prothrombin'].astype(int)
new_df1['Stage'] = new_df1['Stage'].astype(int)
new_df1.dtypes

ID                 int64
N_Days             int64
Status             int64
Drug               int64
Sex                int64
Ascites            int64
Hepatomegaly       int64
Spiders            int64
Edema              int64
Bilirubin          int64
Cholesterol        int64
Albumin            int64
Copper             int64
Alk_Phos           int64
SGOT               int64
Tryglicerides      int64
Platelets          int64
Prothrombin      float64
Stage              int64
prothrombin        int64
dtype: object

In [112]:
new_df1.Status.value_counts()
# 0= death
# 1= censored
# 2=CL


1    232
0    161
2     25
Name: Status, dtype: int64

In [113]:
new_df1.Sex.value_counts()
# 0 = female
# 1 = male

0    374
1     44
Name: Sex, dtype: int64

In [114]:
new_df1.Drug.value_counts()
# 0= D-penicilian
# 1=placebo
# 2=unrecorded

0    158
1    154
2    106
Name: Drug, dtype: int64

In [115]:
new_df1.Ascites.value_counts()
# 0= yes
# 1= no
# 2 =unrecorded

1    288
2    106
0     24
Name: Ascites, dtype: int64

In [116]:
new_df1.Hepatomegaly.value_counts()
# 0= no
# 1=yes
# 2=unrecorded

0    160
1    152
2    106
Name: Hepatomegaly, dtype: int64

In [117]:
new_df1.Spiders.value_counts()
# 0= unrecoded
# 1= yes
# 2=no

1    222
2    106
0     90
Name: Spiders, dtype: int64

In [118]:
new_df1.Edema.value_counts()
# 0= no
# 1=yes
# 2 = unrecorded

1    354
2     44
0     20
Name: Edema, dtype: int64

In [119]:
new_df1.max()
# max return a maximum value in the column

ID                 418.0
N_Days            4795.0
Status               2.0
Drug                 2.0
Sex                  1.0
Ascites              2.0
Hepatomegaly         2.0
Spiders              2.0
Edema                2.0
Bilirubin           28.0
Cholesterol       1775.0
Albumin              4.0
Copper             588.0
Alk_Phos         13862.0
SGOT               457.0
Tryglicerides      598.0
Platelets          721.0
Prothrombin         99.0
Stage               99.0
prothrombin         99.0
dtype: float64

## **Conclusion**

A total of 418 people participated in this study, of which 158 patients took d-penicillin and complained of edema.