# About the dataset:

kidney_stone_data

Source: https://www.kaggle.com/datasets/utkarshxy/kidney-stone-data/data

## Introduction
In 1986, a group of urologists in London published a research paper in The British Medical Journal that compared the effectiveness of two different methods to remove kidney stones. Treatment A was open surgery (invasive), and treatment B was percutaneous nephrolithotomy (less invasive).

When they looked at the results from 700 patients, treatment B had a higher success rate. However, when they only looked at the subgroup of patients different kidney stone sizes, treatment A had a better success rate.

Simpon's paradox occurs when trends appear in subgroups but disappear or reverse when subgroups are combined.
In this project -> medical data published in 1986 in "The British Medical Journal" where the effectiveness of two types of kidney stone removal treatments (A - open surgery and B - percutaneous nephrolithotomy) were compared.

Using multiple logistic regression and visualize model output to help the doctors determine if there is a difference between the two treatments. While not required, it will also help to have some knowledge of inferential statistics.

## Content
The data contains three columns: treatment (A or B), stone_size (large or small) and success (0 = Failure or 1 = Success).

In [None]:
#Install the library
!pip install scipy



In [None]:
#Import all the libraries
import pandas as pd
from scipy.stats import ttest_ind, f_oneway, chi2_contingency

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#Load the dataset
data = pd.read_csv("kidney_stone_data.csv")
df = pd.DataFrame(data)

In [None]:
df.head(10)

Unnamed: 0,treatment,stone_size,success
0,B,large,1
1,A,large,1
2,A,large,0
3,A,large,1
4,A,large,1
5,B,large,1
6,A,small,1
7,B,large,1
8,B,small,1
9,A,large,1


In [None]:
df.tail(10)

Unnamed: 0,treatment,stone_size,success
690,B,small,1
691,A,small,1
692,A,large,1
693,A,large,1
694,B,large,0
695,B,small,0
696,B,small,1
697,B,small,1
698,A,large,1
699,A,small,1


In [None]:
df.shape

(700, 3)

In [None]:
df.dtypes

treatment     object
stone_size    object
success        int64
dtype: object

In [None]:
df.describe()

Unnamed: 0,success
count,700.0
mean,0.802857
std,0.398126
min,0.0
25%,1.0
50%,1.0
75%,1.0
max,1.0


# T-Test

- H0 (Null Hypothesis): There is no significant difference in the effectiveness/success rate of Treatment A and Treatment B.

- H1 (Alternate Hypothesis): There is a significant difference in the effectiveness/success rate of Treatment A and Treatment B.

Note: If the p-value is less than 0.05, reject H0. Otherwise, it is accepted.

In [None]:
#Success rate of Treatment A and Treatment B:
success_A = df[df["treatment"] == "A"]["success"]
success_B = df[df["treatment"] == "B"]["success"]

#Perform Independent Sample T-Test
t_stat = ttest_ind(success_A, success_B)
print(t_stat)

TtestResult(statistic=-1.5204003013436962, pvalue=0.12886323855136153, df=698.0)


# Anova
- H0: There is no significant difference among the group means.
- H1: At least one group has a significantly different mean than the other(s).

In [None]:
#Success rate of Treatment A and Treatment B:
success_small = df[df["stone_size"] == "small"]["success"]
success_large = df[df["stone_size"] == "large"]["success"]

#Perform One Way Sample T-Test
t_stat = f_oneway(success_small, success_large)
print(t_stat)

F_onewayResult(statistic=30.264409926452775, pvalue=5.2953760011433365e-08)


# Chi Square
- H0: There is no significant effect between the treatment type and stone size on the overall success rate.
- H1: There is a signfiicant effect between the treatment type and stone size on the overall success rate.

In [None]:
chi = pd.crosstab([df["treatment"], df["stone_size"]], [df["success"]])

result = chi2_contingency(chi)
print(result)

Chi2ContingencyResult(statistic=31.51349743413072, pvalue=6.626702248891721e-07, dof=3, expected_freq=array([[ 51.84857143, 211.15142857],
       [ 17.15142857,  69.84857143],
       [ 15.77142857,  64.22857143],
       [ 53.22857143, 216.77142857]]))
