# About the dataset
Kidney Stone Method Prediction| Beginners
https://www.kaggle.com/datasets/utkarshxy/kidney-stone-data/data

Introduction
In 1986, a group of urologists in London published a research paper in The British Medical Journal that compared the effectiveness of two different methods to remove kidney stones. Treatment A was open surgery (invasive), and treatment B was percutaneous nephrolithotomy (less invasive).

When they looked at the results from 700 patients, treatment B had a higher success rate. However, when they only looked at the subgroup of patients different kidney stone sizes, treatment A had a better success rate.

Simpon's paradox occurs when trends appear in subgroups but disappear or reverse when subgroups are combined.
In this project -> medical data published in 1986 in "The British Medical Journal" where the effectiveness of two types of kidney stone removal treatments (A - open surgery and B - percutaneous nephrolithotomy) were compared.

Using multiple logistic regression and visualize model output to help the doctors determine if there is a difference between the two treatments. While not required, it will also help to have some knowledge of inferential statistics.

Content
The data contains three columns: treatment (A or B), stone_size (large or small) and success (0 = Failure or 1 = Success).

In [None]:
#Install the library - all hypothesis analysis
!pip install scipy



In [None]:
# Install libraries not nessesary for google colab
# !pip install pandas


### Import all the libraries ###
import pandas as pd
from scipy.stats import ttest_ind, f_oneway, chi2_contingency

In [None]:
#load the data
data = pd.read_csv("kidney_stone_data.csv")
df = pd.DataFrame(data)

In [None]:
data.head(2)

Unnamed: 0,treatment,stone_size,success
0,B,large,1
1,A,large,1


In [None]:
df.tail(3)

Unnamed: 0,treatment,stone_size,success
697,B,small,1
698,A,large,1
699,A,small,1


In [None]:
df.shape

(700, 3)

In [None]:
df.dtypes

treatment     object
stone_size    object
success        int64
dtype: object

In [None]:
df.describe()

Unnamed: 0,success
count,700.0
mean,0.802857
std,0.398126
min,0.0
25%,1.0
50%,1.0
75%,1.0
max,1.0


## T-Test

t-test deals mostly with numerical dada set
Ho - Null: There is no significant difference between success rate of Treatment A and Treatment B

H1 - Alternate: There is a significant difference between success rate of Treatment A and Treatment B.

If p value less than 0.05 Ho hypothesis should be rejected
<br>
If p value more than 0.05 Ho hypothesis should be excepted

We are performing 1-sample t-test on the basis of one particular column - Income
<br>

under T-test there are 3 conditions:
1) 1-sample t-test
we only have 1 sample (we are focusing only on 1 variable, in our case, success rate)
Here we are comparing mean value of 1 sample with the population mean
<br>
If mean value of the sample somewhat related to population mean it means the sample is correct and it's given correct information about the population.
If there is huge difference between the sample mean and population mean we should perform different sample method

2) independ 2-groups t-test
2 column with numerical values -

3) paired sample t-test
2 combined group
example: you have 2 test scores of students score : 1 test was conducted before taking tuitions and another test was conducted after taking tuitions

In [None]:
#Success rate of Treatment A and Treatment B
success_A = df[df['treatment'] == 'A']['success']
success_B = df[df['treatment'] == 'B']['success']

#Perform Independent Sample t-test
t_stat = ttest_ind(success_A, success_B)
print(t_stat)



TtestResult(statistic=-1.5204003013436962, pvalue=0.12886323855136153, df=698.0)


# ANOVA
when we have more than 3 samples groups (variables, columns) we performing ANOVA test

In [None]:
# We using ANOVA test when we want to
#Success rate of Treatment A and Treatment B
success_small = df[df['stone_size'] == 'small']['success']
success_large = df[df['stone_size'] == 'large']['success']

# manisha@oeson.in
anova = f_oneway(success_small, success_large)
print(anova)

F_onewayResult(statistic=30.264409926452775, pvalue=5.2953760011433365e-08)


# Chi Square
Chi square test performs when we have the data set with only with string(object) - categorical data values presented

In [None]:
chi = pd.crosstab([df['treatment'], df['stone_size']], df['success'])

result = chi2_contingency(chi)
print(result)

Chi2ContingencyResult(statistic=31.51349743413072, pvalue=6.626702248891721e-07, dof=3, expected_freq=array([[ 51.84857143, 211.15142857],
       [ 17.15142857,  69.84857143],
       [ 15.77142857,  64.22857143],
       [ 53.22857143, 216.77142857]]))


In [None]:
#### DATA STRUCTURE #######################
#To print the sum of 2 numbers
a=10
b=20
print("The Sum:", (a+b))

The Sum: 30


# Operators in Python

### Arithmetic Operator

In [None]:
# Addition
x = 5 + 8
print(x)

# Substruction
x = 8 - 5
print(x)

# Multiplication
x = 8*5
print(x)

# Division
x = 16/3
print(x)

# Modulus
x = 4 % 2
print(x)

# Floor Division
x = 5//2
print(x)

13
3
40
5.333333333333333
0
2


### Comparison Operator

In [None]:
#### Equal to (==)
5 == 4

False

In [None]:
#### Not Equal to (!=)
4 != 4

False

In [None]:
#### Greater than (>)
5 > 2

True

In [None]:
#### Less than (>)
4 < 7

True

In [None]:
#### Greater than or equal to (>)
5 >= 3

True

In [None]:
#### Less than or equal to (>)
5 <= 3

False

### Assignment Operator

In [None]:
# Assign (= )
x = 5
print(x)

5


In [None]:
# Add and assign
x += 2
print(x)

7


In [None]:
# Substract and assign
x -= 3
print(x)

4


In [None]:
# Multiply and assign
x *= 4
print(x)

16


In [None]:
# Divide and assign
x /= 2
print(x)

8.0


In [None]:
# Floor division and assign
x //= 4
print(x)

2.0


In [None]:
# Modulus and assign
x %= 4
print(x)

2.0


### Logical Operators

In [None]:
# and
(4>5) and (3<6)

False

In [None]:
# or
(4>5) or (3<6)

True

In [None]:
# not
not(4>5)

True

## Data structure
Data structure is the way of organising and storing data that can be accessed and worked very efficiently at any point of time.

### Built-in data stuctures
In Python we have several built-in data structures
In this data staructures the position of elements starts from 0
Suppose we have to write 7 elements, for this we have to write 0-8,
if we will write 0-7 the last element will be excluded


### 1. List
the data structure that contains:
1. a sequence of elements in **ordered** format
2. **mutable** - once its defined it can be changed at any point of time
3. can contain **elements of different data types**
4. alaways created in **squared brackets []**

In [None]:
# creating list
my_list = [1,2,3, 'a', 'b', 'c']

In [None]:
# access element by its position
my_list[5]

'c'

In [None]:
# doing modification of the specific element
my_list[2] = 5
print(my_list)

[1, 2, 5, 'a', 'b', 'c']


In [None]:
# append function - append the element to the end
my_list.append('cat')
print(my_list)

[1, 2, 5, 'a', 'b', 'c', 'cat']


In [None]:
# my_list.remove('a')
my_list.remove('a')

### 2. Tuple
the data structure that contains:
1. a sequence of elements in **ordered** format
2. **non-mutable** - once its defined it can not be changed, any kind of modifications can not be done
   ideal for storing data that should not be modified, such as database records
3. can contain **elements of different data types**
4. alaways created in **parentheses ()**

In [None]:
# create a tuple:
my_tuple = (1, 2, 3, 'a', 'b', 'c')

In [None]:
my_tuple[5]

'c'

In [None]:
# we can not change in tuple - # any kind of modification can not be done with tuple!!!!
my_tuple[2] = 5
print(my_tuple)

TypeError: 'tuple' object does not support item assignment

In [None]:
# append function - append the elemnt to the end
my_list.append('cat')
print(my_tuple)

(1, 2, 3, 'a', 'b', 'c')


In [None]:
# my_list.remove('a') - # any kind of modification can not be done with tuple!!!!
my_tuple.remove('a')

AttributeError: 'tuple' object has no attribute 'remove'

### 3. Dictionary
Data structure that alows to store
1. **key-value pairs**
2. presented in **unordered** format
3. **mutable** - once its defined it can be changed at any point of time
4. **indexed by keys**, the keys are **unmutable** and have to be **unique**, the value can be repeated
   <br>id card is actualy the type of dictionary data structure
5. created with **cirly brackets {}**   

In [None]:
# Create Dictionary
my_dict = {'name':"Ryan",
           'age': 36,
           'gender': 'Male'
          }

In [None]:
# Access value by the index
my_dict['name']

'Ryan'

In [None]:
# Modifying value by the index
my_dict['age'] = 25

In [None]:
print(my_dict)

{'name': 'Ryan', 'age': 36, 'gender': 'Male'}


In [None]:
# add value to the dicionary Assignment Operator
my_dict = {'name':"Ryan",
           'age': 36,
           'gender': 'Male',
           'marital status':"married"
          }

In [None]:
print(my_dict)

{'name': 'Ryan', 'age': 36, 'gender': 'Male', 'marital status': 'married'}


### 4. Set

Data structure that used to store multiple items in a single variable. It contains:
1. **unordered unique** elements
2. **mutable** - once its defined it can be changed at any point of time, **elements** itsef are **unmutable**
3. can be created with **curly brackets {}**


In [None]:
## creating the set
my_set={1,2,3, 'a', 'b'}

In [None]:
## adding the element to the set
my_set.add('c')

In [None]:
print(my_set)

{1, 2, 3, 'c', 'b', 'a'}


In [None]:
## removing the element from the set
my_set.remove('c')

In [None]:
print(my_set)

{1, 2, 3, 'b', 'a'}


### 5. String
Data structures that contains:
1. a **collection of** alphabets, words or other **characters**
2. **non-mutable** - once its defined it can not be changed
3. created with single (') or double quot ('')


In [None]:
# Creating string
string = 'Hello'

In [None]:
# Accessing element in the string
string[0]

'H'

In [None]:
# Performing slicing -  accessing elements by the specific range
string[0:5]
## here we are accesing all 5 elemnts in the string from 0 to 4,
## the last 5th elemnt will not be accessed

'Hello'

### 6. Queues
simple data structures that allow us to store and retrieve data sequentially
Based on rule FIFO: - 1st IN and 1st OUT
accessed from the library queue
<br>
https://www.simplilearn.com/tutorials/python-tutorial/queue-in-python

In [None]:
## installing the queue library
!pip install queuelib



In [None]:
# creating Queue
from queue import Queue
q = Queue(maxsize = 3)

In [None]:
# print q size
print(q.qsize())

0


### 7. Arrays
An array is defined as a container that stores the collection of items at contiguous memory locations.

1. This ordered collection of elements with every value being of **the same data type** such as integer, float, and character type

2. mutable - we can update the array at any point in time. We can add an element to the existing array, delete any of the elements, even update the particular index with the new values.

https://www.scaler.com/topics/array-in-python/

In [None]:
# Creating array #############################

# For creating an array in Python, we need to import the array module.
import array as arr

# After importing, the array module we just have to use the array function
# which is used to create the arrays in Python

my_arr = arr.array('d', [20, 35, 55, 65])

print(my_arr)


array('d', [20.0, 35.0, 55.0, 65.0])


In [None]:
# Adding Element to Array #############################
# We can use 2 different methods to add the elements to the existing array.
# 1. .append(): added a single element at the end of the existing array.
# 2. .extend(): add an array at the end to the existing array

# use of append function to add the element
my_arr.append(77)
print('After use of append(), updated array is:', my_arr)

# use of extend function to add the list of element
my_arr.extend([1, 2, 3, 4])
print('After use of extend(), updated array is:', my_arr)

After use of append(), updated array is: array('d', [20.0, 35.0, 55.0, 65.0, 77.0])
After use of extend(), updated array is: array('d', [20.0, 35.0, 55.0, 65.0, 77.0, 1.0, 2.0, 3.0, 4.0])


In [None]:
# Accessing Elements from Array in Python #############################

my_arr_2element = my_arr[2]

print('Element at 2nd index of array is: ', my_arr_2element)


Element at 2nd index of array is:  55.0


In [None]:
# Removing Elements from Array #############################
# We can use 2 functions to remove the elements from the array.
# 1. .pop() -  deletes the element of the last index of the array
# 1. .remove() - takes an element
# as a parameter and remove the first occurrence of that element.

my_arr.remove(55)

print('This shows that element 55 is removed from the array. Updated Array is: ', my_arr)


my_arr.pop()

print('This shows that element which is at the last index of array removed. Updated Array is: ', my_arr)


This shows that element 55 is removed from the array. Updated Array is:  array('d', [20.0, 35.0, 65.0, 77.0, 1.0, 2.0, 3.0, 4.0])
This shows that element which is at the last index of array removed. Updated Array is:  array('d', [20.0, 35.0, 65.0, 77.0, 1.0, 2.0, 3.0])


In [None]:
# Searching Elements from Array in Python #############################
# .index() which returns the index of the first occurrence of that element.

import array as arr

myArr = arr.array('i', [20, 35, 55, 65])

indexOfSearchedElement = myArr.index(35)

print('index of element 35 in array is: ', indexOfSearchedElement)


index of element 35 in array is:  1


In [None]:
# Updating Elements in Array in Python #############################
# re-assign a new value
import array as arr

myArr = arr.array('i', [20, 35, 55, 65])

print('array before updating the value: ', myArr)

myArr[2] = 100

print('array after updating the 2nd index of the array: ',myArr)



array before updating the value:  array('i', [20, 35, 55, 65])
array after updating the 2nd index of the array:  array('i', [20, 35, 100, 65])


In [None]:
# Slicing of an array  #############################
my_arr = [1, 2, 3, 4, 5]
sliced_arr = my_arr[1:4]  # Extract elements from index 1 to 3 (exclusive)
print(sliced_arr)

my_arr2 = [1, 2, 3, 4, 5]
sliced_arr2 = my_arr2[::2]  # Extract elements at even indices
print(sliced_arr2)



[2, 3, 4]
[1, 3, 5]


In [None]:
# Counting Elements in a Array
my_array = [1, 2, 3, 4, 5]
count = len(my_array)
print(count)

5
