#### **UTI Diagnosis Prediction Model**

Cameron Presley
cameron@cameron-presley.com

**Overview**

The following model explores different ML modeling approaches to predict the presence of a Urinary Tract Infection (UTI) from a group of patients urinalysis results. Factors gathered include patient age and gender, urine characteristcs such as color, transparency, pH, specific gravity, as well as the concentration of glucose in the urine sample.  Orther features include White Blood Cell (WBC) and Red Blood Cell (RBC) counts, epithelial cells, mucuous threads, amorphouse urates, as well as absence/presence of bacteria.

**Content**

1. Import libraries and other important formatting as appropriate
2. Exploratory Data Analysis (EDA) and observations - Univariate, Bivariate, Multivariate Analysis
2. Data cleaning and pre-processing (as required)
3. Model building & Tuning
4. Model performance & evaluation
5. Observations and insights



**Data Source**

This dataset was collected from a local clinic in Northern Mindanao, Philippines, and is from April of 2020 to January of 2023 for the fulfilment of the researchers capstone entitled: Optimizing UTI Diagnosis with Machine Learning and Artificial Neural Network for Reducing Misdiagnoses by Agdeppa et al. (2023). Research paper can be aquired from: kristianrogeragdeppa@gmail.com

https://www.kaggle.com/datasets/avarice02/urinalysis-test-results

### 1. Import libraries and format settings

In [6]:
# avoid warnings
import warnings
warnings.filterwarnings('ignore')

#import libraries for working with data :  arrays, linear algebra, plotting, etc.
import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns   # for plotting and visualizing data

# import libraries for machine learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import precision_recall_curve


# import libraries for deep learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import load_model
from tensorflow.keras import regularizers
from tensorflow.keras import initializers
from tensorflow.keras import activations
from tensorflow.keras import optimizers
from tensorflow.keras import losses
from tensorflow.keras import metrics

# import libraries for evaluating the model
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import explained_variance_score
from sklearn.metrics import max_error
from sklearn.metrics import mean_poisson_deviance
from sklearn.metrics import mean_gamma_deviance
from sklearn.metrics import mean_tweedie_deviance



In [9]:
data = pd.read_csv('/Users/cameronpresley/DS_Portfolio/bio_informatics_ml_models/uti_prediction_model/urinalysis_tests.csv')
df = data.copy()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1436 entries, 0 to 1435
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        1436 non-null   int64  
 1   Age               1436 non-null   float64
 2   Gender            1436 non-null   object 
 3   Color             1435 non-null   object 
 4   Transparency      1436 non-null   object 
 5   Glucose           1436 non-null   object 
 6   Protein           1436 non-null   object 
 7   pH                1436 non-null   float64
 8   Specific Gravity  1436 non-null   float64
 9   WBC               1436 non-null   object 
 10  RBC               1436 non-null   object 
 11  Epithelial Cells  1436 non-null   object 
 12  Mucous Threads    1436 non-null   object 
 13  Amorphous Urates  1436 non-null   object 
 14  Bacteria          1436 non-null   object 
 15  Diagnosis         1436 non-null   object 
dtypes: float64(3), int64(1), object(12)
memory