# Classifier Model to Predict PONV
> Detecting Post Operative Nausea and Vomiting before it happens

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]

## Background

"I thought you just finished your first project? What're you stressing about?" my wife asked at dinner. Little did she know that I felt unbelievable pressure to come up with a meaningful topic since I had selected a trivial one for the first of my four bootcamp projects. I tried to play it cool though and casually threw the ball back in her court. "Yeah, I'm thinking ahead to my next project. Is there anything at work where predictive analytics might help?"  

My wife, who works in the medical field, started listing off the common medical classification problems you typically see on Kaggle. I was on a different wave length though. Being on a Freakonomics podcast kick, I wanted to solve a problem that was under the radar; something on a smaller scale that people overlooked but if solved, could result in substantial change for the better.

My next question would lead to my project 3 topic. **"Hey, you seem to talk with your patients a lot about nausea. What's that all about?"**

**Post Operative Nausea and Vomiting (PONV)** is the phenomenon of nausea, vomiting, or retching experienced by a patient in the postanesthesia care unit (PACU) or within 24 hours following a surgical procedure. The average incidence of PONV after general anesthesia is about 30% in all post-surgical patients but up to 80% in high-risk patients {% fn 1 %}. Getting a little motion sickness after an operation may not sound like it's a big deal but according to surveys, patients would rather be in pain than deal with nausea. What complicates this even further is nowadays, physicians and hospitals are reimbursed by insurance companies based on patient satisfaction surveys so if a patient gets PONV, it's not only additional resources spent helping patients recover, but also a hit to the hospital's bottom line.

![](https://prccustomresearch.com/wp-content/uploads/2019/Blog_Images/GettyImages-157418952-1200x627.jpg)

Anesthesiologists are the doctors typically tasked to deal with PONV and many of them use the simplified scoring system in the chart below. Surely there's a way to use more data and make this a more precise scoring system if it's that important.

![](../images/PONV/apfel_score.jpg)

## Did I just perform ETL?

### Step 1: Clean the data

I would've loved to do more healthcare related data science projects but the primary hurdle was accessibility to data. Fortunatley for this subject, I was able to find some raw data that was available from a research paper written by a physician at the University of Sao Paulo {% fn 2 %}. While I was grateful to have the data, it wasn't the cleanest. There was a lot of information missing and some of it didn't make any sense. Example:

**Patient 2**  
**Smoker?:** No  
**Last cigarette?** 2 weeks ago  

I had to make judgment calls for cases like this, which was to say that I assumed if a human were to make an error, it'd be in filling out whether they are a smoker or not rather than accidentally filling out that they last smoked a cigarette 2 weeks ago.

### Step 2: Put it in the cloud

One of the requirements for the assignment was to set up a SQL database on an AWS EC/2 machine. While I had gotten somewhat comfortable running queries, creating a table consisting of over 60 features as my first table ever was a bit challenging. Of course I had a near meltdown when it finally ran and the output came out like this:
![](../images/IMG_2747.JPG)

Word to the wise: use SQL clients. It turned out that's just what things look like on CLI sometimes.

### Step 3: Never tell me the odds... from a tree

Having true probability was an important component so my choices were either Naive Bayes or Logistic Regression in terms of models. 

In terms of metrics, I first optimized for ROC/AUC and subsequently tuned to get the best F1. It was important to minimize both false-positives and false-negatives but quantifying which was more important to minimize was difficult. 

- **False-Positives**: On the surface, it might be easy to say that having more false-positives is the lesser of two evils. Aggressive treatment may require unnecessary use of drugs that counteract nausea but the drugs do come with side effects. There's a value to patient health and trying to quantify it is rarely straight forward.  
- **False-Negatives**: If we're only looking 10 feet ahead of us, figuratively, there's a lot more short term consequences with not handling a high-risk PONV patient appropriately

## Incremental Gain

Although my model performed better for the test population, it was incremental and I would probably have to do some hypothesis testing to be able to explicitly say that the model is a superior substitute.

![](../images/test_scores.png)

The bigger issue though is that the Apfel test is so much simpler and yet it performed just as well. Unless the model can be improved to the point that it is substantively better than something like the Apfel scoring method, it wouldn't make sense for physicians to input values for nearly 5x the features.

![](../images/Streamlit_PONV.png "A snapshot of the app I made via StraemLit")

### Thoughts or Inputs?

If you have thoughts on how I might change my approach to the PONV problem, I'm all ears!

{{ 'Choi SU. Is postoperative nausea and vomiting still the big "little" problem?. Korean J Anesthesiol. 2016;69(1):1-2. doi:10.4097/kjae.2016.69.1.1' | fndetail: 1 }}  
{{Guimaraes, Gabriel (2018), “PONV risk factors in onchological surgery”, Mendeley Data, v1
http://dx.doi.org/10.17632/gsnj8vmgm2.1 | fndetail: 2}}