In [27]:
import datetime

import numpy as np
import pandas as pd
import pyodbc
import sklearn

import matplotlib.pyplot as plt

## Fetch data from database
We load and store to CSV instead when using the synthetic dataset in order to show the methodology instead   
When using the real dataset, the data is fetched from a SQL server.   
   
The dataset contains 2M journal entries for Swedish patients.   
The goal is to detect and potentially predict a specific case of adverse events, namely _fall injuries_.  
For 2016, the journal texts have been annotated by domain expert in the case it was (label `1`) or wasn't a fall injury (label `0`).   
There are a total of **172.749** annotated journal entries for 2016. Out of those are **302** confirmed fall injuries.  

In the original dataset there are 5508 unique patients, but in the synthetic one, there are 6258   
   
### The different datasets
There are two different datasets created for the project.  
1. **Dataset for finding previous fall injuries**     
This is the first dataset covered in the notebook.    
The goal for that dataset is to use the free-text in the journal text to find if fall injuries can be identified form only the free-text, and if so, which words are used by the model.
2. **Dataset for prediction future fall injuries**    
This dataset is the second one covered in the notebook and presented in the final cells.    
The goal for that dataset is to use the journal text or potentially more features from the _electronic health record_ (EHR) to predict, based only on previous information, if it is possible to predict which patients are in risk of a fall injury.




In [10]:
"""
db = pyodbc.connect('Driver={SQL Server};'
                   	'Server=LTCIDD;'
                   	'Database=RH_SMS;'
                   	'Trusted_Connection=yes;')
query = 'exec dbo.usp_GetNotes {}, {}'.format(1,1)
chunk = pd.read_sql(query, db)
chunk.shape
d = chunk
"""

pass # remove me if querying from the SQL Server

In [20]:
# For loading th synthetic dataset
chunk = pd.read_csv('../data/synthethetic_data_medical.csv')
chunk.head()

Unnamed: 0,omv_pk,Patient_ID,Inpatient_Admissiondatetime,Inpatient_Departure,omvantDT,sokkod,omvtext_concat,Class_2016,Patient_Gender,Patient_Age
0,2714_known,1,2010-12-28 09:00:00.000,2011-01-03 15:00:00.000,2010-12-28 17:00:00.000,known,"Cool book, fast shipping, great condition!*****",,F,77
1,2714_closest,1,2010-12-28 09:00:00.000,2011-01-03 15:00:00.000,2010-12-29 04:00:00.000,closest,"thoughtful, informative, must read more than once",,F,77
2,2714_pending,1,2010-12-28 09:00:00.000,2011-01-03 15:00:00.000,2010-12-29 18:00:00.000,pending,GREAT! Arrived within a few days...brand new!...,,F,77
3,2714_beneficial,1,2010-12-28 09:00:00.000,2011-01-03 15:00:00.000,2010-12-30 07:00:00.000,beneficial,A couple of months ago I decided to run a test...,,F,77
4,2714_sagem,1,2010-12-28 09:00:00.000,2011-01-03 15:00:00.000,2010-12-30 12:00:00.000,sagem,This is a &#34;time honored&#34; version. Exc...,,F,77


### Count number of journal entries in total
- Counts in total how many journal entries
- Also counts the number of entries by class, e.g. How many patients has or hasn't had a fall injury.

In [34]:
len(chunk) # Number of patient journals in total

2351348

In [65]:
chunk['Class_2016'].value_counts()

0.0    301277
1.0    302   
Name: Class_2016, dtype: int64

# Dataset for finding previous fall injuries
## Create New Columns
New columns are created for faster serching, filtering, etc.
- Insert the `year` column 
- Insert a ID column, `id`, unique for each journal entry, and not just an ID for the patients

In [83]:
dd = chunk.copy(deep=True)
col_name = "Inpatient_Admissiondatetime"

# Insert the new columns
years = [str(y).split('-')[0] for y in dd[col_name].values]
dd.insert(insert_idx, "year", years, True)

insert_idx = [idx+2 for idx, col in enumerate(dd.columns) if col == col_name] [0]
dd.insert(0, "id", list(dd.index), True)

df_2017 = dd[dd.year == "2017"]

## Create a Train-Validation split of the dataset dataset

The dataset is cleaned from NaN and missing values.   
The journals with and without fall injuries are separated. This is done to ensure that when making a train-test (or rather train-valid split) that the split is equal for the two different labels.  
The journal entries without fall injuries are under sampled to balance the distribution of the labels in the dataset better.   

#### Test set
We use the term `train-test` split, even thoough we extract and use it as a train and validation set.   
The reason for that is that the TRUE test set that is used are with-held and based manually verified journal texts from 2017.

In [42]:
COLUMNS = ['Patient_ID', 'omvtext_concat', 'Class_2016', 'year', 'id', 'omvantDT', 'Inpatient_Admissiondatetime', 'Patient_Gender', 'Patient_Age']
dd = dd[COLUMNS]


#Patient_Gender	Patient_Age
# only works for 2016 years data
def get_fall_injuries_by_label(data):
    """Get which journal entire was or wasn't a fall injury."""
    data = data[COLUMNS].dropna()
    fall_inj     = data[data['Class_2016'] == 1.0].reset_index(drop=True)
    not_fall_inj = data[data['Class_2016'] == 0.0].reset_index(drop=True)
    return fall_inj, not_fall_inj
    

def shuffle_dataset(data, seed=42):
    return sklearn.utils.shuffle(data, random_state=seed).reset_index(drop=True)


def drop_categories(df):
    """Remove some column names, once they already appear, but renamed."""
    df = df.drop(columns=['omvtext_concat', 'Class_2016'])
    return df


def rename_and_drop_categories(df):
    """ Add and drop column names. """
    
    df["text"] = df["omvtext_concat"]
    df["label"] = df["Class_2016"]
    return drop_categories(df)
    

def sort_based_on_journal_text_length(df):
    """Returns list of journal texts, sorted in by length in decending order."""
    s = df.text.str.len().sort_values().index
    sorted_df = neg.reindex(s)
    sorted_df = sorted_df.reset_index(drop=True)
    return sorted_df


def train_test_split_by_labels(pos, neg, split_ratio=0.8):
    """ Create train-test split.
    
    args:
        pos: DataFrame of all positive journal entries (fall injuries)
        neg: DataFrame of all negaitve journal entries (not fall injuries)
        split_ratio: fration of pos and neg examples in training set
    """
    pos = rename_and_drop_categories(pos)
    neg = rename_and_drop_categories(neg)
    
    # Sort and return the journal texts with the most words 
    # Then we under sample the number of negative examples 
    # Testing training the model on different splits, 
    # ... this seemed the largest split possible before 
    # ... performance decreased in precision and recall was reduced greatly.
    sorted_neg = sort_based_on_journal_text_length(neg)[-100_000:]
    neg = sorted_neg.sample(n = 5*len(pos))
    neg = neg.reset_index(drop=True)
    
    # Split into train-valid set (later valid set split into valid and test when training)
    pos_train_fraction = int(len(pos) * split_ratio)
    neg_train_fraction = int(len(neg) * split_ratio)
    pos_train = pos.loc[:pos_train_fraction-1]
    pos_test  = pos.loc[pos_train_fraction:]
    neg_train = neg.loc[:neg_train_fraction-1]
    neg_test  = neg.loc[neg_train_fraction:]
    
    # Shuffle the dataset
    train = shuffle_dataset(pd.concat([pos_train, neg_train]))
    train = drop_categories(train)
    test  = shuffle_dataset(pd.concat([pos_test, neg_test])) 
    test = drop_categories(test)
    
    return train, test

In [24]:
pos, neg = get_fall_injuries_by_label(dd)

In [40]:
# For our test purposes only. The real test set was provided by Region Halland
def create_test_set_by_year(df, year: str):
    return df[df.year == year].copy(deep=True)
    
da = rename_and_drop_categories(dd)
test_set_2017 = create_test_set_by_year(da, "2017")

test_set_2017.to_csv("../data/test_2017.csv", index=False)
test_set_2017.head()

Unnamed: 0,Patient_ID,year,id,omvantDT,Inpatient_Admissiondatetime,Patient_Gender,Patient_Age,text,label
119,10,2017,119,2017-12-21 02:00:00.000,2017-12-20 09:00:00.000,F,76,"first version. The Man, the Mystery & message ...",
120,10,2017,120,2017-12-21 19:00:00.000,2017-12-20 09:00:00.000,F,76,This book is one of the most provocative histo...,
121,10,2017,121,2017-12-22 07:00:00.000,2017-12-20 09:00:00.000,F,76,"I was hoping for something of more quality, bu...",
122,10,2017,122,2017-12-22 12:00:00.000,2017-12-20 09:00:00.000,F,76,I'm so happy I purchased this book. I passed t...,
123,10,2017,123,2017-12-22 13:00:00.000,2017-12-20 09:00:00.000,F,76,"This biography had as much color, intrigue and...",


# Store all filtered values to a train and test CSV 
### Limit the number of samples in the train and test set


In [43]:
train, valid = train_test_split_by_labels(pos, neg, 0.8)

# Save training and testing set
train.to_csv("../data/rh_train.csv", index=False)
valid.to_csv("../data/rh_valid.csv", index=False)

In [44]:
# Save the data from 2016 as a separate dataset
data_for_2016, _ = train_test_split_by_labels(pos, neg, 1)

# Save training and testing set
data_for_2016.to_csv("../data/rh_all_2016_data.csv", index=False)

In [46]:
pd.read_csv("../data/rh_train.csv")

Unnamed: 0,Patient_ID,year,id,omvantDT,Inpatient_Admissiondatetime,Patient_Gender,Patient_Age,text,label
0,43916,2016,1433995,2016-07-27 23:00:00.000,2016-6-30 15:00:00.000,M,63,The first parts of the book are great. The aut...,0.0
1,34491,2016,447902,2016-11-30 14:00:00.000,2016-11-28 06:00:00.000,M,96,This book must have written for me because it ...,0.0
2,22365,2012,290726,2012-05-26 03:00:00.000,2012-5-25 05:00:00.000,M,85,From my 6 year old &#34;this story is SAD!&#34...,1.0
3,21747,2016,282919,2016-05-06 09:00:00.000,2016-4-28 10:00:00.000,F,76,"If you have only one cookbook, this must be th...",0.0
4,41253,2017,971110,2017-06-24 22:00:00.000,2017-6-10 19:00:00.000,M,62,"Super basic, not worth the time. Not even for ...",1.0
...,...,...,...,...,...,...,...,...,...
1444,44794,2016,1718627,2016-12-13 05:00:00.000,2016-11-19 03:00:00.000,M,86,I went to high school (briefly) with the autho...,0.0
1445,45188,2016,1896447,2016-12-01 14:00:00.000,2016-6-11 02:00:00.000,F,88,A typical NT Wright approach to a topic such a...,0.0
1446,45408,2016,1998008,2016-10-09 07:00:00.000,2016-5-14 01:00:00.000,M,80,Excellent book. This was so well written for b...,0.0
1447,41029,2016,931642,2016-07-15 16:00:00.000,2016-5-20 14:00:00.000,M,80,"This math book offers a creative, simple, and ...",0.0


## Show content from CSV's

In [57]:
pd.set_option('display.width', 5000)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.expand_frame_repr', False)


def view_all_patient_journal_notes(df, patient_id, show_only_labeled=True):
    if show_only_labeled:
        df = df.dropna()
    patient_notes = df["Patient_ID"] == patient_id
    return df[patient_notes][['Patient_ID', 'omvtext_concat', 'Class_2016']]


def view_patient_journal_note(df, journal_id, show_only_labeled=True):
    if show_only_labeled:
        df = df.dropna()
    patient_notes = df["id"] == journal_id
    return df[patient_notes][['id', 'omvtext_concat', 'Class_2016']]

  pd.set_option('display.max_colwidth', -1)


In [60]:
view_all_patient_journal_notes(dd, patient_id=43916)

Unnamed: 0,Patient_ID,omvtext_concat,Class_2016
1433920,43916,"Disclosure - free sample provided.<br /><br />This is the third book in the series that we have received. We have loved every single one of these fun sticker and activity books. The Princess Grace Sticker and Activity Book was just as much fun as the other's. These are books loaded with stickers, and fun activities to keep your child entertained. They are great for traveling, or great for a quiet activity at home.<br /><br />This book is about Princess Grace, but it also includes the other princesses too. Joy, Faith, Hope, and Charity, are featured in this book. This book includes beautifully illustrated color pictures that my daughter really enjoyed. The book has activities like finding the stickers to complete the picture. The book also has a word search that has your child look for Grace's favorite things. Their are also pages to color. My daughter loved the coloring page with Grace and the kitties. The book has a page where your child gets to draw their own vase full of beautiful flowers.<br /><br />These are just some of the great activities that you will find in the Princess Grace Sticker and Activity book.<br />Disclosure of Material Connection: I received this book free from the publisher through the BookLook Bloggers book review bloggers program. I was not required to write a positive review. The opinions I have expressed are my own. I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255 “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”",0.0
1433921,43916,Perfect condition no odd smells Thank you,0.0
1433922,43916,"Nicely packaged. It's about 50/50 with good verses not so great illustrations. In my opinion some of the postcards are boring and not colorful, but that's just my opinion. There are many illustrations in here that are great!",0.0
1433923,43916,Ms. Scranton's book shed some seriously needed light on the deep dark secret world of quick books add-on integration. I was always very intimidated by the prospect of making all this stuff work together. Ms. Scranton's book make it seem very possible that I could actually pull it off. That is not even the best part. As a result of reading the book I contacted Ms. Scranton for some extra advice. She is brilliant! Her understanding and knowledge of quick books and its add-ons is remarkable. All I can say is read this book and call her. You won't be sorry.,0.0
1433924,43916,"I bought this book as a present for my Australian wife. I always told her that the old translation we had by Khairallah was missing many verses and tried to translate it for her myself. When I saw this book on Amazon, I was so happy to read: &#34;complete&#34; and &#34;accurate&#34;. So I bought it and was so happy to see that they included all of Gibran's original illustrations in high quality. Also, the translation is SPOT ON and beautifully written! I found myself comparing the Arabic in the back of the book to the corresponding English verse. I think this deserves to be acknowledged as a great work in English poetry by itself and not just as a fantastic translation. I highly, highly recommend it.",0.0
...,...,...,...
1434012,43916,"When you study Malcolm X, and follow the tracks of his life, and his dedication to the truth, he was a definite threat to the Nation of Islam, and the United States government. He truly wanted to solve the race issue in America, which he connected to the World. Since his death, the current batch of &#34;civil rights&#34; leaders have done nothing like he did. That to me is the true test of them. They have become soft and comfortable, and are not willing to put themselves out for anyone,much less another Black person, no matter what they say.As Malcolm said &#34;when they drop those dollars on you, your soul goes.&#34; A good book to have which shows what kind of &#34;democracy&#34; America is, and what happens if you speak out about America.",0.0
1434013,43916,"This book was a more detailed, entertaining, blunt version of &#34;the rules.&#34; Provided a lot of relevant and useful information. I read through it at least twice a year. Loved it.",0.0
1434014,43916,I found this book very informative and very useful. It will remain on my library shelf for future reference. Well written and concise.,0.0
1434015,43916,"Excellent introductory Book. After reading this book, you will understand mainframe basics with ease.",0.0


In [62]:
view_patient_journal_note(dd, journal_id=1433995)

Unnamed: 0,id,omvtext_concat,Class_2016
1433995,1433995,"The first parts of the book are great. The authors combine marketing with neuroscience very well. Where other books mostly describe what works, this book also explains why the brain reacts like this. Unfortunately, the last part of the book was more general about how brands are perceived by consumers. If you are a marketer, you will not learn much from this. Another minor point is that it is repetitive. Not only in what they discuss, but also the examples. Some examples are multiple times described in the exact same detail (for example: Colruyt, Coca-Cola, Nike).<br /><br />In sum: great book which learns marketers more about why the brain works the way it does. Only slightly repetitive and the last part is about general marketing knowledge. I would recommend this book for the first parts.",0.0


### Show all entries containing a specific word
From inspecting the data and visualizing the words the model found importaint, we can plot which entries contain words.   
The word 'ramla' (Swedish for "fallen/falling") was very importaint in identifying fall injuries. 

In [84]:
# Find rows containing string
chunk[chunk['omvtext_concat'].str.contains("fall", case=False)].head()

Unnamed: 0,omv_pk,Patient_ID,Inpatient_Admissiondatetime,Inpatient_Departure,omvantDT,sokkod,omvtext_concat,Class_2016,Patient_Gender,Patient_Age
84,2706_automation,6,2014-8-17 18:00:00.000,2014-08-25 17:00:00.000,2014-08-23 21:00:00.000,automation,"Pros<br />Artwork is stunningly beautiful, light, positive, colorful, one of the best whimsical fantasy deck out in the market. The images feels other worldly sci fi like I dropped into the most glorious dream<br />No dark scary imaginary<br />Beautiful fairies and mermaids in the cups. All grouped together by animal/fantasy and colours.<br />Its somewhat divers with people which is a joy to see ( a few men, Asians and Africans) yeah!<br />I have found this deck great for spiritual reads, and path work.<br />Has a quite voice that is profoundly clear with its message, beautiful readings!<br />Close to the RWS but original it's own self and vibe<br />Very intuitive deck<br /><br />Cons<br />Flimsy and card stock. It's greatest down fall in my opinion!<br />I want a deluxe version with better card stock. Shadowscape deck is the thinnest card stock I own and second most used professionl deck.<br />Somewhat strays from the RWS so it's a very intuitive deck so depends on what you prefer or like. (Could be a benefit)<br /><br />As the books is quite poetic she has a tendency to ramble about on about nothing, her book lacks depth.<br /><br />Stephanie Pui Mon Law is an incredible fantasy artist. I would prefer higher quality materials as you never know when the deck may stop printing in my opinion. And to me it's an insult to the artist material. Please do a deluxe version!<br />My deck already has a bend to it and looks scratched. I look after all my decks. Paper the thinnest I've ever owned and the silver boarders can scratch off so be careful with this. The artwork is finely detailed and extremely beautiful I can understand that people find this deck difficult to see and even with my good eye sight. I have a tendency to stare a lot longer then other decks. I wish this came in a deluxe slightly bigger size lets pressure the artist he he!",,F,88
108,9819_tunes,9,2010-3-10 20:00:00.000,2010-03-16 22:00:00.000,2010-03-11 06:00:00.000,tunes,"Wait no, it's done. What about Kadesh? Did he survive? Does Horeb find Jayden? Will Jayden be reunited with either sister? I have so many questions. At first i was upset with the ending until i learned it's the first book of the series. I will definitely be reading the second book, I can't wait to read what happens next.<br /><br />The story follows Jayden, a desert girl from Mesopotamian. She was betrothed to her cousin and future king of her tribe Horeb (a selfish and power hungry man). Life with Horeb would bring Jayden and her family wealth and power but she does not love or even like Horeb. She falls in love with the generous and kind hearted stranger Kadesh, and realizes that she has to somehow get out of her betrothal to Horeb (an impossible task). I can understand why her father wouldn't break the wedding contract to Horeb, but he was acting out of desperation and did not have his daughter's best interest at heart.<br /><br />Jayden loses everything - her family, her tribe and her home. She goes through so much but still remains strong and tries to hold her family together.<br /><br />The author developed the characters well and was very descriptive without being boring, I like her writing style. The story was heartbreaking and at times upsetting but sprinkled with hope. I really enjoyed this book, I got hooked immediately and couldn't put it down.",,F,88
190,44940_harry,15,2017-10-4 07:00:00.000,2017-10-09 13:00:00.000,2017-10-07 01:00:00.000,harry,"The market is just saturated with step-brother romances right now and I have read quite a few of them. Maybe it's a fascination with the topic (I once was someone's step-sister) or the fact that although it seems to be a forbidden thing, it really is just a matter of two people living in close proximity (not by their choice) when their parents happen to marry, and hormones just take over.<br />I really enjoy Sabrina Paige's writing, having read almost all of her series so far, and with TOOL, she did not disappoint me (having already tackled the subject matter in Prick) and it is her way of telling a story with multiple characters full of different personalities, that sets hers apart from the rest.<br />Gaige &#34;almost&#34; seemed like he would be such a hard man to like with no redeeming qualities whatsoever when we first meet him and with his history of racing, wild partying and women galore, it is no wonder Delaney just wants to keep her distance. But sometimes attraction is too hard to resist and when these two are thrown together for work and forced to see another side of each other...they can't resist the temptation. A jealous rival and co-worker, some less than stellar examples of motherly love and colorful friends make up some of the issues they must face and also provide some interesting conflicts and comic relief.<br />I liked this book so much that I read it straight through because unlike some books, I really really wanted them to get an HEA with a realistic outcome and not just a bunch of wham-bam hot sex with no discussion on the morality issues or society's reaction to their coupling.<br />Recommend for romance readers who like some spicy with their sweet.<br />I received a free copy of this book for an honest review.<br /><br />***** 4.5 ***** &#34;falling for the one you can't resist&#34; stars",,M,71
238,46626_immigrants,19,2017-8-15 06:00:00.000,2017-08-23 09:00:00.000,2017-08-18 02:00:00.000,immigrants,"This is an honest, transparent, and interesting story that escapes the rhetoric of either side and helps the reader see the obvious flaws in both our draconian laws regarding marijuana possession/use as well the pitfalls of legalization. Like the author, I also weigh in on the side of legalization, knowing full well as he so thoughtfully reminds us, that marijuana is not entirely harmless. &#34;Marijuana Nation&#34; really is a must read if you are at all interested in understanding the finer points of a drug surrounded by mythology.",,M,82
358,16980_tunes,30,2016-1-3 16:00:00.000,2016-01-06 10:00:00.000,2016-01-06 01:00:00.000,tunes,This book is a great read! It's something you want don't want to put down once you start reading it. You'll fall in love with these characters right away. I'd definitely recommend this book to a friend (and already told both my roommates they need to read it).,0.0,M,83


In [85]:
# Now for creating the other, second, dataset

# Dataset for prediction future fall injuries

When predicting if a patient will have a fall injury in the future or not, we do the following
1. Group journal entries to each patient
2. Use all of the journal text, except for the very last entry, as the journal text. But if a fall injury occured, then stop include only the entries up to that point
3. Predict based on all previous data about the patient (journal text) if the next entry will be a fall injury or not
4. To capture how long the patient has stayed, there are multiple different ways. We chose to create a separate `time` column for that and corresponds to the number of hours in total the patient has stayed in the hospital.  

#### Know limitiaiton with this approach:
1. Currently only includes journal entries up to the first fall injury, but patients that was fallen once could probably be more likely to fall in the future. But those cases are not counted here. In the original dataset, from 302 total recorded fall injuries in 2016, by only including the first fall injury for a patient, the number of fall injuries went down to 244
2. Since a patient may fall very early on in their stay at the hospital, and we, therefore, only include those early journal entries, there could be an issue with highly imbalanced datasets in terms of how many journal entries a patient has in total when predicting future fall injuries.

In [79]:
# original 191007 
# with annoated time 173055 and looses 2 positive examples
# Also, patients are still there after fall injurie, so we remove the text after the fall and stop to only predict the fall

# Takes us down to 244 injuries and 5508 unique patients. Reduction from 300 to 244 is because some fall more often
ddd = dd[(dd.Class_2016 == 1.0) | (dd.Class_2016 ==0.0)].dropna()
ddd = ddd.groupby(by=["Patient_ID"])

text_list = []
pat_id_list = []
time_list = []
label_list = []
pat_gender_list = []
pat_age_list = []

In [80]:
# For every patient and the journal entries
for pat_id, entry in ddd:
    
    # Check for all journal entries from 2016, 
    # __count__ how many cumulative journal entries should the patient have
    # Use all but the last journal text for the patient
    # For each of the journal entries for a patient, check if the current one is a fall injury
    patient_journals_cummulative = []
    journal_entries_2016 = entry.Class_2016.values
    for label_ in journal_entries_2016:
        label_ = int(label_)
        patient_journals_cummulative.append(label_)
        # This means that we might loose some falls since only checks if it is the first fall
        if label_ == 1.0:
            break
            
    # Store what the label of all the notes for the patient.
    # 1 if the patient will eventually have a fall injury, 0 otherwise.
    label_list.append(patient_journals_cummulative[-1])
        
    # Use all but the very last entry or up to the fall injury journal note for each patient
    idx_last_entry = len(patient_journals_cummulative)-1
    pat_id_list.append(pat_id)
    text_list.append(" ".join(entry.omvtext_concat.values[:idx_last_entry+1]))
    
    # Set age and gender
    pat_gender_list.append(0 if entry.Patient_Gender.values[-1] == "F" else 1)
    pat_age_list.append(int(entry.Patient_Age.values[-1]))

    # Get how many hours in total the patient spent in the hospital
    last_journal_time = entry.omvantDT.values[idx_last_entry]
    last_journal_time = str(last_journal_time).split(".")[0]
    last_journal_time = datetime.datetime.strptime(last_journal_time, '%Y-%m-%d %H:%M:%S')

    init_journal_time = entry.Inpatient_Admissiondatetime.values[idx_last_entry]
    init_journal_time = init_journal_time.split(".")[0]
    init_journal_time = datetime.datetime.strptime(init_journal_time, '%Y-%m-%d %H:%M:%S')
    time_hours = int((last_journal_time - init_journal_time).total_seconds() // 3600)
    
    # Known issue, one of the times is -1
    time_list.append(time_hours)

    
ddf['id'] = pat_id_list
ddf['text'] = text_list
ddf['time'] = time_list
ddf['label'] = label_list
ddf['gender'] = pat_gender_list
ddf['age'] = pat_age_list

ddf.to_csv("../data/df_to_predict_fall_injuries.csv", index=False)

In [82]:
ddf.head()

Unnamed: 0,id,text,time,label,gender,age
0,21,"Perfect! Looked and looked for this book. Engaging and super for close reading for details with kids. This is a great start for primary student animal reports. Bought 4 different packs (numbers, colors & shapes, Alphabet and first words) for grandchild. Will give to her as little gifts to have for travel and something we can play together. As good as it gets. Simple. Clear. Works. Clearly does not have any idea what he is writing about. Seems to be focused on grapes in an almost fetishistic manner, and has no idea what a hat looks like. Kendall Uyeji is a new mind in the world of modern writing, which causes fluctuations of trust to make the reader feel bad. Levin does a wonderful job of introducing you to the characters and how their personal lives provide angst and motivation to do their difficult jobs. The interplay between the characters provides a wonderful web to the horrific mystery they are called on to solve. The book is realistic, fun, thought provoking and moves at a wonderful pace. I felt like I knew the characters and places they lived and worked. The most touching part of the story remains between the main Character Preuss and his special needs child Toby. The tenderness, understanding and deep parental love seems to provide Pruess a deep centering in his difficult chaotic life. I so enjoyed how Levin takes the reader on a journey of what everyday life is like--the ugliness, the petty bickering, the frustrations, the hopes, the joys and the willingness to sacrifice. A wonderful book! a quite story line of people holding hands. its about 100 b/w photographs of people holding hands. a number of them will put happy tears in your eyes. very refreshing it goes back when you are young forward when you get older and when you are still in between life. again very refreshing.",67,0,1,67
1,22,"Good variety Easy to hard. Stolz (1997, 2010) is the reason why I am doing my dissertation on adversity quotient. The CORE dimensions as well as the elements adversity quotient (AQ) are invaluable to anyone who is alive and works.",23,0,0,84
2,24,"great way to teach kids classical music through story telling Good message. A bit simplistic, but a fun read for sales people. It can be a great training tool. I haven't had time to read it, but I look forward to it. Arrived with no problems. Thank you. Perhaps I can say more later. Great gift for the F1 fan. All of these thing are Christmas presents. I cannot tell you much of aything about how well yhey were received. The dentist loved the book and was looking forward to using it for her classes. Whayt I need to know Last year I picked up THREAT VECTOR on a whim. It was the first Clancy book I'd read in ages. It was also the first novel I'd read that was written by Mark Greaney. I absolutely loved it. After reading THREAT VECTOR I went and bought all of Greaney's solo Gray Man novels. I read the four Gray Man books over the course of a month. I decided to wait until COMMAND AUTHORITY came out in mass market paperback before reading it -- I have a nice hardback for the collection but I hate lugging around those thick hardcovers when I do most of my reading on the bus. I picked up another copy of COMMAND AUTHORITY when it was released in paperback and devoured it in a week. COMMAND AUTHORITY continues the standard of excellence I have come to expect of Greaney's writing. It comes across as a very timely read given the events in Ukraine over the past year. COMMAND AUTHORITY is fast paced and well plotted. I felt that the inclusion of the Jack Ryan Sr. murder investigation subplot 30 years in the past was a nice touch. The Russian invasion of Ukraine and everything leading up to it is thrilling. It took a hundred or so pages for me to really get into the book but once I was hooked it was impossible to put down.<br /><br />I know there are a still a lot of people complaining about Greaney writing under the Clancy name now that Tom is dead but it is also apparent that there are a lot more people that support Greaney and will continue to do so. I believe that Greaney has done an honor to the Clancy franchise and I hope he keeps writing about Jack Ryan Sr/Jr for a long time to come. Good book with decent coverage of functional programming in F# and the functional capabilities added to C#. The book is well written, and easy to understand. The book teaches dog training using positive reinforcement, and compassion. The instructions are clear and easy to follow. very good and accurate 90% of the times, worst reading Great book for making me think about unique business ideas. Really good info about the value of building a following. A fun story balanced with solid how-to material. What a wonderful book! Fabulous characters, beautiful illustrations and an incredible story full of great lessons! A book I must have close by.",131,0,1,72
3,28,"This is a fun read on a fun topic by an interesting author. Thanks to Mr. Dykes for the clear translation! Great book, easy read and very enjoyable story line interesting read on how big parma manipulates us to spend alot of money so they have huge profits Speedy delivery with no mistakes. The book is fabulous! The cover was pretty worn around the edges, but I'd much rather buy used. Thanks This the LXX and official Orthodox Greek New Testament together. It's very hard to come by this, and for this price you can't beat it.<br /><br />I have one significant gripe: the font. Otherwise this would be a full 5-star. It may not be this way for others, but I find the font very, very difficult to read. There are smaller fonts that are much easier to read (I have a pocket GNT with a very small font I can read). I can only read this in the brightest of lights. Obviously, this is not true for other reviewers.<br /><br />Overall, though, this is a steal. This is a great book that keeps the ideas simple and easy to consume. A lot of great tips, pointers, reminders and actionable ideas anyone can incorporate!",89,0,0,56
4,29,"one of my mainstay devotionals This book is really fun. I have to keep it hidden do the older grand kids won't see it and want to color themselves. My one improvement suggestion is to perforate the pages so they can be easily removed. We take them out one at a time, put something underneath (because our markers bleed through), and color away. Thanks, Walter, for many hours of beautiful fun. Takes awhile to put all his messages together about compassion and people. Second read is necessary but it is very thought provoking and good. Timely delivery! Just as advertised. Samuel Renihan has done a great service to the church: he has taken what has been misunderstood and neglected (the confessional statement &#34;God...without passions&#34;), and has ably and graciously explained it. In what is a most enjoyable and clear treatment of the doctrine of divine impassibility, Samuel Renihan has shown its exegetical and theological foundation. As well, he generously sprinkles quotes from older writers which demonstrate the doctrine's historical attestation in the church. Finally, there is a practical and pastoral section that shows what ought to be obvious: those who neglect theology proper have removed a vital element that is necessary for genuine Christian living. In my estimation, the most important aspect of this book, however, is the demonstration of the theological method involved at arriving at the biblical doctrine. May God use this work to further inform His people concerning this most important, and most blessed truth. Heled me get through a difficult break up.... Delivered on time what a good book.tapping and pain relief go hand in hand well done nick This book is sweet from cover to cover. We read it to our 15-month-old son a few times each week and it is clear he likes both the repetition and the simple but expressive pictures. Lovely. I love this book! Olivia is a great author and I cannot wait until her next one. I love Hattie Green. I couldn't put the book down. I am recommending this book to all my friends and family!!! This book is a great follow up to Unbroken. A more personal account, written by Louis. Dr. Wolfe baldly lays out what food is for us, especially its purpose as fuel and the many herbs and spices that also play a vital function in healing and caring for this amazing enterprise we call our body. I could never test all she claims, but i do know she is a straight shooter, and I am now looking at the food I eat with new fascination and interest. Excellent series I would recommend to anyone JUST THE INFO I WAS LOOKING FOR.",117,0,0,76
