Problem Statement
Recent Covid-19 Pandemic has raised alarms over one of the most overlooked area to focus: Healthcare Management. While healthcare management has various use cases for using data science, patient length of stay is one critical parameter to observe and predict if one wants to improve the efficiency of the healthcare management in a hospital. 
This parameter helps hospitals to identify patients of high LOS risk (patients who will stay longer) at the time of admission. Once identified, patients with high LOS risk can have their treatment plan optimized to miminize LOS and lower the chance of staff/visitor infection. Also, prior knowledge of LOS can aid in logistics such as room and bed allocation planning.
Suppose you have been hired as Data Scientist of HealthMan – a not for profit organization dedicated to manage the functioning of Hospitals in a professional and optimal manner.
The task is to accurately predict the Length of Stay for each patient on case by case basis so that the Hospitals can use this information for optimal resource allocation and better functioning. The length of stay is divided into 11 different classes ranging from 0-10 days to more than 100 days.

Data Description
traindata.csv – File containing features related to patient, hospital and Length of stay on case basis traindata_dictonary.csv – File containing the information of the features in train file

Test Set
testdata.csv – File containing features related to patient, hospital. Need to predict the Length of stay for each caseid

Sample Submission:

case_id: Unique id for each case

Stay: Length of stay for the patient w.r.t each case id in test data

Evaluation Metric
The evaluation metric for this hackathon is 100*Accuracy Score.

Acknowledgements
More details can be found on Analytics Vidhya website who conducted the hackathon.
https://datahack.analyticsvidhya.com/contest/janatahack-healthcare-analytics-ii/#ProblemStatement

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.metrics import mean_absolute_error
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from xgboost import XGBRegressor

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

sample_sub_filepath = "../input/av-healthcare-analytics-ii/healthcare/sample_sub.csv"
train_data_dictionary_filepath = "../input/av-healthcare-analytics-ii/healthcare/train_data_dictionary.csv"
train_data_filepath = "../input/av-healthcare-analytics-ii/healthcare/train_data.csv"
test_data_filepath = "../input/av-healthcare-analytics-ii/healthcare/test_data.csv"

sample_sub = pd.read_csv(sample_sub_filepath)
train_data_dictionary = pd.read_csv(train_data_dictionary_filepath)
train_data = pd.read_csv(train_data_filepath)
test_data = pd.read_csv(test_data_filepath)

# Determine objective in training data
train_y = train_data.Stay

# Remove unnecessary columns in train & test
features = ["Type of Admission","Severity of Illness","Age"]
train_X = train_data[features]
val_X = test_data[features]

# Label encoding (X)
    # Get relevant columns
label_encoder_features = ["Severity of Illness", "Age"]
    # Make copy to preserve original data
label_train_X = train_X.copy()
label_val_X = val_X.copy()
label_train_X = label_train_X[label_encoder_features]
label_val_X = label_val_X[label_encoder_features]
    # Label encode
label_encoder=LabelEncoder()
for col in label_encoder_features:
    label_train_X[col]=label_encoder.fit_transform(label_train_X[col])
    label_val_X[col]=label_encoder.transform(label_val_X[col])

# Label encoding (y)
label_train_y = train_y.copy()
label_train_y = label_encoder.fit_transform(label_train_y)

# One hot encoding (X)
    # Get relevant columns
one_hot_encoder_features = ["Type of Admission"]
    # Make copy to preserve original data
one_hot_train_X = train_X.copy()
one_hot_val_X = val_X.copy()
one_hot_train_X = one_hot_train_X[one_hot_encoder_features]
one_hot_val_X = one_hot_val_X[one_hot_encoder_features]
    # One hot encode 
one_hot_encoder = OneHotEncoder(sparse=False)
one_hot_train_X = pd.DataFrame(one_hot_encoder.fit_transform(one_hot_train_X))
one_hot_val_X = pd.DataFrame(one_hot_encoder.transform(one_hot_val_X))
    # Get back columns names 
one_hot_train_X.columns = one_hot_encoder.get_feature_names(one_hot_encoder_features)
one_hot_val_X.columns = one_hot_encoder.get_feature_names(one_hot_encoder_features)

# Concatenate label and one hot encoding
concat_train_X = pd.concat([label_train_X,one_hot_train_X],axis=1)
concat_val_X = pd.concat([label_val_X,one_hot_val_X],axis=1)

# Make predictions using the XGB model
XGB_model = XGBRegressor(n_estimators=500)
XGB_model.fit(concat_train_X, label_train_y)

# Undo label encoding for predictions
predictions = XGB_model.predict(concat_val_X).round()
predictions = list(label_encoder.inverse_transform(predictions.astype(int)))
predictions = np.array(predictions)

# Save predictions to file
output = pd.DataFrame( {"case_id" : test_data["case_id"],
                        "Stay": predictions})
output.to_csv('submission.csv', index=False)

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/av-healthcare-analytics-ii/healthcare/sample_sub.csv
/kaggle/input/av-healthcare-analytics-ii/healthcare/train_data_dictionary.csv
/kaggle/input/av-healthcare-analytics-ii/healthcare/train_data.csv
/kaggle/input/av-healthcare-analytics-ii/healthcare/test_data.csv
