<a href="https://colab.research.google.com/github/AstridLab/Mini-Projects/blob/main/Hospital_LengthOfStay_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hospital Data Analysis

**Goal:** Analyze hospital patient data to understand patterns and predict treatment costs.  
**Tools:** Python, pandas, scikit-learn, matplotlib, seaborn  

## Import Libraries & Upload Dataset

In [None]:

import pandas as pd
import matplotlib.pyplot as plt


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


from google.colab import files


In [None]:
uploaded = files.upload()


## Load Dataset & Inspect

In [None]:
import pandas as pd

df = pd.read_csv("LengthOfStay.csv")


In [None]:
df.info()
df.isnull().sum()


In [None]:

df.fillna(method='ffill', inplace=True)


In [None]:

df['vdate'] = pd.to_datetime(df['vdate'])


## Feature Preparation

In [None]:

features = pd.get_dummies(df[['gender', 'dialysisrenalendstage', 'asthma', 'irondef']], drop_first=True)


df['rcount'] = df['rcount'].replace('5+', '5').astype(float)
target = df['rcount']


features['gender_M'] = features['gender_M'].astype(int)


In [None]:
print(features.head())
print(features.dtypes)



In [None]:
print(df['rcount'].unique())


## Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)


## Train Linear Regression

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)


## Predict & Evaluate

In [None]:
predictions = model.predict(X_test)

mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse:.2f}")


## Visualize Results

In [None]:
plt.scatter(y_test, predictions, alpha=0.5)
plt.plot([y_test.min(), y_test.max()],
         [y_test.min(), y_test.max()],
         color='red', linewidth=2)
plt.xlabel("Actual Length of Stay")
plt.ylabel("Predicted Length of Stay")
plt.title("Actual vs Predicted Length of Stay")
plt.show()


Visual Interpretation:

Most points lie close to the diagonal (y = x), meaning the model predicts length of stay reasonably well. Deviations reflect variability in patient features like gender, asthma, dialysis stage, and iron deficiency.

## Mini Conclusion

Linear Regression provides a reasonable baseline prediction for patient length of stay.

Predictions align fairly well with actual values.

