## BloodPulse: Anticipating Blood Donation Trends.
In the ever-evolving landscape of healthcare optimization, a new initiative emerges from weeks of dedicated exploration and learning. Rooted in the rich soil of historical donor data, this project sets out to pioneer a predictive model for forecasting blood donor participation in March 2007. The quest is clear – to revolutionize blood bank resource allocation, diminish shortages, and optimize outreach strategies.

As the protagonist in this narrative of healthcare innovation, you, the visionary data scientist, stand at the forefront of this mission. Armed with a deep understanding of the intricacies surrounding blood donation patterns, your role is pivotal. The objective is not merely to predict donor participation but to reshape the landscape of blood bank operations, making a tangible impact on the lives that depend on this critical resource.

Much like a guardian of community health, you navigate through extensive datasets, blending quantitative acumen with qualitative insight. Your analytical skills become the guiding compass for healthcare professionals, uncovering patterns and correlations that might have eluded traditional approaches. In collaboration with blood bank experts, you craft a predictive model that transcends conventional methods, offering foresight into donor behaviors.

This model becomes a beacon of hope for blood banks, not only predicting participation but allowing for proactive resource planning and targeted outreach. Your commitment extends beyond lines of code; you are a maestro orchestrating a symphony of blood donation data, transforming it into a harmonious melody of insights that resonates through the blood donation centers.

Your work is not just about algorithms; it's about making a real impact on the lives of those who rely on the generosity of blood donors. Your predictive model stands as a guardian, vigilant in its mission to ensure that blood banks are well-equipped, shortages are minimized, and outreach efforts are optimized for maximum impact.

In the saga of healthcare transformation, you, once again, emerge as the unsung hero. Your dedication to unraveling the complexities of blood donor participation through predictive modeling contributes not only to the success of the project but also to a narrative of improved healthcare, where the power of data-driven insights becomes a formidable ally in ensuring a stable and efficient blood supply for those in need.


In [83]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import *
from sklearn.linear_model import *

## Module 1
### Task 1: Unveiling the Bloodline Chronicles.
In the realm of healthcare foresight, a new chapter begins as 'training_data.csv' is summoned into existence. This code isn't just a file read; it's a journey into the bloodline chronicles, where columns like 'MonLastDon' and 'TotVolDon' weave a narrative of donor patterns. With each renamed column, the code becomes an architect, constructing a foundation for a predictive model. In these lines, we embark on a quest to understand the intricacies of blood donation, where data becomes the key to unlocking insights that may reshape the landscape of donor participation.

In [84]:
train_df = pd.read_csv("training_data.csv")
train_df.columns = ["MonLastDon","NoDon","TotVolDon","MonFirstDon","DonMar2007"]
train_df

Unnamed: 0,MonLastDon,NoDon,TotVolDon,MonFirstDon,DonMar2007
0,2,50,12500,98,1
1,0,13,3250,28,1
2,1,16,4000,35,1
3,2,20,5000,45,1
4,1,24,6000,77,0
...,...,...,...,...,...
571,23,1,250,23,0
572,16,3,750,86,0
573,21,2,500,52,0
574,39,1,250,39,0


### Task 2: The Prelude to Predictive Symphony.
In the orchestration of healthcare foresight, 'test_data.csv' emerges as the prelude, setting the stage for a predictive symphony. This code isn't just a data import; it's the overture to a narrative where each entry becomes a note, and every variable a protagonist in the tale of blood donor prediction. With each line executed, the code transforms raw data into the protagonist's script, foreshadowing the predictive journey that awaits. In these lines, the test data becomes a canvas, ready to be painted with insights that may redefine the landscape of blood bank operations.

In [85]:
test_df = pd.read_csv("test_data.csv")
test_df

Unnamed: 0,MonLastDon,NoDon,MonFirstDon,AveDonPerPeriod
0,2,12,52,0.692308
1,21,7,38,0.552632
2,4,1,4,0.750000
3,11,11,38,0.868421
4,4,12,34,1.058824
...,...,...,...,...
111,11,9,33,0.818182
112,16,6,40,0.450000
113,16,3,19,0.473684
114,8,15,77,0.584416


### Task 3: Untangling Threads in the Bloodline Tapestry.
In the intricate tapestry of blood donor data, 'duplicates_train' steps forward as the detective, untangling threads within 'training_data.csv.' This code isn't merely counting duplications; it's a quest for clarity, ensuring the integrity of our dataset. With each duplicated entry counted, the code becomes a guardian, preserving the purity of the bloodline narrative. In these lines, we embark on a journey to sift through the intricacies, where duplications are not just numbers but potential distortions in the symphony of insights we aim to unravel.


In [86]:
no_of_train_duplicates = train_df.duplicated().sum()
no_of_train_duplicates

153

### Task 4: Illuminating Gaps in the Blood Donation Mosaic.
As we journey through the mosaic of blood donation data, 'null_values_train' becomes the torchbearer, illuminating gaps within 'training_data.csv.' This code isn't just a numeric summary; it's an artist's palette, pointing to areas where the canvas of donor insights is incomplete. With each null value counted, the code transforms into a brushstroke, guiding us to fill the voids and complete the picture of donor behaviors. In these lines, we embark on a creative journey where null values aren't gaps but opportunities to enhance the richness of our predictive narrative.

In [87]:
no_of_train_na = train_df.isnull().sum()
no_of_train_na

MonLastDon     0
NoDon          0
TotVolDon      0
MonFirstDon    0
DonMar2007     0
dtype: int64

### Task 5: Echoes in the Uncharted Blood Donor Territory.
In the symphony of blood donation exploration, 'duplicates_test' takes the stage, counting echoes within the uncharted territory of 'test_data.csv.' This code isn't just a duplication check; it's a sonar navigating unexplored waters, where echoes of redundancy may disrupt the harmony of predictive insights. With each duplicated entry counted, the code becomes a vigilant explorer, ensuring the integrity of our test dataset. In these lines, we unveil echoes in the unknown, allowing us to tread carefully and navigate the uncharted blood donor frontiers with a dataset that resonates with precision.

In [88]:
no_of_test_duplicates = test_df.duplicated().sum()
no_of_test_duplicates

0

### Task 6: Unveiling Blank Spaces in the Blood Donation Canvas.
In the grand canvas of blood donation insights, 'null_values_test' emerges as a torchbearer, illuminating the blank spaces within 'test_data.csv.' This code isn't just a numeric summary; it's an artist's palette pointing to areas where the masterpiece of donor behaviors is incomplete. With each null value counted, the code transforms into a brushstroke, guiding us to fill the voids and complete the painting of predictive insights. In these lines, we embark on a creative journey where null values aren't gaps but opportunities to enhance the richness of our predictive narrative.

In [89]:
no_of_test_na = test_df.isnull().sum()
no_of_test_na

MonLastDon         0
NoDon              0
MonFirstDon        0
AveDonPerPeriod    0
dtype: int64

## Module 2
### Task 1: Harmonizing Blood Donation Metrics.
In the symphony of blood donation metrics, a new melody emerges as 'ratio_totno' takes center stage, calculating the harmony between total volume and number of donations in 'training_data.csv.' This code isn't just about ratios; it's a conductor orchestrating a composition where each entry becomes a note in the melody of donor patterns. With each calculated ratio, the code transforms into a maestro, revealing insights into the nuanced relationship between volume and frequency of donations. In these lines, we delve into the intricacies of blood donation, where ratios become a key to deciphering the symphony of donor behaviors.

In [90]:
train_df["ratio_totno"] = train_df["TotVolDon"] / train_df["NoDon"]
train_df

Unnamed: 0,MonLastDon,NoDon,TotVolDon,MonFirstDon,DonMar2007,ratio_totno
0,2,50,12500,98,1,250.0
1,0,13,3250,28,1,250.0
2,1,16,4000,35,1,250.0
3,2,20,5000,45,1,250.0
4,1,24,6000,77,0,250.0
...,...,...,...,...,...,...
571,23,1,250,23,0,250.0
572,16,3,750,86,0,250.0
573,21,2,500,52,0,250.0
574,39,1,250,39,0,250.0


### Task 2: Sculpting Precision in Blood Donation Insights.
In the evolution of blood donation insights, a transformative act unfolds as 'TotVolDon' is sculpted away from 'training_data.csv.' This code isn't just about dropping a column; it's a chisel refining the intricacies of our dataset. With each line executed, the code becomes an artisan, shaping a narrative where the total volume of donations no longer plays a role in our predictive journey. In these lines, we witness not just a deletion but a refinement, where the precision of our model is honed by strategic sculpting of the dataset.

In [91]:
train_df.drop("TotVolDon",axis=1,inplace=True)
train_df

Unnamed: 0,MonLastDon,NoDon,MonFirstDon,DonMar2007,ratio_totno
0,2,50,98,1,250.0
1,0,13,28,1,250.0
2,1,16,35,1,250.0
3,2,20,45,1,250.0
4,1,24,77,0,250.0
...,...,...,...,...,...
571,23,1,23,0,250.0
572,16,3,86,0,250.0
573,21,2,52,0,250.0
574,39,1,39,0,250.0


### Task 3: Crafting the Symphony of Blood Donation Features.
In the orchestration of predictive insights, a symphony is crafted as features are refined in 'training_data.csv.' This code isn't just about splitting data; it's a composition where 'MonFirstDon' is transformed into 'AveDonPerPeriod.' With each calculated average, the code becomes a conductor orchestrating the harmonious blend of features. In these lines, the dataset metamorphoses into a symphony of insights, and the train-test split becomes the stage for our predictive model to unfold.

In [92]:
lastcoltarget = train_df["DonMar2007"].copy()
train_df.drop("DonMar2007",axis=1,inplace=True)
no_period_first_donation = train_df["MonFirstDon"] / 3
# One period = 3 months
avg_don_per_period = train_df["NoDon"] / no_period_first_donation
train_df.insert(loc=3,column="AveDonPerPeriod",value=avg_don_per_period)
X = train_df.copy()
y = lastcoltarget.copy()

In [94]:
X

Unnamed: 0,MonLastDon,NoDon,MonFirstDon,AveDonPerPeriod,ratio_totno
0,2,50,98,1.530612,250.0
1,0,13,28,1.392857,250.0
2,1,16,35,1.371429,250.0
3,2,20,45,1.333333,250.0
4,1,24,77,0.935065,250.0
...,...,...,...,...,...
571,23,1,23,0.130435,250.0
572,16,3,86,0.104651,250.0
573,21,2,52,0.115385,250.0
574,39,1,39,0.076923,250.0


In [95]:
y

0      1
1      1
2      1
3      1
4      0
      ..
571    0
572    0
573    0
574    0
575    0
Name: DonMar2007, Length: 576, dtype: int64

In [96]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=(20/100),random_state=42)

## Module 3
### Task 1: Precision Unveiled: The Logistic Symphony.
In the realm of predictive prowess, a logistic symphony unfolds as 'LogisticRegression' takes the stage. This code isn't just about fitting a model; it's a performance where the maestro, 'logistic_model,' conducts precision into the harmonies of blood donation predictions. With each prediction, the code becomes a storyteller, foretelling the accuracy of our logistic regression model. In these lines, we witness not just an accuracy score but the culmination of a predictive journey where every variable played its role in the symphony of precision.

In [98]:
logistic_model = LogisticRegression()
logistic_model.fit(X_train,y_train)
y_pred = logistic_model.predict(X_test)
model_accuracy = round(accuracy_score(y_test,y_pred),2)
model_accuracy

0.76

### Task 2: The Prediction Ensemble: Reality vs. Forecast.
In the climactic finale of our predictive symphony, 'logistic_model' takes center stage, predicting blood donor participation in the unseen 'test_data.csv.' This code isn't just about making predictions; it's an ensemble performance where reality meets forecast in the 'predictions_df.' With each entry, the code becomes a storyteller, unraveling the harmony between actual and predicted outcomes. In these lines, we witness not just data frames but a narrative of how our predictive model fares when faced with the reality of blood donation behavior.

In [99]:
predictions_df = pd.DataFrame(columns=["Actual","Predicted"])
predictions_df["Actual"] = y_test.copy()
predictions_df["Predicted"] = y_pred.copy()

In [100]:
predictions_df

Unnamed: 0,Actual,Predicted
234,0,0
118,0,0
346,0,0
498,0,0
402,1,0
...,...,...
75,1,0
355,1,0
244,0,0
272,0,0


In [106]:
print((model_accuracy * 100),"%")

76.0 %


In [107]:
print(((1 - model_accuracy) * 100),"%")

24.0 %


In [116]:
print(round((predictions_df["Actual"] == predictions_df["Predicted"]).sum() * 100 / len(predictions_df),0),"%")

76.0 %


In [117]:
print(round((predictions_df["Actual"] != predictions_df["Predicted"]).sum() * 100 / len(predictions_df),0),"%")

24.0 %
