### Blood Donation Dataset

In [1]:
# Load the library used for this project
import ucimlrepo

In [2]:
from ucimlrepo import fetch_ucirepo

In [4]:
# Fetch the dataset by unique identifier
blood = fetch_ucirepo(id=176)

In [5]:
# The library has already converted the dataset to a Pandas DataFrame
X = blood.data.features
y = blood.data.target

In [6]:
# Display the metadata of the dataset
print(blood.metadata)

{'uci_id': 176, 'name': 'Blood Transfusion Service Center', 'repository_url': 'https://archive.ics.uci.edu/dataset/176/blood+transfusion+service+center', 'data_url': 'https://archive.ics.uci.edu/static/public/176/data.csv', 'abstract': 'Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. ', 'area': 'Business', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 748, 'num_features': 4, 'feature_types': ['Real'], 'demographics': [], 'target_col': ['Donated_Blood'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 2008, 'last_updated': 'Sat Mar 16 2024', 'dataset_doi': '10.24432/C5GS39', 'creators': ['I-Cheng Yeh'], 'intro_paper': {'title': 'Knowledge discovery on RFM model using Bernoulli sequence', 'authors': 'I. Yeh, K. Yang, Tao-Ming Ting', 'published_in': 'Expert systems with applications', 'year': 2009, 'url': 'https://www.semantics

In [7]:
# Variable information
print(blood.variables)

            name     role     type demographic  \
0        Recency  Feature  Integer        None   
1      Frequency  Feature  Integer        None   
2       Monetary  Feature  Integer        None   
3           Time  Feature  Integer        None   
4  Donated_Blood   Target   Binary        None   

                                         description units missing_values  
0                         months since last donation  None             no  
1                          total number of donations  None             no  
2                        total blood donated in c.c.  None             no  
3                        months since first donation  None             no  
4  whether he/she donated blood in March 2007 (1 ...  None             no  


In [8]:
# Display the full dataset
print(blood.data.original)

     Recency  Frequency  Monetary  Time  Donated_Blood
0          2         50     12500    98              1
1          0         13      3250    28              1
2          1         16      4000    35              1
3          2         20      5000    45              1
4          1         24      6000    77              0
..       ...        ...       ...   ...            ...
743       23          2       500    38              0
744       21          2       500    52              0
745       23          3       750    62              0
746       39          1       250    39              0
747       72          1       250    72              0

[748 rows x 5 columns]


In [9]:
# Display the header row of the dataset
print(blood.data.headers)

Index(['Recency', 'Frequency', 'Monetary', 'Time', 'Donated_Blood'], dtype='object')


In [10]:
# Name of the dataset
print(blood.metadata.name)

Blood Transfusion Service Center


In [11]:
# Get the unique identifier for the blood donation dataset
print(blood.metadata.uci_id)

176


In [12]:
# Get the tag for the blood donation dataset
print(blood.metadata.area)

Business


In [13]:
# Get the characteristics of the blood donation dataset
print(blood.metadata.characteristics)

['Multivariate']


In [14]:
# Get the number of instances in the dataset
print(blood.metadata.num_instances)

748


In [15]:
# Get the number of the features in the dataset
print(blood.metadata.num_features)

4


In [16]:
# Get the target variable
print(blood.metadata.target_col)

['Donated_Blood']


In [17]:
# Find out of they are any missing values
print(blood.metadata.has_missing_values)

no


In [18]:
# when was this dataset curated
print(blood.metadata.year_of_dataset_creation)

2008


In [19]:
# Research paper accompanying this dataset
print(blood.metadata.intro_paper)

{'title': 'Knowledge discovery on RFM model using Bernoulli sequence', 'authors': 'I. Yeh, K. Yang, Tao-Ming Ting', 'published_in': 'Expert systems with applications', 'year': 2009, 'url': 'https://www.semanticscholar.org/paper/c8dc14c5758d3d2fdd28bdb9c833f88b939d8dff', 'doi': '10.1016/j.eswa.2008.07.018'}


In [20]:
# URL to get CSV file
print(blood.metadata.data_url)

https://archive.ics.uci.edu/static/public/176/data.csv


In [21]:
# Any other facets of the data
print(blood.metadata.additional_info.summary)

To demonstrate the RFMTC marketing model (a modified version of RFM), this study adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. To build a FRMTC model, we selected 748 donors at random from the donor database. These 748 donor data, each one included R (Recency - months since last donation), F (Frequency - total number of donation), M (Monetary - total blood donated in c.c.), T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood). 


### Split dataset

In [22]:
from sklearn.model_selection import train_test_split

In [None]:
# prompt: split the blood data objects above it a train & test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Last cell produces an error; which upon further inspection; it occurs when I was unpacking the `X & y` variables. The `y` variable receives a [NoneType] Object, since the UCIMLREPO returns a None for that entity.

This now means, I must fetcht the data directly from the UCI ML Repo. Instead of using the predefined DataFrame.

---