# Table of Contents

0.1 Import Libraries

0.2 Import Data: all_donors_clean.csv and census_2013-22_clean.csv

0.3 Explore Original Dataframes

0.4 Create keys for merge

0.5 Outer Merge - census and living donors -> dfm

0.6 Outer Merge - dfm and deceased donors -> dfm

0.7 Derive total_donors column

0.8 Export Merged Dataframe: donors-census.pkl



### 0.1 Import Libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

### 0.2 Import Data: all_donors_clean.pkl and census_2013-22_clean.pkl


In [2]:
# Identify the file pathway to data files
path = r'C:\Users\CJ\Documents\_CJ-Stuff\Career Foundry\Data Immersion\Ach 6 - Adv Analytics and Dashboard\Donate Life Project'

In [3]:
# Import data
donors = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'all_donors_clean.pkl'))

In [4]:
# Import data
census = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'census_2013-22_clean.pkl'))

### 0.3 Explore Original Dataframes

In [5]:
donors.shape

(76160, 7)

In [6]:
census.shape

(56000, 6)

These match what we expected (previously exported)

In [7]:
donors.head()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity
0,8,Living,2022,Alabama,18-34 Years,Male,White (Non-Hispanic)
5440,1,Living,2022,Alabama,18-34 Years,Male,Black (Non-Hispanic)
10880,0,Living,2022,Alabama,18-34 Years,Male,Hispanic/Latino
16320,0,Living,2022,Alabama,18-34 Years,Male,Asian (Non-Hispanic)
21760,0,Living,2022,Alabama,18-34 Years,Male,American Indian/Alaska Native (Non-Hispanic)


In [8]:
donors.tail()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity
54399,0,Deceased,2013,Wyoming,50-64 Years,Female,Hispanic/Latino
59839,0,Deceased,2013,Wyoming,50-64 Years,Female,Asian (Non-Hispanic)
65279,1,Deceased,2013,Wyoming,50-64 Years,Female,American Indian/Alaska Native (Non-Hispanic)
70719,0,Deceased,2013,Wyoming,50-64 Years,Female,Pacific Islander (Non-Hispanic)
76159,0,Deceased,2013,Wyoming,50-64 Years,Female,Multiracial (Non-Hispanic)


In [9]:
census.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic)
5600,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic)
11200,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic)
16800,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic)
22400,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic)


In [10]:
census.tail()

Unnamed: 0,population,year,state,age_group,gender,ethnicity
33599,45781.0,2018,Wyoming,65+,Female,White (Non-Hispanic)
39199,47316.0,2019,Wyoming,65+,Female,White (Non-Hispanic)
44799,47690.0,2020,Wyoming,65+,Female,White (Non-Hispanic)
50399,49349.0,2021,Wyoming,65+,Female,White (Non-Hispanic)
55999,50885.0,2022,Wyoming,65+,Female,White (Non-Hispanic)


Data for both dataframes appears as expected.

In [11]:
donors.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76160 entries, 0 to 76159
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   number_donors  76160 non-null  int16   
 1   donor_type     76160 non-null  category
 2   year           76160 non-null  int16   
 3   state          76160 non-null  category
 4   age_group      76160 non-null  category
 5   gender         76160 non-null  category
 6   ethnicity      76160 non-null  category
dtypes: category(5), int16(2)
memory usage: 1.2 MB


In [12]:
census.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56000 entries, 0 to 55999
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   population  56000 non-null  float64 
 1   year        56000 non-null  int16   
 2   state       56000 non-null  category
 3   age_group   56000 non-null  category
 4   gender      56000 non-null  category
 5   ethnicity   56000 non-null  category
dtypes: category(4), float64(1), int16(1)
memory usage: 1.2 MB


All of the column names were extracted correctly.  The datatypes for corresponding columns match up correctly.

### 0.4 Prep for merge

Step to preparing for this merge:

1 ) Split the donors dataframe into two -- living donors and deceased donors.

2 ) Create a key column for each of the three dataframes (living, deceased, census) that is made by concatinating year-state-age_group-gender-ethnicity.

In [13]:
# Confirming how many living and deceased donors are in the orignal dataframe.
donors['donor_type'].value_counts()

Deceased    49476
Living      26684
Name: donor_type, dtype: int64

In [14]:
# Creating a dataframe just for the living donors
living_d = donors[donors['donor_type']=='Living']
living_d.shape

(26684, 7)

In [15]:
# Confirming the df
living_d['donor_type'].value_counts()

Living      26684
Deceased        0
Name: donor_type, dtype: int64

In [16]:
# Creating a dataframe just for the deceased donors
deceased_d = donors[donors['donor_type']=='Deceased']
deceased_d.shape

(49476, 7)

In [17]:
# Confirming the df
deceased_d['donor_type'].value_counts()

Deceased    49476
Living          0
Name: donor_type, dtype: int64

Each of these new subset looks as expected

#### Creating key columns of year-state-age_group-gender-ethnicity

In [18]:
# Create key
living_d['key'] = (living_d['year'].astype(str)+'-'+living_d['state'].astype(str)+'-'+living_d['age_group'].astype(str)+'-'+living_d['gender'].astype(str)+'-'+living_d['ethnicity'].astype(str)).str.replace(" ", "")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  living_d['key'] = (living_d['year'].astype(str)+'-'+living_d['state'].astype(str)+'-'+living_d['age_group'].astype(str)+'-'+living_d['gender'].astype(str)+'-'+living_d['ethnicity'].astype(str)).str.replace(" ", "")


In [19]:
# Confirm key looks generally correct
living_d.head()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity,key
0,8,Living,2022,Alabama,18-34 Years,Male,White (Non-Hispanic),2022-Alabama-18-34Years-Male-White(Non-Hispanic)
5440,1,Living,2022,Alabama,18-34 Years,Male,Black (Non-Hispanic),2022-Alabama-18-34Years-Male-Black(Non-Hispanic)
10880,0,Living,2022,Alabama,18-34 Years,Male,Hispanic/Latino,2022-Alabama-18-34Years-Male-Hispanic/Latino
16320,0,Living,2022,Alabama,18-34 Years,Male,Asian (Non-Hispanic),2022-Alabama-18-34Years-Male-Asian(Non-Hispanic)
21760,0,Living,2022,Alabama,18-34 Years,Male,American Indian/Alaska Native (Non-Hispanic),2022-Alabama-18-34Years-Male-AmericanIndian/Al...


In [20]:
living_d.tail()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity,key
50865,0,Living,2013,Wyoming,50-64 Years,Female,Hispanic/Latino,2013-Wyoming-50-64Years-Female-Hispanic/Latino
56305,0,Living,2013,Wyoming,50-64 Years,Female,Asian (Non-Hispanic),2013-Wyoming-50-64Years-Female-Asian(Non-Hispa...
61745,0,Living,2013,Wyoming,50-64 Years,Female,American Indian/Alaska Native (Non-Hispanic),2013-Wyoming-50-64Years-Female-AmericanIndian/...
67185,0,Living,2013,Wyoming,50-64 Years,Female,Pacific Islander (Non-Hispanic),2013-Wyoming-50-64Years-Female-PacificIslander...
72625,0,Living,2013,Wyoming,50-64 Years,Female,Multiracial (Non-Hispanic),2013-Wyoming-50-64Years-Female-Multiracial(Non...


In [21]:
deceased_d['key'] = (deceased_d['year'].astype(str)+'-'+deceased_d['state'].astype(str).str.replace(" ", "")+'-'+deceased_d['age_group'].astype(str)+'-'+deceased_d['gender'].astype(str)+'-'+deceased_d['ethnicity'].astype(str)).str.replace(" ", "")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  deceased_d['key'] = (deceased_d['year'].astype(str)+'-'+deceased_d['state'].astype(str).str.replace(" ", "")+'-'+deceased_d['age_group'].astype(str)+'-'+deceased_d['gender'].astype(str)+'-'+deceased_d['ethnicity'].astype(str)).str.replace(" ", "")


In [22]:
deceased_d.head()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity,key
1906,2,Deceased,2022,Alabama,< 1 Year,Male,White (Non-Hispanic),2022-Alabama-<1Year-Male-White(Non-Hispanic)
7346,0,Deceased,2022,Alabama,< 1 Year,Male,Black (Non-Hispanic),2022-Alabama-<1Year-Male-Black(Non-Hispanic)
12786,0,Deceased,2022,Alabama,< 1 Year,Male,Hispanic/Latino,2022-Alabama-<1Year-Male-Hispanic/Latino
18226,0,Deceased,2022,Alabama,< 1 Year,Male,Asian (Non-Hispanic),2022-Alabama-<1Year-Male-Asian(Non-Hispanic)
23666,0,Deceased,2022,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2022-Alabama-<1Year-Male-AmericanIndian/Alaska...


In [23]:
deceased_d.tail()

Unnamed: 0,number_donors,donor_type,year,state,age_group,gender,ethnicity,key
54399,0,Deceased,2013,Wyoming,50-64 Years,Female,Hispanic/Latino,2013-Wyoming-50-64Years-Female-Hispanic/Latino
59839,0,Deceased,2013,Wyoming,50-64 Years,Female,Asian (Non-Hispanic),2013-Wyoming-50-64Years-Female-Asian(Non-Hispa...
65279,1,Deceased,2013,Wyoming,50-64 Years,Female,American Indian/Alaska Native (Non-Hispanic),2013-Wyoming-50-64Years-Female-AmericanIndian/...
70719,0,Deceased,2013,Wyoming,50-64 Years,Female,Pacific Islander (Non-Hispanic),2013-Wyoming-50-64Years-Female-PacificIslander...
76159,0,Deceased,2013,Wyoming,50-64 Years,Female,Multiracial (Non-Hispanic),2013-Wyoming-50-64Years-Female-Multiracial(Non...


In [24]:
census['key'] = (census['year'].astype(str)+'-'+census['state'].astype(str)+'-'+census['age_group'].astype(str)+'-'+census['gender'].astype(str)+'-'+census['ethnicity'].astype(str)).str.replace(" ", "")

In [25]:
census.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...
5600,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...
11200,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...
16800,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...
22400,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...


In [26]:
census.tail()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key
33599,45781.0,2018,Wyoming,65+,Female,White (Non-Hispanic),2018-Wyoming-65+-Female-White(Non-Hispanic)
39199,47316.0,2019,Wyoming,65+,Female,White (Non-Hispanic),2019-Wyoming-65+-Female-White(Non-Hispanic)
44799,47690.0,2020,Wyoming,65+,Female,White (Non-Hispanic),2020-Wyoming-65+-Female-White(Non-Hispanic)
50399,49349.0,2021,Wyoming,65+,Female,White (Non-Hispanic),2021-Wyoming-65+-Female-White(Non-Hispanic)
55999,50885.0,2022,Wyoming,65+,Female,White (Non-Hispanic),2022-Wyoming-65+-Female-White(Non-Hispanic)


All of the keys appear to have been generated correctly.

### 0.5 Outer Merge of census and living dataframes into dfm

In [27]:
# Revisiting the shapes of the two dfs to be merged
census.shape

(56000, 7)

In [28]:
living_d.shape

(26684, 8)

In [29]:
# Merging only the needed columns
dfm = census.merge(living_d[['key', 'number_donors']], on = ['key'], how = 'outer', indicator = True)

#### Confirm merge

In [30]:
dfm.shape

(56000, 9)

In [31]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,number_donors,_merge
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only


In [32]:
dfm['_merge'].value_counts()

left_only     29316
both          26684
right_only        0
Name: _merge, dtype: int64

All numbers appear correct.

#### Rename number_donors to living_donors

In [33]:
dfm.rename(columns ={'number_donors': 'living_donors'}, inplace = True)

In [34]:
# Confirm column is renamed
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,_merge
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,,left_only


#### Remove merge flag

In [35]:
dfm = dfm.drop(columns='_merge')

In [36]:
# Confirming the correct column was dropped
dfm.shape

(56000, 8)

In [37]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,


### 0.6 Outer Merge of dfm and deceased dataframes

In [38]:
# Revisiting the shapes of the two dfs to be merged
deceased_d.shape

(49476, 8)

In [39]:
dfm.shape

(56000, 8)

In [40]:
# Merging only the needed columns
dfm = dfm.merge(deceased_d[['key', 'number_donors']], on = ['key'], how = 'outer', indicator = True)

#### Confirm merge

In [41]:
dfm.shape

(56000, 10)

In [42]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,number_donors,_merge
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,,,left_only
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,,,left_only
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both


In [43]:
dfm['_merge'].value_counts()

both          49476
left_only      6524
right_only        0
Name: _merge, dtype: int64

All numbers appear correct.

#### Renaming number_donors to deceased_donors

In [44]:
dfm.rename(columns ={'number_donors': 'deceased_donors'}, inplace = True)

In [45]:
# Confirming correct column was renamed
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,deceased_donors,_merge
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,,,left_only
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,,,left_only
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0,both


#### Remove merge flag

In [46]:
dfm = dfm.drop(columns='_merge')

In [47]:
dfm.shape

(56000, 9)

In [48]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,deceased_donors
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,,
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,,
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,,0.0


In [49]:
dfm.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56000 entries, 0 to 55999
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   population       56000 non-null  float64 
 1   year             56000 non-null  int16   
 2   state            56000 non-null  category
 3   age_group        56000 non-null  category
 4   gender           56000 non-null  category
 5   ethnicity        56000 non-null  category
 6   key              56000 non-null  object  
 7   living_donors    26684 non-null  float64 
 8   deceased_donors  49476 non-null  float64 
dtypes: category(4), float64(3), int16(1), object(1)
memory usage: 2.5+ MB


### 0.7 Replace NaNs with zero

In years where there were no donors for a particular demographic (state-gender-age_group-ethnicity), then the state did not report any data.  These instances are currently represented by a null (NaN) and this will change them to zeroes for clarity.

In [50]:
dfm['living_donors'] = dfm['living_donors'].replace(np.nan, 0)

In [51]:
dfm['deceased_donors'] = dfm['deceased_donors'].replace(np.nan, 0)

In [52]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,deceased_donors
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0


In [53]:
dfm.isnull().sum()

population         0
year               0
state              0
age_group          0
gender             0
ethnicity          0
key                0
living_donors      0
deceased_donors    0
dtype: int64

All null values have been changed to zeros.

### 0.7 Derive total_donors columns

In [54]:
# Adding a colum for total_donors for each demographic slice
dfm['total_donors'] = dfm['living_donors'] + dfm['deceased_donors']

In [55]:
# Confirming the look of the new column
dfm.shape

(56000, 10)

In [56]:
dfm.head()

Unnamed: 0,population,year,state,age_group,gender,ethnicity,key,living_donors,deceased_donors,total_donors
0,96.0,2013,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2013-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0,0.0
1,123.0,2014,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2014-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0,0.0
2,78.0,2015,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2015-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0,0.0
3,85.0,2016,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2016-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0,0.0
4,95.0,2017,Alabama,< 1 Year,Male,American Indian/Alaska Native (Non-Hispanic),2017-Alabama-<1Year-Male-AmericanIndian/Alaska...,0.0,0.0,0.0


In [57]:
# Confirming it contains the correct data.

In [58]:
dfm['living_donors'].sum() + dfm['deceased_donors'].sum()

169541.0

In [59]:
dfm['total_donors'].sum()

169541.0

### 0.8 Export Merged Dataframe: donors-census.pkl

In [63]:
# Confirming final shape and datetypes
dfm.shape

(56000, 10)

In [64]:
dfm.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56000 entries, 0 to 55999
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   population       56000 non-null  float64 
 1   year             56000 non-null  int16   
 2   state            56000 non-null  category
 3   age_group        56000 non-null  category
 4   gender           56000 non-null  category
 5   ethnicity        56000 non-null  category
 6   key              56000 non-null  object  
 7   living_donors    56000 non-null  float64 
 8   deceased_donors  56000 non-null  float64 
 9   total_donors     56000 non-null  float64 
dtypes: category(4), float64(4), int16(1), object(1)
memory usage: 2.9+ MB


In [65]:
# Export df as a pickle file for future analysis in Python
dfm.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'donors-census.pkl'))

In [66]:
# Export a copy of the df as .csv that can be opened in Excel
dfm.to_csv(os.path.join(path, '02 Data','Prepared Data', 'donors-census.csv'), index = False)