<a href="https://colab.research.google.com/github/carolinehagood/covid-project/blob/main/Combining_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [40]:
import pandas as pd

#rename columns and drop column in vaccination data so they're aligned
vaccine_df = pd.read_csv('clean_vaccine_data.csv')
deaths_df = pd.read_csv('Covid_deaths_cleaned.csv')
deaths_df.rename(columns={'Start Date': 'Start_Date'}, inplace=True)
deaths_df.rename(columns={'End Date': 'End_Date'}, inplace=True)
vaccine_df.drop(columns=['Unnamed: 0'], inplace=True)
vaccine_df.head()

Unnamed: 0,Start_Date,End_Date,Total_Doses,Week
0,2020-12-13,2020-12-19,1139003,51.0
1,2020-12-20,2020-12-26,1878155,52.0
2,2020-12-27,2021-01-02,3105966,53.0
3,2021-01-03,2021-01-09,5613572,1.0
4,2021-01-10,2021-01-16,7179128,2.0


In [41]:
deaths_df.head()

Unnamed: 0,Start_Date,End_Date,Year,Week,COVID-19 Deaths
0,12/29/2019,01/04/2020,2019/2020,1.0,0.0
1,01/05/2020,01/11/2020,2020,2.0,1.0
2,01/12/2020,01/18/2020,2020,3.0,2.0
3,01/19/2020,01/25/2020,2020,4.0,3.0
4,01/26/2020,02/01/2020,2020,5.0,0.0


In [42]:
import pandas as pd

# Convert date columns to datetime if they aren't already
deaths_df['Start_Date'] = pd.to_datetime(deaths_df['Start_Date'])
deaths_df['End_Date'] = pd.to_datetime(deaths_df['End_Date'])
vaccine_df['Start_Date'] = pd.to_datetime(vaccine_df['Start_Date'])
vaccine_df['End_Date'] = pd.to_datetime(vaccine_df['End_Date'])

# Merge the DataFrames, using 'start_date' and 'end_date' to align rows
merged_df = pd.merge(deaths_df, vaccine_df[['Start_Date', 'End_Date', 'Total_Doses']],
                      on=['Start_Date', 'End_Date'], how='left')

# Fill NaN values in the 'Total_Doses' column with 0
merged_df['Total_Doses'].fillna(0, inplace=True)

# Display the rows where vaccines start
print(merged_df.iloc[49:56])


   Start_Date   End_Date       Year  Week  COVID-19 Deaths  Total_Doses
49 2020-12-06 2020-12-12       2020  50.0          20940.0          0.0
50 2020-12-13 2020-12-19       2020  51.0          22342.0    1139003.0
51 2020-12-20 2020-12-26       2020  52.0          23399.0    1878155.0
52 2020-12-27 2021-01-02  2020/2021  53.0          24895.0    3105966.0
53 2021-01-03 2021-01-09       2021   1.0          26028.0    5613572.0
54 2021-01-10 2021-01-16       2021   2.0          25743.0    7179128.0
55 2021-01-17 2021-01-23       2021   3.0          23734.0    8371239.0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  merged_df['Total_Doses'].fillna(0, inplace=True)


In [46]:
#Adding variable with cumulative vaccines
merged_df['Doses_Cumulative'] = merged_df['Total_Doses'].cumsum()
print(merged_df.iloc[49:56])


   Start_Date   End_Date       Year  Week  COVID-19 Deaths  Total_Doses  \
49 2020-12-06 2020-12-12       2020  50.0          20940.0          0.0   
50 2020-12-13 2020-12-19       2020  51.0          22342.0    1139003.0   
51 2020-12-20 2020-12-26       2020  52.0          23399.0    1878155.0   
52 2020-12-27 2021-01-02  2020/2021  53.0          24895.0    3105966.0   
53 2021-01-03 2021-01-09       2021   1.0          26028.0    5613572.0   
54 2021-01-10 2021-01-16       2021   2.0          25743.0    7179128.0   
55 2021-01-17 2021-01-23       2021   3.0          23734.0    8371239.0   

    Doses_Cumulative  
49               0.0  
50         1139003.0  
51         3017158.0  
52         6123124.0  
53        11736696.0  
54        18915824.0  
55        27287063.0  


In [47]:
merged_df.to_csv('merged_data.csv', index=False)