Requirements

1. Input the data
2. Fix the Relationship Start field so that it has a Date data type
3. Create a field for today's date (14/02/2024)
4. To count the number of Valentine's days, we need to think a little more creatively than using a simple datediff function to count the number of years. A couple of potential routes could be:
    - Scaffolding the data so there is a row per day, filtering to Valentine's days and counting the number of rows
    - A logical calculation that takes into consideration whether the Couple's Relationship Start date is before or after Valentines Day
5. To prepare to join onto the Gift ideas dataset, make the Year field match the data type of the Number of Valentine's field
6. Join the 2 datasets together
7. Remove unnecessary fields
8. Output the data

In [2]:
import pandas as pd

1. Input the data

In [4]:
couples_df = pd.read_excel("Valentine's Preppin' Data.xlsx", sheet_name='Couples')

In [5]:
couples_df

Unnamed: 0,Couple,Relationship Start
0,The Loves,"January 15, 2021"
1,The Roses,"October 8, 1964"
2,The Harts,"May 28, 2018"
3,The Darlings,"December 3, 2017"
4,The Doves,"August 21, 1994"
5,The Archers,"February 12, 2020"
6,The Potters,"November 4, 2015"
7,The Bakers,"April 17, 1989"
8,The Gardeners,"June 9, 1974"
9,The Lovelaces,"September 30, 2009"


In [6]:
gifts_df = pd.read_excel("Valentine's Preppin' Data.xlsx", sheet_name='Gifts')

In [7]:
gifts_df

Unnamed: 0,Year,Gift
0,1st,Paper
1,2nd,Cotton
2,3rd,Leather
3,4th,Fruit/Flowers
4,5th,Wood
5,6th,Iron
6,7th,Copper/Wool
7,8th,Bronze
8,9th,Pottery
9,10th,Aluminium/Tin


3. Create a field for today's date (14/02/2024)

In [9]:
couples_df["Todays' Date"] = pd.to_datetime('14/02/2024', dayfirst=True)

In [10]:
couples_df

Unnamed: 0,Couple,Relationship Start,Todays' Date
0,The Loves,"January 15, 2021",2024-02-14
1,The Roses,"October 8, 1964",2024-02-14
2,The Harts,"May 28, 2018",2024-02-14
3,The Darlings,"December 3, 2017",2024-02-14
4,The Doves,"August 21, 1994",2024-02-14
5,The Archers,"February 12, 2020",2024-02-14
6,The Potters,"November 4, 2015",2024-02-14
7,The Bakers,"April 17, 1989",2024-02-14
8,The Gardeners,"June 9, 1974",2024-02-14
9,The Lovelaces,"September 30, 2009",2024-02-14


4. To count the number of Valentine's days, we need to think a little more creatively than using a simple datediff function to count the number of years. A couple of potential routes could be:
    - Scaffolding the data so there is a row per day, filtering to Valentine's days and counting the number of rows
    - A logical calculation that takes into consideration whether the Couple's Relationship Start date is before or after Valentines Day

In [12]:
couples_df['Relationship Start'] = pd.to_datetime(couples_df['Relationship Start'], format= '%B %d, %Y')

In [13]:
couples_df

Unnamed: 0,Couple,Relationship Start,Todays' Date
0,The Loves,2021-01-15,2024-02-14
1,The Roses,1964-10-08,2024-02-14
2,The Harts,2018-05-28,2024-02-14
3,The Darlings,2017-12-03,2024-02-14
4,The Doves,1994-08-21,2024-02-14
5,The Archers,2020-02-12,2024-02-14
6,The Potters,2015-11-04,2024-02-14
7,The Bakers,1989-04-17,2024-02-14
8,The Gardeners,1974-06-09,2024-02-14
9,The Lovelaces,2009-09-30,2024-02-14


In [14]:
couples_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Couple              10 non-null     object        
 1   Relationship Start  10 non-null     datetime64[ns]
 2   Todays' Date        10 non-null     datetime64[ns]
dtypes: datetime64[ns](2), object(1)
memory usage: 372.0+ bytes


In [15]:
all_dates = []

for _, row in couples_df.iterrows():

    date_range = pd.date_range(start=row['Relationship Start'], end=row["Todays' Date"], freq='D')

    couple_dates = pd.DataFrame({
        'Couple' : row['Couple'],
        'date': date_range
    })

    all_dates.append(couple_dates)


result = pd.concat(all_dates, ignore_index=True)

In [16]:
result

Unnamed: 0,Couple,date
0,The Loves,2021-01-15
1,The Loves,2021-01-16
2,The Loves,2021-01-17
3,The Loves,2021-01-18
4,The Loves,2021-01-19
...,...,...
78534,The Lovelaces,2024-02-10
78535,The Lovelaces,2024-02-11
78536,The Lovelaces,2024-02-12
78537,The Lovelaces,2024-02-13


In [17]:
result = result[(result['date'].dt.month == 2) & (result['date'].dt.day == 14)]

In [18]:
result

Unnamed: 0,Couple,date
30,The Loves,2021-02-14
395,The Loves,2022-02-14
760,The Loves,2023-02-14
1125,The Loves,2024-02-14
1255,The Roses,1965-02-14
...,...,...
77077,The Lovelaces,2020-02-14
77443,The Lovelaces,2021-02-14
77808,The Lovelaces,2022-02-14
78173,The Lovelaces,2023-02-14


In [19]:
result = result.copy()

In [20]:
df = result.groupby('Couple')['date'].count().reset_index(name='Number of Valentines')

In [21]:
df

Unnamed: 0,Couple,Number of Valentines
0,The Archers,5
1,The Bakers,35
2,The Darlings,7
3,The Doves,30
4,The Gardeners,50
5,The Harts,6
6,The Lovelaces,15
7,The Loves,4
8,The Potters,9
9,The Roses,60


5. To prepare to join onto the Gift ideas dataset, make the Year field match the data type of the Number of Valentine's field


In [23]:
gifts_df

Unnamed: 0,Year,Gift
0,1st,Paper
1,2nd,Cotton
2,3rd,Leather
3,4th,Fruit/Flowers
4,5th,Wood
5,6th,Iron
6,7th,Copper/Wool
7,8th,Bronze
8,9th,Pottery
9,10th,Aluminium/Tin


In [24]:
gifts_df['Year'] = gifts_df['Year'].replace([r'st',r'nd',r'rd',r'th'], ['','','',''], regex=True)

In [25]:
gifts_df

Unnamed: 0,Year,Gift
0,1,Paper
1,2,Cotton
2,3,Leather
3,4,Fruit/Flowers
4,5,Wood
5,6,Iron
6,7,Copper/Wool
7,8,Bronze
8,9,Pottery
9,10,Aluminium/Tin


6. Join the 2 datasets together


In [27]:
gifts_df['Year'] = gifts_df['Year'].astype(int)

In [28]:
df = df.merge(right=gifts_df, how='left',left_on='Number of Valentines', right_on='Year' )

In [29]:
df

Unnamed: 0,Couple,Number of Valentines,Year,Gift
0,The Archers,5,5,Wood
1,The Bakers,35,35,Coral
2,The Darlings,7,7,Copper/Wool
3,The Doves,30,30,Pearl
4,The Gardeners,50,50,Gold
5,The Harts,6,6,Iron
6,The Lovelaces,15,15,Crystal
7,The Loves,4,4,Fruit/Flowers
8,The Potters,9,9,Pottery
9,The Roses,60,60,Diamond


7. Remove unnecessary fields
8. Output the data

In [31]:
df = df[['Couple','Number of Valentines','Gift']]

In [32]:
df

Unnamed: 0,Couple,Number of Valentines,Gift
0,The Archers,5,Wood
1,The Bakers,35,Coral
2,The Darlings,7,Copper/Wool
3,The Doves,30,Pearl
4,The Gardeners,50,Gold
5,The Harts,6,Iron
6,The Lovelaces,15,Crystal
7,The Loves,4,Fruit/Flowers
8,The Potters,9,Pottery
9,The Roses,60,Diamond


In [33]:
df.to_csv('week_07_output.csv', index=False)