<a href="https://colab.research.google.com/github/Eddychege/datascienceprojects/blob/main/MQLs_QI_to_Sign_UPs_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd

In [4]:
df = pd.read_csv('/content/MQL Q1.csv')

In [5]:
print(df.head())
print(df.info())
print(df.shape)
print(df.columns)

     ` q1aqa1111111111` First Name Second Name                        Email  \
0  NaN        22/1/2025  \nJoseph        Maina    jmwambugu34@gmail.com\n\n   
1  NaN        22/1/2025     Esther     Wanjiru   estyerwanjiru100@gmail.com   
2  NaN        22/1/2025  Fredrick      Micheni  \nfredrickmicheni@gmail.com   
3    1        22/1/2025      Mercy         Jnr        \nmchemjor4@gmail.com   
4    2        22/1/2025      Susan       Hiuhu     \nsusanhiuhu82@gmail.com   

   Phone Number                                Insurer  \
0  2.547818e+11                                    All   
1  2.547178e+11                            APA Medical   
2  2.541145e+11                                    APA   
3  2.547262e+11                      Both PA and Motor   
4  2.547900e+11  ICEA APA Jubilee Britam\nPA Insurance   

                                  Marketing Comments Sales Exec  \
0  Are you a licensed independent insurance agent...     Yvonne   
1  Are you a licensed independent insuranc

In [6]:
df1 = pd.read_csv('/content/1st_transaction_(forecasted_this_month)_listing (2).csv')

In [7]:
print(df1.head())
print(df1.shape)
print(df1.info())
print(df1.columns)

                          Name                       Agency    KYC Mobile  \
0              ERICK  KIPYEGON              ERICK  KIPYEGON     724076521   
1                 BEVERL HONUR                 BEVERL HONUR  254745449685   
2            ANN WAMBUI NGANGA          ANN WAMBUI NG'ANG'A  254112564877   
3  BORN AGAIN INSURANCE AGENCY  BORN AGAIN INSURANCE AGENCY  254723818670   
4         ELVIS AGESA MALIKISI         ELVIS AGESA MALIKISI  254727419101   

                     KYC E-Mail   Status Created Date  Signup Date  \
0        kendagorkips@gmail.com  Sign-up  21-MAR-2025  21-MAR-2025   
1    mbayakhasayabev2@gmail.com  Sign-up  08-JUL-2024  08-JUL-2024   
2   annwambuinganga42@gmail.com  Sign-up  13-JUN-2024  13-JUN-2024   
3  bornagaininsurance@gmail.com  Sign-up  26-JUL-2023  26-JUL-2023   
4          elvisagera@gmail.com  Sign-up  20-JAN-2023  20-JAN-2023   

     Sync Date  Days In Pipeline   Channel    Reffered By  \
0          NaN                66   Pending            N

In [8]:
df2 = pd.read_csv('/content/1st_transaction_(no_forecast_this_month)_listing.csv')

In [9]:
print(df2.head())
print(df2.shape)
print(df2.info())
print(df2.columns)

                              Name                           Agency  \
0                      GEORGE KEYA    DIAMOND MARK INSURANCE AGENCY   
1  MID ALLIANCE INSURANCE AGENCIES  MID ALLIANCE INSURANCE AGENCIES   
2                       LEAH MUNGA                              NaN   
3             GEORGE ODHIAMBO OUMA             GEORGE ODHIAMBO OUMA   
4                     RAHAB MUKAMI                     RAHAB MUKAMI   

     KYC Mobile                      KYC E-Mail   Status Created Date  \
0     722803690  diamondmarkinsurance@yahoo.com  Sign-up  13-JUN-2024   
1  254722804746           gitaujamesk@gmail.com  Sign-up  31-JAN-2025   
2           NaN                             NaN  Sign-up          NaN   
3  254726173612              oushouma@gmail.com  Sign-up  06-FEB-2025   
4  254725813778          rahab.mukami@yahoo.com  Sign-up  04-JUN-2024   

   Signup Date    Sync Date  Days In Pipeline  Channel Reffered By  \
0  13-JUN-2024          NaN               345  Pending         N

In [10]:


# --- Data Cleaning and Standardization ---

# MQL Data (df)
# Rename columns for clarity based on your sample data's structure
# Adjust 'MQL_Internal_ID' and 'MQL_Date' if your actual column names/roles differ
df.rename(columns={
    '`': 'MQL_Internal_ID',
    'q1aqa1111111111`': 'MQL_Date'
}, inplace=True)

# Clean 'Email' column: strip whitespace/newlines and convert to lowercase
df['Email'] = df['Email'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()

# Convert 'MQL_Date' to datetime objects (assuming DD/MM/YYYY format)
# 'errors=coerce' will turn unparseable dates into NaT (Not a Time)
df['MQL_Date'] = pd.to_datetime(df['MQL_Date'], dayfirst=True, errors='coerce')

# Drop MQLs with missing essential data (Email or MQL_Date)
df.dropna(subset=['MQL_Date', 'Email'], inplace=True)


# Sign-up Data (df1, df2)
# Concatenate the two sign-up dataframes into one
consolidated_signup_df = pd.concat([df1, df2], ignore_index=True)

# Clean 'KYC E-Mail' column: strip whitespace/newlines and convert to lowercase
consolidated_signup_df['KYC E-Mail'] = consolidated_signup_df['KYC E-Mail'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()

# Convert 'Signup Date' to datetime objects (assuming DD-MON-YYYY format)
consolidated_signup_df['Signup Date'] = pd.to_datetime(consolidated_signup_df['Signup Date'], format='%d-%b-%Y', errors='coerce')

# Drop sign-ups with missing essential data (KYC E-Mail or Signup Date)
consolidated_signup_df.dropna(subset=['Signup Date', 'KYC E-Mail'], inplace=True)

# Deduplicate sign-ups: keep the earliest Signup Date for each unique email
consolidated_signup_df.sort_values(by='Signup Date', inplace=True)
consolidated_signup_df.drop_duplicates(subset='KYC E-Mail', keep='first', inplace=True)


# --- Data Merging/Linking ---

# Perform a left merge: Link MQLs to their sign-up records.
# All MQLs are kept, and sign-up info is added if a match exists.
mql_signup_analysis_df = df.merge(
    consolidated_signup_df[['KYC E-Mail', 'Signup Date']],
    left_on='Email',
    right_on='KYC E-Mail',
    how='left'
)

# Create a boolean column to easily identify converted MQLs
mql_signup_analysis_df['Is_Signed_Up'] = mql_signup_analysis_df['Signup Date'].notna()


# --- Conversion Rate Calculation ---

total_mqls = len(df) # The total count of cleaned MQLs
converted_mqls = mql_signup_analysis_df['Is_Signed_Up'].sum() # Count of MQLs with a matched signup date

conversion_rate_signup = (converted_mqls / total_mqls) * 100 if total_mqls > 0 else 0

print(f"--- MQL to Sign-up Conversion Analysis (Overall) ---")
print(f"Total MQLs: {total_mqls}")
print(f"MQLs Converted to Sign-up: {converted_mqls}")
print(f"**Conversion Rate (MQL to Sign-up): {conversion_rate_signup:.2f}%**\n")


# --- Aging Analysis (Time to Convert) ---

# Filter for MQLs that successfully converted and have a logical signup date (not before MQL date)
converted_leads_for_aging = mql_signup_analysis_df[
    mql_signup_analysis_df['Is_Signed_Up'] &
    (mql_signup_analysis_df['Signup Date'] >= mql_signup_analysis_df['MQL_Date'])
].copy() # .copy() to avoid SettingWithCopyWarning

if not converted_leads_for_aging.empty:
    # Calculate the difference in days
    converted_leads_for_aging['Aging_Days'] = (converted_leads_for_aging['Signup Date'] - converted_leads_for_aging['MQL_Date']).dt.days

    average_aging_days = converted_leads_for_aging['Aging_Days'].mean()
    median_aging_days = converted_leads_for_aging['Aging_Days'].median()
    min_aging_days = converted_leads_for_aging['Aging_Days'].min()
    max_aging_days = converted_leads_for_aging['Aging_Days'].max()

    print(f"--- Time to Sign-up (Aging) for Converted MQLs ---")
    print(f"Average: {average_aging_days:.2f} days")
    print(f"Median: {median_aging_days:.2f} days")
    print(f"Minimum: {min_aging_days} days")
    print(f"Maximum: {max_aging_days} days")
else:
    print("No converted MQLs found with a logical signup date to calculate aging.")

--- MQL to Sign-up Conversion Analysis (Overall) ---
Total MQLs: 115
MQLs Converted to Sign-up: 5
**Conversion Rate (MQL to Sign-up): 4.35%**

--- Time to Sign-up (Aging) for Converted MQLs ---
Average: 27.75 days
Median: 28.00 days
Minimum: 1 days
Maximum: 54 days


In [11]:
# Assuming the previous code snippet has been executed and 'converted_leads_for_aging' DataFrame exists

# Display the relevant columns for each converted MQL
if not converted_leads_for_aging.empty:
    print("\n--- Details of Each Converted MQL (MQL to Sign-up) ---")
    # Select and display the specific columns requested
    print(converted_leads_for_aging[['Email', 'MQL_Date', 'Signup Date', 'Aging_Days']].to_string(index=False))
else:
    print("\nNo MQLs converted to Sign-up (with logical dates) to display individual details.")



--- Details of Each Converted MQL (MQL to Sign-up) ---
                 Email   MQL_Date Signup Date  Aging_Days
kendagorkips@gmail.com 2025-01-26  2025-03-21          54
    oushouma@gmail.com 2025-02-02  2025-02-06           4
    jwmugo@outlook.com 2025-02-16  2025-04-09          52
kendagorkips@gmail.com 2025-03-20  2025-03-21           1


In [12]:
# Assuming the previous code snippet has been executed and 'converted_leads_for_aging' DataFrame exists

# Display the relevant columns for each converted MQL, including names
if not converted_leads_for_aging.empty:
    print("\n--- Details of Each Converted MQL (MQL to Sign-up) with Names ---")
    # Select and display the specific columns, now including 'First Name' and 'Second Name'
    print(converted_leads_for_aging[['First Name', 'Second Name', 'Email', 'MQL_Date', 'Signup Date', 'Aging_Days']].to_string(index=False))
else:
    print("\nNo MQLs converted to Sign-up (with logical dates) to display individual details.")


--- Details of Each Converted MQL (MQL to Sign-up) with Names ---
First Name Second Name                  Email   MQL_Date Signup Date  Aging_Days
     Erick    Kipyegon kendagorkips@gmail.com 2025-01-26  2025-03-21          54
    George    Odhiambo     oushouma@gmail.com 2025-02-02  2025-02-06           4
      Jane        Mugo     jwmugo@outlook.com 2025-02-16  2025-04-09          52
     Erick    Kipyegon kendagorkips@gmail.com 2025-03-20  2025-03-21           1


In [13]:
df3 = pd.read_csv('/content/2nd_transaction_(forecasted_this_month)_listing (1).csv')
df4 = pd.read_csv('/content/2nd_transaction_(no_forecast_this_month)_listing (1).csv')

In [14]:
print(df3.head())
print(df3.shape)
print(df3.info())
print(df3.columns)

                      Name                     Agency    KYC Mobile  \
0            JOSEPH KITEME            BIMA KIT AGENCY  254713051186   
1          MILICENT BITUTU     MILICENT BITUTU AGENCY  254723093271   
2  PUFFIN INSURANCE AGENCY    PUFFIN INSURANCE AGENCY  254738935178   
3        FAITH NJERI MAINA          FAITH NJERI MAINA     723545288   
4             FRANK  KEIYO  GUARDIAN INSURANCE AGENCY  254722609935   

      Telephone                 KYC E-Mail             Register Email  \
0     713051186       Bima.kit@outlook.com       Bima.kit@outlook.com   
1     723093271   milicentbitutu@gmail.com   milicentbitutu@gmail.com   
2     738935178     wamokomunene@gmail.com     wamokomunene@gmail.com   
3     723545288         fei.njeri@gmail.co         fei.njeri@gmail.co   
4  254722609935  agency.guardian@gmail.com  agency.guardian@gmail.com   

       Status Signup (Date) Production (Date)  Conversion TAT (Day's) Type  \
0  Transactor   19-APR-2024       19-APR-2024           

In [15]:
print(df4.head())
print(df4.shape)
print(df4.info())
print(df4.columns)

                                Name                             Agency  \
0          RISKWISE INSURANCE AGENCY          RISKWISE INSURANCE AGENCY   
1            THESYM INSURANCE AGENCY            THESYM INSURANCE AGENCY   
2           INNOVEX INSURANCE AGENCY           INNOVEX INSURANCE AGENCY   
3           KANGWAE INSURANCE AGENCY           KANGWAE INSURANCE AGENCY   
4  PARADIGM INSURANCE AGENCY LIMITED  PARADIGM INSURANCE AGENCY LIMITED   

     KYC Mobile     Telephone                          KYC E-Mail  \
0  254722281314     722998455         riskwiseinsurance@gmail.com   
1  254723362077  254723362077             mwakatheresia@gmail.com   
2  254723122796  254723122796          admin@innovexinsurance.com   
3  254723544073  254723544073  kangwae.insurance.agency@gmail.com   
4  254115038535     720672796              andrewonanda@yahoo.com   

                       Register Email      Status Signup (Date)  \
0         riskwiseinsurance@gmail.com  Transactor   24-FEB-2023   


In [16]:

import numpy as np

# --- Re-use Cleaned MQL Data (df) from previous step ---
# Assuming 'df' (your MQL DataFrame) is already cleaned and deduplicated by email
# from the previous MQL to Sign-up analysis.
# If you are running this code independently, you'd need to include the MQL cleaning part again.
# df.rename(columns={'`': 'MQL_Internal_ID', 'q1aqa1111111111`': 'MQL_Date'}, inplace=True)
# df['Email'] = df['Email'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()
# df['MQL_Date'] = pd.to_datetime(df['MQL_Date'], dayfirst=True, errors='coerce')
# df.dropna(subset=['MQL_Date', 'Email'], inplace=True)
# df.sort_values(by='MQL_Date', inplace=True)
# df.drop_duplicates(subset='Email', keep='first', inplace=True)
# print(f"MQLs (unique by email) for NWAT analysis: {len(df)}\n")


# --- 1. NWAT Data Cleaning & Standardization (df3, df4) ---
print("Step 1: Cleaning and preparing NWAT data...")

# Concatenate the two NWAT dataframes
consolidated_nwat_df = pd.concat([df3, df4], ignore_index=True)

# Clean 'KYC E-Mail' column: strip whitespace/newlines and convert to lowercase
consolidated_nwat_df['KYC E-Mail'] = consolidated_nwat_df['KYC E-Mail'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()

# Convert 'Production (Date)' to datetime objects (assuming DD-MON-YYYY format)
consolidated_nwat_df['Production (Date)'] = pd.to_datetime(consolidated_nwat_df['Production (Date)'], format='%d-%b-%Y', errors='coerce')

# Drop NWATs with missing essential data (KYC E-Mail or Production (Date))
initial_nwat_rows = len(consolidated_nwat_df)
consolidated_nwat_df.dropna(subset=['Production (Date)', 'KYC E-Mail'], inplace=True)
cleaned_nwat_rows = len(consolidated_nwat_df)
print(f"NWATs initially: {initial_nwat_rows}, after dropping rows with missing essential data: {cleaned_nwat_rows}")

# Deduplicate NWATs: keep the earliest 'Production (Date)' for each unique email
# This identifies the *first transaction* per user.
consolidated_nwat_df.sort_values(by='Production (Date)', inplace=True)
consolidated_nwat_df.drop_duplicates(subset='KYC E-Mail', keep='first', inplace=True)
deduplicated_nwat_count = len(consolidated_nwat_df)
print(f"NWATs after deduplication by email (kept earliest production date): {deduplicated_nwat_count}")

print(f"Cleaned NWAT Data Head (First Transaction per Email):\n{consolidated_nwat_df[['KYC E-Mail', 'Production (Date)']].head()}\n")


# --- 2. Data Merging/Linking (MQLs to NWATs) ---
print("Step 2: Merging MQLs with NWATs...")

# Perform a left merge: keep all unique MQLs and add NWAT information if available
mql_nwat_analysis_df = df.merge(
    consolidated_nwat_df[['KYC E-Mail', 'Production (Date)']], # Only bring necessary columns from NWATs
    left_on='Email',
    right_on='KYC E-Mail',
    how='left'
)

# Rename the merged 'Production (Date)' column for clarity in the merged DataFrame
mql_nwat_analysis_df.rename(columns={'Production (Date)': 'NWAT_Date'}, inplace=True)

# Identify MQLs who completed an NWAT
mql_nwat_analysis_df['Is_NWAT'] = mql_nwat_analysis_df['NWAT_Date'].notna()

print(f"Merged MQL-NWAT data head:\n{mql_nwat_analysis_df[['Email', 'MQL_Date', 'NWAT_Date', 'Is_NWAT']].head()}\n")


# --- 3. MQL to NWAT Conversion Rate Calculation ---
print("Step 3: Calculating MQL to NWAT Conversion Rate...")

total_mqls_base = len(df) # The total count of unique, cleaned MQLs

# Filter for MQLs who performed an NWAT AND the NWAT occurred ON or AFTER their MQL date
converted_mqls_to_nwat = mql_nwat_analysis_df[
    mql_nwat_analysis_df['Is_NWAT'] &
    (mql_nwat_analysis_df['NWAT_Date'] >= mql_nwat_analysis_df['MQL_Date'])
]
num_converted_to_nwat = len(converted_mqls_to_nwat)

conversion_rate_nwat = (num_converted_to_nwat / total_mqls_base) * 100 if total_mqls_base > 0 else 0

print(f"--- MQL to NWAT Conversion Analysis (Overall) ---")
print(f"Total Unique MQLs: {total_mqls_base}")
print(f"Unique MQLs Converted to NWAT (first transaction AFTER MQL date): {num_converted_to_nwat}")
print(f"**Conversion Rate (MQL to NWAT): {conversion_rate_nwat:.2f}%**\n")


# --- 4. Aging Analysis (Time to NWAT Conversion) ---
print("Step 4: Performing Time Analysis (Aging) for NWATs...")

if not converted_mqls_to_nwat.empty:
    converted_mqls_to_nwat['Aging_Days_NWAT'] = (converted_mqls_to_nwat['NWAT_Date'] - converted_mqls_to_nwat['MQL_Date']).dt.days

    average_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].mean()
    median_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].median()
    min_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].min()
    max_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].max()

    print(f"--- Time to First Transaction (NWAT Aging) for Converted MQLs ---")
    print(f"Average: {average_aging_days_nwat:.2f} days")
    print(f"Median: {median_aging_days_nwat:.2f} days")
    print(f"Minimum: {min_aging_days_nwat} days")
    print(f"Maximum: {max_aging_days_nwat} days")

    # Display individual NWAT conversion details including names and aging
    print("\n--- Details of Each Converted MQL (MQL to NWAT) ---")
    display_columns = ['First Name', 'Second Name', 'Email', 'MQL_Date', 'NWAT_Date', 'Aging_Days_NWAT']
    # Check if 'First Name' and 'Second Name' exist before displaying, as df might have been loaded differently
    available_cols = [col for col in display_columns if col in converted_mqls_to_nwat.columns]
    print(converted_mqls_to_nwat[available_cols].to_string(index=False))

else:
    print("No MQLs converted to NWAT (first transaction after MQL date) to calculate aging or display details.")

print("\nMQL to NWAT analysis complete.")

Step 1: Cleaning and preparing NWAT data...
NWATs initially: 22, after dropping rows with missing essential data: 22
NWATs after deduplication by email (kept earliest production date): 22
Cleaned NWAT Data Head (First Transaction per Email):
                            KYC E-Mail Production (Date)
8          riskwiseinsurance@gmail.com        2023-05-27
9              mwakatheresia@gmail.com        2023-07-20
10          admin@innovexinsurance.com        2024-02-20
11  kangwae.insurance.agency@gmail.com        2024-03-22
0                 bima.kit@outlook.com        2024-04-19

Step 2: Merging MQLs with NWATs...
Merged MQL-NWAT data head:
                        Email   MQL_Date NWAT_Date  Is_NWAT
0       jmwambugu34@gmail.com 2025-01-22       NaT    False
1  estyerwanjiru100@gmail.com 2025-01-22       NaT    False
2   fredrickmicheni@gmail.com 2025-01-22       NaT    False
3         mchemjor4@gmail.com 2025-01-22       NaT    False
4      susanhiuhu82@gmail.com 2025-01-22       NaT   

In [17]:
# Assuming 'converted_mqls_to_nwat' DataFrame exists from the previous analysis

print("\n--- Searching for Hezron Masinde in Converted MQLs to NWAT ---")

# Option 1: Search by Name (case-insensitive for robustness)
# Convert names to lowercase for robust searching
search_name_lower_first = 'hezron'
search_name_lower_second = 'masinde'

# Filter based on name columns
hezron_by_name = converted_mqls_to_nwat[
    (converted_mqls_to_nwat['First Name'].astype(str).str.lower() == search_name_lower_first) |
    (converted_mqls_to_nwat['Second Name'].astype(str).str.lower() == search_name_lower_second)
]

if not hezron_by_name.empty:
    print(f"\nFound Hezron Masinde (by name) in converted NWATs:\n{hezron_by_name[['First Name', 'Second Name', 'Email', 'MQL_Date', 'NWAT_Date', 'Aging_Days_NWAT']].to_string(index=False)}")
else:
    print(f"\n'Hezron Masinde' (by name) not found in converted NWATs.")


# Option 2: Search by a specific Email (if you know it)
# Replace 'hezron.masinde@example.com' with Hezron Masinde's actual email if you know it
search_email = 'some_email@example.com' # <--- REPLACE WITH HEZRON'S ACTUAL EMAIL
hezron_by_email = converted_mqls_to_nwat[
    converted_mqls_to_nwat['Email'] == search_email.lower() # Ensure consistency with our lowercasing
]

if not hezron_by_email.empty:
    print(f"\nFound Hezron (by email: {search_email}) in converted NWATs:\n{hezron_by_email[['First Name', 'Second Name', 'Email', 'MQL_Date', 'NWAT_Date', 'Aging_Days_NWAT']].to_string(index=False)}")
else:
    print(f"\n'{search_email}' not found in converted NWATs.")

print("\n--- End of Search ---")


--- Searching for Hezron Masinde in Converted MQLs to NWAT ---

'Hezron Masinde' (by name) not found in converted NWATs.

'some_email@example.com' not found in converted NWATs.

--- End of Search ---


In [18]:
# Assuming 'df' (cleaned MQLs), 'df3' and 'df4' (raw NWATs) are loaded.
# We'll use the *raw* df3/df4 for initial checks, and then consider their cleaned versions.

search_first_name = 'hezron'
search_second_name = 'masinde'

# --- Diagnostic Step 1: Check in MQL Data (df) ---
print("\n--- Searching for Hezron Masinde in MQL (df) Data ---")
# Use the cleaned 'Email' and 'MQL_Date' for consistency with our analysis
hezron_in_mql = df[
    (df['First Name'].astype(str).str.lower() == search_first_name) &
    (df['Second Name'].astype(str).str.lower() == search_second_name)
].copy() # Use .copy() to avoid SettingWithCopyWarning

if not hezron_in_mql.empty:
    print("Found in MQL data:")
    print(hezron_in_mql[['First Name', 'Second Name', 'Email', 'MQL_Date']].to_string(index=False))
    hezron_mql_email = hezron_in_mql['Email'].iloc[0] if not hezron_in_mql.empty else None
    hezron_mql_date = hezron_in_mql['MQL_Date'].iloc[0] if not hezron_in_mql.empty else None
    print(f"Hezron's MQL Email: {hezron_mql_email}")
    print(f"Hezron's MQL Date: {hezron_mql_date}")
else:
    print("Hezron Masinde NOT found in MQL data (df).")
    hezron_mql_email = None
    hezron_mql_date = None


# --- Diagnostic Step 2: Check in Raw NWAT Data (df3 and df4) ---
print("\n--- Searching for Hezron Masinde in Raw NWAT (df3, df4) Data ---")
# Combine raw NWAT data for searching
raw_nwat_combined = pd.concat([df3, df4], ignore_index=True)

# Clean email and date in a temporary way for searching
raw_nwat_combined['KYC E-Mail_cleaned'] = raw_nwat_combined['KYC E-Mail'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()
raw_nwat_combined['Production (Date)_dt'] = pd.to_datetime(raw_nwat_combined['Production (Date)'], format='%d-%b-%Y', errors='coerce')


# Search by cleaned email, if found in MQLs
if hezron_mql_email:
    print(f"Searching raw NWAT data for email: {hezron_mql_email}")
    hezron_in_nwat_by_email = raw_nwat_combined[
        raw_nwat_combined['KYC E-Mail_cleaned'] == hezron_mql_email
    ].copy()

    if not hezron_in_nwat_by_email.empty:
        print("Found in raw NWAT data (by email):")
        # Display earliest production date for this email from raw NWATs
        earliest_nwat_date = hezron_in_nwat_by_email.sort_values(by='Production (Date)_dt').iloc[0]
        print(earliest_nwat_date[['Name', 'KYC E-Mail', 'Production (Date)']].to_string())
        print(f"Earliest NWAT Production Date: {earliest_nwat_date['Production (Date)_dt']}")
        hezron_nwat_date = earliest_nwat_date['Production (Date)_dt']
    else:
        print(f"Email '{hezron_mql_email}' NOT found in raw NWAT data (df3/df4).")
        hezron_nwat_date = None
else:
    print("Cannot search raw NWAT data by email as Hezron's email not found in MQLs.")
    # Fallback to name search if MQL email not found (less reliable)
    print(f"Searching raw NWAT data by name: {search_first_name} {search_second_name}")
    hezron_in_nwat_by_name_fallback = raw_nwat_combined[
        (raw_nwat_combined['Name'].astype(str).str.lower().str.contains(search_first_name)) &
        (raw_nwat_combined['Name'].astype(str).str.lower().str.contains(search_second_name))
    ].copy()
    if not hezron_in_nwat_by_name_fallback.empty:
        print("Found in raw NWAT data (by name fallback):")
        print(hezron_in_nwat_by_name_fallback[['Name', 'KYC E-Mail', 'Production (Date)']].to_string(index=False))
        # Take the email from here for further checks
        hezron_mql_email = hezron_in_nwat_by_name_fallback['KYC E-Mail_cleaned'].iloc[0]
        hezron_nwat_date = hezron_in_nwat_by_name_fallback['Production (Date)_dt'].min()
    else:
        print("Hezron Masinde NOT found in raw NWAT data (df3/df4) by name either.")
        hezron_nwat_date = None


# --- Diagnostic Step 3: Compare Dates (if both found) ---
print("\n--- Comparing MQL and NWAT Dates ---")
if hezron_mql_date is not None and hezron_nwat_date is not None:
    print(f"Hezron's MQL Date: {hezron_mql_date}")
    print(f"Hezron's Earliest NWAT Date: {hezron_nwat_date}")
    if hezron_nwat_date >= hezron_mql_date:
        print("Result: NWAT Date is ON or AFTER MQL Date. This record should be counted.")
    else:
        print("Result: NWAT Date is BEFORE MQL Date. This record would be EXCLUDED from MQL-driven NWAT conversion.")
else:
    print("Cannot compare dates: MQL or NWAT record for Hezron not fully found.")

print("\n--- End of Diagnostic Search ---")


--- Searching for Hezron Masinde in MQL (df) Data ---
Found in MQL data:
First Name Second Name                    Email   MQL_Date
    Hezron     Masinde masindehezron6@gmail.com 2025-02-20
    Hezron     Masinde masindehezron6@gmail.com 2025-03-06
    Hezron     Masinde masindehezron6@gmail.com 2025-03-20
Hezron's MQL Email: masindehezron6@gmail.com
Hezron's MQL Date: 2025-02-20 00:00:00

--- Searching for Hezron Masinde in Raw NWAT (df3, df4) Data ---
Searching raw NWAT data for email: masindehezron6@gmail.com
Email 'masindehezron6@gmail.com' NOT found in raw NWAT data (df3/df4).

--- Comparing MQL and NWAT Dates ---
Cannot compare dates: MQL or NWAT record for Hezron not fully found.

--- End of Diagnostic Search ---


In [19]:
import pandas as pd
import numpy as np

# --- Re-execute MQL Data Cleaning and Deduplication (df) ---
# (Make sure this section is run again to have a fresh 'df' before the manual correction)

# MQL Data (df)
df.rename(columns={
    '`': 'MQL_Internal_ID',
    'q1aqa1111111111`': 'MQL_Date'
}, inplace=True)

df['Email'] = df['Email'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()
df['MQL_Date'] = pd.to_datetime(df['MQL_Date'], dayfirst=True, errors='coerce')
df.dropna(subset=['MQL_Date', 'Email'], inplace=True)

df.sort_values(by='MQL_Date', inplace=True)
df.drop_duplicates(subset='Email', keep='first', inplace=True)
print(f"MQLs (unique by email) after initial cleaning: {len(df)}")

# --- CRITICAL FIX: Manually correct Hezron Masinde's email in MQL (df) data ---
# Replace the old email with the correct one found in NWAT data
old_email = 'masindehezron6@gmail.com'
correct_email_in_nwat = 'hezronmasinde@gmail.com'

if old_email in df['Email'].values:
    df.loc[df['Email'] == old_email, 'Email'] = correct_email_in_nwat
    print(f"Corrected email for '{old_email}' to '{correct_email_in_nwat}' in MQL data (df).")
else:
    print(f"Warning: Old MQL email '{old_email}' not found for correction. Perhaps already corrected or MQL not present.")

print(f"MQLs (unique by email) after potential email correction: {len(df)}\n")


# --- NWAT Data Cleaning & Standardization (df3, df4) (Same as before) ---
print("Step 1: Cleaning and preparing NWAT data...")

consolidated_nwat_df = pd.concat([df3, df4], ignore_index=True)
consolidated_nwat_df['KYC E-Mail'] = consolidated_nwat_df['KYC E-Mail'].astype(str).str.strip().str.replace('\n', '', regex=False).str.lower()
consolidated_nwat_df['Production (Date)'] = pd.to_datetime(consolidated_nwat_df['Production (Date)'], format='%d-%b-%Y', errors='coerce')
consolidated_nwat_df.dropna(subset=['Production (Date)', 'KYC E-Mail'], inplace=True)
consolidated_nwat_df.sort_values(by='Production (Date)', inplace=True)
consolidated_nwat_df.drop_duplicates(subset='KYC E-Mail', keep='first', inplace=True)
print(f"Cleaned NWAT Data Head (First Transaction per Email):\n{consolidated_nwat_df[['KYC E-Mail', 'Production (Date)']].head()}\n")


# --- Data Merging/Linking (MQLs to NWATs) (Same as before) ---
print("Step 2: Merging MQLs with NWATs...")
mql_nwat_analysis_df = df.merge(
    consolidated_nwat_df[['KYC E-Mail', 'Production (Date)']],
    left_on='Email',
    right_on='KYC E-Mail',
    how='left'
)
mql_nwat_analysis_df.rename(columns={'Production (Date)': 'NWAT_Date'}, inplace=True)
mql_nwat_analysis_df['Is_NWAT'] = mql_nwat_analysis_df['NWAT_Date'].notna()
print(f"Merged MQL-NWAT data head:\n{mql_nwat_analysis_df[['Email', 'MQL_Date', 'NWAT_Date', 'Is_NWAT']].head()}\n")


# --- MQL to NWAT Conversion Rate Calculation (Same as before) ---
print("Step 3: Calculating MQL to NWAT Conversion Rate...")
total_mqls_base = len(df)
converted_mqls_to_nwat = mql_nwat_analysis_df[
    mql_nwat_analysis_df['Is_NWAT'] &
    (mql_nwat_analysis_df['NWAT_Date'] >= mql_nwat_analysis_df['MQL_Date'])
]
num_converted_to_nwat = len(converted_mqls_to_nwat)

conversion_rate_nwat = (num_converted_to_nwat / total_mqls_base) * 100 if total_mqls_base > 0 else 0

print(f"--- MQL to NWAT Conversion Analysis (Overall) ---")
print(f"Total Unique MQLs: {total_mqls_base}")
print(f"Unique MQLs Converted to NWAT (first transaction AFTER MQL date): {num_converted_to_nwat}")
print(f"**Conversion Rate (MQL to NWAT): {conversion_rate_nwat:.2f}%**\n")


# --- Aging Analysis (Time to NWAT Conversion) (Same as before) ---
print("Step 4: Performing Time Analysis (Aging) for NWATs...")

if not converted_mqls_to_nwat.empty:
    converted_mqls_to_nwat['Aging_Days_NWAT'] = (converted_mqls_to_nwat['NWAT_Date'] - converted_mqls_to_nwat['MQL_Date']).dt.days

    average_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].mean()
    median_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].median()
    min_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].min()
    max_aging_days_nwat = converted_mqls_to_nwat['Aging_Days_NWAT'].max()

    print(f"--- Time to First Transaction (NWAT Aging) for Converted MQLs ---")
    print(f"Average: {average_aging_days_nwat:.2f} days")
    print(f"Median: {median_aging_days_nwat:.2f} days")
    print(f"Minimum: {min_aging_days_nwat} days")
    print(f"Maximum: {max_aging_days_nwat} days")

    # Display individual NWAT conversion details including names and aging
    print("\n--- Details of Each Converted MQL (MQL to NWAT) ---")
    display_columns = ['First Name', 'Second Name', 'Email', 'MQL_Date', 'NWAT_Date', 'Aging_Days_NWAT']
    available_cols = [col for col in display_columns if col in converted_mqls_to_nwat.columns]
    print(converted_mqls_to_nwat[available_cols].to_string(index=False))

else:
    print("No MQLs converted to NWAT (first transaction after MQL date) to calculate aging or display details.")

print("\nMQL to NWAT analysis complete.")

MQLs (unique by email) after initial cleaning: 110
Corrected email for 'masindehezron6@gmail.com' to 'hezronmasinde@gmail.com' in MQL data (df).
MQLs (unique by email) after potential email correction: 110

Step 1: Cleaning and preparing NWAT data...
Cleaned NWAT Data Head (First Transaction per Email):
                            KYC E-Mail Production (Date)
8          riskwiseinsurance@gmail.com        2023-05-27
9              mwakatheresia@gmail.com        2023-07-20
10          admin@innovexinsurance.com        2024-02-20
11  kangwae.insurance.agency@gmail.com        2024-03-22
0                 bima.kit@outlook.com        2024-04-19

Step 2: Merging MQLs with NWATs...
Merged MQL-NWAT data head:
                        Email   MQL_Date NWAT_Date  Is_NWAT
0       jmwambugu34@gmail.com 2025-01-22       NaT    False
1  estyerwanjiru100@gmail.com 2025-01-22       NaT    False
2   fredrickmicheni@gmail.com 2025-01-22       NaT    False
3         mchemjor4@gmail.com 2025-01-22       NaT

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  converted_mqls_to_nwat['Aging_Days_NWAT'] = (converted_mqls_to_nwat['NWAT_Date'] - converted_mqls_to_nwat['MQL_Date']).dt.days


In [None]:
df5 = pd.read_csv('')