# Growth Data Analyst Exercise: Full report
#### By Umar Butt, Friday 9-NOV-2024

# Summary

This analysis focused on evaluating verification success ratios across three test groups using SMS and WhatsApp as verification methods. Group A (SMS-only) demonstrated the lowest success ratio (0.8732), while Groups B and C, which offered both methods, achieved higher ratios; suggesting that providing users with multiple verification options enhances success. Group C (WhatsApp > SMS) had the highest overall success ratio at 0.9282, with SMS achieving the highest individual success ratio within this group (0.9535).

The findings show that SMS generally outperforms WhatsApp in verification success, particularly when offered as a secondary option. User preference tends to align with the first method offered, as shown by the higher WhatsApp uptake in Group C and higher SMS uptake in Group B. A key insight is that a two-method setup increases the likelihood of successful verification compared to a single-method approach.

Further analysis could integrate cost data by region, user demographics, and connectivity to optimise the method selection process for different user segments. This would allow for more targeted and cost-effective verification strategies. In conclusion, a dual-method approach prioritising SMS where feasible, appears to be a robust strategy for improving verification success, particularly given SMS’s higher performance ratios as the first method offered.





## 1. Which screen would you suggest we proceed with?

To determine the optimal screen, we analysed `verification` success ratio across `methods` and `groups`, aiming to select the screen configuration with the highest success ratio for future use.

- 1.a. Verification success ratio by group and method:
    - This query identifies the combination of group and method with the highest verification success ratio.



In [38]:
import pandas as pd
from database_utils import DatabaseConnector as dc
from decouple import config

# Initialize the DatabaseConnector instance
db_connector = dc()

# Load the database credentials (assuming 'config' pulls from your .env or credentials file)
db_credentials = db_connector.read_db_creds(config('credentials_env'))  # Update if needed

# Initialize the database engine
engine = db_connector.init_db_engine(db_credentials)

# Check if the engine is initialized
if engine is None:
    print("Failed to initialize the database engine.")
else:
    # Now you can execute your SQL query and load it into a Pandas DataFrame
    query_1a = '''
    SELECT 
        v.group, 
        v.method, 
        ROUND(AVG(v.verified), 4) AS verification_success_ratio
    FROM 
        verification AS v
    GROUP BY 
        v.group, v.method
    ORDER BY 
        verification_success_ratio DESC;
    '''
    
    # Use pd.read_sql_query to directly execute the query and return the result as a DataFrame
    result_df = pd.read_sql_query(query_1a, engine)

    # Display the DataFrame (first few rows)
    display(result_df.head())


Unnamed: 0,group,method,verification_success_ratio
0,C,Sms,0.9535
1,B,Sms,0.9298
2,B,Whatsapp,0.9145
3,C,Whatsapp,0.914
4,A,Sms,0.8732


1.b. Verification success rates by group:


In [37]:
# Define the new SQL query
query_1b = '''
SELECT 
    v.group, 
    ROUND(AVG(v.verified), 4) AS verification_success_ratio,
    ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 2) AS group_representation_percentage
FROM 
    verification AS v
GROUP BY 
    v.group
ORDER BY 
    verification_success_ratio DESC;
'''

# Execute the new query and load the result into a DataFrame
result_df2 = pd.read_sql_query(query_1b, engine)

# Display the new DataFrame (first few rows)
display(result_df2.head())


Unnamed: 0,group,verification_success_ratio,group_representation_percentage
0,C,0.9282,33.21
1,B,0.9281,32.15
2,A,0.8732,34.64



Insights:

- `Highest Success ratio`: Group C (WhatsApp > SMS) demonstrates the highest success ratio (0.9282), followed closely by Group B (SMS > WhatsApp) at 0.9281.
- `Lowest Success ratio`: Group A (SMS-only) has the lowest success ratio of 0.8732, indicating that including an alternative verification method may enhance effectiveness.
- `Representation`: Distribution among test groups is fairly balanced: Group A makes up 34.64% of total participants, Group B 32.15%, and Group C 33.21%.



# 2. Which success metrics did you consider and why?

To evaluate each method’s performance, I analysed success ratios by `method` within each `group`, especially for Groups B and C, where users had access to both SMS and WhatsApp. This query retrieves essential metrics, including `success ratio` and `usage distribution`.



In [40]:
query_2 = '''
SELECT 
    v.group, 
    v.method, 
    ROUND(AVG(v.verified), 4) AS verification_success_ratio,
    ROUND(COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS overall_method_representation_ratio ,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY v.group), 2) AS method_distribution_within_group_percent 
FROM 
    verification AS v
GROUP BY 
    v.group, v.method
ORDER BY 
    verification_success_ratio DESC;
'''

# Execute the query and store results in a DataFrame
result_df3 = pd.read_sql_query(query_2, engine)

# Display the DataFrame with the new query result
display(result_df3.head())

Unnamed: 0,group,method,verification_success_ratio,overall_method_representation_ratio,method_distribution_within_group_percent
0,C,Sms,0.9535,0.12,36.17
1,B,Sms,0.9298,0.29,88.74
2,B,Whatsapp,0.9145,0.04,11.26
3,C,Whatsapp,0.914,0.21,63.83
4,A,Sms,0.8732,0.35,100.0



Insights:

- Group A (SMS-only): Success ratio of 0.8732, with SMS used exclusively.
- Group B (SMS > WhatsApp): SMS has a 0.9298 success ratio, representing 88.74% of group interactions.
- WhatsApp has a slightly lower success ratio of 0.9145, representing 11.26% of the group.
- Group C (WhatsApp > SMS): WhatsApp is used more frequently (63.83%) but has a lower success rate (0.9140).
- SMS, though less used (36.17%), has the highest success rate across all methods at 0.9535.

Key Observations:

- Dual-Method Effectiveness: 
    - Groups B and C, both of which offer SMS and WhatsApp, show higher success ratios than Group A, suggesting that providing a choice of methods may increase verification success.
- Higher Success with SMS: 
    - Within Groups B and C, SMS consistently outperforms WhatsApp, especially in Group C, where SMS achieves a 0.9535 success ratio, indicating it may be a more reliable option in some cases.
- First Method Preference: 
    - In groups with two methods, the first method is often preferred: Group C (WhatsApp > SMS) favors WhatsApp at 63.83%, while Group B (SMS > WhatsApp) heavily favors SMS at 88.74%. Group C’s lower performance when WhatsApp is prioritised suggests that SMS could be a better first choice.



# 3. Would you incorporate additional data if you could?

Additional data on `method costs` and `regional distribution` could improve cost-effectiveness and performance insights, allowing for refined recommendations on method allocation by region.




In [41]:
query_3 = '''
SELECT 
    p.country, 
    v.method,
    COUNT(v."userID") AS user_count,
    ROUND(
        CASE 
            WHEN v.method = 'Whatsapp' THEN COUNT(v."userID") * c.whatsapp_usd::numeric
            WHEN v.method = 'Sms' THEN COUNT(v."userID") * c.sms_usd::numeric
        END, 
        2
    ) AS total_cost,
    ROUND(AVG(v.verified), 4) AS verification_success_ratio
FROM 
    verification AS v
JOIN 
    profiles AS p 
ON 
    v."userID" = p."userID"
JOIN 
    costs AS c 
ON 
    p.country = c.country
GROUP BY 
    p.country, v.method, c.whatsapp_usd, c.sms_usd
ORDER BY 
    total_cost DESC;
'''

# Execute the query and store results in a DataFrame
result_df3 = pd.read_sql_query(query_3, engine)

# Display the DataFrame with the new query result
display(result_df3)


Unnamed: 0,country,method,user_count,total_cost,verification_success_ratio
0,PK,Sms,1096,197.28,0.8823
1,MA,Sms,1511,181.32,0.8776
2,ID,Sms,711,149.48,0.8650
3,EG,Sms,804,102.98,0.8682
4,DZ,Sms,539,70.07,0.8998
...,...,...,...,...,...
265,RE,Whatsapp,1,0.04,0.0000
266,TD,Whatsapp,1,0.04,1.0000
267,BO,Whatsapp,1,0.04,1.0000
268,TG,Whatsapp,1,0.04,1.0000


Insights:

- Cost vs Success Rate: 
    - The query reveals total costs per `method` by `country`, along with success ratios, highlighting whether higher costs correlate with higher success rates. If SMS consistently delivers higher success but incurs more cost than WhatsApp, prioritising SMS in regions where it has the highest performance may be justified.
- Cost-Effectiveness by Region: 
    - Additional demographic or regional data could further fine-tune optimal verification methods by balancing cost with success.

Summary:

- Performance by Group: 
    - `Group C (WhatsApp > SMS)` has the `highest overall success rate`. SMS in Group C achieves the highest individual success rate across all methods. `Groups B and C`, which offer two verification methods, `outperform Group A (SMS-only)`.
- Method Performance: 
    - `SMS` generally shows `higher success ratio` than WhatsApp, especially in groups where SMS is offered as a secondary option.
- Value of Additional Data: 
    - Incorporating cost and regional demographic data would help tailor verification methods for each region, balancing success rates with cost for more efficient outcomes.
    
This approach enables a comprehensive decision based on performance, user preference, and cost-effectiveness.





# 4. Why do you think your chosen screen performed best?

Several key factors likely contributed to the strong performance of `Group C’s screen`:

- Higher Verification Success Ratio:
    - Group C achieved the highest overall verification success ratio (0.9282), with SMS in this group reaching a notably high ratio of 0.9535. This indicates that having access to multiple verification methods, especially when SMS is available, may significantly boost the overall success rate.

- Choice of Verification Methods:
    - Offering `both SMS and WhatsApp` seems to enhance user experience and verification success, as Groups B and C both showed higher ratios compared to Group A (which only had SMS). This suggests that providing a `choice can help accommodate different user preferences` and situations, `improving overall outcomes.`

- Effectiveness of SMS as a Secondary Option:
    - In Group C, where WhatsApp was the primary option, users who chose SMS still achieved the highest success ratio among all tested methods (0.9535). This highlights that, while WhatsApp might be appealing to some users, SMS remains a highly reliable verification method and can be crucial when available as a backup.

- Preferred Usage of Primary Method:
    - The analysis also revealed that `users tend to favor the first method` presented. In Group C (WhatsApp > SMS), WhatsApp was preferred by 63.83% of users, while in `Group B (SMS > WhatsApp), SMS was preferred by 88.74%`. This demonstrates that the `order in which methods are offered may influence the users choice`, with the first option often being the more popular. However, even when not the first choice, SMS still showed higher success rates, suggesting it is the more dependable method.




# 5. What additional analysis could you perform if more data were available?

If additional data were available, several further analyses could be conducted to gain deeper insights into the verification process:

- Cost-Benefit Analysis by Region:

    - Incorporating data on the cost of each method (SMS and WhatsApp) by region would enable a cost-benefit analysis, optimising the selection of methods based on cost efficiency. For example, if SMS incurs a higher cost but results in significantly better success rates in certain regions, we could prioritise SMS where the cost is justified by higher success.

- Demographic Insights (e.g., Age, Device Type):

    - With demographic data such as user age, device type (e.g., iOS vs. Android, model and age), we could assess whether specific demographics or devices show a preference for or higher success rates with either SMS or WhatsApp. This could inform targeted verification strategies for different user groups, enhancing both cost efficiency and user experience. Additionally, data on mobile phone models, network types, device age, and geographical areas could help identify regions with poor coverage or service. If such issues are found, they could affect businesses by highlighting areas that need service reliability improvements.

- Time-of-Day or Day-of-Week Analysis:

    - If timestamp data were available for each verification attempt, success rates could be analysed by time of day or day of the week to reveal trends that could optimise verification timing. For instance, certain methods may be more effective during working hours or on weekends, enabling us to tailor method availability based on usage patterns.

- Impact of Network and Internet Connectivity:
    - Information on users' connectivity quality (e.g., mobile network strength or Wi-Fi availability) could provide insights into why certain users may prefer SMS over WhatsApp or vice versa. This would be particularly valuable in regions with inconsistent internet reliability, where SMS might serve as a better fallback.

- User Feedback on Method Preference:

    - Gathering user feedback on their experience with each verification method would provide qualitative insights, helping to understand user satisfaction. This is especially useful in cases where verification fails, allowing for adjustments that align with user expectations and improve overall success rates.

- Analysis of Method Switching Behavior:
    - Given the study's limitations, a key area for further analysis would be examining user behavior when switching between verification methods. If users have the option to switch methods after an initial failure, it could improve overall success rates. This is not currently captured in the available data, so understanding switching behavior would help form a more complete picture of user preferences and success rates, especially in cases where the first method fails.
    
By combining these additional analyses with the quantitative success metrics, the verification strategy could be refined to achieve greater efficiency, satisfaction, and cost-effectiveness.

If more time were available, I would focus on investigating why Group A (SMS-only) had such a low verification success ratio of 0.87, despite the sample representation being relatively balanced. In contrast, the other groups showed higher success ratios. The large discrepancy between successes (16,750 with verified = 1) and failures (1,674 with verified = 0) could stem from multiple factors such as issues with the verification method, network connectivity, user errors, or lack of fallback options. A more detailed analysis of these factors (methods, timing, user behavior, etc.) would help uncover the root cause of the high failure rate.




In [45]:
'''
SQL Query for Further Analysis
'''

query_5 = '''
SELECT verified, 
       COUNT(*) AS verification_count
FROM verification
GROUP BY verified
ORDER BY verified DESC;
'''

# Execute the query and store the result in a DataFrame
result_df = pd.read_sql_query(query_5, engine)

# Display the result
display(result_df)


Unnamed: 0,verified,verification_count
0,1,16750
1,0,1674
