# Research with Gemini

We're going to be testing Google's Gemini API. 

Credentials are located in config.yaml in the ai-research root folder.

## Setup

In [7]:
# Imports
from chiefai.ai import make_gemini_request
from chiefai.db import query
import polars as pl

# Notebook formatting
from IPython.display import display, HTML, Markdown

In [2]:
data = query("SELECT * FROM web_visit_results LIMIT 5")
data.head(3)

id,core_client,project_id,user_id,session_id,unique_key,postal_code,region,dma,dma_code,city,country,browser,device,device_type,search_engine,medium,source,platform,platform_version,bounce,session_date_time,ip_address,pages,search_terms,language,latitude,longitude,organization,referrer,session_length,created_at,updated_at,core_product,last_access,content,session
i64,str,i64,str,str,str,str,str,str,i64,str,str,str,str,str,str,str,str,str,str,str,datetime[μs],str,list[struct[2]],str,str,f64,f64,str,str,str,datetime[μs],datetime[μs],str,datetime[μs],null,null
99735578,"""OPOS""",82884616,"""50182511673882""","""3288761085059530754""","""828846165018251167388232887610…",,,,0,,"""USA""","""Unknown""",,"""Desktop/laptop""",,"""paidsocial""","""snapchat""","""Unknown""","""Unknown""","""true""",2024-01-03 13:11:13,"""34.123.204.87""","[{""collections/vaginal-health"",0}]",,"""Unknown""",37.751,-97.822,"""Google""","""Direct""","""0""",2024-01-03 13:13:00.337214,2024-01-03 13:13:00.337784,"""""",2024-01-03 13:11:13,,
99735579,"""OPOS""",82884616,"""78135251450464""","""5120671839057608705""","""828846167813525145046451206718…","""32163""","""Florida""","""Orlando, FL""",534,"""The Villages""","""USA""","""Safari""","""Apple iPhone""","""Mobile""",,"""paidsocial""","""ig""","""iOS""","""iOS 17.1""","""true""",2024-01-03 13:11:13,"""68.205.39.5""","[{""collections/vaginal-health"",0}]",,"""English (United States)""",28.9265,-81.9928,"""Spectrum""","""instagram.com""","""0""",2024-01-03 13:13:00.337214,2024-01-03 13:13:00.337784,"""""",2024-01-03 13:11:13,,
99735580,"""OPOS""",82884616,"""103390995781948""","""6775832299565744285""","""828846161033909957819486775832…",,,,0,,"""USA""","""Chrome""","""Generic Android""","""Mobile""",,,"""Direct""","""Android""","""Android 9.0""","""true""",2024-01-03 13:11:13,"""147.160.184.123""","[{""tools/recurring/login"",0}]",,"""English (United States)""",37.751,-97.822,"""Unknown""","""Direct""","""0""",2024-01-03 13:13:00.337214,2024-01-03 13:13:00.337784,"""""",2024-01-03 13:11:13,,


## Assess AI Approach to Data Analysis

We're going to use the make_gemini_request function in chiefmedai.ai to send data. 

The approach we're going to try is to create summary statistics from the database and programmatically generate a text query to feed to the AI model. We'll use prompt injection of the data to do this. 

Firstly we'll try getting simple counts of orders by platform and ask the AI model to assess the results. 


In [26]:
orders_query = """
    SELECT
        source,
        revenue::NUMERIC
    FROM web_event_results
    WHERE 
        LOWER(event_name) = 'order'
        AND event_date_time >= NOW() - INTERVAL '3 days' 
        AND core_client = 'OPOS'
        AND core_product = 'MENO'
    LIMIT 1000
"""
orders = query(orders_query)
orders = orders.with_columns(
    pl.col("revenue").cast(pl.Float64).alias("revenue")
)
print(orders.head(3))

shape: (3, 2)
┌─────────────────────┬─────────┐
│ source              ┆ revenue │
│ ---                 ┆ ---     │
│ str                 ┆ f64     │
╞═════════════════════╪═════════╡
│ portal.afterpay.com ┆ 180.08  │
│ instagram.com       ┆ 38.73   │
│ google              ┆ 45.47   │
└─────────────────────┴─────────┘


In [30]:
orders_agg = orders.group_by(
    "source"
).agg([
    pl.count("revenue").alias("source_count"),
    pl.median("revenue").alias("median_revenue"),
    pl.mean("revenue").alias("mean_revenue")
]).filter(
    pl.col("source_count") >= 30
)
orders_agg.head()

source,source_count,median_revenue,mean_revenue
str,u32,f64,f64
"""instagram.com""",81,32.18,43.182222
"""Direct""",193,34.43,50.141088
"""google""",321,31.3,40.197726
"""fb""",148,36.98,44.272973
"""Klaviyo""",50,33.325,46.5504


In [36]:
prompt_data = ""

for row in orders_agg.iter_rows(named=True):
    source = row['source']
    count = row['source_count']
    median = round(row['median_revenue'], 2)
    mean = round(row['mean_revenue'], 2)
    
    # Generate a descriptive paragraph for this source
    source_description = f"""
        Source: {source}
        There were {count} orders from this source. 
        The median revenue per order was ${median:.2f}.
        The average (mean) revenue per order was ${mean:.2f}.
    """
    
    # Add a comparison of mean to median
    """
    if mean > median:
        difference = round(mean - median, 2)
        source_description += f"The mean is ${difference:.2f} higher than the median, suggesting some high-value outlier orders.\n"
    elif median > mean:
        difference = round(median - mean, 2)
        source_description += f"The median is ${difference:.2f} higher than the mean, suggesting some low-value outlier orders.\n"
    else:
        source_description += f"The mean and median are identical, suggesting a symmetrical distribution of order values.\n"
    """
        
    # Add a separator between sources
    source_description += "-" * 40 + "\n"
    
    # Add this source's description to the overall text
    prompt_data += source_description

# Remove the final separator and add a summary
prompt_data = prompt_data.rstrip("-" * 40 + "\n")
prompt_data += f"\n\nSummary: Analyzed revenue statistics for {len(orders_agg)} different traffic sources."

print("Generated the following descriptive text:")
print(prompt_data)

Generated the following descriptive text:

        Source: instagram.com
        There were 81 orders from this source. 
        The median revenue per order was $32.18.
        The average (mean) revenue per order was $43.18.
    ----------------------------------------

        Source: Direct
        There were 193 orders from this source. 
        The median revenue per order was $34.43.
        The average (mean) revenue per order was $50.14.
    ----------------------------------------

        Source: google
        There were 321 orders from this source. 
        The median revenue per order was $31.30.
        The average (mean) revenue per order was $40.20.
    ----------------------------------------

        Source: fb
        There were 148 orders from this source. 
        The median revenue per order was $36.98.
        The average (mean) revenue per order was $44.27.
    ----------------------------------------

        Source: Klaviyo
        There were 50 orders from t

In [39]:
# Test AI function
prompt = f"""ArithmeticError
I'm sharing revenue data from our e-commerce store, broken down by traffic source. The data includes order counts, median revenue, and mean revenue for each source.

[ANALYTICS DATA]
{prompt_data}

Based on this information, please provide:

1. A summary of our overall traffic and revenue patterns
2. An analysis of which traffic sources are most valuable in terms of:
   - Total revenue contribution
   - Revenue per order
   - Potential for growth based on order volume

3. Insights about any outliers or unusual patterns in the data

4. Strategic recommendations for:
   - Which traffic sources we should invest more in
   - Which sources might need optimization
   - Any suggested A/B tests or experiments

5. How our traffic source performance compares to typical e-commerce benchmarks

Please format your analysis with clear sections and include both tactical and strategic insights. Feel free to note any additional data that would help refine this analysis further.
"""

In [40]:
response = make_gemini_request(request_text=prompt)
display(Markdown(response))

Okay, here's an analysis of your e-commerce revenue data, broken down by traffic source, with recommendations for optimization and strategic investment.

**1. Overall Traffic and Revenue Patterns Summary**

*   **Dominant Sources:** Google and Direct traffic are the dominant sources, accounting for the highest order volume (321 and 193 orders respectively).
*   **Consistent Median Revenue:**  The median revenue per order is relatively consistent across all sources, hovering around the $31-$37 range. This suggests a stable core product offering that resonates across different customer segments.
*   **Mean vs. Median Discrepancy:** The mean revenue per order is consistently higher than the median across all sources. This indicates that a portion of orders within each source are significantly higher in value, pulling the average up.  This is a common pattern in e-commerce, showing that some customers spend considerably more than others.
*   **Revenue Skew:** Due to the consistent presence of higher-value customers, a small subset of users likely contributes disproportionately to total revenue across all sources. This highlights the importance of identifying and targeting high-value customers.

**2. Traffic Source Valuation**

*   **Total Revenue Contribution (Estimated - based on mean):**

    *   Google: 321 orders * $40.20/order = ~$12,904
    *   Direct: 193 orders * $50.14/order = ~$9,677
    *   fb: 148 orders * $44.27/order = ~$6,552
    *   instagram.com: 81 orders * $43.18/order = ~$3,498
    *   Klaviyo: 50 orders * $46.55/order = ~$2,328
    *   applovin: 33 orders * $41.30/order = ~$1,363

    *   **Conclusion:** Google is the top revenue driver by a significant margin, followed by Direct, and then Facebook.

*   **Revenue Per Order (Mean):**

    *   Direct: $50.14
    *   Klaviyo: $46.55
    *   fb: $44.27
    *   instagram.com: $43.18
    *   applovin: $41.30
    *   Google: $40.20

    *   **Conclusion:** Direct traffic and Klaviyo (likely email marketing) show the highest average revenue per order, indicating higher customer value from these channels.

*   **Potential for Growth (based on order volume):**

    *   Google: Highest volume, indicating the largest potential for growth with optimization.
    *   Direct:  High volume, but "Direct" traffic can be misleading.  It may include untracked campaigns, returning customers with saved bookmarks, or dark social shares. Requires further investigation.
    *   fb: Significant volume, but lower RPO than direct/klaviyo suggests there is potential to improve this.
    *   Instagram: Solid volume, could be improved.
    *   Klaviyo: Good RPO, but lower volume suggests the need for strategic investment to increase list size and engagement.
    *   applovin: Lowest volume, might not be worth significant investment unless ROI can be substantially increased.

**3. Outliers and Unusual Patterns**

*   **Direct Traffic Anomaly:** The high mean revenue per order from "Direct" traffic, combined with its substantial volume, is an outlier. It's critical to understand *why* Direct traffic performs so well. Is it loyal customers who spend more? Is it the result of offline marketing efforts?
*   **Google's Volume, Lower RPO:** While Google drives the most orders, its lower mean revenue per order compared to Direct and Klaviyo suggests potential for optimization. Perhaps the Google traffic converts at a high rate but mostly consists of customers who are purchasing entry-level products.
*   **Klaviyo's High RPO, Low Volume:** The high RPO for Klaviyo is encouraging. It signifies that email marketing efforts are targeting a highly valuable segment of your customer base. However, the low volume implies a need for list growth and optimized email strategies.
*   **Variance Across Sources:** The differences in average revenue per order (RPO) across channels indicate different customer acquisition costs (CAC), customer lifetime values (LTV), or product preferences.

**4. Strategic Recommendations**

*   **Investment Priorities:**

    *   **Google:**  Continue investing, but focus on optimizing for higher-value customers. Consider segmenting your Google Ads campaigns to target customers with higher purchase intent or lifetime value. Explore audience targeting (e.g., demographic, affinity, in-market) and adjust bidding strategies accordingly.
    *   **Klaviyo:** Invest in growing your email list. Implement strategies like:
        *   Lead magnets (e.g., discounts, exclusive content) on your website.
        *   Optimize your website for email capture.
        *   Run targeted email campaigns based on customer behavior (e.g., abandoned cart, post-purchase).
        *   Focus on engaging existing subscribers with personalized content and promotions to maximize RPO.
    *   **Direct:**  Before investing more, thoroughly investigate the sources of your "Direct" traffic. Implement UTM tracking codes on all marketing campaigns to accurately attribute conversions. Consider using a tool to identify dark social shares.
    *   **Facebook:** A solid contributor with good volume, but lower RPO than direct/klaviyo suggests there is potential to improve this.

*   **Optimization Opportunities:**

    *   **Google:**
        *   **Keyword Refinement:** Analyze search terms bringing in traffic and refine your keyword targeting to focus on high-intent keywords.
        *   **Ad Copy Optimization:**  Test different ad copy to highlight higher-value products or special offers.
        *   **Landing Page Optimization:**  Ensure landing pages are optimized for conversion, with clear calls to action and relevant product recommendations.
    *   **fb:**
        *   **Audience Refinement:**  Optimize your ad targeting to reach a more affluent audience or customers with a higher propensity to spend.
        *   **Creative Testing:**  Experiment with different ad formats (e.g., video, carousel) and creative elements (e.g., images, headlines) to improve click-through rates and conversion rates.
    *   **Applovin:** Closely monitor ROI. If it's not significantly improving with optimization, consider reallocating resources to higher-performing channels.

*   **A/B Tests and Experiments:**

    *   **Landing Page Optimization:**  Test different landing page layouts, calls to action, and product displays to improve conversion rates for all traffic sources, especially Google.
    *   **Pricing Strategies:**  Experiment with different pricing tiers or bundle offers to encourage higher average order values.
    *   **Checkout Process:** Simplify the checkout process to reduce cart abandonment rates, especially for mobile users.
    *   **Email Marketing:** Test different email subject lines, content, and send times to improve open rates, click-through rates, and conversions for Klaviyo.
    *   **Personalization:** A/B test personalized product recommendations based on browsing history or purchase behavior to increase RPO.

**5. Comparison to E-commerce Benchmarks**

*   **Traffic Source Mix:** The relative importance of Google, Direct, and Facebook as traffic sources aligns with typical e-commerce benchmarks. However, the specific percentages may vary depending on your industry, target audience, and marketing strategy.
*   **Average Order Value (AOV):**  Calculate your overall AOV by averaging the mean revenue per order across all sources, weighted by order volume. Compare this to industry benchmarks for your product category. Websites like Statista, IRP Commerce, and Littledata provide e-commerce benchmarks for AOV, conversion rates, and traffic source distribution.
*   **Conversion Rates:**  Ideally, you'd want to track conversion rates (percentage of visitors who place an order) for each traffic source. Compare these conversion rates to industry benchmarks to identify areas for improvement.
*   **Customer Acquisition Cost (CAC):**  Track the cost of acquiring a customer through each traffic source. This will help you determine the ROI of your marketing investments.

**Additional Data Needed:**

*   **Customer Lifetime Value (CLTV):**  Understanding the CLTV for each traffic source is crucial for making informed investment decisions.  Klaviyo users often have higher lifetime value.
*   **Cost Per Acquisition (CPA) or Cost Per Order (CPO):**  Knowing how much it costs to acquire a customer or generate an order through each source allows for better ROI calculation.
*   **Demographic and Psychographic Data:**  Understanding the demographics and psychographics of your customers from each traffic source can help you tailor your marketing messages and product offerings.
*   **Detailed Direct Traffic Breakdown:** Identify where "direct" users are actually coming from (e.g., bookmarks, email campaigns, social shares). Use UTM parameters to properly track these sources.
*   **Product Category Performance by Source:** Understanding which product categories are most popular within each traffic source can help you optimize product merchandising and targeting.

By combining the data you've provided with this additional information and by implementing the recommended strategies, you can optimize your traffic sources, increase revenue, and improve your overall e-commerce performance.
