# Introduction to Ideal Customer Behavior Analysis

## Objective

The primary objective of this analysis is to unravel the defining characteristics and behaviors of the 'ideal customer' within our digital wallet application. By identifying patterns in usage, acquisition, and retention, we aim to:

- Understand the demographic and transactional profile of our most valuable customers.
- Discover how and when these customers engage with our platform.
- Determine which marketing channels are most effective in acquiring and retaining these customers.

## Purpose

The insights garnered from this analysis will be instrumental in:

- **Strategizing Marketing Efforts**: Aligning marketing initiatives with proven acquisition channels and times.
- **Enhancing Product Development**: Tailoring app features and services to meet the needs of our ideal customers.
- **Optimizing User Experience**: Streamlining the customer journey to boost satisfaction and engagement.
- **Expanding Customer Base**: Applying successful characteristics of ideal customers to broader segments.

## Approach

We will dissect our user data across multiple dimensions, including:

- Transaction frequency and volume
- Acquisition channels and their effectiveness
- Seasonal trends in user acquisition and activity
- Geographical distribution of customer acquisition

Through this multifaceted analysis, we endeavor to craft targeted strategies that will not only attract more ideal customers but also elevate the overall user base, driving growth and ensuring the longevity of our digital wallet service.



In [2]:
# Importing necessary libraries for data analysis and visualization
import pandas as pd
import plotly.express as px
import geopandas as gpd
import folium
import branca
from branca.element import Template, MacroElement

In [3]:
# Loading transaction and geospatial data
data = pd.read_csv('txns_data.csv')
mexico =gpd.read_file('C:/Users/52222/OneDrive/Documentos/Data Projects/Personal_projects/retail/2022_00_ENT/2022_00_ENT.shp')

In [4]:
# Correcting encoding issues in the 'NOMGEO' column of the Mexico geospatial data
mexico['NOMGEO'] = mexico['NOMGEO'].replace({
    'Ciudad de MÃ©xico':'Ciudad de México',
    'MÃ©xico':'México',
    'MichoacÃ¡n de Ocampo':'Michoacán de Ocampo',
    'Nuevo LeÃ³n':'Nuevo León',
    'QuerÃ©taro':'Querétaro',
    'San Luis PotosÃ\xad':'San Luis Potosí',
    'YucatÃ¡n':'Yucatán',
})

# Converting date columns to pandas datetime objects
data['first_txn_date'] = pd.to_datetime(data['first_txn_date'])
data['txn_date'] = pd.to_datetime(data['txn_date'])

# Creating new columns representing the month of the transaction
data['first_txn_month'] = data['first_txn_date'].dt.to_period('M')
data['txn_month'] = data['txn_date'].dt.to_period('M')

# Adding a new column to categorize users as 'New customer' or 'Repeated customer'
data['user_status'] = data.apply(lambda row: 'New customer' if row['first_txn_month'] == row['txn_month'] else 'Repeated customer', axis=1)

# Quick summary

In [5]:
# Key metrics
unique_customers = data['customerID'].nunique()
total_tpv = data['tpv'].sum()
total_txns = data['txn_ID'].nunique()

# Printing the calculated metrics
print(f'Unique customers: {unique_customers:,}')
print(f'Total TPV: ${round(total_tpv):,}')
print(f'Total transactions: {total_txns:,}')

Unique customers: 4,455
Total TPV: $31,966,559
Total transactions: 56,419


# Analyzing customers

In [6]:
monthly_customers = data.groupby(['txn_month', 'user_status'])['customerID'].nunique().reset_index()
monthly_customers['txn_month'] = monthly_customers['txn_month'].astype(str)

colors = {'New customer': '#2C3538', 'Repeated customer': '#5C7278'}

fig = px.bar(monthly_customers, 
             x='txn_month', 
             y='customerID', 
             color='user_status', 
             barmode='group',
             color_discrete_map= colors,
             labels={'txn_month': 'Month', 'customerID': 'Count of unique customers', 'user_status': 'User status'},
             title='Customer acquisition and retention throughout 2023')

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight 1: Strong Growth in Customer Retention Over New Acquisitions

Over the year 2023, the data highlights a consistent growth in the number of repeated customers each month, while the number of new customers remains relatively stable or even shows a slight decline. For instance, repeated customers increased from 871 in February to 2,715 in December, suggesting effective customer retention strategies.

### Insight 2: Seasonal Trends in New Customer Acquisition

There is a noticeable trend where new customer acquisition peaks during certain months, such as January, March, and July, hinting at potential seasonal marketing impacts or external factors influencing user sign-ups. For example, the spike to 526 new customers in July might indicate a successful campaign or seasonal promotion.

# Analyzing TPV

In [7]:
monthly_tpv = data.groupby(['txn_month','txn_type'])['tpv'].sum().reset_index()
monthly_tpv['txn_month'] = monthly_tpv['txn_month'].astype(str)

colors = {'Pay_at_Store': '#004F80', 'Ecommerce': '#D49947', 'Bill_Payments': '#80CEFF'}

fig = px.bar(monthly_tpv, 
             x='txn_month', 
             y='tpv', 
             color='txn_type',
             color_discrete_map= colors,
             labels={'txn_month': 'Month', 'tpv': 'TPV', 'txn_type': 'Transaction type'},
             title='TPV throughout 2023')

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight 1: Dominance of In-Store Payments

Throughout 2023, payments made in-store (`Pay_at_Store`) consistently account for a significant portion of the Total Payment Volume (TPV). This trend underscores the importance of physical retail locations in the digital wallet ecosystem, despite the presence of e-commerce and bill payment options.

### Insight 2: The Ebb and Flow of Bill Payments

The TPV associated with bill payments (`Bill_Payments`) exhibits a fluctuating pattern, with notable dips in May and October. This could indicate seasonal variability in bill-related expenses or changes in consumer payment habits during specific periods of the year.

# Analyzing transactions

In [8]:
monthly_txns = data.groupby(['txn_month','txn_type'])['txn_ID'].nunique().reset_index()
monthly_txns['txn_month'] = monthly_txns['txn_month'].astype(str)

colors = {'Pay_at_Store': '#004F80', 'Ecommerce': '#D49947', 'Bill_Payments': '#80CEFF'}

fig = px.bar(monthly_txns, 
             x='txn_month', 
             y='txn_ID', 
             color='txn_type',
             color_discrete_map= colors,
             labels={'txn_month': 'Month', 'txn_ID': 'Total Transactions', 'txn_type': 'Transaction type'},
             title='Transactions throughout 2023')

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight 1: Consistent Performance Across Transaction Types

In 2023, each transaction type (Bill_Payments, Ecommerce, Pay_at_Store) shows a consistent level of transactions month over month, with no single category dominating consistently. This indicates a balanced use case for the digital wallet, which serves diverse consumer needs equally well.

### Insight 2: Slight Variability in Ecommerce Transactions

While the overall trend shows consistency, Ecommerce transactions display slight variability with a notable dip in June and a peak in October. This could reflect changes in consumer online shopping behavior or promotional periods that drive more online sales.


# What is the behaviour of my ideal customer?

In [9]:
customer_txns = data.groupby('customerID')['txn_ID'].count().reset_index()

fig = px.box(customer_txns, x='txn_ID',
              orientation = 'h',
              title='Annual User Behavior Distribution')

fig.update_layout(
    yaxis_title='',
    xaxis_title='Number of Transactions',
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight 1: Profiling the Ideal Digital Wallet User

Our analysis reveals that the typical digital wallet user engages with the platform with a transaction frequency ranging from the median of 10 up to the third quartile at 19 transactions annually. This group represents the sweet spot of our user base – active without being outliers. By focusing on elevating users to this 10-19 transaction bracket, we can cultivate a more robust and committed customer segment. Strategies designed to move lower-frequency users into this band could yield significant engagement gains, solidifying our foundation of consistently active users.

### Insight 2: Aspiring Towards Highly Engaged Customers

Upon careful examination of the upper transaction range, we find that the upper fence of the box plot stands at 40 transactions. Users exceeding this frequency demonstrate exceptional engagement with our digital wallet. However, before we label them as our 'ideal customers', a meticulous verification is necessary to confirm that this activity is legitimate and not a result of abusive or fraudulent behavior. Once validated, these highly active customers represent a pinnacle of engagement that we can aspire to. By studying their habits, we can identify features and services that resonate with our most active users. With this understanding, we can then develop targeted marketing strategies and loyalty programs aimed at nurturing our broader user base to achieve similar levels of activity, while ensuring the integrity and security of transactions within our platform.

# How many ideal customers do we have?

In [10]:
ideal_customers = customer_txns[(customer_txns['txn_ID'] >= 10) & (customer_txns['txn_ID'] <= 19)]
data_ideal_customers = data[data['customerID'].isin(ideal_customers['customerID'])]
ideal_customers_unique = data_ideal_customers['customerID'].nunique()
print(f'We got {ideal_customers_unique:,} ideal customers ({round(ideal_customers_unique/unique_customers*100,2)}% of our database)')

We got 1,230 ideal customers (27.61% of our database)


### Insight: Proportion of Ideal Customers Within the User Base

We've identified that 1,230 of our customers fall into the ideal engagement bracket, with transaction frequencies ranging from 10 to 19 annually. This segment makes up a significant 27.61% of our total customer base. Recognizing the substantial size of this group provides us with a clear target demographic for engagement strategies. By further understanding and catering to the needs and preferences of these users, we can foster a more active and dedicated customer base, thereby increasing overall platform engagement and loyalty.

# How did we acquire them?

In [11]:
data_ideal_customers_unique = data_ideal_customers.drop_duplicates(subset='customerID')
acquisition_ideal_customers = data_ideal_customers_unique.groupby('acquisition_channel')['customerID'].count().sort_values(ascending=False).reset_index()

fig = px.bar(acquisition_ideal_customers, 
             x='customerID', 
             y='acquisition_channel',
             orientation='h',
             labels={'acquisition_channel': 'Acquisition Channel', 'customerID': 'Number of Customers'},
             title='Customer acquisition')

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight: Diverse Channels for Ideal Customer Acquisition

The acquisition channels for our ideal customers are well-diversified. Organic methods lead slightly with 419 customers, closely followed by social media with 412, and store-based acquisition with 399. This balanced distribution suggests that our ideal customers are not reliant on a single acquisition strategy, indicating the strength and effectiveness of our multi-channel marketing approach. Efforts to further optimize and invest in these channels could lead to an increase in acquiring customers who are more likely to become highly engaged with our platform.


# When did we acquire them?

In [12]:
ideal_customer_acquisition = data_ideal_customers_unique.groupby('first_txn_month')['customerID'].count().reset_index()
ideal_customer_acquisition['Percentage'] = (ideal_customer_acquisition['customerID']/ideal_customer_acquisition['customerID'].sum())*100

ideal_customer_acquisition['first_txn_month'] =ideal_customer_acquisition['first_txn_month'].astype(str)

fig = px.bar(ideal_customer_acquisition, 
             x='first_txn_month', 
             y='Percentage',
             labels={'first_txn_month': 'Month'},
             title='Monthly Distribution of Ideal User Acquisition')

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title_x = 0.5
)

fig.show()

### Insight: Peak Acquisition Periods for Ideal Customers

Our data reveals that the acquisition of ideal customers peaked in the spring, with the highest percentages recorded in March, April, and May of 2023. These months alone account for a substantial 68.21% of our ideal customer acquisitions, pointing to a highly effective quarter for our marketing and engagement strategies. This trend could be attributed to seasonal marketing campaigns or user behavior that aligns with financial cycles such as tax returns or holiday spending. Identifying the drivers behind this surge can help us replicate this success in future periods.


# From where did we acquire them?

---
**Note on Viewing the Geospatial Map:**

To view the detailed geospatial map that visualizes the distribution of ideal customers across different states, please download the accompanying `.txt` file named `map_code.txt` along with this Jupyter Notebook.

Once you have both files, simply:

1. Open this Jupyter Notebook.
2. Navigate to the section titled 'Geospatial Distribution of Ideal Customers'.
3. Copy the Python code from `map_code.txt`.
4. Paste the code into a new cell in the notebook at the specified section.
5. Run the cell to generate and view the interactive map.

This approach ensures the notebook remains lightweight for GitHub hosting, while still providing you with the full capabilities to explore the geospatial data visualization.

---

In [14]:
# Geospatial Distribution of Ideal Customers
customers_x_states = data_ideal_customers.groupby('location')['customerID'].nunique().reset_index()
customers_x_states_ordered = customers_x_states.sort_values(by = 'customerID', ascending= False)
customers_x_states_ordered

Unnamed: 0,location,customerID
3,Campeche,53
18,Nuevo León,48
22,Quintana Roo,47
11,Guerrero,47
10,Guanajuato,46
12,Hidalgo,46
9,Durango,45
2,Baja California Sur,44
5,Chihuahua,42
1,Baja California,41


### Insight: Geographic Trends in Ideal Customer Acquisition

Our spatial analysis reveals that ideal customer acquisition is not uniform across the country. Notably, states like Campeche, Nuevo León, and Guerrero lead in terms of the number of ideal customers, each with over 45 acquisitions. The visualization underscores the importance of regional strategies and suggests that localized marketing efforts could be particularly effective. By understanding regional preferences and behaviors, we can tailor our engagement strategies to resonate with potential users in high-performing areas and replicate this success in regions with lower acquisition numbers.

# First Conclusions

### Conclusion 1: Ideal Customer Engagement Range
The ideal customers actively engage with the platform, making between 10 to 19 transactions annually. This frequency signifies a highly valuable customer segment that is engaged enough to be profitable but not so much as to suggest anomalous behavior. These customers form a substantial 27.61% of the total user base, indicating a strong core of your application's users that could be the focus of targeted engagement and retention strategies.

### Conclusion 2: Acquisition Channels and Timing
The ideal customers are acquired through a diverse mix of channels, with organic acquisition slightly leading. The spring months, particularly March to May, are the most effective for acquiring these customers. This could inform future marketing campaigns, suggesting a need to increase efforts during these months or to analyze what about these periods makes them so conducive to acquiring ideal customers.

### Conclusion 3: Regional Preferences in Customer Acquisition
There is a clear geographical pattern to where ideal customers are located, with specific states showing higher acquisition numbers. This regional variance highlights the importance of understanding local market dynamics and can direct localized marketing efforts. Tailoring marketing strategies to align with regional preferences and behaviors could help replicate the success seen in high-performing areas across other regions.

## Next Steps for Deepening User Analysis

To further refine our understanding of the ideal customer and to enhance the effectiveness of our customer acquisition and retention strategies, the following analytical steps are recommended:

### Segment-Specific Behavior Analysis

#### **Action Item**: Deep Dive into High-Performing Segments
- We should examine the transaction behaviors, preferred payment types, and frequency of use within the top-performing segments to understand the drivers of high engagement.
- **Rationale**: Tailoring the app experience based on segment-specific preferences can increase customer satisfaction and transaction frequency.

### Seasonal and Temporal Patterns

#### **Action Item**: Analyze Seasonal Trends and Timing
- We need to conduct a time-series analysis to identify patterns in customer acquisition and activity throughout the year.
- **Rationale**: Understanding the temporal dynamics can help in planning marketing campaigns and feature rollouts to coincide with periods of high user activity.

### Geo-Spatial Expansion Opportunities

#### **Action Item**: Evaluate Underpenetrated Markets
- we must perform a market penetration analysis on regions with lower acquisition numbers to identify potential barriers or opportunities for growth.
- **Rationale**: Developing strategies to increase market penetration in these areas could lead to an expanded user base.

### Product Usage Patterns

#### **Action Item**: Profile Usage Across Services
- We need to analyze the usage patterns of different in-app services (like bill payments, in-store payments, and e-commerce transactions) among the ideal customer segment.
- **Rationale**: Insights can inform product development and promotional offers that cater to the most valued services.

### Customer Journey Mapping

#### **Action Item**: Map the Customer Journey
- We must create a detailed customer journey maps from acquisition to retention, identifying key touchpoints that influence customer behavior.
- **Rationale**: This can highlight areas for improving user experience and identify moments where customers are most receptive to engagement.

### Predictive Modelling for Customer Lifetime Value (CLV)

#### **Action Item**: Implement Predictive Analytics
- We should use machine learning models to predict the Customer Lifetime Value of different segments, especially focusing on the ideal customer profile.
- **Rationale**: Predictive insights can help in proactively designing personalized retention strategies and optimizing resource allocation for maximum ROI.
