**This data was cleaned, analyzed and visualized using Microsoft Power BI.**

# **Data validation**

## **Dataset Overview:**
  - Initial Size: 8 columns and 15,000 rows
  - Final Size: 8 columns and 14,998 rows
Column-Specific Cleaning Steps:
1.	**Week:**
     - No missing values.
     - 6 distinct values representing weeks 1-6 post-launch.
     - No changes made.

2.	**Sales_method**:
     - Initially had 5 distinct values due to 2 typos ("em +call" and                "email").
     - Corrected the typos to "Email + Call" and "Email".
     - No missing values.

3.	**Customer_id:**
      - No missing values or duplicates.
      - 15,000 distinct values, indicating one ID per customer.
      - No changes made.

4.  **nb_sold:**
     - 10 distinct numeric values ranging from 7-16.
     - No missing values.
     - No changes made.

5.  **Revenue:**
     - Rounded to 2 decimal places.
     - 1,074 missing values (7% of the dataset).
     - Replaced missing values with the median value of 89.5. The median was        chosen because it is less sensitive to outliers compared to the mean,        providing a more accurate central tendency for the data set,                  especially when the missing data percentage is greater than 5%.
      -Converted the column to numeric format.

6.  **Years_as_Customer:**
     - Numeric column with 42 distinct values.
     - Identified and removed 2 extreme values (63 and 47) that exceeded the        company’s existence (founded in 1984, with a maximum valid value of 40        years).
     - After cleaning, 40 distinct values remained.

7.  **Nb_site_visit:**
     - Numeric column with 27 distinct values ranging from 12-41.
     - No missing values.
     - No changes made.

8. **State:**
    - 50 distinct states, all values in character format.
    - No missing values.
    - No changes made.



# **Exploratory Analysis**

## **Number of Customers for each approach**

The countplot below shows the number of customers for each sales method. The Email method has the most customers, with 7,465. Next, the Call method comes in with 4,961 customers. Finally, the Email + Call method has the fewest, with 2,572 customers.

![Number of Customers for each method](image1.png)




## **Overall Revenue Distribution**

The graph below provides an insightful look into the overall revenue distribution of the dataset. The revenue values are plotted along the x-axis, with the frequency of occurrences shown on the y-axis. The most striking feature of the graph is the concentration of revenue values around a central point. Most of the data clusters between approximately 80 to 100 in revenue, with a particularly sharp peak around 85. This indicates that a large portion of the revenue observations fall within this range, making it a critical area of focus.

The median revenue, marked at 89.50, aligns closely with this peak, suggesting that the median is a reliable measure of central tendency for this dataset. The mean revenue, slightly higher at 93.62, indicates a slight right skew in the data, meaning that while most revenues hover around the median, there are some higher values that pull the mean upwards. This skewness hints at the presence of a few significant outliers on the higher end of the revenue spectrum.

Looking closer, there are noticeable spikes at higher revenue values, particularly around 150, 180, and 190. These spikes could suggest specific revenue brackets that occur more frequently.On the flip side, there are also some lower revenue observations, especially between 30 to 50, which might represent smaller transactions that, while less frequent, still contribute to the overall revenue distribution.

Overall, this distribution tells a story of consistency, with most sales centered around a median value, yet it also reveals the presence of higher-value transactions that add complexity to the revenue landscape. 

![Overall Revenue Distribution](image2.png)


## **Revenue Distribution for each Method**

The box plot below provides a clear comparison of revenue distribution across the three sales methods: Call, Email, and Email + Call.

Starting with the **Call method**, we see that it generates the lowest and most consistent revenue. Most of the sales fall between 45 USD and 55 USD, with a median just under 50 USD .  This indicates that phone calls might be more effective  for  smaller, straightforward transactions. There are a few outliers above 75  USD, showing that while it’s possible to achieve higher sales through calls, these instances are rare.

The **Email method** shows slightly higher revenue on average, with most transactions ranging between  90 USD and 100 USD. The median revenue here is around 85 USD, which is consistent with the overall revenue pattern we observed earlier. There are outliers on both ends, but they’re not as extreme, suggesting that while email can handle a variety of transactions, it mostly results in mid-range sales.

The **Email + Call method** stands out for its wider range of revenue. The interquartile range stretches from around  150 USD to 190 USD, and the median is slightly above 182 USD. However, this method also captures higher-value sales, as evidenced by outliers that extend further, especially on the higher end. This suggests that combining email with a follow-up call can effectively drive more significant sales, appealing to a broader range of customers, including those making larger purchases.

In summary, the Call method seems best suited for smaller, more routine transactions. The Email method handles a slightly broader range, fitting well for mid-range sales. The Email + Call method, however, appears to be the most effective in driving higher revenue, making it a strong candidate for maximizing overall sales.

![box plot showing a Revenue Distribution for each method](image3.png)


## **Difference in Revenue Over Time for each of the methods**

This bar chart provides a clear view of how revenue evolved over six weeks for each of the three sales methods: Call, Email, and Email + Call. 

Starting with the Email + Call method, it consistently outperformed the other methods throughout the six weeks. From Week 1, where it generated 124 USD, there was a steady increase in revenue each week except for week 3 where there was a little drop, peaking at 205 USD in Week 6. This consistent rise suggests that combining emails with follow-up calls became increasingly effective as time went on. The sharp increase between Weeks 5 and 6 indicates that customers likely responded well to this personalized approach, leading to higher engagement and more significant sales.

The Email method also showed positive results, though not as strong as the combined method. It started at about  88 USD in Week 1 and gradually increased, hitting 128 USD by Week 6, their impact might have plateaued by Week 3, but picked up by week 4 remained constant in the following week then had a little increase in week 6. This pattern suggests that while emails are somewhat effective.

In contrast, the Call method showed the lowest revenue across all weeks, starting at 37 USD in Week 1 and gradually increasing to 67 USD by Week 6. While there was some growth, especially a noticeable increase in Week 6, the overall revenue remained significantly lower than the other methods.

Overall, there was a clear difference in revenue trends across the three methods. The Email + Call method consistently generated the highest revenue and showed continuous growth, making it the most effective approach over time. The Email method performed well. Meanwhile, the Call method generated the least revenue, indicating that phone calls alone might not be sufficient to drive substantial sales.



![Revenue over Time for each method](image4.png)




## **Difference in Revenue Over years as customer for each of the methods**

This line graph below provides an insightful look at how revenue varies based on how long customers have been with Pens and Printers, segmented by the three sales methods: Call, Email, and Email + Call.

Starting with the Email + Call method (orange line), it’s clear that this approach consistently generates the highest revenue across almost all customer age groups. The revenue stays strong, generally between 150 USD to 200 USD, though there are some ups and downs along the way. However, there is a noticeable decline towards the end, especially with customers who have been with the company for around 33 years.

The Email method (dark blue line) shows a steady revenue pattern across different customer age groups, hovering around the 100 USD mark. This consistency indicates that while Email alone might not drive as high revenue as the combined approach, it’s still a reliable method for maintaining steady sales.

On the other hand, the Call method (light blue line) consistently generates the lowest revenue among the three methods, staying around 50 USD for most customer age groups. There are minor ups and downs, but overall, the trend remains fairly flat. This suggests th Opportunities for Growth and Developmentat relying on phone calls alone might not be the most effective way to drive sales, particularly as customers become more established with the company.

In conclusion, this graph highlights that the Email + Call method is the most effective in generating higher revenue, especially with customers who have been with the company for up to 30 years. The Email method provides a stable, reliable revenue stream across all customer age groups, making it a solid choice for consistent engagement. Meanwhile, the Call method, while less effective overall, might still have its place in specific contexts or with certain customer segments, but it’s clear that calls alone are not enough to drive substantial sales, especially with more established customers.


![Chart of Revenue over years as customer for each method](image5.png)


## **Distribution of Customers over time.**

The histogram below shows that Pens and Printers has successfully attracted a large number of new customers, particularly those who have been with the company for less than two years, likely due to recent marketing efforts or product launches. However, there is a noticeable decline in customer numbers as the years increase, with significantly fewer customers remaining after 10 years, and even fewer beyond 20 years. This pattern highlights the typical challenges of customer retention over time. Despite this, the company does have a small but loyal group of long-term customers who have stayed for over 20 years, representing a valuable segment that should be nurtured. The findings emphasize the importance of continuing strong customer acquisition efforts while also focusing on improving retention strategies to maintain and grow the customer base, balancing both to ensure sustained long-term growth.

![distribution Of Years as Customers](image6.png)




## **Sales Method to choose.**

Based on the data, I would recommend we keep using the **Email + Call method**. It consistently brought in the highest revenue across the board, which shows it is really effective.

The only downside is that it takes more time and effort from the team, but the results suggest it is worth it—this approach really seems to connect with the customers and drive sales.
 
This method also captures higher-value sales, as evidenced by outliers in the boxplot that extend further, especially on the higher end. This suggests that combining **email with a follow-up call** can effectively drive more significant sales, appealing to a broader range of customers, including those making larger purchases.


## **Business Metric to Monitor**

Given that the business goal is to identify the most effective sales approach for the new product line in response to changing consumer behavior, I recommend we focus on tracking **the percentage of average sales revenue generated by each sales method**.

Although the **Email + Call** method has consistently brought in the highest revenue, the Email and Call methods have shown more stability with less variability. This suggests that while the combined approach is powerful, it's important to monitor the performance of all methods to determine what works best over time.

By tracking **the percentage of average sales revenue across all methods**, we can better understand which strategies are most effective in different contexts and for different customer segments. This metric will allow us to spot trends and patterns, helping us make more informed decisions about how to optimize the sales strategies moving forward.

Based on the data from the past six weeks, the percentage of average sales revenue for each method is as follows:

Email + Call: **53.98%**

Email: **30.51%**

Call: **15.51%**

These figures highlight the effectiveness of **the Email + Call method**, but also the importance of continuing to track and evaluate all methods to ensure we're using the best approach for our customers.









## **Reccomendation**

I recommend that the following should be implemented to increase the overall business Performance

1. Tracking the percentage of average sales revenue for each method over time will help us understand the evolving effectiveness of each strategy.As customer behaviors and market conditions change, it’s important to regularly review the effectiveness of each sales method.

2. Focus on the Email + Call strategy for key customer segments and important campaigns.

3. Utilize the Email method for customer segments where the additional effort of a call may not significantly increase sales. This method is reliable and less resource-intensive, making it suitable for maintaining steady sales.
 
4. Develop and implement strategies aimed at improving long-term customer retention. This could include loyalty programs, personalized offers, or targeted communications that address the specific needs of customers who have been with the company for a longer time.
 
5. Consider investing in tools or research that provide deeper insights into customer behavior and preferences, particularly for long-term customers
 
6. As the company implements these strategies, it’s important to track progress and celebrate successes, both small and large.
 
7. Finally, be proactive in anticipating future challenges