# Data Analyst Professional Practical Exam Submission

**You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.**

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the [Markdown Guide](https://s3.amazonaws.com/talent-assets.datacamp.com/Markdown+Guide.pdf) before you start.


The dataset initially includes **15000** records with **8** columns each, before undergoing cleaning and validation. I have checked each column against the criteria specified in the dataset table.

- **week**: contains the weeks since product launch. Values are integers between 1 and 6. No cleaning is needed.

- **Sales_method**: The sales method should only contain one of three distinct categories: 'Email', 'Call', and 'Email + Call'. 'em + call' and 'email' are replaced by 'Email + Call' and 'Email' respectively.

- **customer_id**: are unique identifiers for customers. No cleaning is needed.

- **nb_sold**: this represents the number of products sold. values are integers between 7 and 16.No cleaning is needed.

- **revenue**: numeric values, with **1074** missing values. Not missing valaues are rounded to 2 decimal places as recomanded.

- **years_as_customer**: The dataset includes information about the number of years a customer has been making purchases from the company, which was established in 1984. Considering the data is up to date until 2022, there are **5** records with values exceeding 38.

- **nb_site_visits**: intrgers without missing values, same as the description. No cleaning is needed.

- **state**: 50 states in the United States, same as the description. No cleaning is needed.

After the data validation, the dataset contains **15000** entries with **8** columns with **1074** missing values in the **revenue** column.

# How many customers were there for each approach?
- The **'Email'** method (green bar), reached 7466 customers, which constitutes approximately **50%** of the total. 
- The **'Call'** method (blue bar) reached with 4962 customers, representing about **33%**. 
- The combination of **'Email and Call'** (orange bar), reached 2572 customers, accounting for roughly **17%** of the total outreach.
![fig01](fig01.PNG)


# What does the spread of the revenue look like overall?
From the descriptive statistics and box plot, we can conclude that:
- The revenue data is **not evenly distributed** and likely skewed, with the **majority** of the sales transactions generating revenues between approximately USD **52** and USD **107**. 
- The median being lower than the average of the minimum and maximum values suggests a possible **skew** towards the lower end of the revenue range. 
- The data presents a fairly **wide range**, indicating variability in the revenue generated from sales transactions.
![fig02](fig02.PNG)


# What does the spread of the revenue look like  for each method?

This box plot illustrates the revenue distribution across three sales methods. 

- The **'call'** method accounted for 4,781 sales, with revenues ranging from USD **32.54** to USD **71.36** (32.54 is the minimum of the overall revenu) and a median of USD **49.07**. 
- The **'email'** method had 6,922 sales with a wider revenue spread from USD **78.83** to USD **148.97** and a median of USD **95.58** (95.58 is near the the median of the overall revenu). 
- The combined **'email + call'** method included 2,223 sales and generated the highest revenues, ranging from USD **122.11**  to USD **238.32** (238.32 is the maximun of the overall revenu) with a median of **184.74**. 

The analysis indicates that the combined method **'email + call'** yielded the **highest revenue per sale**, followed by the **'email + call'** method, while the **"call"** method generated the lowest.
![fig03](fig03.PNG)


# Was there any difference in revenue over time for each of the methods?
The next visualization shows the evolution of revenue from the three sales methods over six weeks. 
- **'Email'** had the highest revenue but **declined over time**. 
- **'Call'** method’s revenue remained **relatively stable** with slight increases. 
- the '**Email + Call**' method showed a **consistent upward trend** and surpassed the other methods by week five and six. 
![fig04](fig04.PNG)


# Definition of a Metric for the Business to Monitor
Based on all what we found until now. I recommend to use the **Percentage of Revenue per Interaction** as our key metric.
$$\text{Percentage of Revenue per Interaction for a Sales Method} = \frac{\text{Total Revenue of a Sales Method}}{\text{Sum of Total Revenues for All Sales Methods}}\times100$$

## Estimate the Initial Value(s) for the Metric Based on the Current Data
Based on our data, it appears that the sales method of "**Email**" accounts for the largest portion of revenue per interaction, making up approximately **41.83%** of the total. The "**Email + Call**" method follows closely, accounting for about **36.87%** of the revenue per interaction. Lastly, the "**Call**" method contributes the least, comprising around **21.30%** of the revenue per interaction.

![fig05](fig05.PNG)

## How Should the Business Monitor What They Want to Achieve?
To effectively monitor this metric, the business should:

- Regularly record and update data on customer interactions and revenues associated with each sales method.
- Periodically calculate the Percentage of Revenue per Interaction for each method using the formula.
- Analyze trends over time - is the Revenue per Interaction increasing, decreasing, or remaining constant for each sales method?
- Compare the Revenue per Interaction against other metrics, such as customer satisfaction, to determine if any adjustments to the sales methods are needed.
- Consider external factors that might affect the Revenue per Interaction, such as market conditions or changes in customer preferences.



# Summary:

- The dataset consists of sales data over multiple weeks, detailing the sales method used (Call, Email, Email + Call), number of items sold, revenue, customer tenure, and other details.

- The **'Email'** method was **used the most**, followed by 'Call' and then 'Email + Call'.

- Initially, the **'Email'** method generated the highest revenue, but this revenue **trend decreased over time**.

- The **'Email + Call**' method showed **consistent growth** in revenue over time.

- The **'Call'** method remained somewhat **stable in revenue** but didn't stand out in any specific metric.

- **Percentage of Revenue per Interaction** was highest for the 'Email' method, indicating it as the most time-efficient. 'Email + Call' had the second-highest percentage, while the 'Call' method was the least efficient.



# Recommendations:

- To bolster revenue growth, the business ought to concentrate its resources on the **'Email + Call'** approach, given its demonstrated track record of **consistent escalation**.

- For **efficient time management**, the business should opt for the **'Email'** method, as it enables rapid interactions and facilitates reaching a larger customer base in a shorter span.

- It’s prudent to reassess the **'Call'** method. The method could be reserved for special instances that necessitate a personal touch or engagement with **high-value clients**.

- Maintain vigilant monitoring of sales method performance metrics and **adjust strategies according to recent data**. Pay attention to customer feedback to align methods with their preferences.

- Perform a **cost analysis** for each sales method to ascertain that the revenues generated adequately offset the associated expenses.

By implementing these recommendations, the business can aim for a balance between efficient time management and revenue growth, while also ensuring a high quality of customer relationships.

## 📝 Task List

Your written report should include written text summaries and graphics of the following:
- Data validation:   
  - Describe validation and cleaning steps for every column in the data 
- Exploratory Analysis:  
  - Include two different graphics showing single variables only to demonstrate the characteristics of data  
  - Include at least one graphic showing two or more variables to represent the relationship between features
  - Describe your findings
- Definition of a metric for the business to monitor  
  - How should the business use the metric to monitor the business problem
  - Can you estimate initial value(s) for the metric based on the current data
- Final summary including recommendations that the business should undertake

*Start writing report here..*

## ✅ When you have finished...
-  Publish your Workspace using the option on the left
-  Check the published version of your report:
	-  Can you see everything you want us to grade?
    -  Are all the graphics visible?
-  Review the grading rubric. Have you included everything that will be graded?
-  Head back to the [Certification Dashboard](https://app.datacamp.com/certification) to submit your practical exam report and record your presentation