<h1><center>Territory Sales Performance Analysis Test</center></h1>

## Overview
This test consists of three questions designed to evaluate your ability to analyze pharmaceutical sales data using Python and Pandas. You will work with territory-level sales and attainment data to identify high-performing districts and territories within different regions. Your task is to apply **grouping, sorting, ranking, and aggregation techniques** to derive meaningful insights.

## Instructions
- **Libraries:** Ensure you have imported all necessary libraries (e.g., `pandas`).
- **Variable Naming:** Store your final output in the specified variable name (e.g., `answer1`, `answer2`, etc.).
- **Testing:** Validate your results by ensuring they meet the expected output format and logic.
- **Code Quality:** Write clean, readable, and well-commented code.
- **Formatting:** Ensure that the final output DataFrames match the expected format provided in each question.

## Questions  

1. **Identify Active Partner Territories**  
2. **Identify Districts with the Highest Territory-Level Average Attainment in Each Region**  
3. **Identify the Top Three Performing Territories in Each Region Based on Attainment**  

<b><center>Good luck with your test!<center></b>


In [2]:
# Importing libraries
import pandas as pd, numpy as np, seaborn as sns

In [3]:
# Reading relevant data 
sales = pd.read_csv("additional_files/sales_data.csv")
zipterr = pd.read_csv("additional_files/zip_to_territory.csv")
terrgoals = pd.read_csv("additional_files/territory_targets.csv")

In [4]:
# Let's look at our data 
sales.head()

Unnamed: 0,Prescriberno,Zipcode,State,Specialty,Productcode,TRx_01,TRx_02,TRx_03,TRx_04,TRx_05,TRx_06,TRx_07,TRx_08,TRx_09,TRx_10,TRx_11,TRx_12
0,5,62024,IL,01OBG,44,0,0,0,0,0,0,0,0,0,0,0,0
1,5,62024,IL,01OBG,46,0,0,0,0,0,0,0,0,0,0,0,0
2,5,62024,IL,01OBG,1,0,3,0,0,0,0,0,0,0,0,0,0
3,119,35206,AL,01OPH,2,0,0,0,0,0,0,0,0,0,0,2,0
4,119,35206,AL,01OPH,2,0,0,0,0,0,0,0,0,2,0,0,0


In [5]:
# Let's look at our data 
zipterr.head()

Unnamed: 0,Zipcode,Terrcode
0,35206,T50101
1,35243,T50101
2,36608,T50101
3,35235,T50101
4,36701,T50101


In [None]:
# Let's look at our data 
terrgoals.head()

## **Q1 - Compute Territory-Level Sales and Attainment**  

### **Business Context**  
In pharmaceutical sales, sales representatives are assigned to specific territories, and each territory has predefined sales targets. Understanding **territory-level sales performance** is critical for evaluating business effectiveness and identifying areas requiring additional support.  

### **Problem Statement**  
You are provided with **three datasets** containing **prescription sales data, zip-to-territory mapping, and territory-level targets**. Your task is to compute **territory-level total sales** and derive the **attainment ratio**, which is the proportion of sales achieved relative to the target.  

---

### **Available Datasets**  

#### **1) Zip-Level Prescription Sales Data** (`sales_data.csv`)  
This dataset contains **prescription transaction (TRx) data** at the **zip code level**, along with prescriber information and product codes. Sales data is recorded monthly (`TRx_01` to `TRx_12`), representing the total prescriptions written for each product.  

| Prescriberno | Zipcode | State | Specialty | Productcode | TRx_01 | TRx_02 | TRx_03 | TRx_04 | TRx_05 | TRx_06 | TRx_07 | TRx_08 | TRx_09 | TRx_10 | TRx_11 | TRx_12 |  
|-------------|---------|-------|-----------|-------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|  
| 0000005     | 62024   | IL    | 01OBG     | 44          | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      |  
| 0000005     | 62024   | IL    | 01OBG     | 46          | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      |  
| 0000005     | 62024   | IL    | 01OBG     | 1           | 0      | 3      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      |  
| 0000119     | 35206   | AL    | 01OPH     | 2           | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 2      | 0      |  
| 0000119     | 35206   | AL    | 01OPH     | 2           | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 2      | 0      | 0      | 0      |  

Each row represents a **unique prescriber-product combination** in a given **zip code**, with monthly prescription counts.  

---

#### **2) Zip-to-Territory Mapping** (`zip_to_territory.csv`)  
This dataset maps each **zip code** to a **territory code**. Since sales data is at the **zip level**, this mapping helps aggregate sales to the **territory level**.  

| Zipcode | Terrcode |  
|---------|---------|  
| 35206   | T50101  |  
| 35243   | T50101  |  
| 36608   | T50101  |  
| 35235   | T50101  |  
| 36701   | T50101  |  

---

#### **3) Territory-Level Targets** (`territory_targets.csv`)  
This dataset provides **territory-level sales targets**, which represent expected sales for each territory.  

| Regcode | Distcode | Terrcode | Level     | Targets |  
|---------|---------|----------|-----------|---------|  
| T50000  | T50100  | T50101   | Territory | 1174    |  
| T50000  | T50100  | T50102   | Territory | 1008    |  
| T50000  | T50100  | T50103   | Territory | 4818    |  

Each **territory** has an assigned sales **target** under a regional and district structure.  

---

### **Task Requirements**  
1. **Compute the total sales** (`TRx_01` to `TRx_12`) for each **territory**.  
2. **Calculate the attainment** for each territory, defined as:  Total Sales/ Targets. **This should be rounded to Four decimal places.**

3. **Return the final output** as a DataFrame with the columns as shown in the expected output below. Ensure that this DataFrame is sorted by `Terrcode` in ascending order:
4. Store this final answer in a variable `answer1`.

---

### **Expected Output Format (with dummy values)**  

| Regcode | Distcode | Terrcode | Level     | Targets |  Sales |  Attainment |
|---------|---------|----------|-----------|---------|---------|  -----------|
| T50000  | T50100  | T50101   | Territory | 1174    |  1050   |  0.8943     |
| T50000  | T50100  | T50102   | Territory | 1008    |  980    |  0.9722     |
| T50000  | T50100  | T50103   | Territory | 4818    |  4320   |  0.8966     |


### Please Write your answer here 

Delete the `raise NotImplementedError()` when you start writing your code.

In [6]:
months = [f'TRx_{str(i).zfill(2)}' for i in range(1, 13)]
sales['Total_TRx'] = sales[months].sum(axis=1)

sales_terr = pd.merge(sales, zipterr, on='Zipcode', how='left')

territory_sales = sales_terr.groupby('Terrcode')['Total_TRx'].sum().reset_index()
territory_sales.rename(columns={'Total_TRx':'Sales'}, inplace=True)
answer1 = pd.merge(terrgoals, territory_sales, on='Terrcode', how='left')
answer1['Attainment'] = (answer1['Sales'] / answer1['Targets']).round(4)
answer1 = answer1.sort_values('Terrcode').reset_index(drop=True)

answer1


Unnamed: 0,Regcode,Distcode,Terrcode,Level,Targets,Sales,Attainment
0,T30000,T30100,T30101,Territory,2100,2625,1.25
1,T30000,T30100,T30102,Territory,2294,2225,0.9699
2,T30000,T30100,T30103,Territory,613,558,0.9103
3,T30000,T30200,T30201,Territory,1091,1593,1.4601
4,T30000,T30200,T30202,Territory,60,83,1.3833
5,T30000,T30200,T30203,Territory,277,412,1.4874
6,T30000,T30200,T30204,Territory,2138,1839,0.8601
7,T40000,T40100,T40101,Territory,1040,978,0.9404
8,T40000,T40100,T40102,Territory,758,1114,1.4697
9,T40000,T40100,T40103,Territory,589,560,0.9508


---
### Make sure to write your answer for Q1 above this line
 -  Feel free to add as many code cells above this line as you wish    
 -  Make sure to save your answer in the variable named 'answer1'
 -  If the variable `answer1` is not found, you answer will be rejected by the system

----

## **Q2 - Identify Districts with the Highest Territory-Level Average Attainment in Each Region**  

### **Business Context**  
In pharmaceutical sales, territories are grouped into districts, and districts are further organized under regions. To optimize sales strategies, it is essential to **identify districts within each region that are outperforming others** based on their **territory-level attainment**. This helps leadership allocate resources effectively and replicate successful strategies across underperforming districts.  

### **Problem Statement**  
Using the **territory-level sales and attainment** data computed previously, determine the **district in each region with the highest average territory-level attainment**.  

---

### **Data to be used**  
The starting dataset is the **territory-level sales and attainment** table generated previously
Note - The below dataset shows dummy values

| Regcode | Distcode | Terrcode | Level     | Targets | Sales | Attainment |  
|---------|---------|----------|-----------|---------|---------|------------|  
| T50000  | T50100  | T50101   | Territory | 1174    | 1050    | 0.8943     |  
| T50000  | T50100  | T50102   | Territory | 1008    | 980     | 0.9722     |  
| T50000  | T50100  | T50103   | Territory | 4818    | 4320    | 0.8966     |  
| T50000  | T50200  | T50201   | Territory | 2000    | 1980    | 0.9900     |  
| T50000  | T50200  | T50202   | Territory | 1500    | 1450    | 0.9667     |  
| T50001  | T50200  | T50203   | Territory | 1800    | 1760    | 0.9778     |  
| T50001  | T50200  | T50204   | Territory | 1400    | 1360    | 0.9714     |  
| ...  | ...  | ...   | ... | ...    | ...    | ...     |  

---

### **Example Calculation**  

For **Region `T50000`**, the average territory-level attainment for each district is computed as follows:  

- **District `T50100`**  
  - `(0.8943 + 0.9722 + 0.8966) / 3 = 0.9210`  
- **District `T50200`**  
  - `(0.9900 + 0.9667 + 0.9778 + 0.9714) / 4 = 0.9764`  

Since `T50200` has the higher attainment of the Two districts within **T50000**, it is selected.  

Similar computations need to be done for other regions and the districts with the highest average values need to be selected:  

---

### **Expected Output Format (with dummy values)**  

**Return the final output** as a DataFrame with the columns shown below, sorted by `Regcode` in ascending order.  


| Regcode | Distcode | Attainment |  
|---------|---------|---------------|  
| T30000  | T30100  | ...        |  
| T40000  | T40200  | ...        |  
| T50000  | T50100  | ...        |  

Please store the final output in a variable named `answer2`.

### Please Write your answer here 

Delete the `raise NotImplementedError()` when you start writing your code.

In [7]:
district_avg = answer1.groupby(['Regcode', 'Distcode'])['Attainment'].mean().reset_index()
answer2 = district_avg.loc[district_avg.groupby('Regcode')['Attainment'].idxmax()]
answer2 = answer2.sort_values('Regcode').reset_index(drop=True)
answer2


Unnamed: 0,Regcode,Distcode,Attainment
0,T30000,T30200,1.297725
1,T40000,T40200,1.334725
2,T50000,T50100,1.089767


---
### Make sure to write your answer for Q2 above this line
 -  Feel free to add as many code cells above this line as you wish    
 -  Make sure to save your answer in the variable named 'answer2'
 -  If the variable `answer2` is not found, you answer will be rejected by the system

----

## **Q3 - Identify the Top Three Performing Territories in Each Region Based on Attainment**  

### **Business Context**  
Pharmaceutical sales performance is monitored at multiple levels, including **territories, districts, and regions**. Identifying the **top-performing territories within each region** helps sales leadership recognize successful strategies, reallocate resources efficiently, and address market gaps.  

### **Problem Statement**  
Using the **territory-level sales and attainment** numbers derived previously, determine the **top three territories** with the highest attainment within each region.  

---

### **Data to be Used**  
The starting dataset is the **territory-level sales and attainment** table generated previously.  

| Regcode | Distcode | Terrcode | Level     | Targets | Sales | Attainment |  
|---------|---------|----------|-----------|---------|---------|------------|  
| T50000  | T50100  | T50101   | Territory | 1174    | 1050    | 0.8943     |  
| T50000  | T50100  | T50102   | Territory | 1008    | 980     | 0.9722     |  
| T50000  | T50100  | T50103   | Territory | 4818    | 4320    | 0.8966     |  
| T50000  | T50200  | T50201   | Territory | 2000    | 1980    | 0.9900     |  
| T50000  | T50200  | T50202   | Territory | 1500    | 1450    | 0.9667     |  
| T50001  | T50200  | T50203   | Territory | 1800    | 1760    | 0.9778     |  
| T50001  | T50200  | T50204   | Territory | 1400    | 1360    | 0.9714     |  
| ...     | ...     | ...      | ...       | ...     | ...     | ...        |  

---

### **Example Calculation**  

For **Region `T50000`**, the top three performing territories based on attainment are:  

1. **T50201** → `0.9900`  
2. **T50102** → `0.9722`  
3. **T50202** → `0.9667`  

Similar computations need to be done for other regions and the Territories with the top three attainment values need to be selected under each region:  

---

### **Expected Output Format (with dummy values)**  
**Return the final output** as a DataFrame with the following columns, sorted by `Regcode` in ascending order and `Attainment` in descending order within each region.  

| Regcode | Terrcode | Attainment | Rank |  
|---------|---------|------------|------|  
| T50000  | T50201  | 0.9900     | 1    |  
| T50000  | T50102  | 0.9722     | 2    |  
| T50000  | T50202  | 0.9667     | 3    |  
| T50001  | T50203  | 0.9778     | 1    |  
| T50001  | T50204  | 0.9714     | 2    |  
| T50001  | ...     | ...        | 3    |  

Please store the final output in a variable named `answer3`.

### Please Write your answer here 

Delete the `raise NotImplementedError()` when you start writing your code.

In [8]:
answer3 = answer1.copy()
answer3['Rank'] = answer3.groupby('Regcode')['Attainment'] \
                         .rank(method='first', ascending=False)

answer3 = answer3[answer3['Rank'] <= 3]

answer3 = answer3[['Regcode', 'Terrcode', 'Attainment', 'Rank']] \
            .sort_values(['Regcode', 'Attainment'], ascending=[True, False]) \
            .reset_index(drop=True)

answer3


Unnamed: 0,Regcode,Terrcode,Attainment,Rank
0,T30000,T30203,1.4874,1.0
1,T30000,T30201,1.4601,2.0
2,T30000,T30202,1.3833,3.0
3,T40000,T40201,1.4898,1.0
4,T40000,T40102,1.4697,2.0
5,T40000,T40204,1.4387,3.0
6,T50000,T50103,1.23,1.0
7,T50000,T50102,1.1696,2.0
8,T50000,T50201,1.0901,3.0


---
### Make sure to write your answer for Q3 above this line
 -  Feel free to add as many code cells above this line as you wish    
 -  Make sure to save your answer in the variable named 'answer2'
 -  If the variable `answer3` is not found, you answer will be rejected by the system

----

## **Q4 - Compute the sales contribution of each district in its Region**  

### **Business Context**  
In pharmaceutical sales, understanding how each **district contributes to the total regional sales** helps leadership **assess performance, allocate resources effectively, and optimize future sales strategies**. By identifying high-contributing districts, sales teams can learn from successful models and address underperformance where necessary.  

### **Problem Statement**  
Using the **territory-level sales** DataFrame created previously, determine **each district's total sales** and its **contribution to the total regional sales**.  

---  

### **Getting District level sales**  
You can begin with previously created **territory-level sales** DataFrame (also shown below for reference). Here the Territory level sales can be rolled up to their corresponding Districts (use the variable `Distcode`) to get District level sales

| Regcode | Distcode | Terrcode | Level     | Targets | Sales |  
|---------|---------|----------|-----------|---------|---------|
| T50000  | T50100  | T50101   | Territory | 1174    | 1050    |
| T50000  | T50100  | T50102   | Territory | 1008    | 980     |
| T50000  | T50100  | T50103   | Territory | 4818    | 4320    |
| T50000  | T50200  | T50201   | Territory | 2000    | 1980    |
| T50000  | T50200  | T50202   | Territory | 1500    | 1450    |
| T50001  | T50200  | T50203   | Territory | 1800    | 1760    |
| T50001  | T50200  | T50204   | Territory | 1400    | 1360    |
| ...     | ...     | ...      | ...       | ...     | ...     |

---  

### **What needs to be done**  

For each **district within a region**, compute its **total sales** and determine how much it contributes to the overall regional sales. The goal is to understand **which districts drive the majority of sales** within their respective regions. The final results can be used to highlight **districts with the highest contribution percentages**, allowing leadership to recognize key performing areas.  

---  

### **Expected output format (with dummy values)**  
**Return the final output** as a DataFrame with the following columns, sorted by `Regcode` in ascending order and `dist_contribution` in descending order within each region.  

| Regcode | Distcode | Sales  | Reg_Total_Sales | Dist_Contribution |  
|---------|---------|--------|-----------------|-------------|  
| T50000  | T50100  | 6350   | 9780            | 0.6493      |  
| T50000  | T50200  | 3430   | 9780            | 0.3507      |  
| T50001  | T50300  | ...    | ...             | ...         |  
| T50001  | T50400  | ...    | ...             | ...         |  


Please store the final output in a variable named `answer4`.

### Please Write your answer here 

Delete the `raise NotImplementedError()` when you start writing your code.

In [10]:

district_sales = answer1.groupby(['Regcode', 'Distcode'])['Sales'].sum().reset_index()
region_sales = district_sales.groupby('Regcode')['Sales'].sum().reset_index(name='Reg_Total_Sales')
answer4 = pd.merge(district_sales, region_sales, on='Regcode', how='left')
answer4['Dist_Contribution'] = (answer4['Sales'] / answer4['Reg_Total_Sales']).round(4)
answer4 = answer4.sort_values(['Regcode', 'Dist_Contribution'], ascending=[True, False]).reset_index(drop=True)
answer4


Unnamed: 0,Regcode,Distcode,Sales,Reg_Total_Sales,Dist_Contribution
0,T30000,T30100,5408,9335,0.5793
1,T30000,T30200,3927,9335,0.4207
2,T40000,T40200,9126,11778,0.7748
3,T40000,T40100,2652,11778,0.2252
4,T50000,T50100,8126,15691,0.5179
5,T50000,T50200,7565,15691,0.4821


In [None]:
# Q4: Compute the sales contribution of each district in its Region

# Step 1: Aggregate territory-level sales to district level
district_sales = answer1.groupby(['Regcode', 'Distcode'])['Sales'].sum().reset_index()

# Step 2: Compute total sales per region
region_sales = district_sales.groupby('Regcode')['Sales'].sum().reset_index(name='Reg_Total_Sales')

# Step 3: Merge district sales with regional totals
answer4 = pd.merge(district_sales, region_sales, on='Regcode', how='left')

# Step 4: Compute district contribution to regional sales
answer4['Dist_Contribution'] = (answer4['Sales'] / answer4['Reg_Total_Sales']).round(4)

# Step 5: Sort by Regcode ascending, contribution descending
answer4 = answer4.sort_values(['Regcode', 'Dist_Contribution'], ascending=[True, False]).reset_index(drop=True)

# Display final answer
answer4


---
### Make sure to write your answer for Q3 above this line
 -  Feel free to add as many code cells above this line as you wish    
 -  Make sure to save your answer in the variable named 'answer2'
 -  If the variable `answer3` is not found, you answer will be rejected by the system

---

<h2><center> Completing your test </center></h2>

- Once you have completed your test and answered all the questions:
  
- You can open the guide using the button located on the far right of your screen.

- The button will look like this image:  
  ![](additional_files/End_guide_button.png)

- Clicking on this button will allow you to mark the test as complete using the button shown below:  
  ![](additional_files/completion_button.png)

- Please click on **Mark as Completed** to end and submit your test.

- Thank you!
