Here is a Python test tailored for a Senior Data Scientist role. This test will assess intermediate to advanced skills in data manipulation, statistical analysis, machine learning, and real-world problem-solving typical of data science roles.

---

### Test Outline

1. **Data Wrangling and Manipulation**:
   - Cleaning, transforming, and summarizing data.
   
2. **Exploratory Data Analysis (EDA)**:
   - Investigating relationships, distributions, and trends in the data.
   
3. **Machine Learning Modeling**:
   - Building and evaluating predictive models.
   
4. **Domain-Specific Analysis**:
   - Case study focusing on retail scenarios.
   
5. **Optimization Problem**:
   - Solving a business optimization problem.

### Dataset
Assume we have a retail transaction dataset with columns like `transaction_id`, `customer_id`, `date`, `product_id`, `store_id`, `quantity`, `price`, and `total_amount`.

---

## Instructions

Provide clear explanations and code where necessary.

---

### Problem 1: Data Wrangling and Cleaning (30 points)
1. Load the dataset `transactions.csv`.
2. Identify and handle missing values in `quantity` and `price` columns. If both are missing for a row, drop the row; otherwise, fill missing values in `quantity` with the median and `price` with the mean of respective columns.
3. Check for and remove any duplicate transactions.
4. Create a new column `day_of_week` that captures the day of the week for each transaction date.

### Problem 2: Exploratory Data Analysis (EDA) (20 points)
1. Analyze and visualize the distribution of total sales (using `total_amount`) per store. Which stores have the highest and lowest sales?
2. Explore the relationship between `quantity` and `total_amount`. Provide a scatter plot and a brief interpretation.
3. Identify any seasonal patterns in sales by plotting monthly sales trends. Are there months with significantly higher or lower sales?

### Problem 3: Customer Segmentation (20 points)
1. Using the columns `customer_id`, `total_amount`, and `date`, perform RFM (Recency, Frequency, Monetary) analysis to segment customers into distinct groups.
2. Cluster the customers into four segments based on the RFM metrics using the K-Means algorithm.
3. Provide a brief interpretation of each segment based on their spending behavior and frequency.

### Problem 4: Machine Learning Modeling (25 points)
1. Build a predictive model to forecast the total sales for each store in the next month. Use the columns `store_id`, `date`, `quantity`, `price`, and `total_amount` for your model.
2. Preprocess the data by encoding categorical variables, scaling numerical variables, and performing any other necessary transformations.
3. Train at least two regression models (e.g., Linear Regression and Gradient Boosting). Use cross-validation to evaluate the models and select the best one based on RMSE.
4. Briefly explain any feature engineering steps and evaluate the model's performance.

### Problem 5: Optimization Problem - Stock Allocation (25 points)
1. Imagine retailer has a limited stock of a particular product. Given the transaction data, create a function to optimize the stock allocation across stores to maximize overall sales.
   - Inputs: `store_id`, `expected_demand`, `current_stock`, `stock_limit`.
   - Output: Optimal stock allocation for each store.
2. The function should aim to minimize the stockouts in high-demand stores while adhering to the `stock_limit`.
3. Write a brief summary explaining how your function works and any assumptions made.

---

### Expected Output

Your submission should include:
- Cleaned and processed dataset.
- Data visualizations for EDA.
- RFM analysis and customer segmentation results.
- Machine learning model evaluation results with chosen model explanations.
- Code for the optimization function and test outputs.

---

This test will give a comprehensive view of a candidate's abilities in data manipulation, statistical analysis, machine learning, and optimization, all of which are essential for a Senior Staff Data Scientist role.