# Week 8 Minor Assignment: Statistical Business Testing

## Assignment Overview
Apply statistical testing techniques to real business scenarios using the Olist e-commerce dataset. You will conduct hypothesis tests, interpret results, and provide business recommendations.

## Learning Objectives
By completing this assignment, you will:
- Apply appropriate statistical tests to business questions
- Interpret p-values and effect sizes in business context
- Handle multiple comparison problems
- Communicate statistical findings to business stakeholders
- Make data-driven business recommendations

## Submission Requirements
- Complete all sections with your code and analysis
- Provide business interpretations for all statistical tests
- Include at least 3 specific business recommendations
- Submit as a .ipynb file with all outputs visible

## Section 1: Environment Setup (10 points)

Set up your environment and establish database connection.

In [None]:
# TODO: Import necessary libraries
# Include: pandas, numpy, matplotlib, seaborn, scipy.stats, statsmodels
# Also include database connectivity libraries

# Your code here


In [None]:
# TODO: Set up database connection
# Use environment variables for credentials
# Test your connection

# Your code here


## Section 2: Regional Customer Satisfaction Analysis (25 points)

**Business Question**: Does customer satisfaction vary significantly across Brazilian regions?

**Your Tasks**:
1. Load customer satisfaction data by region
2. Perform appropriate statistical test
3. Calculate effect size
4. Provide business interpretation and recommendations

In [None]:
# TODO: Write SQL query to load regional satisfaction data
# Include: customer region, review scores, order characteristics
# Map states to regions (Southeast, South, Northeast, etc.)

query = """
-- Your SQL query here
-- Hint: Join customers, orders, reviews, and map states to regions
"""

# Your code here


In [None]:
# TODO: Perform statistical analysis
# 1. Check assumptions (normality, equal variance)
# 2. Choose appropriate test (ANOVA or Kruskal-Wallis)
# 3. Calculate effect size
# 4. If significant, perform post-hoc tests

# Your code here


### Business Interpretation (Section 2)

**TODO**: Write your business interpretation here:
- What do the statistical results mean for the business?
- Which regions show higher/lower satisfaction?
- What actions should the company take based on these findings?

*Your interpretation here (3-5 sentences)*

## Section 3: Product Category Performance Testing (25 points)

**Business Question**: Do certain product categories have significantly different delivery times?

**Your Tasks**:
1. Load delivery time data for top 5 product categories
2. Test for significant differences in delivery times
3. Identify categories with fastest/slowest delivery
4. Recommend operational improvements

In [None]:
# TODO: Load delivery time data by product category
# Calculate delivery days as difference between purchase and delivery dates
# Focus on top 5 categories by order volume

# Your code here


In [None]:
# TODO: Perform statistical analysis on delivery times
# 1. Visualize delivery time distributions by category
# 2. Test for significant differences
# 3. Calculate practical effect sizes
# 4. Identify best and worst performing categories

# Your code here


### Business Interpretation (Section 3)

**TODO**: Write your business interpretation here:
- Which categories have significantly different delivery times?
- What might explain these differences?
- What operational changes would you recommend?

*Your interpretation here (3-5 sentences)*

## Section 4: A/B Testing Simulation (25 points)

**Business Scenario**: The company wants to test whether a new payment method recommendation system increases average order value.

**Your Tasks**:
1. Simulate A/B test data (control vs. treatment groups)
2. Test for significant difference in order values
3. Calculate confidence intervals
4. Estimate business impact of the change

In [None]:
# TODO: Load order value data and simulate A/B test
# 1. Load recent order data
# 2. Randomly assign orders to control (A) and treatment (B) groups
# 3. Simulate a 5% improvement in treatment group order values
# 4. Add realistic noise to the simulation

# Your code here


In [None]:
# TODO: Analyze A/B test results
# 1. Perform two-sample t-test
# 2. Calculate confidence intervals for the difference
# 3. Calculate effect size (Cohen's d)
# 4. Estimate monthly/annual revenue impact

# Your code here


### Business Interpretation (Section 4)

**TODO**: Write your business interpretation here:
- Is the difference statistically significant?
- What is the estimated revenue impact?
- Would you recommend implementing the new system?
- What additional considerations should be made?

*Your interpretation here (4-6 sentences)*

## Section 5: Customer Behavior Analysis (15 points)

**Business Question**: Is there a relationship between customer location (urban vs. non-urban) and purchasing behavior?

**Your Tasks**:
1. Classify customers as urban vs. non-urban based on state
2. Compare average order values between groups
3. Test for association between location and high-value purchases
4. Provide marketing recommendations

In [None]:
# TODO: Analyze urban vs. non-urban customer behavior
# 1. Load customer and order data
# 2. Classify states as urban (SP, RJ) vs. non-urban (others)
# 3. Compare order values and purchasing patterns
# 4. Perform appropriate statistical tests

# Your code here


### Business Interpretation (Section 5)

**TODO**: Write your business interpretation here:
- Do urban and non-urban customers behave differently?
- What marketing strategies would you recommend for each group?

*Your interpretation here (2-3 sentences)*

## Section 6: Executive Summary and Business Recommendations

**TODO**: Provide a comprehensive executive summary of your findings and recommendations.

### Key Findings Summary
*Summarize the main statistical findings from all sections (3-4 bullet points)*

### Business Recommendations
*Provide at least 3 specific, actionable business recommendations based on your analysis*

1. **Recommendation 1**: *Your recommendation here*
   - *Rationale based on your statistical analysis*
   - *Expected business impact*

2. **Recommendation 2**: *Your recommendation here*
   - *Rationale based on your statistical analysis*
   - *Expected business impact*

3. **Recommendation 3**: *Your recommendation here*
   - *Rationale based on your statistical analysis*
   - *Expected business impact*

### Statistical Methodology Notes
*Briefly explain the statistical methods you used and why they were appropriate for each business question*

### Limitations and Future Analysis
*Identify any limitations in your analysis and suggest areas for future investigation*

## Grading Rubric

| Component | Points | Criteria |
|-----------|--------|-----------|
| Environment Setup | 10 | Proper imports, database connection, code organization |
| Regional Satisfaction Analysis | 25 | Correct statistical test, interpretation, business insights |
| Category Delivery Analysis | 25 | Appropriate methodology, clear visualizations, actionable findings |
| A/B Testing Simulation | 25 | Proper experimental design, statistical analysis, business impact |
| Customer Behavior Analysis | 15 | Correct classification, statistical testing, marketing insights |
| **Total** | **100** | |

### Additional Evaluation Criteria:
- **Code Quality**: Clean, well-commented code with appropriate error handling
- **Statistical Rigor**: Proper test selection, assumption checking, multiple comparison correction
- **Business Relevance**: Clear connection between statistical results and business implications
- **Communication**: Clear explanations that non-technical stakeholders could understand

### Submission Guidelines:
1. Ensure all code cells have been executed and outputs are visible
2. Include your name and date at the top of the notebook
3. Save as both .ipynb and .pdf formats
4. Submit by [Assignment Due Date]

**Note**: Use appropriate significance levels (α = 0.05) and always check statistical assumptions before applying tests. When assumptions are violated, use appropriate non-parametric alternatives or transformations.