## 🛒 Analytics Task: Store-Level Sales Prediction for OC&Grocery

### 📌 Context
OC&Grocery is a **new client** undergoing a **5-year strategy refresh**. As part of this transformation, they have asked our team at OC&C to help **analyse store-level performance drivers** to support better decision-making.

### 🎯 Objective
The goal is to **predict annual sales at the store level** using provided data on store attributes. These predictions will be used to:

- Improve the **efficiency of the store portfolio**
- Support **strategic expansion** beyond the current store footprint
- Help OC&Grocery **reduce food waste** and enhance **customer satisfaction**

### 📂 Provided Data
You have access to two datasets:
- **Store Data:** Contains various attributes about each store (e.g., location, size, format, etc.)
- **Sales Data:** Provides the **annual sales figures** for each store

> ⚠️ *Not all store attributes will be relevant, part of your task is to determine which features matter most.*

### 🧠 Your Task
1. **Explore and clean** the datasets.
2. **Select relevant features** from the store data.
3. **Train a predictive model** that estimates annual sales for a subset of stores.
4. Evaluate the model's performance and ensure it's suitable for strategic recommendations.

> 📝 **Reminder:** Focus on balancing your time effectively between **Analytics** and **Strategy**. Avoid over-optimizing the model, spend your time wisely.


In [1]:
# Step 1: Install any required packages (if needed)
# Uncomment if packages dont seem to be installed
# !pip install pandas scikit-learn matplotlib seaborn

# Step 2: Import standard data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Step 3: Upload files from your local machine
# Make sure to have downloaded them from the Google Drive first!!
from google.colab import files
uploaded = files.upload()

# Step 4: Load your CSV files into Pandas DataFrames
# Replace 'store_data.csv' and 'sales_data.csv' with your actual filenames
store_df = pd.read_csv('store_data.csv')
sales_df = pd.read_csv('sales_data.csv')

# Step 5: Quick look at the data
print("Store data:")
display(store_df.head())

print("\nSales data:")
display(sales_df.head())


KeyboardInterrupt: 

In [None]:
# Your code here!