# 📈 Elasticity Project: Model Summary

This model focuses on **Price Elasticity of Demand (PED)** and its effect on total revenue. It allows users to explore how changes in price influence quantity demanded and overall sales performance.

## ✅ **Key Components:**

1. **Price Elasticity of Demand (PED) Calculation**

   The elasticity is calculated using the midpoint formula to provide stable and realistic elasticity estimates:

   $$
   E_d = \frac{\frac{Q_2 - Q_1}{(Q_2 + Q_1)/2}}{\frac{P_2 - P_1}{(P_2 + P_1)/2}}
   $$

   Where:
   - \( Q_1 \), \( Q_2 \) = Original and new quantity demanded.
   - \( P_1 \), \( P_2 \) = Original and new price.

2. **Elasticity Classification**

   The model classifies elasticity as:
   - **Elastic** if \( E_d > 1 \)
   - **Inelastic** if \( E_d < 1 \)
   - **Unitary Elastic** if \( E_d = 1 \)

3. **Revenue Impact Calculation**

   We calculate **Total Revenue (TR)** before and after the price change:

   $$
   TR_1 = P_1 \times Q_1
   $$

   $$
   TR_2 = P_2 \times Q_2
   $$

   The **change in revenue** is expressed as:

   $$
   \Delta TR = TR_2 - TR_1
   $$

4. **Visualizations**

   - **Demand Curve Plot:**
     Shows the demand curve shifting based on user input.
   - **Revenue Comparison:**
     Displays side-by-side revenue before and after the price change.

5. **User Inputs (via Sliders):**
   - Initial price (\( P_1 \))
   - Initial quantity (\( Q_1 \))
   - % change in price (\( \%\Delta P \))

6. **Output:**
   - New price & quantity estimates.
   - Elasticity classification (with interpretation).
   - Revenue before & after (with impact summary).
   - Interactive graph updates in real-time.


In [1]:
import pandas as pd
import numpy as np
import joblib
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures


In [2]:
processed_data = pd.read_csv('../data/processed/processed_data.csv')

## ✅ Check data is clean

In [3]:
processed_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 843482 entries, 0 to 843481
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Date           843482 non-null  object
 1   Store          843482 non-null  int64 
 2   DayOfWeek      843482 non-null  int64 
 3   Sales          843482 non-null  int64 
 4   Customers      843482 non-null  int64 
 5   Open           843482 non-null  int64 
 6   Promo          843482 non-null  int64 
 7   StateHoliday   843482 non-null  int64 
 8   SchoolHoliday  843482 non-null  int64 
dtypes: int64(8), object(1)
memory usage: 57.9+ MB


In [4]:
processed_data.head()


Unnamed: 0,Date,Store,DayOfWeek,Sales,Customers,Open,Promo,StateHoliday,SchoolHoliday
0,2015-07-31,1,5,5263,555,1,1,0,1
1,2015-07-31,2,5,6064,625,1,1,0,1
2,2015-07-31,3,5,8314,821,1,1,0,1
3,2015-07-31,4,5,13995,1498,1,1,0,1
4,2015-07-31,5,5,4822,559,1,1,0,1


## 🔥 First elasticity-style insight: Promo effect
- We can directly model the effect of Promo (binary: 0/1) on Sales. This tells you:

- How much more (or less) you sell when running a promo vs. not running one.

- Even a simple OLS regression can give you:

- The coefficient for Promo → this acts like a proxy elasticity for how responsive sales are to promotions.

## 💡 Let’s draft the steps:


### 1️⃣ Convert Date as before:

In [5]:
processed_data['Date'] = pd.to_datetime(processed_data['Date'])
processed_data['Month'] = processed_data['Date'].dt.month
processed_data['Year'] = processed_data['Date'].dt.year
processed_data['WeekOfYear'] = processed_data['Date'].dt.isocalendar().week


### 2️⃣ Filter to open stores only (because closed = 0 sales):

In [6]:
data_open = processed_data[processed_data['Open'] == 1]


### 3️⃣ Set up features & target:

In [7]:
features = ['Promo', 'StateHoliday', 'SchoolHoliday', 'DayOfWeek', 'Month', 'Year']
X = data_open[features]
y = data_open['Sales']


### 4️⃣ Linear regression:

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)

print("Train R^2:", lr.score(X_train, y_train))
print("Test R^2:", lr.score(X_test, y_test))


Train R^2: 0.1502372392327186
Test R^2: 0.1487795746920464


### 5️⃣ Elasticity-like insight: Promo effect
After training, check the coefficients:

In [9]:
coef_table = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': lr.coef_
})
print(coef_table)


         Feature   Coefficient
0          Promo  2.158871e+03
1   StateHoliday  4.831691e-13
2  SchoolHoliday  7.052238e+01
3      DayOfWeek -1.367336e+02
4          Month  8.163864e+01
5           Year  2.024603e+02


## 🔍 Analysis of the model

### 🟢 Promo: +2158.87
💥 BOOM—this is your headline stat.

✅ On average, when a promo is running, sales increase by about 2,159 units compared to days when there’s no promo.

📈 This is your "promotion elasticity proxy"—while it’s not a percentage change (since we don’t have price), it tells you how sensitive sales are to the presence of a promotion.



### 🟠 StateHoliday: ~ 0 (4.8e-13)
That’s super tiny—basically no effect.

This tells us:

🏖️ Whether it’s a state holiday or not doesn’t seem to impact sales much in your data.

Do we know if this column had real variation (were there holidays at all?), or was it sparse? Worth checking with:

In [10]:
print(processed_data['StateHoliday'].value_counts())


StateHoliday
0    843482
Name: count, dtype: int64


## This tells us:

- ✅ 100% of your data points (843,482 rows) have StateHoliday = 0.
- ❌ No actual state holidays are present.

### 💡 Why did the model give us that tiny coefficient (~4.8e-13)?
- Because the StateHoliday feature is constant—it never changes. That means it's giving the model no real signal at all.

- In linear regression, when a feature has no variation, it can’t actually contribute meaningfully to prediction. The regression still assigns it a tiny (basically zero) coefficient, but it’s doing nothing.

### ✅ Next Steps?
- Remove StateHoliday from the feature list going forward because:
    - It’s useless here (no variation = no predictive power).
    - It might even slightly slow down or complicate future models (especially tree-based ones that don’t handle constant features well).

### 🟡 SchoolHoliday: +70.5
This one's interesting:

- When there’s a school holiday, sales increase by ~71 units on average.
- Not a massive effect, but it’s positive.

✅ This makes intuitive sense—families might shop more when kids are out of school.

### 🔵 DayOfWeek: -136.7
This one tells us that as the day of the week increases (likely Monday=1 up to Sunday=7):

- Sales drop about 137 units per day going later in the week.
- It’s linear here, so it might not fully capture patterns like weekend spikes—this could be better handled later with dummy variables (categoricals).

### 🟣 Month: +81.6
Each later month in the year is associated with ~82 units more in sales.

- This may reflect seasonality trends (e.g., Q4 increases), but it’s a pretty small per-month bump.

### 🟤 Year: +202.5
Each year forward (like from 2022 to 2023) is associated with ~202 extra sales units.

- This suggests an upward trend year over year (maybe business growth, inflation, or other market factors).

## 🚦 What’s the Big Takeaway?

| 📊 **Feature**      | 💥 **Interpretation**                                                                                      |
|---------------------|----------------------------------------------------------------------------------------------------------|
| **Promo**           | 🔥 **Major impact: +2159 sales boost.** This is your *main elasticity-like driver.*                       |
| **StateHoliday**    | 💤 **No real effect.**                                                                                   |
| **SchoolHoliday**   | 👍 Small positive bump (~71 units).                                                                      |
| **DayOfWeek**       | 📉 Sales **decline by ~137 units** later in the week (might hint at a weekend lull—worth deeper analysis). |
| **Month**           | 📈 Slight positive trend across months (~82 units increase per month).                                    |
| **Year**            | 🚀 Solid +200 unit boost per year—suggests business growth or other long-term upward trend.               |


**Note:** The `StateHoliday` feature was removed from further modeling because the dataset contains no actual state holidays (100% of rows have `StateHoliday = 0`), making it a constant feature with no predictive value.


In [11]:
features = ['Promo', 'SchoolHoliday', 'DayOfWeek', 'Month', 'Year']


### 2️⃣ 🔄 Re-split your data:
Let’s keep things clean:

In [12]:
X = data_open[features]
y = data_open['Sales']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


### 3️⃣ 🚀 Refit the model:

In [32]:

# Fit model
lr = LinearRegression()
lr.fit(X_train, y_train)

# Predict on test set
y_pred = lr.predict(X_test)

# Calculate MSE and RMSE
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

# Print metrics
print("Train R^2:", lr.score(X_train, y_train))
print("Test R^2:", lr.score(X_test, y_test))
print("Test MSE:", mse)
print("Test RMSE:", rmse)



Train R^2: 0.8197562708621939
Test R^2: 0.8176245322442448
Test MSE: 1751865.9184369168
Test RMSE: 1323.5807185196213


### 4️⃣ 🧐 Get the updated coefficients:

In [14]:
coef_table = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': lr.coef_
})
print(coef_table)


         Feature  Coefficient
0          Promo  2158.870979
1  SchoolHoliday    70.522376
2      DayOfWeek  -136.733631
3          Month    81.638638
4           Year   202.460311


### ✅ What are the coefficents telling us?

| **Feature**       | **Coefficient** | **What it means**                                                                                           |
|-------------------|-----------------|------------------------------------------------------------------------------------------------------------|
| **Promo**         | 2158.87         | ➔ When `Promo` changes from 0 → 1 (no promo → promo), **sales increase by ~2159 units.**                    |
| **SchoolHoliday** | 70.52           | ➔ When `SchoolHoliday` changes from 0 → 1, **sales increase by ~71 units.**                                 |
| **DayOfWeek**     | -136.73         | ➔ For each *increment* in `DayOfWeek` (e.g., Monday=1 → Tuesday=2), **sales decrease by ~137 units.**       |
| **Month**         | 81.64           | ➔ For each *increment* in `Month` (e.g., January=1 → February=2), **sales increase by ~82 units.**          |
| **Year**          | 202.46          | ➔ For each *increment* in `Year` (e.g., 2023 → 2024), **sales increase by ~202 units.**                     |


### 🔄 Model Update: Removed `StateHoliday`

We re-ran the model after removing the `StateHoliday` feature (constant = 0). The updated model shows:

- ✅ Similar `Promo` effect (~ +2150 units).
- ✅ Slight refinement in other coefficients.
- ✅ Model performance remained stable, confirming `StateHoliday` had no predictive value.


In [15]:
print("Train R^2:", lr.score(X_train, y_train))
print("Test R^2:", lr.score(X_test, y_test))


Train R^2: 0.15023723923271926
Test R^2: 0.14877957469204728


### **Model Performance:**

- **Train R²:** 0.15
- **Test R²:** 0.15

This indicates the model explains ~15% of the variance in sales. While this is relatively low, it reflects the noisy nature of sales data and the limited feature set (no price data, no detailed store/product information). The model successfully captures general patterns (e.g., the strong positive effect of promotions) but is not suitable for high-precision forecasting in its current form.

#### **Next steps:**
- Introduce categorical encoding for `DayOfWeek` and `Month`.
- Add `Store` as a feature.
- Explore non-linear models (e.g., RandomForest).
- Investigate feature interactions (e.g., `Promo * DayOfWeek`).


### Check to see how many stores are included in the store data

In [16]:
print(data_open['Store'].value_counts())
print(data_open['Store'].nunique())


Store
562    918
85     918
423    918
262    918
682    918
      ... 
909    607
100    606
744    605
348    597
644    592
Name: count, Length: 1115, dtype: int64
1115


### The data shows there are 1,115 unique store IDs

### Why Move from Linear Regression to RandomForest?

The initial linear regression model provided useful directional insights but yielded low explanatory power (R² ~0.15). This is expected because linear regression assumes purely linear relationships between features and sales. However, real-world retail sales are influenced by complex, non-linear patterns—such as store-specific behavior, varying promo effectiveness, and seasonal effects.

Key reasons for adopting RandomForest:

- **Non-linear modeling:** RandomForest captures complex, non-linear relationships automatically, without requiring manual feature engineering (e.g., interaction terms between Promo and Store).
- **Better handling of categorical variables:** While linear regression requires one-hot encoding (adding 1,100+ dummy variables for `Store`), RandomForest efficiently handles categorical labels through simple label encoding.
- **Automatic interaction learning:** RandomForest naturally identifies important interactions, such as certain stores being more sensitive to promotions on specific days.
- **Improved predictive power:** Tree-based models typically yield higher R² in noisy retail datasets, offering more accurate predictions even with the same data.

For these reasons, RandomForest was selected as the next modeling step to improve performance and capture deeper patterns in the sales data.


### 📝 Feature Engineering and Model Improvement Plan
**Background:**<br>
Our initial linear regression model achieved an R² of ~0.15, indicating poor explanatory power. This low score suggested that key sources of variance were missing from the model or that linear assumptions were too restrictive.

**To address this, we decided to:**

- Introduce additional predictive features

- Move from a linear model to a non-linear model (Random Forest Regressor) to capture complex interactions.

#### Why add Store as a feature?
- Each store likely has its own baseline sales patterns due to:
- Location differences
- Customer demographics
- Local promotions
- Competition
👉 Therefore, ignoring the Store variable omits a major source of variance in sales.

*However:*
- Store is a categorical variable with 1,115 unique values
- Using one-hot encoding would add 1,115 new columns → inefficient and risks overfitting
- Tree-based models like Random Forest cannot handle string categories natively

### Chosen Encoding: Target (Mean) Encoding
We will encode Store by replacing each store ID with the average sales for that store in the training data.

#### ✅ Advantages:
- Captures store-specific sales level
- Keeps dimensionality low (single numeric feature)
- Easy interpretation
- Compatible with tree-based models

#### ⚠️ Risk:
- Data leakage: using target values to encode the same rows can cause overfitting

**To mitigate leakage:**
- Ideally, we would calculate mean sales only on the training fold during cross-validation
- ✅ Avoid data leakage from target encoding
- ✅ Preserve an unbiased validation/test set

**The solution is:**
- → Use out-of-fold mean encoding during training
- → Or at minimum, calculate store means only on the training set and apply those mappings to the test set.

#### 📐 Why is this necessary?
If we calculate store means on all data:
- The encoding would “peek” at test data sales
- Random Forest would be fitting on target information baked into features
- Validation metrics would be overly optimistic

### 📝 Approach (no leakage):
We split data before encoding:

In [17]:
# split BEFORE encoding
train_data, test_data = train_test_split(data_open, test_size=0.2, random_state=42)

# calculate store means only on training set
store_means = train_data.groupby('Store')['Sales'].mean()

# map means to both train and test
train_data['StoreEncoded'] = train_data['Store'].map(store_means)
test_data['StoreEncoded'] = test_data['Store'].map(store_means)

# for stores in test not seen in train → fill with global mean
global_mean = train_data['Sales'].mean()
test_data['StoreEncoded'] = test_data['StoreEncoded'].fillna(global_mean)


### Explanation:
- We split the data into train and test sets before encoding.
- We calculate the store means on the training set and apply them to both train and test sets.
- If a store is not present in the training set, we fill it with the global mean.
- 👉 The random_state parameter controls the random number generator that shuffles your data before splitting it.
    - If you don’t specify random_state, every time you run the code, you’d get a different split of train and test data.
    - But by setting random_state=42 (or any number), you’re telling Python:
        - "*Hey, I want the split to be random—but I want it to be the same random split every time I run this code.*"
    - ✅ This is key for reproducibility:
        - You’ll get the exact same train/test sets every time you run the script.
        - Someone else running your code (with the same random_state) will get the same results.
    - 🧐 Why the number 42?
        - The number 42 is just a popular in-joke from The Hitchhiker’s Guide to the Galaxy, where 42 is famously "the answer to life, the universe, and everything."
        - It could be any integer:

### ✅ Now we’re clean:

- StoreEncoded in train is based only on train sales
- StoreEncoded in test does not “peek” at its own target
- Unseen stores → fallback to global mean

### 🚨 Avoiding Target Leakage in Store Encoding
Target (mean) encoding introduces risk of data leakage when the target variable is used to encode both training and validation/test rows.

👉 Therefore, we calculate Store mean sales only on the training set and map those values to the test set.

If a store in the test set wasn’t seen in training, we fill it with the global mean sales from the training set.

### 🚦 Quick pre-modeling readiness checklist:
✅ 1. No missing values in features you’re feeding to the model?<br>
→       - Make sure no features (columns) still have NaNs or missing values.<br>
→       - You already filled StoreEncoded for test data—but check other columns too.<br>
        - 📝 If you have missing values elsewhere, you might need to impute (mean/median) or drop those rows/columns.

In [18]:
print(train_data.isnull().sum())


Date             0
Store            0
DayOfWeek        0
Sales            0
Customers        0
Open             0
Promo            0
StateHoliday     0
SchoolHoliday    0
Month            0
Year             0
WeekOfYear       0
StoreEncoded     0
dtype: int64


✅ 2. All features numeric?

- Random Forests can’t handle raw categorical data → so check if any other columns are still categorical (like strings or objects).
- If any features are object or category types, you’ll need to encode them (one-hot encoding or another target encoding).

In [19]:
print(train_data.dtypes)


Date             datetime64[ns]
Store                     int64
DayOfWeek                 int64
Sales                     int64
Customers                 int64
Open                      int64
Promo                     int64
StateHoliday              int64
SchoolHoliday             int64
Month                     int32
Year                      int32
WeekOfYear               UInt32
StoreEncoded            float64
dtype: object


#### Results
- Date has datetime64 dtype which cannot be used in modeling
- Date should either be transformed or dropped

#### Random Forest can’t handle datetime columns directly. You’ve already extracted:

✅ Month<br>
✅ Year<br>
✅ WeekOfYear<br>
✅ DayOfWeek<br>

🎯 The data has effectively already decomposed Date into its useful features.

➡️ Therefore → "Date" can safely drop Date for modeling:

✅ 3. Do you have your X and y separated?

Need to define your features (X) and target (y):
- ⚠️ If test set doesn’t have Sales yet (because you’re predicting), then you won’t have y_test.

In [20]:
X_train = train_data.drop(['Sales', 'Date', 'Store'], axis=1)
y_train = train_data['Sales']

X_test = test_data.drop(['Sales', 'Date', 'Store'], axis=1)  # drop same cols
y_test = test_data['Sales']  # only if Sales exists in test set


✅ 4. Scaling/normalization:

Good news: Random Forest doesn’t require scaling or normalization.

✔️ You can skip StandardScaler or MinMaxScaler → that’s a win.

✅ 5. Any weird outliers or illogical values?

This step’s optional, but sometimes useful → a quick histogram or describe() check to see if there are crazy extreme values that might throw off the model.

In [21]:
print(X_train.shape)


(674785, 10)


In [31]:


# instantiate the model
# rf = RandomForestRegressor(random_state=42)
rf = RandomForestRegressor(n_estimators=10, random_state=42)


# fit the model
rf.fit(X_train, y_train)

# predict
predict = rf.predict(X_test)

# calculate metrics
mse = mean_squared_error(y_test, predict)
r2 = r2_score(y_test, predict)

# print results
print(f'MSE: {mse:.2f}')
print(f"RMSE: {np.sqrt(mse):.2f}")
print(f"R^2: {r2:.4f}")


MSE: 443952.70
RMSE: 666.30
R^2: 0.9538


### Metrics:
with:<br>
rf = RandomForestRegressor(n_estimators=50, random_state=42)<br>
time to execute: 3.5 min

Calculated Metrics:<br>
- MSE: 401733.90
- RMSE: 633.82
- R^2: 0.9582
---
with:<br>
rf = RandomForestRegressor(n_estimators=100, random_state=42)<br>
time to execute: 6.5 min

Calculated Metrics:<br>
- MSE: 396158.47
- RMSE: 629.41
- R^2: 0.9588

---
with:<br>
rf = RandomForestRegressor(n_estimators=30, random_state=42)<br>
time to execute: 1 min 49 sec

Calculated Metrics:<br>
MSE: 409439.60<br>
RMSE: 639.87<br>
R^2: 0.9574

---

with:<br>
rf = RandomForestRegressor(n_estimators=20, random_state=42)<br>
time to execute: 1 min 49 sec

Calculated Metrics:<br>
MSE: 422443.78<br>
RMSE: 649.96<br>
R^2: 0.9560

---

with:<br>
rf = RandomForestRegressor(n_estimators=10, random_state=42)<br>
time to execute: 1 min 49 sec

Calculated Metrics:<br>



In [23]:
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")


X_train shape: (674785, 10)
X_test shape: (168697, 10)
y_train shape: (674785,)
y_test shape: (168697,)


In [24]:
print("Sample predictions:", predict[:5])
print("Actual values:", y_test[:5])


Sample predictions: [7833.8  5800.04 9690.82 9788.06 8364.96]
Actual values: 23193     7560
98710     6286
188103    8475
653632    9507
499190    8081
Name: Sales, dtype: int64


In [25]:

cv_r2 = cross_val_score(rf, X_train, y_train, cv=5, scoring='r2')
print(f"Cross-validated R²: {cv_r2.mean():.4f}")


Cross-validated R²: 0.9555


#### Discussion:
- Linear regression: R^2 = 0.14
- Random Forest with 5 trees: R^2 = 0.9456
- Random Forest with 20 trees: R^2 = 0.9560
- Random Forest with 100 trees: R^2 = 0.9588
- Random Forest with 30 trees: R^2 = 0.9574
- Random Forest with 50 trees: R^2 = 0.9582
- Linear Regression with Polynomial Features: R^2 = 0.8514

What this means?
- Linear regression is not a good model for this data
- Random Forest with 5 trees is a good model for this data
- Random Forest with 20 trees is a good model for this data
- Random Forest with 50 trees is a good model for this data
- The data is not a straight line, which is apparent by the R^2 value moving from 0.14 to 0.85.
- Though 5 trees is a good fit, for production the recommendation is 50 trees.
- The rapid convergence of R² from 5 → 50 trees indicates a highly learnable dataset with strong predictor-target relationships, allowing even small ensembles to achieve near-optimal performance.
- The gap between linear (0.14) and polynomial regression (0.85) confirms underlying non-linear patterns that are better captured by flexible, non-parametric methods like Random Forest.
- While 5 trees perform well, increasing to 50+ trees ensures greater prediction stability and reduces sensitivity to sampling variance—making it a more robust production choice.

In [30]:



# Transform training data
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X_train)

# Fit model
lr = LinearRegression()
lr.fit(X_poly, y_train)

# Predict on test data
X_test_poly = poly.transform(X_test)
y_pred = lr.predict(X_test_poly)

# Calculate metrics
r2_poly = lr.score(X_test_poly, y_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

# Print results
print(f'Poly R²: {r2_poly:.4f}')
print(f'MSE: {mse:.4f}')
print(f'RMSE: {rmse:.4f}')



Poly R²: 0.8514
MSE: 1427158.7358
RMSE: 1194.6375


## Summary of Model Performance

The Random Forest model predicts sales with an average error of ±630 units and explains 95.8% of sales variation across stores. Increasing tree count past 50 yields minimal gains but substantially increases compute time.

## Save Model

In [27]:
# save model
joblib.dump((rf, X_train.columns.tolist()), '../data/trained/random_forest_model_with_features.pkl')



['../data/trained/random_forest_model_with_features.pkl']

In [28]:
# Save test features and labels to CSV for visualization notebook
X_test.to_csv('../data/trained/X_test.csv', index=False)
y_test.to_csv('../data/trained/y_test.csv', index=False)
