# Combining Predictive Techniques

## Data Given

* StoreSalesData.csv - This file contains sales by product category for all existing stores for 2012, 2013, and 2014.
* StoreInformation.csv - This file contains location data for each of the stores.
* StoreDemographicData.csv - This file contains demographic data for the areas surrounding each of the existing stores and locations for new stores.

Load Package

In [None]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import calinski_harabasz_score, silhouette_score, davies_bouldin_score
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tools.eval_measures import rmse
from statsmodels.graphics import tsaplots

import matplotlib.pyplot as plt
# plt.style.use('seaborn-whitegrid')
plt.rcParams['figure.figsize'] = [11, 7]

Load Data

In [None]:
# Load Stores Sales
stores_sales_data = pd.read_csv('storesalesdata.csv')
# Bad Data: There no date 29-Feb-2014, Drod the data
# stores_sales_data = stores_sales_data.query('Date != "2014 02 29"')
# Convert Date varible to daterime object
# stores_sales_data = stores_sales_data.assign(Date = pd.to_datetime(stores_sales_data['Date']))

stores_sales_data.head(3)

In [None]:
# Load Store Information
store_information_data = pd.read_csv('storeinformation.csv')
store_information_data.head(3)

In [None]:
# Load Store Demographic Data
store_demographic_data = pd.read_csv('storedemographicdata.csv')
store_demographic_data.head(3)

## Task 1: Store Format (segments) for Existing Stores

To remedy the product surplus and shortages, the company wants to introduce different store formats. Each store format will have a different product selection in order to better match local demand. The actual building sizes will not change, just the product selection and internal layouts.

* Determine the optimal number of store formats based on sales data.
    - Sum sales data by StoreID and Year
    - Use percentage sales per category per store for clustering (category sales as a percentage of total store sales).
    - Use only 2015 sales data.
    - Use a K-means clustering model.

* Segment the 85 current stores into the different store formats.
* Use the StoreSalesData.csv and StoreInformation.csv files.

## Task 1 Submission
1. What is the optimal number of store formats? How did you arrive at that number?
2. How many stores fall into each store format?
3. Based on the results of the clustering model, what is one way that the clusters differ from one another?
4. Please provide a map created in Tableau that shows the location of the existing stores, uses color to show cluster, and size to show total sales. Make sure to include a legend! Feel free to simply copy and paste the map into the submission template.

In [None]:
# Aggregate sum of sales by Store and Year
filtered_columns = ['Dry_Grocery', 'Dairy', 'Frozen_Food', 'Meat', 'Produce', 'Floral', 'Deli', 'Bakery', 'General_Merchandise']
filtered_stores_data =  stores_sales_data.groupby(['Store', 'Year'], as_index=False)[filtered_columns].sum()
# Add Store Total Sales per year
filtered_stores_data = filtered_stores_data.assign(Total_Sales = filtered_stores_data[filtered_columns].sum(axis=1))
# Calculate percentage sales per category per store
filtered_stores_data[filtered_columns] = filtered_stores_data[filtered_columns].div(filtered_stores_data['Total_Sales'], axis=0)

# Filter 2015 data
filtered_stores_sales_2015_data = filtered_stores_data.query('Year == 2015')

print('\nFiltered and Aggregated 2015 Store Data')
filtered_stores_sales_2015_data.head()


In [None]:
# Find best numbers of clusters

# Function to return pandas describe
def cluster_number_test(raw_data, score_metric, number_test, cluster_number_list):
    df_list = []
    scaler = MinMaxScaler().fit_transform(raw_data)

    for i in cluster_number_list:
        score_list = []
        for _ in range(number_test):
            kmeans = KMeans(n_clusters=i)
            kmeans.fit(scaler)
            score = score_metric(raw_data, kmeans.labels_)
            score_list.append(score)

        temp_df = pd.DataFrame(score_list)
        df_list.append(temp_df)

    column_names = [f'Cluster {i}' for i in cluster_number_list]
    _df = pd.concat(df_list, axis=1)
    _df.columns = column_names
    return _df.describe().round(2)


In [None]:
# Finding best number of cluster
raw_data = filtered_stores_sales_2015_data[filtered_columns]
test_count = 100
possible_clusters = range(2,8)


In [None]:
# Run Test
c_h_score = cluster_number_test(raw_data, calinski_harabasz_score, test_count, possible_clusters)
sil_score = cluster_number_test(raw_data, silhouette_score, test_count, possible_clusters)
d_b_score = cluster_number_test(raw_data, davies_bouldin_score, test_count, possible_clusters)

In [None]:
# Metric: Calinski Harabasz Score - Higher the better
print(f'Run Calinski Harabasz Score Test {test_count} times - Higher the better')
print(c_h_score, '\n')
# Metric: Silhoutte Score - Higher the better
print(f'Run Silhoutte Score Test {test_count} times - Higher the better')
print(sil_score, '\n')
# Metric: Davies Bouldin Score - Smaller the better
print(f'Run Davies Bouldin Score Test {test_count} times - Smaller the better')
print(d_b_score)

In [None]:
# From above test, what is the best number of cluster?

##### Score Results #####
# Calinski Harabasz Score: 2 Cluster
# Silhoutte Score: 3 Cluster
# Davies Bouldin Score: 3 Cluster

best_number_cluster = 3

In [None]:
# Clusters - number of clusters = 3 
kmeans = KMeans(n_clusters = best_number_cluster)
# scale data
scaled_data = MinMaxScaler().fit_transform(raw_data)
kmeans.fit(scaled_data)
# Add cluser laber to data
filtered_stores_sales_2015_data = filtered_stores_sales_2015_data.assign(Segment = kmeans.labels_)

# check numbers of stores in each Segment
print('\nNumber of stores in Segment')
print(filtered_stores_sales_2015_data['Segment'].value_counts())

In [None]:
# Cluster interpretion: Use centroid
columns_name = ['Dry_Grocery', 'Dairy', 'Frozen_Food', 'Meat', 'Produce', 'Floral', 'Deli', 'Bakery', 'General_Merchandise']
print('\nAvarage distance between store formats')
pd.DataFrame(kmeans.cluster_centers_, columns=columns_name).round(4)

In [None]:
# Merge filtered store sale with store
filtered_stores_sales_2015_merged_information_data = filtered_stores_sales_2015_data.merge(store_information_data, how='inner', on='Store')
filtered_stores_sales_2015_merged_information_data.head(2)

In [None]:
# TODO
# Plot Clusters
# Scatter Markers Customizing: Size, Color, Shape, Transparency

# filtered_stores_sales_2015_merged_information_data.to_csv('filtered_stores_sales_2015_merged_information_data.csv')
# use above data to plot cluster map in Tableau

## Task 2: Store Format for New Stores

The grocery store chain has 10 new stores opening up at the beginning of the year. The company wants to determine which store format each of the new stores should have. However, we don’t have sales data for these new stores yet, so we’ll have to determine the format using each of the new store’s demographic data.

You’ve been asked to:

* Develop a model that predicts which segment a store falls into based on the demographic and socioeconomic characteristics of the population that resides in the area around each new store.
* Use a 20% validation sample with Random Seed = 3 when creating samples with which to compare the accuracy of the models. Make sure to compare a decision tree, forest, and boosted model.
* Use the model to predict the best store format for each of the 10 new stores.
* Use the StoreDemographicData.csv file, which contains the information for the area around each store.

Note: In a real world scenario, you could use PCA to reduce the number of predictor variables. However, there is no need to do so in this project. You can leave all predictor variables in the model.


## Task 2 Submission
* What methodology did you use to predict the best store format for the new stores? Why did you choose that methodology?
* What are the three most important variables that help explain the relationship between demographic indicators and store formats? Please include a visualization.
* What format do each of the 10 new stores fall into? Please provide a data table.


In [None]:
# 44 variables in demographic store data
store_demographic_columns = ['Age0to9', 'Age10to17', 'Age18to24', 'Age25to29', 'Age30to39', 'Age40to49', 'Age50to64', 'Age65Plus', 'EdLTHS', 'EdHSGrad', 'EdSomeCol', 'EdAssociate', 'EdBachelor', 'EdMaster', 'EdProfSchl', 'EdDoctorate', 'HHSz1Per', 'HHSz2Per', 'HHSz3Per', 'HHSz4Per', 'HHSz5PlusPer', 'HHIncU25K', 'HHInc25Kto50K', 'HHInc50Kto75K',
       'HHInc75Kto100K', 'HHInc100Kto150K', 'HHInc150Kto250K', 'HHInc250KPlus', 'PopAsian', 'PopBlack', 'PopHispanic', 'PopMulti', 'PopNativeAmer', 'PopOther', 'PopPacIsl', 'PopWhite', 'HVal0to100K', 'HVal100Kto200K', 'HVal200Kto300K', 'HVal300Kto400K', 'HVal400Kto500K', 'HVal500Kto750K', 'HVal750KPlus', 'PopDens']

# join demographic data with store information
store_info_columns = ['Store', 'Type']
store_demographic_with_info_data = store_demographic_data.merge(store_information_data[store_info_columns], on='Store')

#  filter existing store
store_demographic_with_info_data_existing = store_demographic_with_info_data.query('Type == "Existing"')
#  filter new store
store_demographic_with_info_data_new = store_demographic_with_info_data.query('Type == "New"')

# merge segment to existing store
store_filtered_columns = ['Store', 'Segment']
_temp_df = filtered_stores_sales_2015_merged_information_data[store_filtered_columns]
store_demographic_with_info_data_existing = store_demographic_with_info_data_existing.merge(_temp_df, on='Store')

# Prepair X and y for training
y = store_demographic_with_info_data_existing['Segment']
X = store_demographic_with_info_data_existing.drop(columns='Segment')
# Split train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=3)
# Column Transformer
column_transformer = ColumnTransformer([('numerical', MinMaxScaler(), store_demographic_columns)])
column_transformer.fit(X_train)

In [None]:
# Classifier Score on N run test
# Transform Train Data
X_train_transformed = column_transformer.transform(X_train)
# Transform Test Data
X_test_transformed = column_transformer.transform(X_test)

def classifier_test_score(estimator, X_train=X_train_transformed, y_train=y_train, X_test=X_test_transformed, y_test=y_test, cv=100):
    _score_list = []
    for _ in range(cv):
        _cls = estimator().fit(X_train, y_train)
        _score = _cls.score(X_test, y_test)
        _score_list.append(_score)
    return np.array(_score_list)

In [None]:
# Decision Tree Classifier 
decision_tree_result = classifier_test_score(DecisionTreeClassifier)
# Random Forest Classifier
random_forest_result = classifier_test_score(RandomForestClassifier)
# Gradient Boosting Classifier
gradient_boosting = classifier_test_score(GradientBoostingClassifier)

In [None]:
# Model Score
print('Model Accuracy Score in Validation Data')
print(f'Decision Tree Average Accuracy Score in Validation Data: {decision_tree_result.mean().round(2)}')
print(f'Random Forest Average Accuracy Score in Validation Data: {random_forest_result.mean().round(2)}')
print(f'Gradient Boosting Accuracy Average Score in Validation Data: {gradient_boosting.mean().round(2)}')

In [None]:
# Plot Feature Importance: Decision Tree
# Score
decision_tree_model = DecisionTreeClassifier().fit(X_train_transformed, y_train)
print(f'Decision Tree Score in Validation Data: {decision_tree_model.score(X_test_transformed, y_test).round(2)}')
# Feature Importances
feature_imp_dt = pd.Series(decision_tree_model.feature_importances_, index=store_demographic_columns).sort_values()
print('Top 3 features:', list(feature_imp_dt[-3:].index)[::-1])
# Plot
feature_imp_dt.plot(kind='barh', title='Decision Tree: Feature Importances')

In [None]:
# Plot Feature Importance: Random Forest
# Score
random_forest_model = RandomForestClassifier().fit(X_train_transformed, y_train)
print(f'Random Forest Score in Validation Data: {random_forest_model.score(X_test_transformed, y_test).round(2)}')
# Feature Importances
feature_imp_rf = pd.Series(random_forest_model.feature_importances_, index=store_demographic_columns).sort_values()
print('Top 3 features:', list(feature_imp_rf[-3:].index)[::-1])
# Plot
feature_imp_rf.plot(kind='barh', title='Random Forest: Feature Importances')

In [None]:
# Plot Feature Importance: Gradient Boosting
gradient_boosting_model = GradientBoostingClassifier().fit(X_train_transformed, y_train)
print(f'Gradient Boosting Score in Validation Data: {gradient_boosting_model.score(X_test_transformed, y_test).round(2)}')
# Feature Importances
feature_imp_gb = pd.Series(gradient_boosting_model.feature_importances_, index=store_demographic_columns).sort_values()
print('Top 3 features:', list(feature_imp_gb[-3:].index)[::-1])
# Plot
feature_imp_gb.plot(kind='barh', title='Gradient Boosting: Feature Importances')

In [None]:
# Function: To predict New Store Segment
# Input Data
new_data = store_demographic_with_info_data_new
transformed_data = column_transformer.transform(new_data)

def predict_new_store_segment(model, new_store_data=new_data, transformered_data=transformed_data):
    pred = model.predict(transformed_data)
    temp_df = new_store_data.assign(Segment = pred)
    
    return temp_df

In [None]:
# Predict Segment for New Store
segment_result_dt = predict_new_store_segment(decision_tree_model)
segment_result_rf = predict_new_store_segment(random_forest_model)
segment_result_gb = predict_new_store_segment(gradient_boosting_model)

In [None]:
# merge the result for visualazation
columns_merge = ['Store', 'Segment']
temp_merge_df = segment_result_dt[['Store', 'Type', 'Segment']].merge(segment_result_rf[columns_merge], on='Store', suffixes=('_Tree', '_Forest'))
temp_merge_df.merge(segment_result_gb[columns_merge], on='Store').rename(columns={'Segment': 'Segment_Boost'})


In [None]:
columns_store_form_report = ['Store', 'Segment']
print('\nThe Segment for New Stores')
segment_result_gb[columns_store_form_report]

## Task 3: Forecasting
Fresh produce has a short life span, and due to increasing costs, the company wants to have an accurate monthly sales forecast.

You’ve been asked to prepare a monthly forecast for produce sales for the full year of 2016 for both existing and new stores.

Note: Use a 6 month holdout sample for the TS Compare tool (this is because we do not have that much data so using a 12 month holdout would remove too much of the data)

## Task 3 Submission
1. What type of ETS or ARIMA model did you use for each forecast? Use ETS(a,m,n) or ARIMA(ar, i, ma) notation. How did you come to that decision?


2. Please provide a table of your forecasts for existing and new stores. Also, provide visualization of your forecasts that includes historical data, existing stores forecasts, and new stores forecasts.


In [None]:
# Store ans segement pairs
# Variables to work with
columns_store_segment = ['Store', 'Segment']
# existing stores
existing_store_segment = filtered_stores_sales_2015_data[columns_store_segment]
# new stores
new_store_segement = segment_result_gb[columns_store_segment]
# Join Store Sales with Segment
stores_sales_with_segment_data = stores_sales_data.merge(existing_store_segment, on='Store')
stores_sales_with_segment_data.head(3)


In [None]:
# Aggregate Monthly Produce sales for existing store forecast
# existing_store_monthly_sales_data 
tmp_df= stores_sales_with_segment_data.groupby(['Year', 'Month'], as_index=False)['Produce'].agg({'Monthly_Sales': 'sum'})
# convert Year and Month columns to datetimeindex
tmp_date = tmp_df['Year'].astype(str) + '-' + tmp_df['Month'].astype(str)
tmp_df = tmp_df.assign(Date = pd.to_datetime(tmp_date))
existing_store_monthly_sales_data = tmp_df.set_index('Date', drop=True)['Monthly_Sales']
existing_store_monthly_sales_data.index.freq ='MS'
print('\nMonthly Produce Sales Data for Existing Stores')
existing_store_monthly_sales_data.head(3)

In [None]:
# Aggregate Monthly Produce sales for new store forecast 
tmp_df = stores_sales_with_segment_data.groupby(['Store','Year', 'Month', 'Segment'], as_index=False)['Produce'].agg({'Monthly_Sales': 'sum'})
tmp_df = tmp_df.groupby(['Year', 'Month', 'Segment'], as_index=False)['Monthly_Sales'].agg({'Avg_Monthly_Sales': 'mean'})
# convert Year and Month columns to datetimeindex
tmp_date = tmp_df['Year'].astype(str) + '-' + tmp_df['Month'].astype(str)
segment_store_monthly_sales_data = tmp_df.assign(Date = pd.to_datetime(tmp_date))
# Sales per Segment
# Segment 0
segment_0_store_monthly_sales_data = segment_store_monthly_sales_data.query('Segment == 0').set_index('Date', drop=True)['Avg_Monthly_Sales']
segment_0_store_monthly_sales_data.index.freq ='MS'
print('\nMonthly Produce Sales Data for segment 0 Stores')
print(segment_0_store_monthly_sales_data.head(2))
# Segment 1
segment_1_store_monthly_sales_data = segment_store_monthly_sales_data.query('Segment == 1').set_index('Date', drop=True)['Avg_Monthly_Sales']
segment_1_store_monthly_sales_data.index.freq ='MS'
print('\nMonthly Produce Sales Data for segment 1 Stores')
print(segment_1_store_monthly_sales_data.head(2))
# Segment 2
segment_2_store_monthly_sales_data = segment_store_monthly_sales_data.query('Segment == 2').set_index('Date', drop=True)['Avg_Monthly_Sales']
segment_2_store_monthly_sales_data.index.freq ='MS'
print('\nMonthly Produce Sales Data for segment 2 Stores')
print(segment_2_store_monthly_sales_data.head(2))

In [None]:
# Plot Aggregated Monthly Produce Sales
fig, axs = plt.subplots(2, 1, figsize=(12, 9))
axs[0].plot(existing_store_monthly_sales_data, label='Existing Stores')
axs[0].set_title('Monthly Produce Sales for Existing Stores')
axs[0].legend()

axs[1].plot(segment_0_store_monthly_sales_data, label='Segment 0 Stores')
axs[1].plot(segment_1_store_monthly_sales_data, label='Segment 1 Stores')
axs[1].plot(segment_2_store_monthly_sales_data, label='Segment 2 Stores')
axs[1].set_title('Monthly Produce Sales for Segment')
axs[1].legend()

### Monthly Produce Sales and Segment Sales follow similar patten  
Let find best model for Mothly sales then use it to train for segment sales

In [None]:
# Time Series Train Test Data
def time_series_train_test_split(df_with_date_index_data, holdout_size=6):
    train = df_with_date_index_data.iloc[:-holdout_size]
    test = df_with_date_index_data.iloc[-holdout_size:]
    return train, test

In [None]:
# Grid Search ETS Model
# Search Parameters: error, trend, damped_trend, seasonal, seasonal_periods

# helper: Format ETS to return single capital letter
def helper_ets_format(value):
    if not value:
        return 'N'
    return value[0].upper()

# intial values for Error, Trend and Seasonal
e = ['add', 'mul']
t = ['add', 'mul']
s = ['add', 'mul', None]

def ets_grid_search(train, test, errors=e, trends=t, is_damped=[True, False], seasonals=s):
    tmp_result = []
    for error in errors:
        for damped in is_damped:
            if not damped:
                trends = ['add', 'mul', None]
            for trend in trends:
                for seasonal in seasonals:
                    ets_model = ETSModel(train, error=error, trend=trend, damped_trend=damped, seasonal=seasonal, seasonal_periods=12)
                    ets_model_fit = ets_model.fit(disp=0)
                    rmse_result = rmse(test, ets_model_fit.forecast(6)).round()
                    ets_value = f'ETS{helper_ets_format(error), helper_ets_format(trend), helper_ets_format(seasonal)}'
                    tmp_parameters = [ets_value, damped, rmse_result]
                    tmp_result.append(tmp_parameters)
            trends = ['add', 'mul']
    tmp_df = pd.DataFrame(tmp_result, columns=['Model', 'Damped_Trend', 'RMSE'])
    tmp_df = tmp_df.sort_values(by='RMSE')
    return tmp_df.assign(RMSE = lambda x: x.RMSE.map('{:,}'.format))

In [None]:
# Run ETS Grid Search
train, test = time_series_train_test_split(existing_store_monthly_sales_data)
# est_gridsearch_result = ets_grid_search(train, test)

In [None]:
# Best ETS Model
print('ETS Top 5 Best Model')
# est_gridsearch_result.head()

In [None]:
# Grid Search ARIMA Model
# Search Parameters: order, trend
possible_p_q = range(13)
possible_d = range(3) #use pmdarima ndiff function to finds max d value for adf, kpss and pp test we get 2
possible_trend = ['n', 'c', 't', 'ct']

def arima_grid_search(train, test, ps=possible_p_q, ds=possible_d, qs=possible_p_q, trends=possible_trend):
    tmp_result = []
    for d in ds:
        for p in ps:
            for q in qs:
                for trend in trends:
                    try:
                        model = ARIMA(train, order=(p,d,q), trend=trend)
                        model_fit = model.fit(disp=0)
                        rmse_result = rmse(test, model_fit.forecast(6)).round()
                    except:
                        rmse_result = float('inf')

                    arima_value = f'ARIMA{p,d,q}'
                    tmp_parameters = [arima_value, trend, rmse_result]
                    tmp_result.append(tmp_parameters)

    tmp_df = pd.DataFrame(tmp_result, columns=['Model', 'Trend', 'RMSE'])
    tmp_df = tmp_df.sort_values(by='RMSE')
    return tmp_df.assign(RMSE = lambda x: x.RMSE.map('{:,}'.format))

In [None]:
# Run ARIMA Grid Search
train, test = time_series_train_test_split(existing_store_monthly_sales_data)
# arima_gridsearch_result = arima_grid_search(train, test)

In [None]:
# Best ETS Model
print('ARIMA Top 5 Best Model')
# arima_gridsearch_result.head()

In [None]:
# Model Selection:
# ETS Model: ETS(A,M,M) with damped trend
ets_model = ETSModel(train, error='add', trend='mul', damped_trend=True, seasonal='mul', seasonal_periods=12)
ets_model_fit = ets_model.fit(disp=0)
ets_forecast = ets_model_fit.forecast(6).round(2)

# ARIMA Model: ARIMA(0,0,2) with constant trend 'ct'
arima_model = ARIMA(train, order=(0,0,2), trend='ct')
arima_model_fit = arima_model.fit()
arima_forecast = arima_model_fit.forecast(6).round(2)

# Plot Models Forecast
fig, axs = plt.subplots(1, 1, figsize=(12, 6))
# Plot Actual Sales
axs.plot(existing_store_monthly_sales_data, label='Actual Sales')
# Plot ETS forecast Sales
axs.plot(ets_forecast, label='ETS(A,M,M) Forecast')
# Plot ARIMA forecast Sales
axs.plot(arima_forecast, label='ARIMA(0,0,2) Forecast')
# add title and legend
axs.set_title('Monthly Produce Sales for Existing Stores')
axs.legend()

In [None]:
# Forecast 2016 Sales for Existing Stores
# Train the model with all data
existing_stores_forecast = ARIMA(existing_store_monthly_sales_data, order=(0,0,2), trend='ct').fit().forecast(12).round()
existing_stores_forecast

In [None]:
# Forecast 2016 Sales for New Stores
# Train the model with all data in segment
segment_0_stores_forecast = ARIMA(segment_0_store_monthly_sales_data, order=(0,0,2), trend='ct').fit().forecast(12).round()
segment_1_stores_forecast = ARIMA(segment_1_store_monthly_sales_data, order=(0,0,2), trend='ct').fit().forecast(12).round()
segment_2_stores_forecast = ARIMA(segment_2_store_monthly_sales_data, order=(0,0,2), trend='ct').fit().forecast(12).round()
# commbine segments
segment_stores_forecast = pd.concat([segment_0_stores_forecast, segment_1_stores_forecast, segment_2_stores_forecast], axis=1)
segment_stores_forecast.columns = ['Segment_0', 'Segment_1', 'Segment_2']
# Count the segment in best cluster selected
best_cluster = segment_result_gb
segment_counts = best_cluster['Segment'].value_counts()
# add total sales per month
segment_stores_forecast = segment_stores_forecast.assign(Monthly_Sales = lambda x: x.Segment_0*segment_counts[0]+x.Segment_1*segment_counts[1]+x.Segment_2*segment_counts[2])
segment_stores_forecast['Monthly_Sales']