# **Project-2**

In this project, you will analyze and predict the weekly sales for a retail store. The dataset includes weekly sales data for *45* store locations over a *143-week* period. Create a machine learning model (**regression**) to predict weekly sales values using the train and test datasets provided.

**Dataset Details:**

*Store*: Store number

*Week*: 1 through 143

*Temperature*: Weekly outside temperature

*Holiday*: Yes for holiday week, No for non-holiday week

*CPI*: The Consumer Price Index

*Fuel Price*: Price per gallon

*Unemployment*: Unemployment rate

*WeeklySales*: Total sales amount


**Datasets Locations and Names:**
Canvas -> Modules -> Week 5 -> Datasets -> "trainSales.csv" and "testSales.csv".

Download the .ipynb file and save as FirstName_LastName_Project2.ipynb. Please submit (upload) your source code to Canvas.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

sales = pd.read_csv("trainSales.csv")

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=50)

for train_index, test_index in split.split(sales, sales[""]):
    strat_train_set = sales.iloc[train_index]
    strat_test_set = sales.iloc[test_index]

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy="medium")

numeric_std_pipline = Pipeline([('imputer', SimpleImputer(strategy='median')),
                                  ('stdscaler', StandardScaler())])

numeric_minmax_pipeline = Pipeline([('imputer', SimpleImputer(strategy='median')),
                                    ('minmaxscaler', MinMaxScaler())])

cat_pipeline = Pipeline([('onehot', OneHotEncoder())])

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor as RFR

full_transformer = ColumnTransformer([('numeric_stdpreprocessing', numeric_std_pipeline, num_std_attribs),
                                ('numeric_minmaxpreprocessing', numeric_minmax_pipeline, num_maxmin_attribs),
                                 ('cat_preprocessing', cat_pipeline, cat_attribs)
                                ])

p1_full_pipeline = Pipeline([('all_column_transformation', full_transformer),
                        ('linear_regression', LinearRegression())
                      ])



p2_full_pipeline = Pipeline([('all_column_transformation', full_transformer),
               ("RFR_model", RFR())
               ])

In [None]:
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

def fit_and_print(p, train_set, train_labels, test_set, test_labels):
  p.fit(train_set, train_labels)
  train_preds = p.predict(train_set)
  test_preds = p.predict(test_set)
  print("Training Error: " + str(mean_absolute_error(train_preds, train_labels)))
  print("Test Error: " + str(mean_absolute_error(test_preds, test_labels)))
  print("R2 score: " + str(r2_score(test_preds, test_labels)))

In [None]:
fit_and_print(p1_full_pipeline, train, train_labels, test, test_labels)

In [None]:
fit_and_print(p2_full_pipeline, train, train_labels, test, test_labels)