# Basic Python Analysis and TimeSeries with RMSSE
1. [Objective and Evaluation of Performance](#objective)
2. [Datasets](#datasets)
3. [EDA: Calendar](#calendar)
4. [EDA: Sales](#sales)

---
## Objective and Evaluation of Performance <a id="objective"></a>

**Competition:** M5 Forecasting - Accuracy https://www.kaggle.com/c/m5-forecasting-accuracy/overview

**Objective:** Estimate the unit sales of Walmart retail goods

**Description:** forecasting by identifying the method(s) that provide the most accurate point forecasts for each of the **42,840 time series** of the competition. Provide 28 days ahead (4 weeks) point forecasts (PFs) for all the series of the competition.

* 3,049 Products
* 3 Categories: Hobbies, Foods, and Household
* 7 Departments: 
* 10 Stores: 
* 3 States: CA, TX, and WI

**Evaluation:** Root Mean Squared Scaled Error (RMSSE), which is a variant of the well-known Mean Absolute Scaled Error (MASE). The performance measures are first computed for each series separately by averaging their values across the forecasting horizon and then averaged again across the series in a weighted fashion. A lower WRMSSE score is better.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

---
## Datasets <a id="datasets"></a>

1. calendar
2. sales
3. sample_submission
4. sell_prices

---
## Calendar <a id="calendar"></a>
Contains information about the dates the products are sold from 2011-01-29 to middle 2016.
*	date: The date in a “y-m-d” format.
*	wm_yr_wk: The id of the week the date belongs to.
*	weekday: The type of the day (Saturday, Sunday, …, Friday).
*	wday: The id of the weekday, starting from Saturday.
*	month: The month of the date.
*	year: The year of the date.
*	event_name_1: If the date includes an event, the name of this event.
*	event_type_1: If the date includes an event, the type of this event.
*	event_name_2: If the date includes a second event, the name of this event.
*	event_type_2: If the date includes a second event, the type of this event.
*	snap_CA, snap_TX, and snap_WI: A binary variable (0 or 1) indicating whether the stores of CA, TX or WI allow SNAP  purchases on the examined date. 1 indicates that SNAP purchases are allowed.

SNAP = Supplemental Nutritional Assistance Program

In [None]:
# Import calendar dataset and show first lines
import pandas as pd
calend = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/calendar.csv')
calend.head()

In [None]:
# amount of lines, columns
calend.shape

In [None]:
# Years from most 2011 to middle 2016
calend.year.value_counts()

In [None]:
# Events, Holidays
calend.event_name_1.value_counts()

In [None]:
calend.event_type_1.value_counts()

In [None]:
calend.event_name_2.value_counts()

In [None]:
calend.event_type_2.value_counts()

---
## Sales <a id="sales"></a>

Contains the historical daily unit sales data per product and store.
*	item_id: The id of the product.
*	dept_id: The id of the department the product belongs to.
*	cat_id: The id of the category the product belongs to.
*	store_id: The id of the store where the product is sold.
*	state_id: The State where the store is located.
*	d_1, d_2, …, d_i, … d_1941: The number of units sold at day i, starting from 2011-01-29. 

In [None]:
import pandas as pd
sales = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/sales_train_validation.csv')
sales.head()

In [None]:
sales.shape

In [None]:
# Sales per State
sales.state_id.value_counts()

In [None]:
# Products per category
sales.cat_id.value_counts()

In [None]:
sales.dept_id.value_counts()

In [None]:
sales.store_id.value_counts()

In [None]:
sales.describe()

In [None]:
# Plot max sales per day


---
## Sample Submission <a id="sample"></a>