# Preparing Time Series Data

## 1.0 Import Library

In [1]:
import numpy as np
import pandas as pd

<br />

---

## 2.0 Preview the Sales Data

### 2.1 Load and Preview First Fives Rows

In [22]:
df = pd.read_csv('sales.csv', parse_dates=['date'])

In [23]:
df.head()

Unnamed: 0,date,store_id,cat_id,sales
0,2011-01-29,TX_1,FOODS,3950.35
1,2011-01-30,TX_1,FOODS,3844.97
2,2011-01-31,TX_1,FOODS,2888.03
3,2011-02-01,TX_1,FOODS,3631.28
4,2011-02-02,TX_1,FOODS,3072.18


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58230 entries, 0 to 58229
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      58230 non-null  datetime64[ns]
 1   store_id  58230 non-null  object        
 2   cat_id    58230 non-null  object        
 3   sales     58230 non-null  float64       
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 1.8+ MB


<br />

### 2.2 Number of Unique Values of `store_id` and `cat_id`

In [7]:
df[['store_id', 'cat_id']].nunique()

store_id    10
cat_id       3
dtype: int64

<br />

### 2.3 Date Range of the `sales` data

In [29]:
start_date = df['date'].min()
end_date = df['date'].max()

In [30]:
('Start Date', start_date), ('End Date', end_date)

(('Start Date', Timestamp('2011-01-29 00:00:00')),
 ('End Date', Timestamp('2016-05-22 00:00:00')))

The date range of the `sales` data is from 29 Jan 2011 to 22 May 2016

<br />

---

## 3.0 Process the Data

### 3.1 Duplicated Dates Within A Store and Category Group

In [25]:
df[df.duplicated(['date', 'store_id', 'cat_id'])]

Unnamed: 0,date,store_id,cat_id,sales


Nope, there isn't any duplicated dates within a store and category group

<br />

### 3.2 Check for Missing dates

In [41]:
daily_df = pd.date_range(start=start_date, end=end_date).to_frame(name='daily_date')

In [46]:
pd.merge(daily_df, df, left_index=True, right_on='date', how='left').isnull().sum()

daily_date    0
date          0
store_id      0
cat_id        0
sales         0
dtype: int64

There isn't any missing value in the `sales` data

<br />

### 3.3 Check Outliners with IQR method