# Getting Data Ready

Forecasting is used in a variety of applications and business use cases: For example, retailers need to forecast the sales of their products to decide how much stock they need by location, Manufacturers need to estimate the number of parts required at their factories to optimize their supply chain, Businesses need to estimate their flexible workforce needs, Utilities need to forecast electricity consumption needs in order to attain an efficient energy network, and enterprises need to estimate their cloud infrastructure needs.

<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/forecast_overview_steps.png" width="98%">

In this notebook we will be walking through the first steps outlined in left-box above.


## Table Of Contents
* Step 1: [Setup Amazon Forecast](#setup)
* Step 2: [Prepare the Datasets](#DataPrep)
* Step 3: [Create the Dataset Group and Dataset](#DataSet)
* Step 4: [Create the Target Time Series Data Import Job](#DataImport)
* [Next Steps](#nextSteps)

For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)

## Step 1: Setup Amazon Forecast<a class="anchor" id="setup"></a>

This section sets up the permissions and relevant endpoints.

In [1]:
!pip install boto3 --upgrade

Collecting boto3
  Downloading boto3-1.21.9-py3-none-any.whl (132 kB)
     |████████████████████████████████| 132 kB 7.9 MB/s            
[?25hCollecting botocore<1.25.0,>=1.24.9
  Downloading botocore-1.24.9-py3-none-any.whl (8.6 MB)
     |████████████████████████████████| 8.6 MB 64.8 MB/s            
Installing collected packages: botocore, boto3
  Attempting uninstall: botocore
    Found existing installation: botocore 1.23.46
    Uninstalling botocore-1.23.46:
      Successfully uninstalled botocore-1.23.46
  Attempting uninstall: boto3
    Found existing installation: boto3 1.20.46
    Uninstalling boto3-1.20.46:
      Successfully uninstalled boto3-1.20.46
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.22.46 requires botocore==1.23.46, but you have botocore 1.24.9 which is incompatible.
aiobotocore 1.3.0 requires botocore<1.20.50,>=1.20.49

In [2]:
import sys
import os
import pandas as pd

# importing forecast notebook utility from notebooks/common directory
sys.path.insert( 0, os.path.abspath("./common") )
import util

%reload_ext autoreload
import boto3
import s3fs

In [3]:
# what is your forecast horizon in number time units you've selected?
# e.g. if you're forecasting in months, how many months out do you want a forecast?
FORECAST_LENGTH = 8

# What is your forecast time unit granularity?
# Choices are: ^Y|M|W|D|H|30min|15min|10min|5min|1min$ 
DATASET_FREQUENCY = "W"
TIMESTAMP_FORMAT = "yyyy-MM-dd"

# What name do you want to give this project?  
# We will use this same name for your Forecast Dataset Group name.
PROJECT = 'm5_sku_prediction_2m'
DATA_VERSION = 4

Configure the S3 bucket name and region name for this lesson.

- If you don't have an S3 bucket, create it first on S3. 
- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in.

In [4]:
import boto3 
import sagemaker 

session = boto3.session.Session()
region = session.region_name
client = boto3.client("sts")
account_id = client.get_caller_identity()["Account"]

bucket_name = sagemaker.session.Session().default_bucket()

In [5]:
# Connect API session
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

<b>Create IAM Role for Forecast</b> <br>
Like many AWS services, Forecast will need to assume an IAM role in order to interact with your S3 resources securely. In the sample notebooks, we use the get_or_create_iam_role() utility function to create an IAM role. Please refer to "notebooks/common/util/fcst_utils.py" for implementation.

In [6]:
# Create the role to provide to Amazon Forecast.
role_name = "ForecastNotebookRole"
print(f"Creating Role {role_name} ...")
role_arn = util.get_or_create_iam_role( role_name = role_name )

# echo user inputs without account
print(f"Success! Created role arn = {role_arn.split('/')[1]}")

Creating Role ForecastNotebookRole ...
Created arn:aws:iam::080438298673:role/ForecastNotebookRole
Attaching policies...
Waiting for a minute to allow IAM role policy attachment to propagate
Done.
Success! Created role arn = ForecastNotebookRole


## Step 2: Prepare the Datasets<a class="anchor" id="DataPrep"></a>

For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) We aggregate the usage data hourly. 

To begin, use Pandas to read the CSV and to show a sample of the data.

In [7]:

tts_file = "./processed_demands.csv"
df = pd.read_csv(tts_file, header=0)
df

Unnamed: 0,item_id,timestamp,demand,store_id,state_id
0,FOODS_1_001,2011-01-29,3,CA_1,CA
1,FOODS_1_005,2011-01-29,3,CA_1,CA
2,FOODS_1_011,2011-01-29,2,CA_1,CA
3,FOODS_1_013,2011-01-29,2,CA_1,CA
4,FOODS_1_016,2011-01-29,4,CA_1,CA
...,...,...,...,...,...
2066732,FOODS_2_078,2016-04-24,3,WI_3,WI
2066733,FOODS_2_081,2016-04-24,3,WI_3,WI
2066734,FOODS_2_082,2016-04-24,1,WI_3,WI
2066735,FOODS_2_083,2016-04-24,1,WI_3,WI


Notice in the output above there are 3 columns of data:

1. The Timestamp
1. A Value
1. An Item ID

These are the 3 key required pieces of information to generate a forecast with Amazon Forecast. More can be added but these 3 must always remain present.

The dataset happens to span January 01, 2014 to Deceber 31, 2014. We are only going to use January to October to train Amazon Forecast.

You may notice a variable named `df` this is a popular convention when using Pandas if you are using the library's dataframe object, it is similar to a table in a database. You can learn more here: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html


At this time the data is ready to be sent to S3 where Forecast will use it later. The following cells will upload the data to S3.

In [8]:
tt_key=f"m5/{PROJECT}_{DATA_VERSION}/m5-demand-time-train.csv"

boto3.Session().resource('s3').Bucket(bucket_name).Object(tt_key).upload_file(tts_file)

### Prepare Meta Data 

In [9]:
meta_file = './item_meta.csv'

meta_df = pd.read_csv(meta_file, header=0)

In [10]:
meta_df.head()

Unnamed: 0,item_id,dept_id,cat_id
0,FOODS_1_001,FOODS_1,FOODS
1,FOODS_1_002,FOODS_1,FOODS
2,FOODS_1_003,FOODS_1,FOODS
3,FOODS_1_004,FOODS_1,FOODS
4,FOODS_1_005,FOODS_1,FOODS


In [11]:
meta_key=f"m5/{PROJECT}_{DATA_VERSION}/m5-item-meta.csv"
boto3.Session().resource('s3').Bucket(bucket_name).Object(meta_key).upload_file(meta_file)

### Prepare Related Time Series 

In [12]:
# rts_file = './related_ts.csv'
rts_file = './related_ts.csv'
rts_df = pd.read_csv(rts_file, header=0)


In [13]:
rts_df

Unnamed: 0,item_id,timestamp,store_id,state_id,event_type_1,event_type_2,snap_CA,snap_TX,snap_WI,sell_price
0,FOODS_1_001,2011-01-29,CA_1,CA,unknown,unknown,0,0,0,2.00
1,FOODS_1_002,2011-01-29,CA_1,CA,unknown,unknown,0,0,0,7.88
2,FOODS_1_003,2011-01-29,CA_1,CA,unknown,unknown,0,0,0,2.88
3,FOODS_1_004,2011-01-29,CA_1,CA,unknown,unknown,0,0,0,
4,FOODS_1_005,2011-01-29,CA_1,CA,unknown,unknown,0,0,0,2.94
...,...,...,...,...,...,...,...,...,...,...
5738995,FOODS_2_080,2016-04-24,WI_3,WI,unknown,unknown,0,0,0,2.18
5738996,FOODS_2_081,2016-04-24,WI_3,WI,unknown,unknown,0,0,0,2.38
5738997,FOODS_2_082,2016-04-24,WI_3,WI,unknown,unknown,0,0,0,5.98
5738998,FOODS_2_083,2016-04-24,WI_3,WI,unknown,unknown,0,0,0,6.72


In [14]:
rts_key=f"m5/{PROJECT}_{DATA_VERSION}/m5-rts.csv"
boto3.Session().resource('s3').Bucket(bucket_name).Object(rts_key).upload_file(rts_file)

## Step 3: Create the Dataset Group and Dataset <a class="anchor" id="DataSet"></a>

In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. Since data files are imported headerless, it is important to define a schema for your data.

More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`.


Next, you need to make some choices. 
<ol>
    <li><b>How many time units do you want to forecast?</b>. For example, if your time unit is Hour, then if you want to forecast out 1 week, that would be 24*7 = 168 hours, so answer = 168. </li>
    <li><b>What is the time granularity for your data?</b>. For example, if your time unit is Hour, answer = "H". </li>
    <li><b>Think of a name you want to give this project (Dataset Group name)</b>, so all files will have the same names.  You should also use this same name for your Forecast DatasetGroup name, to set yourself up for reproducibility. </li>
    </ol>

### Create the Dataset Group

In this task, we define a container name or Dataset Group name, which will be used to keep track of Dataset import files, schema, and all Forecast results which go together.


In [15]:
dataset_group = f"{PROJECT}_{DATA_VERSION}"
print(f"Dataset Group Name = {dataset_group}")

Dataset Group Name = m5_sku_prediction_2m_4


In [16]:
dataset_arns = []
create_dataset_group_response = \
    forecast.create_dataset_group(Domain="RETAIL",
                                  DatasetGroupName=dataset_group,
                                  DatasetArns=dataset_arns)

In [17]:
dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

In [18]:
dataset_group_arn

'arn:aws:forecast:us-west-2:080438298673:dataset-group/m5_sku_prediction_2m_4'

In [19]:
forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

{'DatasetGroupName': 'm5_sku_prediction_2m_4',
 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:080438298673:dataset-group/m5_sku_prediction_2m_4',
 'DatasetArns': [],
 'Domain': 'RETAIL',
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 19000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 19000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': 'b32bd28f-dc34-4200-8704-255b3160d0ae',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 01 Mar 2022 13:25:13 GMT',
   'x-amzn-requestid': 'b32bd28f-dc34-4200-8704-255b3160d0ae',
   'content-length': '267',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create the Schema

In [20]:
#        store_id	state_id
# item_id	timestamp	demand	location
ts_schema ={
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"demand",
         "AttributeType":"float"
      },
      {
         "AttributeName":"store_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"state_id",
         "AttributeType":"string"
      }

   ]
}

### Create the Dataset

In [21]:
ts_dataset_name = f"{PROJECT}_{DATA_VERSION}_tts"
print(ts_dataset_name)

m5_sku_prediction_2m_4_tts


In [22]:
response = \
forecast.create_dataset(Domain="RETAIL",
                        DatasetType='TARGET_TIME_SERIES',
                        DatasetName=ts_dataset_name,
                        DataFrequency=DATASET_FREQUENCY,
                        Schema=ts_schema
                       )

In [23]:
ts_dataset_arn = response['DatasetArn']

In [24]:
forecast.describe_dataset(DatasetArn=ts_dataset_arn)

{'DatasetArn': 'arn:aws:forecast:us-west-2:080438298673:dataset/m5_sku_prediction_2m_4_tts',
 'DatasetName': 'm5_sku_prediction_2m_4_tts',
 'Domain': 'RETAIL',
 'DatasetType': 'TARGET_TIME_SERIES',
 'DataFrequency': 'W',
 'Schema': {'Attributes': [{'AttributeName': 'item_id',
    'AttributeType': 'string'},
   {'AttributeName': 'timestamp', 'AttributeType': 'timestamp'},
   {'AttributeName': 'demand', 'AttributeType': 'float'},
   {'AttributeName': 'store_id', 'AttributeType': 'string'},
   {'AttributeName': 'state_id', 'AttributeType': 'string'}]},
 'EncryptionConfig': {},
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 188000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 188000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': 'eb042f52-6f56-4b5d-b4fb-7770efbc9039',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 01 Mar 2022 13:25:12 GMT',
   'x-amzn

### Create Meta Schema 

In [25]:
# item_id	product_type	segment
# product_type	segmentation	style_code	color_code	item_style
# ['item_id', 'ListingPrice', 
#        'SAPLevel1Code', 'SAPLevel2Code', 'SAPLevel3Code', 'SAPLevel4Code',
#        'SAPLevel6Code', 'SAPLevel7Code', 'SAPLevel8Code', 'SAPLevel9Code',
#         'MaterialType']
# dept_id	cat_id
meta_schema ={
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      }, 
      
      {
         "AttributeName":"dept_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"cat_id",
         "AttributeType":"string"
      }
   ]
}

In [26]:
meta_dataset_name = f"{PROJECT}_{DATA_VERSION}_mt"
print(meta_dataset_name)

m5_sku_prediction_2m_4_mt


In [27]:
response = \
forecast.create_dataset(Domain="RETAIL",
                        DatasetType='ITEM_METADATA',
                        DatasetName=meta_dataset_name,
                        Schema=meta_schema
                       )

In [28]:
meta_dataset_arn = response['DatasetArn']

In [29]:
forecast.describe_dataset(DatasetArn = meta_dataset_arn)

{'DatasetArn': 'arn:aws:forecast:us-west-2:080438298673:dataset/m5_sku_prediction_2m_4_mt',
 'DatasetName': 'm5_sku_prediction_2m_4_mt',
 'Domain': 'RETAIL',
 'DatasetType': 'ITEM_METADATA',
 'Schema': {'Attributes': [{'AttributeName': 'item_id',
    'AttributeType': 'string'},
   {'AttributeName': 'dept_id', 'AttributeType': 'string'},
   {'AttributeName': 'cat_id', 'AttributeType': 'string'}]},
 'EncryptionConfig': {},
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 374000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 374000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '0fcd5500-d9d0-4353-a96e-3f1ca50dd2f3',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 01 Mar 2022 13:25:13 GMT',
   'x-amzn-requestid': '0fcd5500-d9d0-4353-a96e-3f1ca50dd2f3',
   'content-length': '476',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create Related Time Series Schema 

In [30]:

# rolling_mean_t4	rolling_std_t4	rolling_mean_t12	rolling_mean_t24
# store_id,state_id,event_type_1,event_type_2,snap_CA,snap_TX,snap_WI,sell_price

rts_schema ={
    "Attributes": [
        {
             "AttributeName":"item_id",
             "AttributeType":"string"
        },
        {
             "AttributeName":"timestamp",
             "AttributeType":"timestamp"
        },
        {
            "AttributeName": "store_id",
            "AttributeType": "string"
        },
        {
            "AttributeName": "state_id",
            "AttributeType": "string"
        },
        
        {
            "AttributeName": "event_type_1",
            "AttributeType": "string"
        },
        {
            "AttributeName": "event_type_2",
            "AttributeType": "string"
        },
        {
            "AttributeName": "snap_CA",
            "AttributeType": "string"
        },
        {
            "AttributeName": "snap_TX",
            "AttributeType": "string"
        },
        {
            "AttributeName": "snap_WI",
            "AttributeType": "string"
        },
        {
            "AttributeName": "sell_price",
            "AttributeType": "float"
        },
#         {
#             "AttributeName": "rolling_mean_t1",
#             "AttributeType": "float"
#         },
#         {
#             "AttributeName": "rolling_mean_t2",
#             "AttributeType": "float"
#         },
#         {
#             "AttributeName": "rolling_mean_t4",
#             "AttributeType": "float"
#         },
#         {
#             "AttributeName": "rolling_mean_t12",
#             "AttributeType": "float"
#         },
#         {
#             "AttributeName": "rolling_mean_t24",
#             "AttributeType": "float"
#         }
    ]
}


In [31]:
rts_dataset_name = f"{PROJECT}_{DATA_VERSION}_rts"
print(rts_dataset_name)

m5_sku_prediction_2m_4_rts


In [32]:
response = \
forecast.create_dataset(Domain="RETAIL",
                            DatasetType='RELATED_TIME_SERIES',
                            DatasetName=rts_dataset_name,
                            DataFrequency=DATASET_FREQUENCY,
                            Schema=rts_schema
                       )

In [33]:
rts_dataset_arn = response['DatasetArn']

In [34]:
forecast.describe_dataset(DatasetArn = rts_dataset_arn)

{'DatasetArn': 'arn:aws:forecast:us-west-2:080438298673:dataset/m5_sku_prediction_2m_4_rts',
 'DatasetName': 'm5_sku_prediction_2m_4_rts',
 'Domain': 'RETAIL',
 'DatasetType': 'RELATED_TIME_SERIES',
 'DataFrequency': 'W',
 'Schema': {'Attributes': [{'AttributeName': 'item_id',
    'AttributeType': 'string'},
   {'AttributeName': 'timestamp', 'AttributeType': 'timestamp'},
   {'AttributeName': 'store_id', 'AttributeType': 'string'},
   {'AttributeName': 'state_id', 'AttributeType': 'string'},
   {'AttributeName': 'event_type_1', 'AttributeType': 'string'},
   {'AttributeName': 'event_type_2', 'AttributeType': 'string'},
   {'AttributeName': 'snap_CA', 'AttributeType': 'string'},
   {'AttributeName': 'snap_TX', 'AttributeType': 'string'},
   {'AttributeName': 'snap_WI', 'AttributeType': 'string'},
   {'AttributeName': 'sell_price', 'AttributeType': 'float'}]},
 'EncryptionConfig': {},
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2022, 3, 1, 13, 25, 13, 505000, tzinfo=tzlocal()

### Update the dataset group with the datasets we created
You can have multiple datasets under the same dataset group. Update it with the datasets we created before.

In [35]:
dataset_arns = []
dataset_arns.append(ts_dataset_arn)
dataset_arns.append(rts_dataset_arn)
dataset_arns.append(meta_dataset_arn)
forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)

{'ResponseMetadata': {'RequestId': '4766f62e-f557-4e4c-a767-ab7d0291123a',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 01 Mar 2022 13:25:13 GMT',
   'x-amzn-requestid': '4766f62e-f557-4e4c-a767-ab7d0291123a',
   'content-length': '2',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Step 4: Create a Target Time Series Dataset Import Job <a class="anchor" id="DataImport"></a>


Now that Forecast knows how to understand the CSV we are providing, the next step is to import the data from S3 into Amazon Forecaast.

In [36]:
# Recall path to your data
ts_s3_data_path = "s3://"+bucket_name+"/"+tt_key
print(f"S3 URI for your data file = {ts_s3_data_path}")

S3 URI for your data file = s3://sagemaker-us-west-2-080438298673/m5/m5_sku_prediction_2m_4/m5-demand-time-train.csv


In [37]:
ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_data_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT)

In [38]:
ts_dataset_import_job_arn=ts_dataset_import_job_response['DatasetImportJobArn']
ts_dataset_import_job_arn

'arn:aws:forecast:us-west-2:080438298673:dataset-import-job/m5_sku_prediction_2m_4_tts/m5_sku_prediction_2m_4'

Check the status of dataset, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes.

In [39]:
status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))
assert status

CREATE_PENDING ..
CREATE_IN_PROGRESS ..........................................
ACTIVE 


In [40]:
forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)

{'DatasetImportJobName': 'm5_sku_prediction_2m_4',
 'DatasetImportJobArn': 'arn:aws:forecast:us-west-2:080438298673:dataset-import-job/m5_sku_prediction_2m_4_tts/m5_sku_prediction_2m_4',
 'DatasetArn': 'arn:aws:forecast:us-west-2:080438298673:dataset/m5_sku_prediction_2m_4_tts',
 'TimestampFormat': 'yyyy-MM-dd',
 'UseGeolocationForTimeZone': False,
 'DataSource': {'S3Config': {'Path': 's3://sagemaker-us-west-2-080438298673/m5/m5_sku_prediction_2m_4/m5-demand-time-train.csv',
   'RoleArn': 'arn:aws:iam::080438298673:role/ForecastNotebookRole'}},
 'FieldStatistics': {'demand': {'Count': 2066737,
   'CountDistinct': 122,
   'CountNull': 0,
   'CountNan': 0,
   'Min': '1.0',
   'Max': '166.0',
   'Avg': 3.2724187934894475,
   'Stddev': 4.081777847905018,
   'CountLong': 2066737,
   'CountDistinctLong': 122,
   'CountNullLong': 0,
   'CountNanLong': 0},
  'item_id': {'Count': 2066737,
   'CountDistinct': 300,
   'CountNull': 0,
   'CountLong': 2066737,
   'CountDistinctLong': 300,
   'Count

### Step 5: Create a Item Meta Data Dataset Import Job <a class="anchor" id="DataImport"></a>


Now that Forecast knows how to understand the CSV we are providing, the next step is to import the data from S3 into Amazon Forecaast.

In [41]:
# Recall path to your data
meta_s3_data_path = "s3://"+bucket_name+"/"+meta_key
print(f"S3 URI for your data file = {meta_s3_data_path}")

S3 URI for your data file = s3://sagemaker-us-west-2-080438298673/m5/m5_sku_prediction_2m_4/m5-item-meta.csv


In [42]:
meta_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,
                                       DatasetArn=meta_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": meta_s3_data_path,
                                             "RoleArn": role_arn
                                         },  
                                       })

In [43]:
meta_dataset_arn

'arn:aws:forecast:us-west-2:080438298673:dataset/m5_sku_prediction_2m_4_mt'

In [44]:
meta_dataset_import_job_arn=meta_dataset_import_job_response['DatasetImportJobArn']
meta_dataset_import_job_arn

'arn:aws:forecast:us-west-2:080438298673:dataset-import-job/m5_sku_prediction_2m_4_mt/m5_sku_prediction_2m_4'

In [45]:
dataset_group

'm5_sku_prediction_2m_4'

In [46]:
status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=meta_dataset_import_job_arn))
assert status

CREATE_PENDING 
CREATE_IN_PROGRESS ...........
ACTIVE 


### Step 6: Related Time Series Dataset Import Job <a class="anchor" id="DataImport"></a>


Now that Forecast knows how to understand the CSV we are providing, the next step is to import the data from S3 into Amazon Forecaast.

In [47]:
# Recall path to your data
rts_s3_data_path = "s3://"+bucket_name+"/"+rts_key
print(f"S3 URI for your data file = {rts_s3_data_path}")

S3 URI for your data file = s3://sagemaker-us-west-2-080438298673/m5/m5_sku_prediction_2m_4/m5-rts.csv


In [48]:
rts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,
                                       DatasetArn=rts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": rts_s3_data_path,
                                             "RoleArn": role_arn
                                         } 
                                       }, 
                                       TimestampFormat=TIMESTAMP_FORMAT)

In [49]:
rts_s3_data_path

's3://sagemaker-us-west-2-080438298673/m5/m5_sku_prediction_2m_4/m5-rts.csv'

In [50]:
rts_dataset_import_job_arn=rts_dataset_import_job_response['DatasetImportJobArn']
rts_dataset_import_job_arn

'arn:aws:forecast:us-west-2:080438298673:dataset-import-job/m5_sku_prediction_2m_4_rts/m5_sku_prediction_2m_4'

In [51]:
status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=rts_dataset_import_job_arn))
assert status

CREATE_PENDING .
CREATE_IN_PROGRESS .......................
ACTIVE 


## Next Steps<a class="anchor" id="nextSteps"></a>

At this point you have successfully imported your data into Amazon Forecast and now it is time to get started in the next notebook to build your first model. To Continue, execute the cell below to store important variables where they can be used in the next notebook, then open `2.Building_Your_Predictor.ipynb`.

In [52]:
# Now save your choices for the next notebook 
# %store item_id
%store PROJECT
%store DATA_VERSION
%store FORECAST_LENGTH
%store DATASET_FREQUENCY
%store TIMESTAMP_FORMAT
%store ts_dataset_import_job_arn
%store ts_dataset_arn
%store dataset_group_arn
%store role_arn
%store bucket_name
%store region
%store tt_key

Stored 'PROJECT' (str)
Stored 'DATA_VERSION' (int)
Stored 'FORECAST_LENGTH' (int)
Stored 'DATASET_FREQUENCY' (str)
Stored 'TIMESTAMP_FORMAT' (str)
Stored 'ts_dataset_import_job_arn' (str)
Stored 'ts_dataset_arn' (str)
Stored 'dataset_group_arn' (str)
Stored 'role_arn' (str)
Stored 'bucket_name' (str)
Stored 'region' (str)
Stored 'tt_key' (str)
