In [1]:
library(reticulate)
sagemaker = import('sagemaker')

bucket = sagemaker$Session()$default_bucket()
prefix = "sagemaker/demo-r-byo"

role = sagemaker$get_execution_role()

In [2]:
role

In [17]:
session = sagemaker$Session()

In [18]:
algorithm_name = "rmars"

_Note: Although we could do preliminary data transformations in the notebook, we'll avoid doing so, instead choosing to do those transformations inside the container.  This is not typically the best practice for model efficiency, but provides some benefits in terms of flexibility._

In [19]:
boto3_r = import('boto3')

In [20]:
region = boto3_r$Session()$region_name
account = boto3_r$client('sts')$get_caller_identity()$Account

Now we'll create an estimator using the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk).  This allows us to specify:
- The training container image in ECR
- The IAM role that controls permissions for accessing the S3 data and executing SageMaker functions
- Number and type of training instances
- S3 path for model artifacts to be output to
- Any hyperparameters that we want to have the same value across all training jobs during tuning

In [21]:
library(tidyverse)

## Data

In [22]:
# loading airly sensor (sensor ids = 7201, 7599, 7803; instal ids = 41414, 41816, 42022) data
# for the period 15-08-2021 to 22-11-2021 (Only NO2, T and RH signals)
data_file <- 'data/data_airly.csv'
data_airly_all <- read_csv(file=data_file)#, col_types = cols("d", "d", "T", "c", "d", "d", "d"))
head(data_airly_all)

New names:
* `` -> ...1

[1mRows: [22m[34m342156[39m [1mColumns: [22m[34m7[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): status
[32mdbl[39m  (5): ...1, id, humidity, no2, temperature
[34mdttm[39m (1): date


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



...1,id,date,status,humidity,no2,temperature
<dbl>,<dbl>,<dttm>,<chr>,<dbl>,<dbl>,<dbl>
0,42022,2021-08-15 00:00:00,RAW,71.82511,297.1198,24.94795
1,42022,2021-08-15 00:05:00,RAW,72.20479,297.1906,24.84013
2,42022,2021-08-15 00:10:00,RAW,72.43551,297.3429,24.74854
3,42022,2021-08-15 00:15:00,RAW,72.64743,297.4239,24.66797
4,42022,2021-08-15 00:20:00,RAW,72.76068,297.6219,24.64623
5,42022,2021-08-15 00:25:00,RAW,73.05018,297.7695,24.57725


In [23]:
# loading coloated reference sensor (balcony analyser T200) data for the same period
data_file <- 'data/data_ref-LONDON.csv'
data_ref_all <- read_csv(file=data_file) %>% 
    mutate(Time = lubridate::force_tz(Time, "Europe/London")) %>%   # set time zone (for downloaded data) to pc local tzone
    mutate(date = lubridate::with_tz(Time, "UTC")) # convert time zone to UTC to match airly data
head(data_ref_all)

[1mRows: [22m[34m28512[39m [1mColumns: [22m[34m4[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m  (3): NO, NO2, NOX
[34mdttm[39m (1): Time


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



Time,NO,NO2,NOX,date
<dttm>,<dbl>,<dbl>,<dbl>,<dttm>
2021-08-15 01:00:00,4.43,21.3,25.8,2021-08-15 00:00:00
2021-08-15 01:05:00,3.14,20.8,24.0,2021-08-15 00:05:00
2021-08-15 01:10:00,2.85,20.6,23.4,2021-08-15 00:10:00
2021-08-15 01:15:00,3.6,20.8,24.4,2021-08-15 00:15:00
2021-08-15 01:20:00,4.04,21.2,25.2,2021-08-15 00:20:00
2021-08-15 01:25:00,4.53,22.0,26.5,2021-08-15 00:25:00


In [24]:
ID = 42022

In [25]:
data_airly_id <- data_airly_all %>% 
    filter(id==ID) %>% 
    select(date, status, no2) %>% 
    pivot_wider(names_from = status, values_from = c(no2))
head(data_airly_id)

date,RAW,AUX,PPB,FINAL
<dttm>,<dbl>,<dbl>,<dbl>,<dbl>
2021-08-15 00:00:00,297.1198,291.0501,26.0064,47.91067
2021-08-15 00:05:00,297.1906,290.9707,26.64045,49.09779
2021-08-15 00:10:00,297.3429,290.9539,27.30686,50.34218
2021-08-15 00:15:00,297.4239,290.9917,27.46796,50.65224
2021-08-15 00:20:00,297.6219,291.08,27.83964,51.33843
2021-08-15 00:25:00,297.7695,291.0375,28.59472,52.74053


In [26]:
data_airly_TRH <- data_airly_all %>% 
    filter(id==ID) %>% 
    filter(status=="RAW") %>% 
    select(date, temperature, humidity)
head(data_airly_TRH)

date,temperature,humidity
<dttm>,<dbl>,<dbl>
2021-08-15 00:00:00,24.94795,71.82511
2021-08-15 00:05:00,24.84013,72.20479
2021-08-15 00:10:00,24.74854,72.43551
2021-08-15 00:15:00,24.66797,72.64743
2021-08-15 00:20:00,24.64623,72.76068
2021-08-15 00:25:00,24.57725,73.05018


In [28]:
install.packages('openair')
library(openair)

also installing the dependencies ‘jpeg’, ‘latticeExtra’, ‘mapproj’


Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done


Attaching package: ‘openair’


The following object is masked from ‘package:reticulate’:

    import




In [29]:
data_base <- data_airly_id %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "min", start.date = "2021-08-15 00:00:00")

In [30]:
data_base_TRH <- data_airly_TRH %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "mean", start.date = "2021-08-15 00:00:00")

In [31]:
data_base_comb <- cbind(data_base[,1:2], data_base_TRH[,2]) %>% 
    na.exclude()
head(data_base_comb)

Unnamed: 0_level_0,date,RAW,temperature
Unnamed: 0_level_1,<dttm>,<dbl>,<dbl>
1,2021-08-15 00:00:00,297.1198,24.60835
2,2021-08-15 01:00:00,296.3621,23.92445
3,2021-08-15 02:00:00,295.7771,23.37839
4,2021-08-15 03:00:00,295.4487,22.81762
5,2021-08-15 04:00:00,295.5031,22.99725
6,2021-08-15 05:00:00,294.5177,24.34073


In [32]:
base_train <- data_base_comb %>% 
    select(-date)

In [None]:
write_csv(base_train, "data/base_train.csv")

In [33]:
data_airly_idTRH <- merge(data_airly_id, data_airly_TRH)
head(data_airly_idTRH)

Unnamed: 0_level_0,date,RAW,AUX,PPB,FINAL,temperature,humidity
Unnamed: 0_level_1,<dttm>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2021-08-15 00:00:00,297.1198,291.0501,26.0064,47.91067,24.94795,71.82511
2,2021-08-15 00:05:00,297.1906,290.9707,26.64045,49.09779,24.84013,72.20479
3,2021-08-15 00:10:00,297.3429,290.9539,27.30686,50.34218,24.74854,72.43551
4,2021-08-15 00:15:00,297.4239,290.9917,27.46796,50.65224,24.66797,72.64743
5,2021-08-15 00:20:00,297.6219,291.08,27.83964,51.33843,24.64623,72.76068
6,2021-08-15 00:25:00,297.7695,291.0375,28.59472,52.74053,24.57725,73.05018


In [34]:
# split data into training; first 30 days (~30%), the rest for test (~70%)
data_airly_train <- data_airly_idTRH %>%
  filter(date < "2021-09-15 00-00-00")

data_airly_test <- data_airly_idTRH %>%
  filter(date >= "2021-09-15 00-00-00")

data_ref_train <- data_ref_all %>%
  filter(date < "2021-09-15 00-00-00")

data_ref_test <- data_ref_all %>%
  filter(date >= "2021-09-15 00-00-00")

In [36]:
head(data_airly_test)

Unnamed: 0_level_0,date,RAW,AUX,PPB,FINAL,temperature,humidity
Unnamed: 0_level_1,<dttm>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2021-09-15 00:00:00,297.0186,290.4469,30.05517,48.12218,17.34458,84.18478
2,2021-09-15 00:05:00,296.9539,290.4402,29.83899,47.843,17.30262,84.36512
3,2021-09-15 00:10:00,296.9006,290.423,29.70262,47.66598,17.26921,84.36981
4,2021-09-15 00:15:00,297.4361,290.4441,31.75936,50.37513,17.22725,84.76824
5,2021-09-15 00:20:00,297.4461,290.399,31.98491,50.67905,17.1681,85.36808
6,2021-09-15 00:25:00,297.3139,290.4098,31.43061,49.95518,17.1462,85.35519


In [37]:
data_airly_train_1hr <- data_airly_train %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "mean", start.date = "2021-08-15 00:00:00")

data_ref_train_1hr <- data_ref_train %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "mean", start.date = "2021-08-15 00:00:00")

In [38]:
data_train_comb <- inner_join(data_airly_train_1hr, data_ref_train_1hr, by=c("date"="date"))

In [39]:
head(data_train_comb)

date,RAW,AUX,PPB,FINAL,temperature,humidity,Time,NO,NO2,NOX
<dttm>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dttm>,<dbl>,<dbl>,<dbl>
2021-08-15 00:00:00,297.4983,291.0624,27.46018,50.64405,24.60835,72.8412,2021-08-15 01:27:30,4.676667,21.55833,26.225
2021-08-15 01:00:00,296.7222,290.968,25.08844,46.37331,23.92445,74.40823,2021-08-15 02:27:30,4.335833,17.8,22.15
2021-08-15 02:00:00,296.2973,290.9715,23.58629,43.66986,23.37839,75.36103,2021-08-15 03:27:30,4.755,15.29167,20.05
2021-08-15 03:00:00,295.6056,290.8266,21.67852,40.20739,22.81762,75.84851,2021-08-15 04:27:30,3.353333,11.87417,15.225
2021-08-15 04:00:00,295.7697,291.0592,21.3027,39.47796,22.99725,75.73416,2021-08-15 05:27:30,2.855,10.91667,13.775
2021-08-15 05:00:00,295.5761,291.3231,19.16184,35.35252,24.34073,73.23513,2021-08-15 06:27:30,4.393333,13.33333,17.73333


In [41]:
gas_train <- data_train_comb %>% 
    select(NO2, RAW, temperature) %>% 
    na.exclude()

In [42]:
write_csv(gas_train, "data/gas_train.csv")

In [48]:
data_airly_test_1hr <- data_airly_test %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "mean", start.date = "2021-09-15 00:00:00")

data_ref_test_1hr <- data_ref_test %>% 
    openair::timeAverage(avg.time = "60 min", statistic = "mean", start.date = "2021-09-15 00:00:00")

In [49]:
data_test_comb <- inner_join(data_airly_test_1hr, data_ref_test_1hr, by=c("date"="date"))

In [50]:
head(data_test_comb)

date,RAW,AUX,PPB,FINAL,temperature,humidity,Time,NO,NO2,NOX
<dttm>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dttm>,<dbl>,<dbl>,<dbl>
2021-09-15 00:00:00,297.1858,290.4173,30.91114,49.27974,17.10584,85.24315,2021-09-15 01:27:30,6.069167,20.91667,26.98333
2021-09-15 01:00:00,296.6088,290.4185,28.79548,46.56564,16.60335,86.91472,2021-09-15 02:27:30,5.256667,19.25,24.525
2021-09-15 02:00:00,295.9173,290.3378,26.48576,43.57857,16.1131,86.60823,2021-09-15 03:27:30,9.43,16.95,26.36667
2021-09-15 03:00:00,295.9334,290.265,26.94795,44.24168,15.67477,86.84233,2021-09-15 04:27:30,7.5525,16.06667,23.64167
2021-09-15 04:00:00,296.0547,290.2665,27.51941,45.03515,15.39307,87.85701,2021-09-15 05:27:30,20.241667,16.39167,36.60833
2021-09-15 05:00:00,296.4984,290.4345,28.72853,46.63902,15.37075,88.66242,2021-09-15 06:27:30,29.433333,17.25,46.7


In [53]:
gas_test <- data_test_comb %>%
    na.exclude()

In [54]:
write_csv(gas_test, "data/gas_test.csv")

## upload

In [46]:
session$upload_data(path="data/base_train.csv",
                                      bucket=bucket,
                                      key_prefix=str_glue(prefix, ID, "train", "base", .sep="/"))

In [47]:
session$upload_data(path="data/gas_train.csv",
                                      bucket=bucket,
                                      key_prefix=str_glue(prefix, ID, "train", "gas", .sep="/"))

## train

In [9]:
estimator = sagemaker$estimator$Estimator(
    image_uri=str_glue("{account}.dkr.ecr.{region}.amazonaws.com/rmars:latest"),
    role=role,
    instance_count=1L,
    instance_type="ml.m4.xlarge",
    output_path=str_glue("s3://{bucket}/{prefix}/output"),
    sagemaker_session=session,
    hyperparameters=list('target' = 'RAW',
                       'degree'= 2)
)  # Setting constant hyperparameter

# target is by defauld "RAW". See mars.R where this is set.

Once we've defined our estimator we can specify the hyperparameters that we'd like to tune and their possible values.  We have three different types of hyperparameters.
- Categorical parameters need to take one value from a discrete set.  We define this by passing the list of possible values to `CategoricalParameter(list)`
- Continuous parameters can take any real number value between the minimum and maximum value, defined by `ContinuousParameter(min, max)`
- Integer parameters can take any integer value between the minimum and maximum value, defined by `IntegerParameter(min, max)`

*Note, if possible, it's almost always best to specify a value as the least restrictive type.  For example, tuning `thresh` as a continuous value between 0.01 and 0.2 is likely to yield a better result than tuning as a categorical parameter with possible values of 0.01, 0.1, 0.15, or 0.2.*

In [10]:
# to set the degree as a varying HP to tune, use: 'degree': IntegerParameter(1, 3) and remove it from the Estimator

hyperparameter_ranges = list(
    "thresh" = sagemaker$parameter$ContinuousParameter(0.001, 0.01),
    "prune" = sagemaker$parameter$CategoricalParameter(c("TRUE", "FALSE"))
)

Next we'll specify the objective metric that we'd like to tune and its definition.  This metric is output by a `print` statement in our `mars.R` file.  Its critical that the format aligns with the regular expression (Regex) we then specify to extract that metric from the CloudWatch logs of our training job.

In [11]:
objective_metric_name = 'mse'
metric_definitions = list(list('Name'= 'mse',
                               'Regex'= 'mse: ([0-9\\.]+)'))

Now, we'll create a `HyperparameterTuner` object, which we pass:
- The MXNet estimator we created above
- Our hyperparameter ranges
- Objective metric name and definition
- Whether we should maximize or minimize our objective metric (defaults to 'Maximize')
- Number of training jobs to run in total and how many training jobs should be run simultaneously.  More parallel jobs will finish tuning sooner, but may sacrifice accuracy.  We recommend you set the parallel jobs value to less than 10% of the total number of training jobs (we'll set it higher just for this example to keep it short).

In [12]:
tuner = sagemaker$tuner$HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    objective_type='Minimize',
    max_jobs=9L,
    max_parallel_jobs=3L)

# Fit!

And finally, we can start our hyperparameter tuning job by calling `.fit()` and passing in the S3 paths to our train (and val) datasets (folders).

*Note, typically for hyperparameter tuning, we'd want to specify both a training and validation dataset and optimize the objective metric from the validation dataset.  However, because data is a very small dataset we'll skip the step of splitting into training and validation.  In practice, doing this could lead to a model that overfits to our training data and does not generalize well.*

In [13]:
tuner$fit(list('train'=str_glue("s3://{bucket}/{prefix}/{ID}/train")), wait=FALSE)

In [14]:
status = boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner$latest_tuning_job$job_name
    )$HyperParameterTuningJobStatus

In [15]:
status

In [None]:
# while (status == "Completed") {
    
#     status = boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
#     HyperParameterTuningJobName=tuner$latest_tuning_job$job_name
#     )$HyperParameterTuningJobStatus

#     completed = boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
#     HyperParameterTuningJobName=tuner$latest_tuning_job$job_name)$TrainingJobStatusCounters$Completed

#     prog = boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
#     HyperParameterTuningJobName=tuner$latest_tuning_job$job_name)$TrainingJobStatusCounters$InProgress
    
#     print(str_glue("{status}, Completed Jobs: {completed}, In Progress Jobs: {prog}"))
        
#     Sys.sleep(30)
# }

Wait until the HPO job is complete, and then run the following cell:

In [None]:
boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner$latest_tuning_job$job_name
)$BestTrainingJob

---

## HPO Analysis

Now that we've started our hyperparameter tuning job, it will run in the background and we can close this notebook.  Once finished, we can use the [HPO Analysis notebook](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb) to determine which set of hyperparameters worked best.

For more detail on Amazon SageMaker's Hyperparameter Tuning, please refer to the AWS documentation. 

---
## Host

Hosting the model we just tuned takes three steps in Amazon SageMaker.  First, we define the model we want to host, pointing the service to the model artifact our training job just wrote to S3.

We will use the results of the HPO for this purpose, but using `hyper_parameter_tuning_job` method.

In [None]:
best_training = boto3_r$client("sagemaker")$describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner$latest_tuning_job$job_name
)$BestTrainingJob

In [None]:
# Get the best trainig job and S3 location for the model file
best_model_s3 = boto3_r$client("sagemaker")$describe_training_job(
    TrainingJobName=best_training$TrainingJobName
)$ModelArtifacts$S3ModelArtifacts
best_model_s3

In [None]:
# import time

r_job = str_glue("demo-r-byo", ID, format(Sys.time(), '%H-%M-%S'), .sep='-')

In [None]:
r_hosting_container = list(
    "Image"=str_glue("{account}.dkr.ecr.{region}.amazonaws.com/{algorithm_name}:latest"),
    "ModelDataUrl"=best_model_s3
)

create_model_response = boto3_r$client("sagemaker")$create_model(
    ModelName=r_job, ExecutionRoleArn=role, PrimaryContainer=r_hosting_container
)

print(create_model_response$ModelArn)

Next, let's create an endpoing configuration, passing in the model we just registered.  In this case, we'll only use a few c4.xlarges.

In [None]:
r_endpoint_config = str_glue("demo-r-byo-config", ID, format(Sys.time(), '%H-%M-%S'), .sep='-')
print(r_endpoint_config)

create_endpoint_config_response = boto3.client("sagemaker").create_endpoint_config(
    EndpointConfigName=r_endpoint_config,
    ProductionVariants=[
        {
            "InstanceType": "ml.t2.medium",
            "InitialInstanceCount": 1,
            "ModelName": r_job,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

Finally, we'll create the endpoints using our endpoint configuration from the last step.

In [None]:
%%time

r_endpoint = "demo-r-endpoint-" + time.strftime("%Y%m%d%H%M", time.gmtime())
print(r_endpoint)
create_endpoint_response = boto3.client("sagemaker").create_endpoint(
    EndpointName=r_endpoint, EndpointConfigName=r_endpoint_config
)
print(create_endpoint_response["EndpointArn"])

resp = boto3.client("sagemaker").describe_endpoint(EndpointName=r_endpoint)
status = resp["EndpointStatus"]
print("Status: " + status)

try:
    boto3.client("sagemaker").get_waiter("endpoint_in_service").wait(EndpointName=r_endpoint)
finally:
    resp = boto3.client("sagemaker").describe_endpoint(EndpointName=r_endpoint)
    status = resp["EndpointStatus"]
    print("Arn: " + resp["EndpointArn"])
    print("Status: " + status)

    if status != "InService":
        raise Exception("Endpoint creation did not succeed")

---
## Predict
To confirm our endpoints are working properly, let's try to invoke the endpoint.

_Note: The payload we're passing in the request is a CSV string with a header record, followed by multiple new lines.  It also contains text columns, which the serving code converts to the set of indicator variables needed for our model predictions.  Again, this is not a best practice for highly optimized code, however, it showcases the flexibility of bringing your own algorithm._

In [None]:
json <- reticulate::import("json")
runtime = boto3_r$Session()$client("runtime.sagemaker")

In [None]:
r_endpoint = 'demo-r-endpoint-202112032200'

In [None]:
csv_serializer = sagemaker$serializers$CSVSerializer(content_type='text/csv')

In [None]:
# there is a limit of max 500 samples at a time for invoking endpoints )?)
payload = data_airly_idTRH$temperature[1:10]
my_payload_as_csv = csv_serializer$serialize(payload) # Payload (aka, data) for inference.

In [None]:
my_payload_as_csv

In [None]:
response = runtime$invoke_endpoint(EndpointName=r_endpoint, 
                                   ContentType="text/csv", 
                                   Body=my_payload_as_csv)

In [None]:
result = json$loads(response$Body$read()$decode())
display(result)

In [None]:
payload

We can see the result is a CSV of predictions for our target variable.  Let's compare them to the actuals to see how our model did.

In [None]:
import matplotlib.pyplot as plt
import numpy as np


plt.scatter(base_test["RAW"], np.fromstring(result[0], sep=","), alpha=0.4, s=50)
plt.xlabel("Actual")
plt.ylabel("Prediction")
x = np.linspace(*plt.xlim())
plt.plot(x, x, linestyle="--", color="g", linewidth=1)

plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np

plt.scatter(base_test[" temperature"], base_test["RAW"], alpha=0.4, s=50)
plt.plot(base_test[" temperature"], np.fromstring(result[0], sep=","), color="red")
plt.xlabel("Actual")
plt.ylabel("Prediction")
x = np.linspace(*plt.xlim())
# plt.plot(x, x, linestyle="--", color="g", linewidth=1)



plt.show()

### (Optional) Clean-up

If you're ready to be done with this notebook, please run the cell below.  This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on.

In [None]:
#boto3.client("sagemaker").delete_endpoint(EndpointName=r_endpoint)

## deploy

In [None]:
mars_pred= tuner$deploy(initial_instance_count = 1L, instance_type = 'ml.t2.medium')

## predict

In [None]:
# cols = names(test)[names(test)== <target>]
# preds = predict(mars_pred, data_airly_idTRH$temperature)

In [None]:
mars_pred

In [136]:
# Alternatively you can call the predict method in mars_pred class
# mars_pred$serializer = csv_serializer
# mars_pred$deserializer = csv_deserializer
pred = mars_pred$predict(data_airly_idTRH$temperature[1:10])

ERROR: Error in py_call_impl(callable, dots$args, dots$keywords): ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary with message "{"error":"500 - Internal server error"}". See https://eu-west-2.console.aws.amazon.com/cloudwatch/home?region=eu-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/rmars-211203-2206-007-5a1de485 in account 870953422121 for more information.

Detailed traceback:
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/sagemaker/predictor.py", line 161, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)



In [None]:
mars_pred$serializer

In [None]:
mars_pred$predict(my_payload_as_csv)

In [139]:
data_airly_idTRH$temperature[1:10]

In [None]:
my_payload_as_csv

In [None]:
mars_pred$predict(my_payload_as_csv)

In [140]:
write_csv(data_airly_idTRH %>% select(temperature), "data/base_test2.csv")

In [None]:
my_payload_as_csv

In [141]:
csv_serializer$serialize(data_airly_idTRH$temperature[1:10])

In [142]:
# there is a limit of max 500 samples at a time for invoking endpoints )?)
payload = data_airly_idTRH %>% select(temperature)
my_payload_as_csv = csv_serializer$serialize(payload) # Payload (aka, data) for inference.
my_payload_as_csv

ERROR: Error in py_call_impl(callable, dots$args, dots$keywords): KeyError: 0

Detailed traceback:
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/sagemaker/serializers.py", line 112, in serialize
    has_multiple_rows = len(data) > 0 and self._is_sequence_like(data[0])
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err



In [149]:
head((data_airly_idTRH %>% select(temperature))[[1]])

In [143]:
mars_pred

<sagemaker.predictor.Predictor>

In [144]:
mars_pred$serializer = csv_serializer

In [150]:
pred = mars_pred$predict((data_airly_idTRH %>% select(temperature))[[1]])

ERROR: Error in py_call_impl(callable, dots$args, dots$keywords): ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary with message "{"error":"500 - Internal server error"}". See https://eu-west-2.console.aws.amazon.com/cloudwatch/home?region=eu-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/rmars-211203-2206-007-5a1de485 in account 870953422121 for more information.

Detailed traceback:
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/sagemaker/predictor.py", line 161, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/anaconda3/envs/R/lib/python3.9/site-packages/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)

