# AWS Spot Price Predictions

## AWS Spot Instances

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.

According to Amazon, Spot instances can be used to stateless/faul-tolerant/flexible worklouds. The idea it to provide machines for much lower prices while promising no guarantees regarding how long the machine will be available, providing only 60 seconds notice before terminating the machine.


## Goals
The idea is to use the discounts Spot instances provide to minimize the cost of cloud workloads. To achieve that, we want to predict both price changes and upcoming interruptions. Thus providing the workload with enough time to gracefully hand over resources and safely shutting down the machine with minimal loss of progress.

So the main 2 goals are:
- Provide predictions for Spot price changes
- Provide predictions for upcoming interruptions


## Data Fetching

For the NN model we needed as much data as possible. Some of the data was provided to us (`spotData.csv`) and some we tried to fetch ourselves.

We created a script `simple_history_log.py` that fetches as much data as possible using the AWS API (called `boto3`). The script concurrently (for different regions) fetches data from the last 3 months (that's all the API can provide) from most recent to oldest. Since the script takes a long time to run, we added a mechanism to save the progress periodically as well as let the script continue where it left off.

### The Fetching Proccess using `boto3`
Since, naturally, there is a lot of data in a span of 3 months, the API returns a single "page" (chunk) of data for each API request. In the response, there's a `next_token` which we need to provide for the next request to get the next chunk of data. Thus, the fetching per region needs to be done in a serial manner.
Each respone is a json in the following format:
```json
{
    'NextToken': '<base64_token>',
    'ResponseMetadata': {/* We don't really care about that */},
    'SpotPriceHistory': [
        /* Timestamped entries. Each entry is for a specific machine, OS and region. It contains the timestamp and the price */
        {
            'AvailabilityZone': 'eu-north-1b',
            'InstanceType': 'r5n.12xlarge',
            'ProductDescription': 'Windows',
            'SpotPrice': '2.928700',
            'Timestamp': datetime.datetime(2022, 3, 9, 20, 35, 8, tzinfo=tzutc())
         },
         ...
    ]
}
```

It is also important to note that it's not 100% clear what is the exact meaning of the data we are fetching. Sometimes we can get big gaps between data points (timestamp wise. For example we might not see any data for an instance between Feb 10th until Feb 12th). 
At first we thought the API only provides a data spot when the price changes but then we noticed there are sometimes very sequential data point with very minimal time gaps (seconds sometimes) with identical prices.

We couldn't find any definite explanation for the phenomenon but here are a few theories that might be worth exploring:
1. The big gaps mean the machine was not available at all as a Spot instance at that time
2. The prices were changing but with such minos changes that the precision isn't fine enough to show it (i.e. the backend sends a data point because it detected a change but after server-side rounding, the data point is identical to the last one)

We found contradicting examples for each of those but we might be missing something.
For example we expected to see very recent timestamp data point for an available Spot machine (if theory 1 is correct) but we didn't always see that, we noticed at times relatively old data points but that are aligned with the current unchanged price.
We didn't explore theory 2 enough to say.

It's also unclear whether or not a data point will be provided when the machine isn't even available. Also, we don't know what is the precision of this data and if it "catches" all of the changes.

We continued with those unknowns in mind.

## Directions

### NN

### Static Analysis




In [11]:
print (123)

123
