# Cost Calculator

This will walk you through calculating the number of metrics in your dataset and then using that value to estimate your costs for Lookout for Metrics.

**Note** This is reported as an estimate because it assumes that you may have new entries in terms of values in your dimensions that are not known in your historical dataset, they will of course have an impact on your total costs. Use this as a guide.**END_NOTE**

This notebook can be executed in your environment by deploying the `getting_started` resources, then browsing back to this folder inside a SageMaker Notebook Instance.

Next upload your historical data into this folder, we will then explore the pricing of a CSV file named `historical.csv` that has been included here. 

Follow along with the notebook as is first, then once you understand the process, update the filename to match your uploaded content and follow allong to completed the pricing exercise.

In [11]:
import pandas as pd

In [12]:
CSV_FILENAME = "historical.csv"

After updating the filename above to reflect your content, run the cell below to see a sample of your data:

In [13]:
data = pd.read_csv(CSV_FILENAME)
data.sample(5)

Unnamed: 0,platform,marketplace,timestamp,views,revenue
21697,pc_web,es,2021-02-13 01:00:00,103,30.9
18444,pc_web,jp,2021-02-06 14:00:00,246,73.8
44008,mobile_web,jp,2021-03-29 07:00:00,310,93.0
118388,mobile_web,es,2021-08-23 21:00:00,470,141.0
51989,mobile_app,us,2021-04-14 03:00:00,326,97.8


In the above cell, we see that `timestamp` was our timestamp field so now we can read the file again with some more specific instructions.

In [14]:
data = pd.read_csv(CSV_FILENAME,parse_dates=True, index_col='timestamp',)
data.sample(5)

Unnamed: 0_level_0,platform,marketplace,views,revenue
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-01-08 01:00:00,mobile_app,es,142,42.6
2021-06-29 05:00:00,mobile_app,fr,239,71.7
2021-06-30 01:00:00,mobile_web,it,610,183.0
2021-07-11 09:00:00,mobile_app,de,420,126.0
2021-06-19 12:00:00,mobile_app,de,700,210.0


Here in this dataset we now see there are a few colums:

Numerical:
* Views
* Revenue

Categorical:
* platform
* marketplace

In the parlance of Lookout for Metrics, this means our Domains are `platform` and `marketplace` and our Measures are `views` and `revenue`. The values within the domains are responsible for a large portion of the number of distinct metrics and the number of columns of measures account for the rest. The basic calculator then for the total number of metrics is:

```
(distinct_values(domain1) * distinct_values(domain2)) * number_of_measure_columns
```

In the cell below we first state the number of measure columns, followed by the list of domains that we wish to monitor in our dataset:

In [15]:
number_of_measure_columns = 2
list_of_domains = ["platform", "marketplace"]

The cell below is a function that will take in our data, and the list of domains, and the number of columns and will return the total number of measures, you can simply run it to see the value:

In [16]:
def generate_unique_metrics(input_data, domain_list, number_of_measures):
    """
    """
    # Assign to 0 first:
    metrics = 0
    for item in domain_list:
        unique_values = input_data.eval(item).nunique()
        # Check for the first entry
        if metrics <= 0:
            metrics += unique_values
        # Sort the rest
        else:
            metrics = metrics * unique_values
    # Now combine the number of measures:
    metrics = metrics * number_of_measures
    return metrics

In [17]:
number_of_metrics = generate_unique_metrics(input_data=data, domain_list=list_of_domains, number_of_measures=number_of_measure_columns)
number_of_metrics

42

In [18]:
number_of_metrics = 3503

Here we see that there are 42 unique metrics in our data, the next step is to determining the pricing, you can learn more about pricing here: https://aws.amazon.com/lookout-for-metrics/pricing/ . The cell below contains a function that will take in the total count then returns the USD price.

In [19]:
def generate_pricing(number_of_metrics):
    assert number_of_metrics>=0
    price_tiers = [
        ( 50000, 0.05 ),
        ( 20000, 0.10 ),
        ( 5000, 0.25 ),
        ( 1000, 0.50 ),
        ( 0, 0.75 ),
    ]
    price = 0
    n = number_of_metrics
    for bottom_number_of_metrics, cost_per_metric in price_tiers:
        if n > bottom_number_of_metrics:
            cost_for_this_tier = (n-bottom_number_of_metrics) * cost_per_metric
            price += cost_for_this_tier
            n = bottom_number_of_metrics
            #print ("Cost for %d ~ : %.2f" % (bottom_number_of_metrics,cost_for_this_tier) )
    print("The total cost monthly for this workload of: " + str(number_of_metrics) +" metrics is: $" + str(format(price, '.2f')))

In [20]:
generate_pricing(number_of_metrics)

The total cost monthly for this workload of: 3503 metrics is: $2001.50
