# Cloudydap Cost Analysis

In [1]:
import numpy as np
import pandas as pd
from aws_price_list import AWSOffersIndex
from bokeh.charts import (output_notebook, output_file, show, 
                          Scatter, Histogram, TimeSeries, Donut, Step, Bar)
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import Range1d, HoverTool, ResizeTool


output_notebook()

In [2]:
def percent_change(before, after):
    return 100 * (after/before - 1).dropna(how='all')

# Input Cloudydap Cost Data

In [3]:
r = pd.read_csv('../../logs/cloudydap_costs.csv')

Check if there is a column named "Arch" (indicates these are indeed Cloudydap cost data):

In [4]:
if 'Arch' not in r:
    raise RuntimeError('Missing "Arch" column')

Are there any null values in the `Arch` column:

In [5]:
if r.Arch.isnull().any():
    raise ValueError('Null values detected in the "Arch" column')

What do we have?

In [6]:
r.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 553 entries, 0 to 552
Data columns (total 74 columns):
identity/LineItemId                    553 non-null object
identity/TimeInterval                  553 non-null object
bill/InvoiceId                         0 non-null float64
bill/BillingEntity                     553 non-null object
bill/BillType                          553 non-null object
bill/PayerAccountId                    553 non-null int64
bill/BillingPeriodStartDate            553 non-null object
bill/BillingPeriodEndDate              553 non-null object
lineItem/UsageAccountId                553 non-null int64
lineItem/LineItemType                  553 non-null object
lineItem/UsageStartDate                553 non-null object
lineItem/UsageEndDate                  553 non-null object
lineItem/ProductCode                   553 non-null object
lineItem/UsageType                     553 non-null object
lineItem/Operation                     553 non-null object
lineItem/Avai

## Analysis

Breakdown on the entries per architecture (and what their identifiers are):

In [7]:
r.Arch.value_counts().sort_index()

A1    437
A2     55
A3     61
Name: Arch, dtype: int64

Time span for each architecture's cost data:

In [8]:
grp = r.groupby('Arch')

Architecture \#1:

In [9]:
grp.get_group('A1')['lineItem/UsageStartDate'].min()

'2017-02-23T10:00:00+00:00'

In [10]:
grp.get_group('A1')['lineItem/UsageEndDate'].max()

'2017-02-24T05:00:00+00:00'

Architecture \#2:

In [11]:
grp.get_group('A2')['lineItem/UsageStartDate'].min()

'2017-02-24T10:00:00+00:00'

In [12]:
grp.get_group('A2')['lineItem/UsageEndDate'].max()

'2017-02-24T12:00:00+00:00'

Architecture \#3:

In [13]:
grp.get_group('A3')['lineItem/UsageStartDate'].min()

'2017-02-25T10:00:00+00:00'

In [14]:
grp.get_group('A3')['lineItem/UsageEndDate'].max()

'2017-02-25T12:00:00+00:00'

### Cost

Total cost per architecture in US$:

In [15]:
tot_cost = grp['lineItem/BlendedCost'].sum()
tot_cost

Arch
A1    14.027309
A2     2.232738
A3     2.264586
Name: lineItem/BlendedCost, dtype: float64

In [16]:
f = Donut(tot_cost, title='Total AWS Cost per Architecture')
show(f)

Total cost as a percentage of the total's total:

In [17]:
100 * tot_cost/tot_cost.sum()

Arch
A1    75.722469
A2    12.052804
A3    12.224727
Name: lineItem/BlendedCost, dtype: float64

Cost percentage change when compared to Architecture \#1:

In [18]:
percent_change(tot_cost['A1'], tot_cost)

Arch
A1     0.000000
A2   -84.082923
A3   -83.855879
Name: lineItem/BlendedCost, dtype: float64

Group the data by architecture, AWS service, and AWS service's operation:

In [19]:
grp = r.groupby(['Arch', 'lineItem/ProductCode', 'lineItem/UsageType', 'lineItem/Operation'])

#### Blended Costs

Blended cost totals per the above groups:

In [20]:
blend_cost = grp['lineItem/BlendedCost'].sum()
blend_cost

Arch  lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
A1    AmazonEC2             BoxUsage:m4.xlarge           RunInstances          1.268500e+01
                            DataTransfer-In-Bytes        RunInstances          0.000000e+00
                            DataTransfer-Out-Bytes       RunInstances          0.000000e+00
                            DataTransfer-Regional-Bytes  PublicIP-In           0.000000e+00
                                                         PublicIP-Out          8.723100e-04
                            EBS:SnapshotUsage            CreateSnapshot        6.504439e-02
                            EBS:VolumeUsage.gp2          CreateVolume-Gp2      8.214284e-01
                            EBSOptimized:m4.xlarge       Hourly                0.000000e+00
      AmazonS3              DataTransfer-Out-Bytes       GetObject             0.000000e+00
                            Requests-Tier1               ListBucketVersions    2.751000e-02
    

Remove the zero costs for easier calculations later:

In [21]:
blend_cost = blend_cost[blend_cost != 0]

In [22]:
f = Donut(blend_cost['A1']['AmazonEC2'], title='Cloudydap AWS Cost Breakdown for Architecture #1',
          plot_height=400, plot_width=400)
show(f)

##### Blended Cost Percent Changes between Two Architectures

Cost percentage change between Arch. \#2 and \#1:

In [23]:
percent_change(blend_cost['A1'], blend_cost['A2'])

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances         -86.440678
                      DataTransfer-Regional-Bytes  PublicIP-Out          52.058328
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2     -86.231884
AmazonS3              Requests-Tier1               ListBucketVersions   -90.676118
                      Requests-Tier2               GetObject             88.502714
                                                   HeadBucket           -90.372671
                      TimedStorage-ByteHrs         StandardStorage        0.003222
                      USE1-USW2-AWS-Out-Bytes      GetObject            -50.921642
                                                   HeadBucket           -90.229885
                                                   ListBucketVersions   -90.645253
Name: lineItem/BlendedCost, dtype: float64

Cost percentage change between Arch. \#3 and \#1:

In [24]:
percent_change(blend_cost['A1'], blend_cost['A3'])

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances         -86.440678
                      DataTransfer-Regional-Bytes  PublicIP-Out          52.043425
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2     -82.608696
AmazonS3              Requests-Tier1               ListBucketVersions   -88.131589
                      Requests-Tier2               GetObject             75.513725
                                                   HeadBucket           -90.838509
                      TimedStorage-ByteHrs         StandardStorage        0.009088
                      USE1-USW2-AWS-Out-Bytes      GetObject            -54.420944
                                                   HeadBucket           -90.804598
                                                   ListBucketVersions   -88.103912
Name: lineItem/BlendedCost, dtype: float64

Cost percentage change between Arch. \#3 and \#2:

In [25]:
percent_change(blend_cost['A2'], blend_cost['A3'])

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances           0.000000
                      DataTransfer-Regional-Bytes  PublicIP-Out          -0.009801
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2      26.315789
AmazonS3              Requests-Tier1               ListBucketVersions    27.290448
                      Requests-Tier2               GetObject             -6.890612
                                                   HeadBucket            -4.838710
                      TimedStorage-ByteHrs         StandardStorage        0.005865
                      USE1-USW2-AWS-Out-Bytes      GetObject             -7.130031
                                                   HeadBucket            -5.882353
                                                   ListBucketVersions    27.166324
Name: lineItem/BlendedCost, dtype: float64

##### Blended Cost as Percentage of the Total per Architecture

Architecture \#1:

In [26]:
blend_cost['A1']/blend_cost['A1'].sum() * 100

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances          90.430744
                      DataTransfer-Regional-Bytes  PublicIP-Out           0.006219
                      EBS:SnapshotUsage            CreateSnapshot         0.463698
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2       5.855923
AmazonS3              Requests-Tier1               ListBucketVersions     0.196117
                                                   PutObject              0.000356
                      Requests-Tier2               GetObject              0.056202
                                                   HeadBucket             0.001836
                                                   ReadACL                0.000011
                      TimedStorage-ByteHrs         StandardStorage        2.685975
                      USE1-USW2-AWS-Out-Bytes      GetObject              0.001941
                 

Architecture \#2:

In [27]:
blend_cost['A2']/blend_cost['A2'].sum() * 100

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances          77.035474
                      DataTransfer-Regional-Bytes  PublicIP-Out           0.059408
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2       5.065316
AmazonS3              Requests-Tier1               ListBucketVersions     0.114881
                      Requests-Tier2               GetObject              0.665586
                                                   HeadBucket             0.001111
                      TimedStorage-ByteHrs         StandardStorage       16.875347
                      USE1-USW2-AWS-Out-Bytes      GetObject              0.005986
                                                   HeadBucket             0.000023
                                                   ListBucketVersions     0.176867
Name: lineItem/BlendedCost, dtype: float64

Architecture \#3:

In [28]:
blend_cost['A3']/blend_cost['A3'].sum() * 100

lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
AmazonEC2             BoxUsage:m4.xlarge           RunInstances          75.952080
                      DataTransfer-Regional-Bytes  PublicIP-In            0.058567
                                                   PublicIP-Out           0.058567
                      EBS:VolumeUsage.gp2          CreateVolume-Gp2       6.308311
AmazonS3              Requests-Tier1               ListBucketVersions     0.144176
                      Requests-Tier2               GetObject              0.611008
                                                   HeadBucket             0.001042
                      TimedStorage-ByteHrs         StandardStorage       16.638995
                      USE1-USW2-AWS-Out-Bytes      GetObject              0.005481
                                                   HeadBucket             0.000021
                                                   ListBucketVersions     0.221752
Name: lineItem/Bl

### Usage Amount

In [29]:
use_amt = grp['lineItem/UsageAmount'].sum()
use_amt = use_amt[use_amt != 0]
use_amt

Arch  lineItem/ProductCode  lineItem/UsageType           lineItem/Operation
A1    AmazonEC2             BoxUsage:m4.xlarge           RunInstances          5.900000e+01
                            DataTransfer-In-Bytes        RunInstances          6.271791e-01
                            DataTransfer-Out-Bytes       RunInstances          3.444954e-02
                            DataTransfer-Regional-Bytes  PublicIP-In           8.722682e-02
                                                         PublicIP-Out          8.723084e-02
                            EBS:SnapshotUsage            CreateSnapshot        1.300888e+00
                            EBS:VolumeUsage.gp2          CreateVolume-Gp2      8.214286e+00
                            EBSOptimized:m4.xlarge       Hourly                5.900000e+01
      AmazonS3              DataTransfer-Out-Bytes       GetObject             3.551022e-02
                            Requests-Tier1               ListBucketVersions    5.502000e+03
    

Isolate only the number of Hyrax S3 requests:

In [30]:
use_amt.loc[pd.IndexSlice[:, ['AmazonS3'], ['Requests-Tier2'], ['GetObject']]]

Arch  lineItem/ProductCode  lineItem/UsageType  lineItem/Operation
A1    AmazonS3              Requests-Tier2      GetObject             19709.0
A2    AmazonS3              Requests-Tier2      GetObject             37152.0
A3    AmazonS3              Requests-Tier2      GetObject             34592.0
Name: lineItem/UsageAmount, dtype: float64

The change in the number of S3 requests between the architectures (A1 → A2 → A3):

In [31]:
use_amt.loc[pd.IndexSlice[:, ['AmazonS3'], ['Requests-Tier2'], ['GetObject']]].diff()

Arch  lineItem/ProductCode  lineItem/UsageType  lineItem/Operation
A1    AmazonS3              Requests-Tier2      GetObject                 NaN
A2    AmazonS3              Requests-Tier2      GetObject             17443.0
A3    AmazonS3              Requests-Tier2      GetObject             -2560.0
Name: lineItem/UsageAmount, dtype: float64