# Obtención e Inserción de datos del histórico de AWS


### Realizado por: Luis Mendoza Montero

** Desarrollo del trabajo: **

- Seleccionar el tipo de familia y la región
- Obtener histórico de datos desde la nube para la instancia **.2xlarge**
- Insertar datos originales a nuestra BBDD

** Regiones y zonas disponibles para cada tipo de familia **

* C4:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - sa-east-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* C5:
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - sa-east-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* D2:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* F1:
   - eu-west-1
   - us-east-1
   - us-west-2
   
* H1:
   - eu-west-1
   - us-east-1
   - us-east-2
   - us-west-2
   
* I3:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - sa-east-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* M4:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - sa-east-1
   - us-east-2
   - eu-west-1
   - eu-west-2
   
* M5:
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* P3:
   - ap-northeast-1
   - ap-northeast-2
   - eu-west-1
   - us-east-1
   - us-east-2
   - us-west-2
 
* R4:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - eu-west-2
   - sa-east-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* T2:
   - ap-northeast-1
   - ap-northeast-2
   - ap-south-1
   - ap-southeast-1
   - ap-southeast-2
   - ca-central-1
   - eu-central-1
   - eu-west-1
   - sa-east-1
   - us-east-1
   - us-east-2
   - us-west-1
   - us-west-2
   
* X1e:
   - ap-northeast-1
   - eu-west-1
   - us-east-1
   - us-west-2


** NO TODAS LAS REGIONES ESTÁ DISPONIBLES PARA CADA FAMILIA **

### Especificar Tamaño de instancia

In [1]:
instancia = '2xlarge'

### Especificar Familia de características

In [2]:
family = 'X1e'

### Especificar Región

In [3]:
region = 'us-west-2'

**----------------------------------------------------------------------------------------------------------------------------**

### Sacamos las fechas

In [4]:
from datetime import datetime, date, time, timedelta
import calendar

# Fecha de comienzo (90 días anteriores)
date = "%Y-%m-%dT%H:%M:%S"
today = datetime.now() # Fecha actual
ndays = timedelta(days=90) 
start = today-ndays # Se restan 90 días a la fecha actual
start = start.strftime(date) 

# Fecha de finalización (Actual)
end = today.strftime(date) 
fam = family.lower()
instance_types  = [fam + '.' + instancia]

number_of_days = 90
print ("Descarga del histórico desde " + start + " hasta " + end)


Descarga del histórico desde 2018-07-06T22:22:47 hasta 2018-10-04T22:22:47


### Cargando ficheros

In [5]:
import sys
import boto as boto
import boto.ec2 as ec2
import datetime, time
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use
%pylab inline
%matplotlib inline

ec2 = boto.ec2.connect_to_region(region)


Populating the interactive namespace from numpy and matplotlib


In [6]:
#
# Procesamos la salida y lo convertimos en un dataframe
#

l = []
for instance in instance_types:
    sys.stdout.write("*** Procesando instancia: " + instance + " ***\n")
    sys.stdout.flush()
    prices = ec2.get_spot_price_history(start_time=start, end_time=end, instance_type=instance)
    for price in prices:
        d = {'InstanceType': price.instance_type, 
             'AvailabilityZone': price.availability_zone, 
             'SpotPrice': price.price, 
             'Timestamp': price.timestamp,
             'Description': price.product_description}
        l.append(d)
    next = prices.next_token
    while (next != ''):
        sys.stdout.write(".")
        sys.stdout.flush()
        prices = ec2.get_spot_price_history(start_time=start, end_time=end, instance_type=instance,
                                            next_token=next )
        for price in prices:
            d = {'InstanceType': price.instance_type, 
                 'AvailabilityZone': price.availability_zone, 
                 'SpotPrice': price.price, 
                 'Timestamp': price.timestamp,
                 'Description': price.product_description}
            l.append(d)
        next = prices.next_token
        
    sys.stdout.write("\n")

df = pd.DataFrame(l)
df = df.set_index(pd.to_datetime(df['Timestamp']))


*** Procesando instancia: x1e.2xlarge ***



### Convertimos los datos a un dataset

In [7]:
# Convertimos de formato df a csv para guardar la BBDD
df.to_csv('datos.csv', encoding='utf-8', index=False)

### Insertar en la Base de Datos

In [8]:
import sqlite3
import csv

# Abrimos el archivo CSV
f=open('datos.csv','r') 
reader = csv.reader(f, delimiter=',')

# Creamos la variable con el nombre de la tabla (categoría + región)
indice = 0
lista = []
while indice < len(region):
    carac = region[indice]
    if carac == '-':
        carac = '_'
    lista.append(carac)
    indice += 1
cadena = " ".join(lista)
reg = (cadena.replace(' ', ''))

table = family+"_"+reg
# Conectar con la base de datos, si no existe la crea automáticamente
conexion = sqlite3.connect("BBDD.db")

# Para poder ejecutar código SQL, tenemos que crear un cursor primero, el nombre de la tabla será la familia de las categorías
cursor = conexion.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " (AvailabilityZone VARCHAR(100), Description VARCHAR(100), InstanceType VARCHAR(100), SpotPrice INTEGER, Timestamp Datetime, PRIMARY KEY('AvailabilityZone','InstanceType','Timestamp','SpotPrice','Description'))")

# Llenamos la BD con los datos del CSV
for row in reader:
    cursor.execute("INSERT OR IGNORE INTO " + table + " VALUES (?, ?, ?, ?, ?)", (row[0], row[1], row[2], row[3], row[4]))
    conexion.commit()

# Guardamos los cambios haciendo un commit
conexion.commit()

# Cerrar la conexión siempre
conexion.close()

print("Datos descargados e insertados con éxito")

Datos descargados e insertados con éxito
