# Description

Create the ETL process to transfer data from the raw format to a working relational database

The database used here is a MYSQL database initialized using DOCKER.

To create and initialize the database:
- docker pull mysql:latest
- docker run --name=mysql_test --env="MYSQL_ROOT_PASSWORD=test" -p 3306:3306 -d mysql:latest

To run the database:
- docker exec -it mysql_test mysql -uroot -p

Inside the database:
- CREATE DATABASE test_db;
- CREATE USER 'newuser'@'%' IDENTIFIED BY 'newpassword';
- GRANT ALL PRIVILEGES ON test_db.* to 'newuser'@'%';

Reference
https://medium.com/swlh/how-to-connect-to-mysql-docker-from-python-application-on-macos-mojave-32c7834e5afa

Run from a docker image
- to build: docker build . -t raw_to_db
- to run: docker run -it -e EXECUTION_ID=444444 -e DB_HOST=docker.for.mac.host.internal raw_to_db -p 3306:3306

obs: Change the DB_HOST env variable when not in MAC

# Import libraries and define functions and paths

## libraries

In [42]:
import pandas as pd
import datetime
import os

## functions

## Paths

In [54]:
path_raw_data_folder = '../data/raw/'

# Read Raw data

In [55]:
data = pd.read_csv(path_raw_data_folder+'iris.csv')
data['timestamp'] = datetime.datetime.now()
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,timestamp
0,5.1,3.5,1.4,0.2,setosa,2020-07-18 11:35:01.085137
1,4.9,3.0,1.4,0.2,setosa,2020-07-18 11:35:01.085137
2,4.7,3.2,1.3,0.2,setosa,2020-07-18 11:35:01.085137
3,4.6,3.1,1.5,0.2,setosa,2020-07-18 11:35:01.085137
4,5.0,3.6,1.4,0.2,setosa,2020-07-18 11:35:01.085137


# Connect SQL DB and transfer data

## Define connection parameters

In [77]:
print(f'the host is:{os.environ["DB_HOST"]}')

the host is:monster


In [73]:
try:
    os.environ["DB_HOST"]
except:
    os.environ["DB_HOST"] = "localhost"

In [69]:
import sqlalchemy as db

# specify database configurations
config = {
    'host': os.environ['DB_HOST'],
    'port': 3306,
    'user': 'mendes',
    'password': 'test',
    'database': 'test_db'
}
db_user = config.get('user')
db_pwd = config.get('password')
db_host = config.get('host')
db_port = config.get('port')
db_name = config.get('database')
# specify connection string
connection_str = f'mysql+pymysql://{db_user}:{db_pwd}@{db_host}:{db_port}/{db_name}'
# connect to database
engine = db.create_engine(connection_str)

In [72]:
print(os.environ['DB_HOST'])

localhost


## Create or append to table

In [79]:
with engine.connect() as conn:
# pull metadata of a table
    metadata = db.MetaData(bind=engine)
    metadata.reflect(only=['test_table'])

    test_table = metadata.tables['test_table']
    frame = data.to_sql('iris',conn, if_exists='append');
    
print('worked')

AttributeError: 'Connection' object has no attribute 'fetch'

In [71]:
config

{'host': 'localhost',
 'port': 3306,
 'user': 'mendes',
 'password': 'test',
 'database': 'test_db'}