## How to use in Local?

This notebook reads the raw data from given input directory and writes transformed data to given output directory.

1. Import repo to your local folder.
2. Be sure that you have a valid python environment with the necesserary packages.
3. Run the notebook.


## Imports

In [None]:
import os
import helpers.transform as transform
import helpers.create_config as create_config
import pandas as pd

## Input and Output Directories

In [None]:
 # INPUT DATA LOCATION
INPUT_DATA_FOLDER = "data/raw"

# DTDL FOLDER
ENTITY_PATH = "../../Appendix/Entities/"

# CONFIG LOCATION
CONFIG_FOLDER = "config"

# OUTPUT DATA LOCATION
OUTPUT_DATA_FOLDER = "data/transformed"

In [None]:
EXCEL_MAPPING_FILE = f"{CONFIG_FOLDER}/conceptual_mapping.xlsx"
OUTPUT_MAPPING_FOLDER = f"{OUTPUT_DATA_FOLDER}/mapping"
CONFIG_FILE = f"{CONFIG_FOLDER}/config.json"


# CREATE THE OUTPUT FOLDER IF NOT EXISTING
if not os.path.exists(OUTPUT_MAPPING_FOLDER):
    os.makedirs(OUTPUT_MAPPING_FOLDER)
if not os.path.exists(CONFIG_FOLDER):
    os.makedirs(CONFIG_FOLDER)

## Creating Configuration File

In [None]:
config_dictionary = create_config.create_entity_mappings_config(EXCEL_MAPPING_FILE)
config_dictionary = create_config.create_relationship_mappings_config(EXCEL_MAPPING_FILE, config_dictionary)
create_config.write_config_json(config_dictionary, CONFIG_FILE)

### Extracting Variables from Config

In [None]:
df = pd.read_json(CONFIG_FILE, typ="series")
ENTITY_LIST = df['entity_list']
RELATIONSHIP_LIST = df['relationship_list']

## Transform the raw data

In [None]:
transform.cache_entity_metadata(ENTITY_LIST, ENTITY_PATH)
transform.write_entity_data(INPUT_DATA_FOLDER, OUTPUT_DATA_FOLDER, OUTPUT_MAPPING_FOLDER, ENTITY_LIST, RELATIONSHIP_LIST)