Skip to content

Mirantis/dataops-dataeng

Repository files navigation

Data Engineering - dataeng

Image

Status Checks

Greetings Labeler

The dataeng project is dedicated to storing all data interactions across an organization as IaC. It provides a Command Line Interface to interact witht the various software tools that are used on a daily basis. It also provides an anlaysis perspective, and offers a wide variety of tooling in terms of analyzing overall operational status of the entire organization for all products.

Repository Structure

This repository is loosely organized into 11 main categories:

  1. .assets

    • An assets directory housing images and assets used throughout the directory (EACH FILE HAS AN MD5SUM TO VERIFY FILE INTEGRITY)
  2. .docker

    • A configuration and secrets directory for all things docker
  3. .github

    • A configuration and secrets directory for all things github
  4. .kube

    • A configuration and secrets directory for all things kubernetes
  5. .secrets

    • A configuration and secrets directory for all things secret
  6. Analyses

    • The Analyses directory that houses a variety of analyses across multiple toolings
  7. Applications

    • The Applications directory houses any and all affiliated applications to the housing repository
  8. Automation

    • The Automation directory will contain automation and tooling for deploying and utilizing the repository
  9. Charts

    • The Charts directory will container Helm Charts related to this repository
  10. Infrastructure

    • The Infrastrucutre directory contains information relate to Ansible and Terraform for continuously deployed infrastructure
  11. Lakes

    • The Lakes Directory will house the information related to any and all data lakes within the organization
  12. Pipelines

    • This will contain any pipelines related to the overall repisitory and is tooling dependent
  13. Queries

    • The Queries directory contains SQL Like queries that are stored for IaC purposes
  14. Warehouse

    • The Warehouse directory contains information on how to interact with the Data Warehouse for the respective organization
  15. dataengctl

    • The dataengctl houses the binary that allows you to work with a variety of tooling at the organization's disposal

Navigating Repository

The above showcases the directories that are contained within the repository. All directories labeled with . in front, such as .secrets are meant to be ignored, and nothing should ever be commited inside of these directories. The main directories are each provided with a README.md that tells you how you should interact with the repository in that particular directory.

Getting Started with dataenctl

To get started with dataengctl, you should create a configuration file wihtin your home directory. It should be named ~/.dataeng and should be a hidden directory. In this directory you should create a file called config.yaml, which contains the secrets and credentials needed to work with dataengctl. The config should look something like this:

jiraConfig:
  token: YOURJIRATOKEN
  url: YOURJIRAURL
  username: YOURUSERNAME
salesForceConfig:
  url: YOURSALESFORCEURL
  username: YOURSALESFORCEUSERNAME
  password: YOURSALESFORCEPASSWORD
  token: YOURSALESFORCETOKEN
  clientID: YOURSALESFORCEORGCLIENTID
  apiVersion: YOURSALESFORCEDEFUALTAPI

Installing dataengctl

To install dataengctl you need to make sure you have two things installed

  1. You Want jq installed on your local host
  2. You want go installed on your local host

Working with dataengctl

To work with dataengctl you can use the makefile that is housed within the dataengctl directory, or you can build it directly in that directory. All Binary files are ignored by default within that directory.

  1. Build the dataengctl binary
    • go build -o dataengctl
  2. Afterwards run the binary specifying it with --help
 rbarrett@MacBook-Pro-2  ~/Git/dataeng   DATAENG-19 ●  . --help                                                      1 ↵  10130  11:38:37
gets you data from different sources

Usage:
  dataengctl [command]

Available Commands:
  analyze     Analyze something
  completion  generate the autocompletion script for the specified shell
  help        Help about any command
  jira        Interact with jira
  salesforce  Interact with salesforce

Flags:
      --config-file string   path to config file
      --debug                specify debug level
  -h, --help                 help for dataengctl

Use "dataengctl [command] --help" for more information about a command.

As a result, you can see that there are several commands that are available

    1. analyze
    1. completion
    1. help
    1. jira
    1. salesforce

To interact with Jira and Salesforce or anything that is using the config.yaml mappings, you will need to specify the command as follows:

    1. Intreracting with analyze command specifying the config path
. analyze --issue-type Escalation --project-key FIELD --config-file ${HOME}/.dataeng/config.yaml | jq "."
    1. Interacting with jira command specifying the config path
. jira issue list --issue-type Escalation --project-key FIELD --config-file ${HOME}/.dataeng/config.yaml | jq ".pri"
    1. Interacting with salesforce command specifying the config path
. salesforce query "SELECT+name+from+Account" --config-file ${HOME}/.dataeng/config.yaml | jq "."

Without jq installed on your machine the default JSON output will not look pretty. All output is default to JSON format.