|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Select Star |
| 4 | +date: 2023-01-03 09:52 +0000 |
| 5 | +comments: false |
| 6 | +categories: [Accelerators] |
| 7 | +tags: [go, test] |
| 8 | +description: select-star - a CLI tool written in Golang for. |
| 9 | +excerpt: Last year (2022) I've started to learn [Go](https://go.dev/) and how you can achieve that easily? well …start building something with it! |
| 10 | +--- |
| 11 | + |
| 12 | +Last year (2022) I've started to learn [Go](https://go.dev/) and how you can achieve that easily? well … start building something with it... |
| 13 | + |
| 14 | +## The Challenge |
| 15 | + |
| 16 | +In my projects I've encountered an interesting use case where a team needed to test the transformation of data from one source database to another target database. |
| 17 | + |
| 18 | +You guess it, it is a case of ETL (extract, transform and load) where a sequential process extracts data from source systems, transforms the information into a consistent data type, then loads the data into a single repository. |
| 19 | + |
| 20 | +As you see, the first challenge is to design this program so testers can define their own queries easily (without touching the codebase of the tool). |
| 21 | + |
| 22 | +So what functionalities should this have, if I will be the tester that will write or run the queries daily? |
| 23 | + |
| 24 | +Functionalities: |
| 25 | + |
| 26 | +* the tool should accept easily source and target database(s) |
| 27 | +* the queries will be executed on source and target databases and results are compared one by one |
| 28 | +* differences are highlighted and displayed in a … HTML report |
| 29 | +* testers can run any query they want (or all of them) |
| 30 | +* testers can run queries on differed source and targets |
| 31 | + |
| 32 | +Those are ok for now, but what quality attributes should I focus on? |
| 33 | +* Usability: the tool should be easily configured. Testers should write new queries easily, define new source & target databases on the go.. |
| 34 | +* Portability: the tool should run on differed operating systems (on Mac, on Linux on Windows.. |
| 35 | +* Maintainability: the tool should be easily modified, improved or adapted to new requirements. |
| 36 | +* Performance: the tool should execute and compute the queries faster. The tester should not spend time waiting for results. |
| 37 | + |
| 38 | +## The Solution |
| 39 | + |
| 40 | +Behold `select-star` -> a CLI(command line interface) tool written in GO that will process queries defined in YAML files |
| 41 | + |
| 42 | + |
| 43 | +The tool will look by default for a `select-star.yml` configuration file (in current or your user's home folder) where you can define source and target DBs along with SQL queries that will be executed. |
| 44 | + |
| 45 | +The results returned by the queries will be compared row by row and the differences will be saved by default in a timestamped HTML report. |
| 46 | + |
| 47 | +Everything is controlled by YAML configuration file(s) where you defined the source DBs or target DBs, queries, variables used. |
| 48 | + |
| 49 | +You can also define mappings or corelation between a source object and a target object (e.g. column A in source maybe is renamed A_up in target). |
| 50 | + |
| 51 | +These mappings will be used when we run the queries. |
| 52 | + |
| 53 | +### Architecture |
| 54 | + |
| 55 | +* `ping` - will test the connection of source DB and target DBs defined in YAML configuration file used |
| 56 | +* `check --query` - will execute a single query identified by name from YAML configuration file used |
| 57 | +* `check --all` - will execute all queries identified by name from YAML configuration file used |
| 58 | +* `config` - use a differed YAML configuration file |
| 59 | +* `mapping` - use a custom mapping defined in YAML configuration file |
| 60 | + |
| 61 | +## The Results |
| 62 | +`select-star` uses [goroutines](https://gobyexample.com/goroutines) to process millions of rows in seconds: (ex: 5milions of rows processed in ~34seconds) |
| 63 | + |
| 64 | +Processed = queries executed on source DB and on target DB, rows compared, and differences highlighted in a HTML timestamped report! |
0 commit comments