Skip to content

Commit

Permalink
Table order (#127)
Browse files Browse the repository at this point in the history
  • Loading branch information
evgeniy-r committed Nov 7, 2021
1 parent 30e4fbf commit a829e29
Show file tree
Hide file tree
Showing 6 changed files with 103 additions and 14 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### 🚀 Added
- Extend template transformer with shared templates and macros [#122](https://github.com/datanymizer/datanymizer/pull/122)
- Add the configurable table order [#127](https://github.com/datanymizer/datanymizer/pull/127)
([@evgeniy-r](https://github.com/evgeniy-r))
- Extend template transformer with shared templates and macros
[#122](https://github.com/datanymizer/datanymizer/pull/122) ([@akirill0v](https://github.com/akirill0v))
- Testing the demo [#114](https://github.com/datanymizer/datanymizer/pull/114)
([@evgeniy-r](https://github.com/evgeniy-r))
- Add support for `postgresql` scheme [#115](https://github.com/datanymizer/datanymizer/pull/115)
([@mgrachev](https://github.com/mgrachev))

### ⚙️ Changed
- Remove arch-specific argument in Demo [#121](https://github.com/datanymizer/datanymizer/pull/121)
([@akirill0v](https://github.com/akirill0v))
- Change edition to 2021 [#113](https://github.com/datanymizer/datanymizer/pull/113)
([@mgrachev](https://github.com/mgrachev))

Expand Down
2 changes: 2 additions & 0 deletions datanymizer_dumper/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ pub trait Table<T>: Sized + Send + Clone + Eq + Hash {
fn get_name(&self) -> String;
/// Returns table name with schema or other prefix, based on database type
fn get_full_name(&self) -> String;
/// Returns possible table names (e.g. full and short)
fn get_names(&self) -> Vec<String>;
/// Get table columns
fn get_columns(&self) -> Vec<Self::Column>;
/// Get columns names (needed in the future for SQL)
Expand Down
48 changes: 46 additions & 2 deletions datanymizer_dumper/src/postgres/dumper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ impl PgDumper {
self.dump_writer.write_all(table.query_from().as_bytes())?;
self.dump_writer.write_all(b"\n")?;

let cfg = settings.find_table(&[table.get_full_name().as_str(), table.get_name().as_str()]);
let cfg = settings.find_table(&table.get_names());

self.init_progress_bar(table.count_of_query_to(cfg), &table.get_full_name());

Expand Down Expand Up @@ -179,6 +179,13 @@ impl PgDumper {

Ok(())
}

fn sort_tables(tables: &mut Vec<(<Self as Dumper>::Table, i32)>, order: &[String]) {
tables.sort_by_cached_key(|(tbl, weight)| {
let position = order.iter().position(|i| tbl.get_names().contains(i));
(position, -weight)
});
}
}

impl Dumper for PgDumper {
Expand All @@ -197,8 +204,13 @@ impl Dumper for PgDumper {
let settings = self.settings();
self.write_log("Start dumping data".into())?;
self.debug("Fetch tables metadata...".into());

let mut tables = self.schema_inspector().ordered_tables(connection);
tables.sort_by(|a, b| b.1.cmp(&a.1));
Self::sort_tables(
&mut tables,
settings.table_order.as_ref().unwrap_or(&vec![]),
);

let all_tables_count = tables.len();

let mut query_wrapper =
Expand Down Expand Up @@ -290,4 +302,36 @@ mod tests {
]
);
}

#[test]
fn sort_tables() {
let order = vec!["table2".to_string(), "public.table1".to_string()];

let mut tables = vec![
(PgTable::new("table1".to_string(), "public".to_string()), 0),
(PgTable::new("table2".to_string(), "public".to_string()), 1),
(PgTable::new("table3".to_string(), "public".to_string()), 2),
(PgTable::new("table4".to_string(), "public".to_string()), 3),
(PgTable::new("table1".to_string(), "other".to_string()), 4),
(PgTable::new("table2".to_string(), "other".to_string()), 5),
];

PgDumper::sort_tables(&mut tables, &order);

let ordered_names: Vec<_> = tables
.iter()
.map(|(t, w)| (t.get_full_name(), *w))
.collect();
assert_eq!(
ordered_names,
vec![
("other.table1".to_string(), 4),
("public.table4".to_string(), 3),
("public.table3".to_string(), 2),
("other.table2".to_string(), 5),
("public.table2".to_string(), 1),
("public.table1".to_string(), 0),
]
)
}
}
4 changes: 4 additions & 0 deletions datanymizer_dumper/src/postgres/table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ impl Table<Type> for PgTable {
format!("{}.{}", self.schemaname, self.tablename)
}

fn get_names(&self) -> Vec<String> {
vec![self.get_full_name(), self.get_name()]
}

fn get_columns(&self) -> Vec<Self::Column> {
self.columns.clone()
}
Expand Down
7 changes: 5 additions & 2 deletions datanymizer_engine/src/settings/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ pub struct Settings {
/// Tables list with transformation rules
pub tables: Tables,

/// Table order. All tables not listed are dumping at the beginning
pub table_order: Option<Vec<String>>,

/// Default transformers configuration
#[serde(default)]
pub default: TransformerDefaults,
Expand Down Expand Up @@ -96,9 +99,9 @@ impl Settings {
self.tables.iter().find(|t| t.name == name)
}

pub fn find_table(&self, names: &[&str]) -> Option<&Table> {
pub fn find_table<T: AsRef<str>>(&self, names: &[T]) -> Option<&Table> {
for name in names {
let table = self.get_table(name);
let table = self.get_table(name.as_ref());
if table.is_some() {
return table;
}
Expand Down
50 changes: 41 additions & 9 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,13 @@ globals:

The config file contains following sections:

| Section | Mandatory | YAML type | Description
|--- |--- |--- |---
| [tables](#tables) | yes | list | A list of anonymized tables
| [default](#default) | no | dictionary | Default values for different anonymization rules
| [filter](#filter) | no | dictionary | A filter for tables schema and data (what to skip when dumping)
| [globals](#globals) | no | dictionary | Some global values (they are available in anonymization templates)
| Section | Mandatory | YAML type | Description
|--- |--- |--- |---
| [tables](#tables) | yes | list | A list of anonymized tables
| [table_order](#table_order) | no | list | An order of table dumping
| [default](#default) | no | dictionary | Default values for different anonymization rules
| [filter](#filter) | no | dictionary | A filter for tables schema and data (what to skip when dumping)
| [globals](#globals) | no | dictionary | Some global values (they are available in anonymization templates)

## tables

Expand Down Expand Up @@ -490,7 +491,7 @@ The order of column processing will be as follows:

_You only need the `rule_order` section when using the `template` transformer with the `final` special template variable._

For additional information please refer to the [template](#template) transformer documentation.
For additional information please refer to the [template](transformers.md#template) transformer documentation.

#### query

Expand Down Expand Up @@ -533,6 +534,37 @@ You can use the `dump_condition`, `transform_condition` and `limit` options in a

If you don't need data from a particular table at all, please refer to the [filter](#filter) section.

## table_order

A list of tables that will be dumped in the specified order (after all tables that are not in the list).
The order of execution for other tables depends on foreign keys.

Look at this configuration example:

```yaml
tables:
- name: "table1"
rules: {}
- name: "table2"
rules: {}
- name: "table3"
rules: {}
table_order:
- "table1"
- "table2"
```

The order of table dumping will be as follows:

1. `table3`
2. `table1`
3. `table2`

You may need this section when using the built-in key-value store in the `template` transformer for sharing data between
tables.

For additional information please refer to the [template](transformers.md#template) transformer documentation.

## default

| Section | Mandatory | YAML type | Description
Expand Down Expand Up @@ -614,7 +646,7 @@ filter:
If you need only a subset of the data, please refer to the [query](#query) section.

## templates
You can specify some templates in config to reuse them in you [template](#template) rules.
You can specify some templates in config to reuse them in you [template](transformers.md#template) rules.
There are different kinds of templates:

- `raw` templates is named templates which may be imported or included by name into your field template, you can use macros to extend complex template.
Expand All @@ -641,7 +673,7 @@ templates:

## globals

You can specify global variables available in all [template](#template) rules.
You can specify global variables available in all [template](transformers.md#template) rules.

```yaml
tables:
Expand Down

0 comments on commit a829e29

Please sign in to comment.