# Schema Model

A model to allow development and evolution of database queries generated on the fly

The first version of the JsonSqlClient simplifies the generation of json-producing nested queries using reflection. The critical information needed to perform this is the relationships between two tables, identifying two linked columns. 

If Primary and Foreign keys exist, these are sufficient to define the link, but in many cases foreign keys are implied, not explicit. Also, some fields may behave like keys although they are not defines as such. Expressing the link "manually" overcomes this limitation.

Developing a model that can be used to generate queries in a multi-step process, and some of the steps may occur long after the early ones. This allows te interpretation of the model to be refined over time.

- Initially, automated analysis can determine the schema, table and column names and Primary Keys.
- Establishing the Foreign Keys provides an initial set of link definitions
- At this stage, inspection is needed to add
  - implied keys (candidates for links)
  - columns eligible as global parameters (for example user keys, org ids that occur across multiple tables)
  - columns suitable for filtering on 
  - columns which contain sensitive information
  - columns which contain zipped, compressed, encrypted or otherwise formatted mixed information
  - columns which are obsolete or redundant
  - identity columns: these might need special handling
  - columns which contain sensitive information but should be hashed to allow comparison
  - columns suitable for ordering
  
The model should be serializable and unserializable at all times, because we need to be able to inject the last or revert to a former version without massive deployment overhead. in a utility, it might also be nice to be able to import a mod into the environment for pre-release testing.



```mermaid
  graph TD
  r((reflection))
  i((Inspection))
  k[/PK & FK/]
  c[/Column Map/]
  f[/Filters/]
  s[/Sorters/]
  p[/global parameters/]
  h[/special handling/]
r --> k
r --> c
r --> |id*| s
i --> f
i --> s
i --> p
i --> h

```
\* The query generator is able to limit the replies to the top number of results: it does this sequenced by id descending so as to include the most recently added rows.

## Chronological tactics

- Start with automation to map the the Primary and Foreign key relationships (links), if these are largely implemented
- Augment the automation-generated column map by identifying implied links
- Identify well-known parameter candidates as parameters (perhaps treat as a pseudo-table?)
- Overlay Filters and Sorters
- Identify Special handling cases
- Determine how to  handle the most critical Special Handling cases
- Modify the generator progressively to use this information.




## Structural tactics

The Schema - Table - Column hierarchy is easily modeled and json-serializable.
Most of the qualifications (identity, filter, sorter, special handling, parameter) occur at the column level
The main exception to this is explicit and implied keys, which are directional links between two columns.

We could consider indexed Schema, Column and Table lists, (arrays) with the ability to identify linked pairs in a fairly compact link table.

Similarly, parameter candidates could identify 1-n occurrences with an index list ...

Simplest structure would repeat schema/table names in a simple list of columns with attributes. A variation could use a lookup list for the schema and table names but hide the details... also, although fk -> pk relationships are n -> 1, the ns are each associated with single column, so if the traversable link ios stored at the site of the foreign reference it is effectively 1->1

Traditionally, we think of Schema as being navigated like
<pre>
  Schema --> Table --> Column --> Properties ...
</pre>
A reversal of the navigation might be more efficient:
<pre>
  Property Value --> Column --> Table --> Schema --> Links  --> Link Columns --> Link Table --> Link Schema
</pre>
or equally, we might consider every table entry to be fully qualified 
<pre>
  Property Value --> Schema.Column.Table -> Links --> Schema.Column.Table
</pre>
This is clearly more efficient and readily Json-serialized.

The links would identify the PK and (implied or explicit) FK Schema.Column.Table. 
Seeing as the Column data is really just a long list, we could add the refs to the FK column, as each really points to one PK: this would require us to identify and have a PK reference for all such candidates... We might need to identify these as Parameters to simplify references...

How we handle this depends on whether there is a meaningful behavioral difference between how we handle FKs that point to a PK (N --> 1) or implied references that might connect outside of the realm (e.g. user ids, org ids, external references to other scopes)

Clearly outside references that do not exist as PKs in the system-under-analysis need to be parameters! So user keys, org ids etc.
These would typically be injected by the user or perhaps a microservice call. So good.

Other implied keys would be treated like foreign keys if they point to effective PKs, otherwise parameter references. ✅


<pre>
Schema.Table.Column.A !FK --> Schema.Column.Table.B  
                      :: properties

Schema.Table.Column.B !PK

Schema.Table.Column.C !ImpliedFK --> Parameter.D IMPLIED LINK TO PARAMETER

Parameter.ParameterUID :: properties
</pre>
We have to accommodate multiple-field pks. One way to do this is to use 0 to note NOT a PK, 1 to denote an identity/PK field, an2 to denote that this is part of a 2-field pk, etc.

## Possible implementation:

[x] Parameters are fields without a column 
[x] Column are addressed via their enclosing Field
[x] OutboundFieldLink is populated when implied or explicit FK: can access fields or columns, no problem.
[x] Could easily use a dictionary for fast searching

Better would be to make everything a file, and have columns as a nullable property. Thus partial databases would have some unreachable columns present as fields.

```mermaid
classDiagram

Field o-- Column

class Field{
    +Unique_Identity
    +DataType
    +CanSort
    +CanFilter
    +HandleAs
    +HandleWith
    +Column
}

class Column{

    +ColumnName
    +TableName
    +SchemaName
    +PrimaryKey
    +OutboundFieldLink
}
Field o-- HandleAs
class HandleAs{
   Visible,
   Truncated,
   Zipped,
   Json,
   Hashed,
   Formatted,
   Special
}
```


Also possible to simplify:

```mermaid
classDiagram

Field o-- Column

class Field{
    +UniqueName
    +DataType
    +CanSort
    +CanFilter
    +Action
    +Data
    +Column
}
class Column{

    +ColumnName
    +TableName
    +SchemaName
    +PrimaryKey
    +OutboundFieldLink
}
Field o-- Action
class Action{
    <<enum>>
   Show,
   Hide,
   Truncate,
   Hash,
   Format,
   Process
}
```
Process would cover decryption, unzipping, partial readaction, whatever is needed. The data for the action comes from the data property.

In [None]:
// A first  attempt at the model
public enum ActionType
{
    Show,
    Hide,
    Truncate,
    Hash,
    Format,
    Process
}
public class Field
{
   public string FieldIdentity { get; set; }
   public string DataType { get; set; }
   public bool CanSort { get; set; }
   public bool CanFilter { get; set; }
   public ActionType Action { get; set; }
   public object[] Data {get; set; }
   public Column Column {get; set; }
   public static Field FromBool(string identity)
   {
      var field = new Field(identity);
      field.DataType = "bool";
      return field;
   }
   public static Field FromDate(string identity)
   {
      var field = new Field(identity);
      field.DataType = "datetime";
      return field;
   }
   public static Field FromNumber(string identity)
   {
      var field = new Field(identity);
      field.DataType = "int";
      return field;
   }
   public static Field FromText(string identity)
   {
      return new Field(identity);
   }
   public static Field FromFK(string identity, string reference, string dataType = "long")
   { 
       var field = new Field(identity, dataType);
       field.Column.OutboundFieldIdentity = reference;
       return field;
   }
   public static Field FromPK(string identity, int inPrimary = 1, string dataType = "long")
   {
        var field = new Field(identity, dataType);
        if (identity.Split('.').Length == 3)
        {
           field.Column.InPrimaryKey = inPrimary;
        }
        return field;
   }
   private Field(string identity, string dataType = "string", ActionType action = ActionType.Show)
   {
      FieldIdentity = identity;
      DataType = dataType;
      Action = action;
      var x = identity.Split('.');
      if (x.Length == 3)
      {
        Column = new Column(x);
      }
   }
}
public class Column
{
    public string ColumnName { get; set; } 
    public string TableName { get; set; }
    public string SchemaName { get; set; }
    public int InPrimaryKey { get; set; } // 0 if not, 1 if only, n if part of a n-field PK
    public string OutboundFieldIdentity { get; set; }
    public Column(string[] identity) : this (identity[0], identity[1], identity[2]){;}
    public Column(string schemaName, string tableName, string columnName, int inPrimaryKey = 0, string outboundFieldIdentity = null)
    {
        SchemaName = schemaName;
        TableName = tableName;
        ColumnName = columnName;
        InPrimaryKey = inPrimaryKey;
        OutboundFieldIdentity = outboundFieldIdentity;
    }
}


In [None]:
// let's set up a couple of examples
List<Field> fields = new List<Field>();

fields.AddRange( new List<Field>{
     Field.FromPK("Global.OrganizationId"),
     Field.FromPK("rpt.Reports.ReportId"),
     Field.FromFK("rpt.Reports.ReportMethodId", "rpt.ReportMethods.ReportMethodId"),
     Field.FromFK("rpt.Reports.ReportInstanceId", "rpt.ReportInstancesReportInstanceId"),
     Field.FromFK("rpt.Reports.OrganizationId","Global.OrganizationId" ),
     Field.FromBool("rpt.Reports.Enabled"),
     Field.FromBool("rpt.Reports.IsScheduledInApp"),
     Field.FromText("rpt.Reports.Format"),
     Field.FromText("rpt.Reports.ReportType"),
     Field.FromText("rpt.Reports.TextQualifier"),
     Field.FromText("rpt.Reports.ReportTitle"),
     Field.FromText("rpt.Reports.FileNameTemplate"),
     Field.FromText("rpt.Reports.Parameters"),
     Field.FromText("rpt.Reports.TransmitterConfiguration"),
     Field.FromDate("rpt.Reports.DateModified"),
});

var Model = new SortedDictionary<string, Field>();
foreach (var f in fields)
{
    Model.Add(f.FieldIdentity, f);
}
Model.Display();


In [None]:
// modify an entry
var x = Model["rpt.reports.OrganizationId"];
x.CanFilter = true;
x.CanSort = true;
x.Display();
var y = Model["rpt.reports.OrganizationId"];
y.Display();

In [None]:

using System.Text.Json;

var json = JsonSerializer.Serialize(Model);
json.Display()






## Discovery?

These classes look like they will allow the population of good data from data returning from reflection, and certainly will alow us to find and modify specific entries quickly, especially if we are using a dictionary (or a sorted dictionary).

These classes have been added to the JsonSqlClient project and unit tests added for them.

Next moves are:

- Generate the base model from a sql query [ref](../Sql/Discovery-pk-fk.sql)
- Add Ability to insert implied links
- Add ability to add Parameters/External links
- Make decisions about defaults for Sorters and Filters - and do these need to be bytes rather than bools?
- Devise a good SOLID solution for Special Handling

During this process, once initially created, the model should be serializable and backed up at all its levels of evolution.

[The project continues here](.\R30-database-model-serialization.ipynb)