Skip to content

DataScheme

akovanev edited this page Dec 5, 2022 · 19 revisions

DataScheme

root points to the entry definition.

definitions determines objects.

Definition

name stands for the object name.

properties the object properties.

Properties

type the property type.

Primitive types:

  • string the string.
  • set one of the list of values.
  • guid the guid.
  • bool True or False.
  • int the integer.
  • double the double.
  • decimal the decimal.
  • datetime the datatime.
  • phone the phone.
  • email the email.
  • ipV4 the ip.
  • bytearray array of bytes.
  • composite appends list of substrings.

Complex types:

  • object the object from definition.
  • array the array.
  • file the file of values that will be transformed to the set primitive type.
  • resource the embedded file of values that will be transformed to the set primitive type.
  • calc the calculated property. Having it means that a custom generator, specifying the property behaviour, and derived from the CalcGeneratorBase should be added to the GeneratorFactory.

pattern

  • for string defines all possible characters, e.g. "abcdefghABCFEDGH". Spaces will be added additionally.
  • for phone all # will be changed to a random digit.
  • for set defines all possible values separated by comma by default.
  • for double, decimal, guid, datetime specifies the output format, e.g. "0.00", "yyyy-MM-dd" etc.
  • for object points to the definition name.
  • for array points to the definition name if it represents the object type, and to a primitive type otherwise.
  • for file specifies the path to an existing file. The data are separated by comma by default.
  • for resource specifies the type name in the assembly and the path to the resource.

subpattern is a primitive type pattern used only within arrays of primitive types.

"name": "prices",
"type": "array", // <- should be array
"pattern": "double" , // <- the pattern keyword specifies the type of array
"subpattern": "0.00", // <- the pattern for the double primitive type

sequenceSeparator the separator for the set and file.

minLength the minimum output data length for string, the min size for array.

maxLength the maximum output data length for string, the max size for array.

minSpaceCount the minimum count of spaces in the string.

maxSpaceCount the maximum count of spaces in the string.

minValue the minimum value for int, double and datetime.

maxValue the maximum value for int, double and datetime.

failure means inconsistent data appears with the specified probability.

  • nullable the probability that null appears.
  • custom the probability that the invalid value appears.
  • range the probability that the value will be out of range. For string that means that the string length will be out of the specified interval.

customFailure specifies the value that will appear for the custom failure case.

Json file

Lets suppose we have the couple of models.

class Product
{
    public string Name {get; set;}
    public Datetime LastUpdated {get; set;}
    public List<Sku> Skus {get; set;}
}

class Sku
{
    public string Name {get; set;}
    public decimal Price {get; set;}
}

Product.Name, is just like Sku.Name, should contain only symbols [a-z0-9], the length should be between 10 and 20, and 10 and 50 respectively. Ranges for LastUpdated and Price are also defined. It is expected that all properties may receive incorrect data with varying degrees of probability.

The json file may look like the one below.

{
  "root": "Root",
  "definitions": [
    {
      "name": "Root",
      "properties": [
        {
          "name": "count",
          "type": "set",
          "pattern": "100"
        },
        {
          "name": "products",
          "type": "array",
          "pattern": "Product",
          "minLength" 100,
          "maxLength": 100
        }
      ]
    },
    {
      "name": "Product",
      "properties": [
        {
          "name": "name",
          "type": "string",
          "pattern": "abcdefghijklmnopqrstuvwxyz0123456789",
          "minLength": 10,
          "maxLength": 20,
          "minSpaceCount": 1,
          "maxSpaceCount": 2,
          "failure": {
            "nullable": 0.1,
            "custom": 0.1,
            "range": 0.05
          }
        },
        {
          "name": "lastUpdated",
          "type": "datetime",
          "pattern": "dd/MM/yy",
          "minValue": "20/10/19",
          "maxValue": "01/01/20",
          "failure": {
            "nullable": 0.1,
            "custom": 0.2,
            "range": 0.1
          }
        },
        {
          "name": "skus",
          "type": "array",
          "pattern": "Sku",
          "maxLength": 3
        }
      ]
    },
    {
      "name": "Sku",
      "properties": [
        {
          "name": "name",
          "type": "string",
          "pattern": "abcdefghijklmnopqrstuvwxyz0123456789",
          "minLength": 10,
          "maxLength": 50,
          "minSpaceCount": 1,
          "maxSpaceCount": 4,
          "failure": {
            "nullable": 0.25
          }
        },
        {
          "name": "price",
          "type": "double",
          "pattern": "0.00",
          "minValue": 0.0,
          "maxValue": 1999.99,
          "failure": {
            "nullable": 0.1,
            "custom": 0.05,
            "range": 0.15
          }
        }
      ]
    }
  ]
}

The root value "Root" specifies the name of the definition to entry. Every definition consists of a list of properties that may point to other definitions. There should not be circular references though.

The attributes, like the name and type, should be mandatory filled. For arrays and objects, and in some other cases, the pattern is also required. More information can be found here.

var dg = new DG();
DataScheme scheme = dg.GetFromFile("data.json");
string output = dg.GenerateJson(scheme);
dg.SaveToFile("data.out.json", output);

DataGenerator.Attributes

Just a piece of code.

public class DgStudentCollection
{
    [DgCalc]
    public int? Count { get; set; }

    [DgLength(Min = 100, Max = 100)]
    public <DgStudent>? Students { get; set; }
}

public class DgStudent
{
    [DgFailure(NullProbability = 0.2)]
    public Guid Id { get; set; }

    [DgSource("firstnames.txt")]
    [DgFailure(NullProbability = 0.1)]
    public string? FirstName { get; set; }

    [DgSource(ResourceType.LastNames, true)]
    [DgFailure(NullProbability = 0.1)]
    public string? LastName { get; set; }

    [DgCalc] //supposed to be calculated
    public string? FullName { get; set; }
    
    [DgGenerator(StudentGeneratorFactory.UintGenerator)]
    [DgRange(Max = 5)]
    public int Year { get; set; }
    
    public DgAddress? CompanyAddress { get; set; }

    [DgName("test_variant")]
    public Variant Variant { get; set; }

    [DgName("test_answers")]
    [DgRange(Min = 1, Max = 5)]
    [DgLength(Max = 5)]
    public int[]? TestAnswers { get; set; }

    [DgName("encoded_solution")]
    [DgPattern(StringGenerator.AbcNum)]
    [DgLength(Min = 15, Max = 50)]
    [DgSpacesCount(Min = 1, Max = 3)]
    [DgFailure(
        NullProbability = 0.1,
        CustomProbability = 0.1,
        OutOfRangeProbability = 0.05)]
    [DgCustomFailure("####-####-####")]
    public string? EncodedSolution { get; set; }

    [DgName("last_updated")]
    [DgPattern("dd/MM/yy")]
    [DgRange(Min = "20/10/19", Max = "01/01/20")]
    [DgFailure(
        NullProbability = 0.2,
        CustomProbability = 0.2,
        OutOfRangeProbability = 0.1)]
    public DateTime? LastUpdated { get; set; }

    public List<DgSubject>? Subjects { get; set; }

    [DgPattern("##.##")]
    [DgRange(Min = 9.50, Max = 99.50)]
    public Decimal Discount { get; set; }
    
    [DgLength(Min = 4, Max = 16)]
    [DgFailure(NullProbability = 0.1)]
    public byte[]? Signature { get; set; }
}

public class DgAddress
{
    [DgSource(ResourceType.Companies, true)]
    [DgFailure(NullProbability = 0.1)]
    public string? Company { get; set; }
        
    [DgGenerator(TemplateType.Phone)]
    [DgPattern("+45 ## ## ## ##;+420 ### ### ###")]
    [DgFailure(NullProbability = 0.05)]
    public string? Phone { get; set; }

    [DgGenerator(TemplateType.Email)]
    [DgFailure(NullProbability = 0.1)]
    public string? Email { get; set; }
    
    [DgSource(ResourceType.Addresses, true)]
    [DgFailure(NullProbability = 0.2)]
    public string? AddressLine { get; set; } 
    
    [DgSource(ResourceType.Cities, true)]
    [DgFailure(NullProbability = 0.1)]
    public string? City { get; set; } 
    
    [DgSource(ResourceType.Countries, true)]
    [DgFailure(NullProbability = 0.15)]
    public string? Country { get; set; } 

    [DgGenerator(TemplateType.IpV4)]
    [DgFailure(NullProbability = 0.1)]
    public string? IpAddress { get; set; }
}

// ...
var dg = new DG(
    new StudentGeneratorFactory(), 
    new DataSchemeMapperConfig { UseCamelCase = true });

DataScheme scheme = dg.GetFromType<DgStudentCollection>();

string jsonData = dg.GenerateJson(scheme);

Mapper Profile

ForType<T>() - registers the type in the profile. All the subtypes should be registered as well.

Ignore(Expression<Func<TType, TProp>> expression) - excludes the property from data generation.

Property(Expression<Func<TType, TProp>> expression) - points to the property for which the generation rules should be setup. If the property is not ignored and skipped in the profile, then the defaults will be applied to it.

Assign(Expression<Func<TType, object>> expression - assigns the value for the given property based on the object data.

Just a piece of code again.

public class StudentsTestProfile : DgProfileBase
{
    public StudentsTestProfile()
    {
        ForType<StudentCollection>()
            .Property(c => c.Count).IsCalc()
            .Property(c => c.Students).Length(100, 100);
                
        ForType<Student>()
            .Ignore(s => s.HasWarnings).Ignore(s => s.IsValid)
            .Ignore(s => s.ParsingErrors).Ignore(s => s.ParsingWarnings)
            .Property(s => s.Id).Failure(nullable: 0.2)
            .Property(s => s.FirstName).FromFile("firstnames.txt").Failure(nullable: 0.1)
            .Property(s => s.LastName).FromResource(ResourceType.LastNames).Failure(nullable: 0.1)
            .Property(s => s.FullName).Assign(s => $"{s.FirstName} {s.LastName}")
            .Property(s => s.Year).UseGenerator(StudentGeneratorFactory.UintGenerator).Range(5)
            .Property(s => s.Variant).HasJsonName("test_variant")
            .Property(s => s.TestAnswers).HasJsonName("test_answers").Length(5).Range(1, 5)
            .Property(s => s.EncodedSolution).HasJsonName("encoded_solution")
                .Pattern(StringGenerator.AbcNum).Length(15, 50).Spaces(1,3)
                .Failure(0.1, 0.1, 0.05, "####-####-####" )
            .Property(s => s.LastUpdated).HasJsonName("last_updated").Pattern("dd/MM/yy")
                .Range("20/10/19","01/01/20").Failure(0.2, 0.2, 0.1)
            .Property(s => s.Subjects).Length(4)
            .Property(s => s.Discount).Pattern("##.##").Range(9.50, 99.50)
            .Property(s => s.Signature).Length(4, 16).Failure(nullable: 0.1);

        ForType<Address>()
            .Property(s => s.Company).FromResource(ResourceType.Companies).Failure(nullable: 0.1)
            .Property(s => s.Phone).UseGenerator(TemplateType.Phone)
                .Pattern("+45 ## ## ## ##;+420 ### ### ###")
                .Failure(nullable: 0.05)
            .Property(s => s.Email).UseGenerator(TemplateType.Email).Failure(nullable: 0.1)
            .Property(s => s.AddressLine).FromResource(ResourceType.Addresses).Failure(nullable: 0.25)
            .Property(s => s.City).FromResource(ResourceType.Cities).Failure(nullable: 0.1)
            .Property(s => s.Country).FromResource(ResourceType.Countries).Failure(nullable: 0.15)
            .Property(s => s.IpAddress).UseGenerator(TemplateType.IpV4).Failure(nullable: 0.1);
        
        ForType<Subject>()
            .Property(s => s.EncodedDescription).HasJsonName("encoded_description")
                .UseGenerator(TemplateType.CompositeString)
                .Pattern($"{StringGenerator.AbcUpper}{{2,5}}-{{1}}{StringGenerator.Num}{{3}}")
            .Property(s => s.Attempts).Range(1, 10)
            .Property(s => s.TotalPrices).HasJsonName("total_prices").SubTypePattern("0.00")
                .Range(0, 125.0).Length(2).Failure(0.15, 0.2, 0.05, "####");
    }
}

//...
var dg = new DG(
    new StudentGeneratorFactory(),
    new DataSchemeMapperConfig { UseCamelCase = true },
    new FileReadConfig { UseCache = true });

string jsonData = dg.GenerateJson<StudentCollection>(new StudentsTestProfile());
Clone this wiki locally