# Verification Suite:
verificationSuite.yaml é um arquivo essencial para a execucao do processo de DataQuality. Neste arquivo sao definidos todos parametros diretamente relacionados com as verificacoes que serao feitas.


### vsYaml: 
Seja `verificationSuite.yaml` um arquivo já existente e costruido com a formatacao necessária.

```python
dqView_simple = {"viewName" : "foo",
                 "inputData": "foo",
                 "infraYaml": "foo",
                 "vsYaml": "yamlFiles/quick_tutorial/a_check/verificationSuite.yaml"}
```


### VerificationSuite .yaml pattern

## AnomalyDetection
O arquivo .yaml deve seguir a seguinte formatacao:

```yaml
Check: {
    level: Error, 
    description: 'Descpription by Yaml'
}


ResultKey: { #[Nao obrigatório]
    key_tags: {
        tag1: x,
        tag2: y,
        tag3: z
    } 
}

AnomalyDetection: [
    addAnomalyCheck: {
        analyzer: Maximum("i1c_lim_pre_ap_preventivo"),
        strategy: 'AbsoluteChangeStrategy(maxRateIncrease=25.0, maxRateDecrease= -25.0)'
    },
    addAnomalyCheck: {
        analyzer: Maximum("i1c_renda_final"),
        strategy: 'AbsoluteChangeStrategy(maxRateIncrease=100.0, maxRateDecrease= -50.0)'
    },
    addAnomalyCheck: {
        analyzer: Minimum("i1c_renda_final"),
        strategy: 'RelativeRateOfChangeStrategy(maxRateIncrease=1.5, maxRateDecrease= -1.5)'
    },
    addAnomalyCheck: {
        analyzer: Mean("i1c_renda_final"),
        strategy: 'RelativeRateOfChangeStrategy(maxRateIncrease=1.2, maxRateDecrease= -1.2)'
    }
]

```

* Todos os resultados da deteccao de anomalias estao relacionados com um objeto __check__, que defini nível de severidade e uma descricao breve. (pydeequ.checks.Check)

* __ResultKey__, assim como Check, é um objeto nativo do pydeequ (pydeequ.repository.ResultKey). Esse parametro permite a atribuicao de tags às verificacoes.  

* Em __AnomalyDetection__, é definida uma lista de addAnomalyCheck's, que por sua vez operam a partir de um tipo especifico de estratégia e analise
    * E finalmente, é nesta etapa que sao definidas os tipos de anomalyDetecion's a a serem a utilizadas no processo de DataQuality.
    * Uma lista das possiveis estratégias de anomalyDetection está disponivel na documentacao oficial do PyDeequ (https://github.com/awslabs/python-deequ/blob/master/docs/anomaly_detection.md\), e também na secao de anexos deste documento
    * Uma lista das possiveis classes de analise está disponível na documentacao oficial do PyDeequ (https://github.com/awslabs/python-deequ/blob/master/docs/analyzers.md), e também na secao de anexos deste documento.

### Anexos:

#### Anomaly Detection 
from pyDeequ (https://github.com/awslabs/python-deequ/blob/master/docs/anomaly_detection.md)<br>
Here are the current supported functionalities of Anomaly Detection. 

| Class               | Method                                          | Status |
|---------------------|-------------------------------------------------|:------:|
| RelativeRateOfChangeStrategy | RelativeRateOfChangeStrategy(maxRateDecrease, maxRateIncrease, order) | Done |
| AbsoluteChangeStrategy  | AbsoluteChangeStrategy(maxRateDecrease, maxRateIncrease, order) | Done |
| SimpleThresholdStrategy | SimpleThresholdStrategy(lowerBound, upperBound) | Done |
| OnlineNormalStrategy | OnlineNormalStrategy(lowerDeviationFactor, upperDeviationFactor, ignoreStartPercentage, ignoreAnomalies) | Done |
| BatchNormalStrategy | BatchNormalStrategy(lowerDeviationFactor, upperDeviationFactor, includeInterval) | Done |
| MetricInterval(Enum) | ['Daily','Monthly'] | Done |
| SeriesSeasonality(Enum) | ['Weekly','Yearly'] | Done |
| HoltWinters | HoltWinters(metricsInterval, seasonality) | Done |


#### Analyzers
from pyDeequ (https://github.com/awslabs/python-deequ/blob/master/docs/analyzers.md)<br>
Here are the current supported functionalities of Analyzers. 

| Class               | Method                                          | Status |
|---------------------|-------------------------------------------------|:------:|
| AnalysisRunner      | onData(DataFrame)                               | Done   |
| AnalysisRunBuilder  | addAnalyzer(analyzer)                           | Done   |
|                     | run()                                           | Done   |
|                     | useRepository(repository)                       | Done   |
|                     | saveOrAppendResult(resultKey)                   | Done   |
| ApproxCountDistinct | ApproxCountDistinct(column)                     | Done   |
| ApproxQuantile      | ApproxQuantile(column, quantile, relativeError) | Done       |
| ApproxQuantiles     | ApproxQuantiles(column, quantiles)           |  Done      |
| Completeness          | Completeness(column)          |      Done     |
| Compliance | Compliance(instance, predicate) | Done|
| Correlation | Correlation(column1, column2) | Done| 
| CountDistinct | CountDistinct(columns) | Done| 
| Datatype | Datatype(column) | Done| 
| Distinctness | Distinctness(columns) | Done| 
| Entropy | Entropy(column) | Done| 
| Histogram | Histogram(column, binningUdf, maxDetailBins) | Done|
| KLLParameters | KLLParameters(spark_session, sketchSize, shrinkingFactor, numberOfBuckets) | Done|
| KLLSketch | KLLSketch(column, kllParameters) | Done | 
| Histogram_maxBins | Histogram_maxBins(column, binningUdf, maxDetailBins) | Done | 
| Maximum | Maximum(column) | Done| 
| MaxLength | MaxLength(column) | Done| 
| Mean | Mean(column) | Done| 
| Minimum | Minimum(column) | Done| 
| MinLength | MinLength(column) | Done| 
| MutualInformation | MutualInformation(columns) | Done| 
| StandardDeviation | StandardDeviation(column) | Done| 
| Sum | Sum(column) | Done| 
| Uniqueness | Uniqueness(columns) | Done| 
| UniqueValueRatio | UniqueValueRatio(columns) | Done|
| AnalyzerContext | successMetricsAsDataFrame(spark_session, analyzerContext) | Done |
|   | successMetricsAsJson(spark_session, analyzerContext) | Done |