# Validation notebook

This notebook is executed using Databricks Workflows as defined in resources/notebook_validation_job.yml. It is used to check summary table for valid results.

## Data Frame assert
Compare results from test data set against an expected set of values that is generated with simpler logic. This is more dynamic but involves putting more logic into the test.

In [None]:
from pyspark.testing.utils import assertDataFrameEqual

result_counts = spark.sql("""
 SELECT count(distinct artist_name) artists, count(1) rows
 FROM sandbox.assetbundle_tutorial_dev.top_artists_by_year_copy
        """)

expected_counts = spark.sql("""
        WITH source_agg (
            SELECT artist_name, total_number_of_songs, year
            FROM sandbox.tutorial.top_artists_by_year
            WHERE year >= 1990 ORDER BY total_number_of_songs DESC, year DESC
        )
        SELECT count(distinct artist_name) artists, count(1) rows
        FROM source_agg
        """)

assertDataFrameEqual(result_counts, expected_counts)

In [None]:
result_counts.show()

## Simple assert
Option you can use if counts will stay consistent in the test environment.

In [None]:
from pyspark.sql import Row

result = spark.sql("""
        SELECT count(1) rows
        FROM main.datakickstart_dev.trip_summary
        """).first()

# Option 1
assert result.rows == 11921

# Option 2
expected_counts = Row(rows=11921)
assert result == expected_counts

In [None]:
print("No errors detected")