We all know the difficult part of pyspark is to get a validate it with proper test execution. This repo is to aid testing pyspark with the help of chispa testing framework. I have given a basic example which can be expanded or modified as the per the need of the business problem.
Code utilities to test pyspark code
The step by step way of executing test scripts using colab Poetry is being used packaging utility.
Configure poetry to use in colab
Give the required details to configure poetry
using chispa for data frame equality checks in spark. we can directly compare two dataframes
like this: expected_df = spark.createDataFrame(expected_data, ["name", "clean_name"]) assert_df_equality(actual_df, expected_df, underline_cells=True)
We can get the summary of tests execution and also details on the failed tests.