Skip to content

deepavasanthkumar/pysparktesting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

We all know the difficult part of pyspark is to get a validate it with proper test execution. This repo is to aid testing pyspark with the help of chispa testing framework. I have given a basic example which can be expanded or modified as the per the need of the business problem.

pyspark testing using colab and chispa testing framework

Code utilities to test pyspark code

The step by step way of executing test scripts using colab Poetry is being used packaging utility.

configure poetry in colab

Configure poetry to use in colab image

Give the required details to configure poetry image

Write the test scripts

using chispa for data frame equality checks in spark. we can directly compare two dataframes

image

image

Execute and evaluate the results

like this: expected_df = spark.createDataFrame(expected_data, ["name", "clean_name"]) assert_df_equality(actual_df, expected_df, underline_cells=True)

We can get the summary of tests execution and also details on the failed tests. image

About

Code utilities to test pyspark code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages