Skip to content

Snowflake

Herminio Vazquez edited this page Feb 11, 2024 · 8 revisions

Worksheets

Working inside Snowflake, is the activity of running data quality checks using Snowflake Python Worksheets.

Requirements

  • stage under data schema in Snowflake
  • cuallee.whl the wheel distribution from PiPy index with the latest version of cuallee
  • wheel_loader.py a python script developed by the Snowflake Labs
  • anaconda dependencies added to your worksheet
  • warehouse where to run the python worksheet

Steps

  1. Make sure that you enabled and accept the Anaconda Python Packages terms and conditions under Admin > Billing & Terms
  2. Select a schema in your Snowflake instance and create a stage, it does not matter if it is internal or external. Let's call it DEMO_STAGE
  3. Proceed to the PiPy index and proceed to download the built distribution of cuallee. At the time of this writing the file available is: cuallee-0.8.5-py3-none-any.whl
  4. Upload your .whl file into the DEMO_STAGE either via the cli or through the UI
  5. Proceed to download the wheel_loader.py available here
  6. Upload your wheel_loader.py file into the DEMO_STAGE either via the cli or through the UI
  7. Create a new worksheet using the + sign in Snowflake Worksheets and select Python Worksheet
  8. In the top right corner of your worksheet, don't forget to select the warehouse to be used to execute this worksheet
  9. In the top left corner of your worksheet, select the database schema that contains the DEMO_STAGE
  10. Next to the schema selection, and the settings drop down menu, press on the packages drop-down menu
  11. 2 tabs will be available: Anaconda Packages and Stage Packages
  12. In the Anaconda Packages add the following library dependencies required by cuallee:
    1. colorama==0.4.6
    2. pandas==1.5.3
    3. pygments==2.15.1
    4. requests==2.31.0
    5. toolz==0.12.0
    6. snowflake-snowpark-python==1.11.1
  13. In the Stage Packages add the following library dependencies to use cuallee:
    1. @demo_stage/cuallee-0.8.5-py3-none-any.whl
    2. @demo_stage/wheel_loader.py
  14. After completing the package setup for both Anaconda and Stage, the added libraries should appear under the bottom of the drop-down inside the Installed Packages
  15. At this point you are ready to go! below a snippet to test the use of cuallee inside Snowflake
# cuallee 
# checks inside snowflake demo

import snowflake.snowpark as snowpark
import wheel_loader

def main(session: snowpark.Session): 
    
    # Your code goes here, inside the "main" handler.
    wheel_loader.load('cuallee-0.8.5-py3-none-any.whl')
    from cuallee import Check, CheckLevel, Control
    check = Check(CheckLevel.WARNING, "Custom", session=session)
    
    tableName = 'snowflake_sample_data.tpch_sf100.lineitem'
    dataframe = session.table(tableName)
    check.is_greater_than("L_QUANTITY", 2)
    check.is_legit("L_COMMENT")

    # Return value will appear in the Results tab.
    return Control.completeness(dataframe, session=session).union(check.validate(dataframe))

Notebooks

Requirements

  • pip install cuallee
  • pip install cuallee[snowpark] or pip install snowflake-snowpark-python
  • Set environment variables to start a session
    • SF_ACCOUNT obtained by clicking into the bottom left part of your snowflake account and selecting Copy account url
    • Then remove the https:// part and also the snowflakecomputing.com part of the URL
    • It should end up in something like this: SF_ACCOUNT=1234567.region-name.cloud
    • SF_USER your snowflake username
    • SF_PASSWORD your snowflake password
    • SF_ROLE your snowflake role i.e. ACCOUNTADMIN
    • SF_WAREHOUSE your designated warehouse for running data quality checks i.e. COMPUTE_WH
    • SF_DATABASE your database selection for running checks i.e. SNOWFLAKE_SAMPLE_DATA

Demo

Clone this wiki locally