# Snowpark for Python

This is a simple example of getting Snowpark up and running.<br>

For performing activities with snowflake/snowpark & Python, install the follow packages: 

- snowflake-connector-python
- snowflake-snowpark-python
- snowflake-sqlalchemy

In [None]:
# imports
import pandas as pd
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, when, lit, explode,split,\
    replace, substring, charindex, array_agg, object_construct_keep_null # example funcs to import & use

Create a dictionary containing your Snowflake connection configs. Typically, I would usually have this in an environment file or config file that does not get committed to any version history. This will avoid passwords within code etc.

In [None]:
snowConn = {
    'account': '',
    'user': 'my_username',
    'password': 'my_password',
    'role': 'my_de_role',
    'warehouse': 'my_warehouse',
    'database': 'my_database',
    'schema': 'my_schema'
    #,'authenticator': 'externalbrowser' # This may be needed if you connect to Snowflake UI via SSO
}

try:
    # this will create the `snowpark` object that refers to your remote snowflake compute
    snowpark = Session.builder.configs(snowConn).create()
    print("Snowpark is now available!")
except Exception as e:
    print(e) 

As a really basic example, assume you had a pandas data frame, locally, from reading a CSV file. You can read this pandas df into a Snowpark df, and you could then write it to Snowflake!

In [None]:
# this reads a fake CSV file of data
file = "fake_data.csv"
local_data = pd.read_csv(file)

# now push the pandas DF to a snowpark DF
snow_df = snowpark.create_dataframe(local_data)

# maybe we want to add a new column, a simple literal string for instance
# this could be "source file", so we know where the original data came from
snow_df_new = snow_df.with_column('SOURCE_FILE', lit(str(file)))

We can also create views of these dataframes (as temp in session memory, so until the snowpark session terminates) which would allow snowflake SQL to be performed on the data frame

In [None]:
# create temp view - call it MY_DATA
snow_df_new.create_or_replace_temp_view("MY_DATA")

# now we can build another dataframe, that aggregates data, but this time, use SQL!
aggQuery = """
SELECT
    CLASS,
    COUNT(NAME) AS NO_OF_STUDENTS,
    SUM(SCORE_100) AS TOTAL_SCORE_CLASS
FROM MY_DATA
GROUP BY CLASS
ORDER BY CLASS                       
"""
agg_data = snowpark.sql(aggQuery)

We now have a dataframe that is an aggregate of our data!<br>
Maybe, we want to save that to a table in snowflake. To do this, we could use:

In [None]:
# write data to snowflake table
agg_data.write.mode("overwrite").save_as_table("PUBLIC.AGGREGATED_DATA_TEST")

Once that completes, you would now be able to query it with Snowpark again! For example:

In [None]:
# display class A only
agg_class_a = snowpark.sql("SELECT * FROM PUBLIC.AGGREGATED_DATA_TEST WHERE CLASS = 'A'")

# we can actually use the toPandas() method to bring the DF results back to local memory & display via pandas
agg_class_a.toPandas() 

Let's now close the session

In [None]:
# close snowpark session 
snowpark.close()

You can obviously do way more complex workloads here, and the syntax is pretty much 99% compaitble with PySpark (the Python API for Spark, which is super helpful as it has a better established online community atm) <br>

Docs can be found at: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/index 