## Brief History of Snowpark API'S

1. Pyspark Dataframes was taken as an insipiration to Snowpark Dataframe

2. Pandas inspired Pandas API on Spark and Pandas on Snowflake API

3. ML Lib - Machine Learning in Pyspark - Inspire Snowpark ML

4. One of the critical difference between pyspark and Snowpark is that

    Pyspark uses mixedCase(ex:groupBy) , while snowpark uses snake_case(ex:group_by).

    

## Snowpark Architecture

Snowpark under the backend converts all the snowpark dataframes into SQL queries and executes all of these queries in parallel in Snowflake. It also does some internal Optmization that will help in reduction of overall cost.

![Snowflake architecture](https://miro.medium.com/v2/resize:fit:1100/format:webp/0*HAwmY9pOb5GSAIAh)

In [None]:
Select 
    n_name as country,
    count(c.*)as tot_customers
From snowflake_sample_data.tpch_sf1.nation n
join snowflake_sample_data.tpch_sf1.customer c
on n_nationkey =  c_nationkey
where 
    n_name in ('CANADA','BRAZIL','CHINA')
group by n_name
order by n_name
    

We use `snowflake.snowpark` Library and `get_active_session` function to connect to snowflake. 

**Note:-** In case your are using any **Streamlit** or **Snowpark Notebooks** , Snowflake will automatically create a new **session** and hence not need to externally be specified.

For more information refer the [link](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Session#snowflake.snowpark.Session)


For the purpose of the lecture we will create a session.

In [None]:
#Connecting to Snowflake:
import pandas as pd
from snowflake.snowpark.context import get_active_session

#Getting active_session:
Session = get_active_session()


#For External or Outside Python Connection:

# import snowflake.connector

# try:
#     conn = snowflake.connector.connect(
#             user='<USERNAME>',
#             password = '<PWD>',
#             account = '<ACCOUNT_IDENTIFIER>',
#             role = '<ACCOUNT_ROLE_NAME>',
#             database = '<DATABSE_NAME>'
#         )
#     cur = con.cursor()
# except Exception as e:
#     logging.error(f"Failed to connect to Snowflake:")
#     raise e


In [None]:
#Executing SQL Queries:
import streamlit as st

query = """
Select 
    n_name as country,
    count(c.*)as tot_customers
From snowflake_sample_data.tpch_sf1.nation n
join snowflake_sample_data.tpch_sf1.customer c
on n_nationkey =  c_nationkey
where 
    n_name in ('CANADA','BRAZIL','CHINA')
group by n_name
order by n_name
    
"""
#Using Snowflake connector:
#res = cur.executre(query)

#Using Pandas
#df =  pd.read_sql(query,cur)


#Using Snowpark:
df = Session.sql(query)



st.dataframe(df)

In [None]:
# Accessing Query History:
qh =  Session.query_history()


In [None]:
# Filtering on Dataframes:
df[df['COUNTRY'].isin('CANDADA','BRAZIL','CHINA')]

In [None]:
# Checking Generated Queries:
qh.queries