# Data Farme Selections and Actions

Course Outline:

1. Outline Snowpark Architecture

    1. Lazy Evaluation

    2. Use Key Objects : Snowpark DataFrames

2. Enhance Performance in Snowpark Applications: Synchronous versus Asych calls

3. Apply operations for filtering and transforming data:

    1. Columns

    2. Data type casting

    3. Rows and Data extraction from a Row object


For more information follow the below links:
1. [Working with dataframe in Snowpark python](https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes#specifying-columns-and-expressions)

2. [Snowflake.snowpark.Dataframe.select](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.DataFrame.select)

In [None]:
#creating a session:

from snowflake.snowpark.context import get_active_session

session =  get_active_session()

In [None]:
# creating sample dataframe:
df =  session.create_dataframe([[1,2,3,4],[5,6,7,8]])
df

In [None]:
# Data frame with columns headers:

df =  session.create_dataframe(
      [[1,2,3,4],[5,6,7,8]],
    schema = ["a","b","c","d"]
)
df

df =  session.create_dataframe(
      [[1,2,3,4],[5,6,7,8]]).to_df("a","b","c","d")

df


In [None]:
#Some actions functions
df.schema
df.columns
df.queries
res =  df.describe()
res

1. `df.schema` :- returns the schema ,i.e only first row containing name of schema.

2. `df.columns` :- returns all the column names of the dataframe.

3. `df.queries `:- returns the queries history of all the quries executed in the notebook.

4. `df.describe()`:- returns the overall stats of the dataframe. Shows the overall statistical overview of dataframe.

In [None]:
#renders dataframes as a text
df.show()

In [None]:
# renders dataframe as a collections of row objects
df.collect()

In [None]:
#extract data from row
df.collect()[0].as_dict()

In [None]:
print(f"Total Rows:{df.count()}") #Count method
print(f"Values 1 in the first row:{df.collect()[0].count(1)}")

In [None]:
#Select specific columns:
from snowflake.snowpark.functions import col

df.select("c",col("c"), df["c"], df.c, df.col("c"))


In [None]:
#Columns as alias:
df2 =  df.select("c",col("c").alias("c2"), df["c"].as_("c3"), df.c.name("c4"))
df2.queries["queries"][0]
df2

In [None]:
#Casting for string types:
from snowflake.snowpark.types import StringType

df.select(
    df.c.cast(StringType()).alias("c1"),
    df.c.astype(StringType()).alias("c2")
)

In [None]:
#Using SQL Expression:
from snowflake.snowpark.functions import sql_expr

df2 = df.select_expr("a+2","cast(b as string)")
df2

df2 =  df.select(
    sql_expr("a+2").as_("ex1"),
    sql_expr("cast(b as string)").as_("ex2")
)
df2

In [None]:
# Remove existing columns:
df.drop("b","d")

In [None]:
#Rename columns:
df.rename({col("a"): "a2", "b": "b2"})

In [None]:
#Asynch and Synch Queries:
from snowflake.snowpark.functions import lit

#add new columns
df.with_column("e",df.a + df.b
              ).with_columns(["f","g"],[lit("11"),lit("12")]
            ).with_column_renamed("a","a2")

In [None]:
#Asynch Job
job =  df2.collect_nowait()
job.query

In [None]:
df =  job.result()
df

In [None]:
df =  job.to_df()
df

In [None]:
job.is_done()

In [None]:
# (try to) cancel AsyncJob
job = session.sql("select SYSTEM$WAIT(3)").collect_nowait()
job.cancel()
job.is_done()

In [None]:
job.is_done()