# Agenda for Jupyter
<img src="https://jupyter.org/assets/main-logo.svg" style="float:right;height:240pt"/>

1. Configure Your Settings
1. Connect to Snowflake
1. Query Snowflake Data
1. Parametrizing Queries
1. Stat Plan Python Routines
1. Manipulating Pandas DataFrames


## Configure Your Settings

Navigate to **📁 / aeconf / aeuser.txt**

**Lines 18-22 are the same for everyone:**

```
## Snowflake
## ----------------
SFAUTHENTICATOR='https://verisk.okta.com'
SFACCOUNT='verisk.us-east-1.privatelink'
SFWAREHOUSE='IPO_ANALYTICS_WH'
```

```
SFDATABASE='YOUR_GROUP_DATABASE'
SFSCHEMA='YOUR_SCHEMA_CREATED_EARLIER'
SFUSER='YOUR-i-NUMBER'
SFROLE='YOUR_ROLE'
```

## Connect to Snowflake

In [None]:
# System packages imports
import os
import pandas as pd
from getpass import getpass

In [None]:
# Import our in-house package to connect to Snowflake

In [None]:
# Prompt your password since we did not save it in aeconf.txt
# Reference: https://docs.python.org/3.7/library/getpass.html
os.environ[__________] = getpass(prompt='Password: ', stream=None)

In [None]:
# This will read aeconf.txt settings that we configured earlier,
# and use those settings to initailize a connection to Snowflake database
# Reference: https://docs.snowflake.com/en/user-guide/python-connector.html
sf_conn = sf.get_connect()

In [None]:
# This will enable you to perform sql_magic in your notebook. 
# sql_magic is Jupyter magic for writing SQL to interact with relational databases. 
# Query results are saved directly to a Pandas dataframe.
# Reference: https://github.com/pivotal-legacy/sql_magic
%reload_ext sql_magic
%config SQL.conn_name = __________ # we defined sf_conn in the cell above

## Query Snowflake Data

How many records are in our dataset?

In [None]:
%%read_sql
SELECT count(1) as cnt
FROM IPO_DEAL_DB.Q1_2020_TRAINING.BLDG_CONT_PREM_SUBSET

What's the aggregate premium amount and policy count per (constr_mtrl, st, inc_yr) grouping?

A unique policy is assumed to be defined as having distinct (tl_grp, co, prem_rec_id, inc_yr) values

In [None]:
%%read_sql -d ___________
SELECT 
  constr_mtrl,
  st,
  inc_yr,
  ___________ AS prem_amt, -- aggregate premium amounts
  ___________ AS expo, -- aggregate exposure
  COUNT(___________) AS pol_cnt -- count of unique policies
FROM 
  ______________________________
WHERE
  (inc_yr::INTEGER BETWEEN ____ AND ____) AND -- filter to records with inception year between 2014 and 2018
  closeout_date ________ AND -- filter to closeout as of 20192
  bldg_cont_cov_id __________ AND -- filter to 'bldg' coverage records
  liab_prop_cov_id __________ AND -- filter to 'prop' and 'comb' coverage records
  terror_cov_cd ___________ AND -- exclude certain terror coverages '7' and '8'
  asgnd_sev ___________ -- only severity of 0 or 1 has data quality worth processing
GROUP BY 
  constr_mtrl, st, inc_yr
ORDER BY 
  constr_mtrl, st, inc_yr

In [None]:
df_input_selections  # contains results of our last query

### Quick Note on Magic Commands: `%` and `%%`

In [None]:
%%read_sql 
-- Description of my code
/*
Must use SQL-style comments because %%read_sql tells Jupyter to treat this whole cell as SQL

nothing is allowed above %%read_sql

and no Python is allowed in this cell
*/

In [None]:
print('This passes!')

%read_sql SELECT COUNT(1) FROM IPO_DEAL_DB.Q1_2020_TRAINING.BLDG_CONT_PREM_SUBSET;

## Parametrizing Queries: Using Python Variables in SQL
Previous code requires you to manually update SQL to run with new filter conditions, but we can do better.

In [None]:
# Let's define some varaibles in Python

In [None]:
%%read_sql
CREATE OR REPLACE TEMP TABLE ____________ AS (
    SELECT 
      inc_yr, st, constr_mtrl, 
      SUM(prem_amt) AS prem_amt,
      SUM(expo) AS expo,
      COUNT(DISTINCT(tl_grp||co||prem_rec_id||inc_yr)) AS pol_cnt
    FROM 
      __________________
    WHERE
      inc_yr::INTEGER BETWEEN ___________  AND ___________ AND
      closeout_date <= ____________ AND
      lower(bldg_cont_cov_id) = 'bldg' AND
      lower(liab_prop_cov_id) IN ('prop', 'comb') AND
      terror_cov_cd NOT IN ('7', '8') AND
      asgnd_sev IN ('0', '1')
    GROUP BY 
      constr_mtrl, st, inc_yr
    ORDER BY 
      constr_mtrl, st, inc_yr
);

In [None]:
%%read_sql

SELECT * FROM {my_table_name};

In [None]:
%read_sql SELECT COUNT(1) as cnt FROM {my_table_name};

## Stat Plan Python Routines

<a href="https://epm.verisk.com/confluence/pages/viewpage.action?pageId=177871408">Actuarial Routines Documentation</a>


### Quick Note on Dictionaries

In [None]:
my_dict = {}
print(type(my_dict)) # what type of python object is 'd1'?
print(my_dict) # print the entire dictionary

In [None]:
# What are the dictionary keys?

In [None]:
# What are the dictionary values?

In [None]:
# Print each (key, value) item in the dictionary, line by line:

### Adjust Exposures by Term

In [None]:
# Import the python class for earnings from the statisitcal plan common library spcommom
from spcommon.adj_expo_by_term import AdjExpoByTerm

print(AdjExpoByTerm.__doc__)  # show documentation for this routine

In [None]:
input_param = {
    
}

In [None]:
input_param

In [None]:
# initialize the AdjExpoByTerm procedure

In [None]:
# get documentation on the AdjExpoByTerm procedure

In [None]:
# shows the SQL underlying the procedure, without actually executing it

In [None]:
# executes the SQL and save the result as a variable in python

In [None]:
# print the table name on Snowflake containing the results

In [None]:
%%read_sql output_2

SELECT 
  constr_mtrl,
  st,
  inc_yr,
  SUM(expo) as expo, -- aggregate exposure
  SUM(expo_adjusted) as expo_adjusted, -- aggregate adjusted exposure
  SUM(prem_amt) AS prem_amt, -- aggregate premium amounts
  COUNT(DISTINCT(tl_grp||co||prem_rec_id||inc_yr)) AS pol_cnt -- unique policies
FROM 
  _______________
WHERE
  (inc_yr::INTEGER BETWEEN {inc_start_yr} AND {inc_end_yr}) AND -- filter to records with inception year between 2014 and 2018
  closeout_date <= {closeout_date} AND -- filter to closeout as of 20192
  bldg_cont_cov_id = 'bldg' AND -- filter to 'bldg' coverage records
  liab_prop_cov_id IN ('prop', 'comb') AND -- filter to 'prop' and 'comb' coverage records
  terror_cov_cd NOT IN ('7', '8') AND -- exclude terror coverages '7' and '8'
  asgnd_sev IN ('0', '1') -- only severity of 0 or 1 has data quality worth processing
GROUP BY 
  constr_mtrl, st, inc_yr
ORDER BY 
  constr_mtrl, st, inc_yr;

## Manipulating Pandas DataFrames

In [None]:
# Let's get some basic info about our dataframe

In [None]:
# Let's do some standard calculations on all our numerical columns

In [None]:
# Let's select the first 3 columns of the first 13 rows

In [None]:
# Let's select records 11-20, and limit our output to just 2 columns: CONSTR_MTRL, EXPO_ADJUSTED

In [None]:
# Let's select records 26-30, and limit our output to all columns between CONSTR_MTRL and EXPO_ADJUSTED

In [None]:
# Which records have a policy count greater than 2000?

In [None]:
# Display our data sorted by premium amount in ascending order

In [None]:
# Pivot our data: 
# for each state and for each inception year, 
# what is the aggregate premium amount, adjusted exposure, and policy count?
pivot1 = pd.pivot_table(output_2,
                        ___________,
                        ___________,
                        ___________,
                        margins = True)
pivot1

In [None]:
# Pivot our data: 
# for each state and for each inception year, 
# what is the aggregate premium amount, adjusted exposure, and policy count per construction material type?
pivot1 = pd.pivot_table(output_2,
                        ___________,
                        ___________,
                        ___________,
                        ___________,
                        margins = True)
pivot2

### Extracting Data to CSV
<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html">Pandas Write Data Frame to CSV</a>

In [None]:
pivot2.to_csv('extract_pivot.csv', sep=',', na_rep='NA')

## Cleanup

In [None]:
sf.close_connect()  # closes the Snowflake connection

## References
- [Greenplum SQL vs Snowflake SQL](https://epm.verisk.com/confluence/display/ISUIAE/GPDB+SQL+vs+SFDB+SQL)
- [Actuarial Routines Documentation](https://epm.verisk.com/confluence/pages/viewpage.action?pageId=177871408)
- [Pandas Write Data Frame to CSV](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)
- [Pandas DataFrame Documentation](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)