# Advanced-notebooks template (aka quick hits)
Author: DS McStatsy \
Version Date: [date]


#### Submission Process:
1. Create a branch in accelerators_dev/advanced_experimentation. Build out your notebook
2. Ensure the checklist below is satisified before submitting
3. Create a pull request to the accelerators_dev/advanced_experimentation folder
4. tag @cmccann11, @joao, and @y2ee201 for review
5. The first review is code review. Afer that, is copy review by the docs team


#### Checklist for an accelerator  - style
1. Ensure authors and date are the top of the notebook (or in the readme), as well as the DataRobot logo and integration partner logo (eg AWS) if applicable
2. Ensure you have/address: 
    - A clear title, identify major integrations if present (e.g. Huggingface, Kubeflow, AWS)
    - What is the problem? 
    - Why is it a problem for data scientists/ our users? 
    - How we can solve it? 
    - A few examples to support your statement, covering WHEN and WHY you would use this approach 
    - What's the benefit of doing this
    - A list of what they will learn (steps to achieve HOW)
    - ** Each major section of code maps to an item on the list of what they will learn. 
3. For each header in the body, use the same wording as the numbered section at the top
4. Ensure there is markdown explaining WHY each section of code is coming up
5. Ensure comments in the code block indicate WHAT is happening (assume people don't know our API very well).Assume your reader can understand python, but is at a beginner level when it comes to our Python SDK  

### Checklist for code
1. If importing data, use a standard method in this folder or the end-to-end notebooks 
2. use project.analyze_and_model 
3. use dr.Client() to auth in with api and token, we will use this for tracking in Kibana
4. indicate your version of the datarobot package, and use a requirements.txt if using multiple packages 
5. The notebook should run without error
6. Minimize hardcoded variables; you want this to run, but also want them to easily adapt it to their own data
7. Be aware of how other notebooks are doing certain functions. We are early and will standardize, but we don't want 15 different ways of getting the top model from the leaderboard. 

Some good examples  
- https://github.com/datarobot-community/ai-accelerators/blob/main/end-to-end/Databricks_End_To_End.ipynb
- https://github.com/datarobot-community/ai-accelerators/blob/main/end-to-end/GCP%20DataRobot%20End%20To%20End.ipynb

### ************** Notebook below ********************** 

### Clear title
what the notebook covers  
when you would use it (or not)  
why it  matters  
how - To tackle X, we will do Y

### What you will learn
1. XYZ (eg load data)
2. XYZ (eg compute custom thing)
3. XYZ (eg evaluate different targets...)
N. XYZ

## 1. XYZ (eg load data) - example below from Brent
### Setup

#### Import libraries

The first cell of the notebook imports necessary packages, and sets up the connection to the DataRobot platform. There are also optional values that can be provided to use an existing project and deployment - if they are omitted then a new Autopilot session will be kicked off and a new deployment will be created using DataRobot's recommended model.

In [None]:
import datarobot as dr
from io import StringIO
import pandas as pd
from py4j.java_gateway import java_import
from pyspark.sql import DataFrame
from pyspark.sql.functions import col
import requests
import time

api_key = "" # Get this from the Developer Tools page in the DataRobot UI
endpoint = "https://app.datarobot.com/" # This should be the URL you use to access the DataRobot UI

dr.Client(endpoint="%sapi/v2" % (endpoint), token=api_key)

# Set these to empty strings to create a new project and/or deployment
project_id = ""
deployment_id = ""

#### Connect to DataRobot

In [None]:
dr.Client()
# The `config_path` should only be specified if the config file is not in the default location described in the API Quickstart guide
# dr.Client(config_path = 'path-to-drconfig.yaml')

### Import data
Here you'll pull in some data to work with. If a data table is available, you can provide the input table name, destination table name, and target feature in this cell. If none of those are provided, load the sample dataset provided by Databricks. This is also where any necessary data preparation would occur before sending the dataset to DataRobot. Note that DataRobot does not currently ingest Spark dataframes directly, so the dataframe will need to be converted to a Pandas dataframe prior to upload.

In [None]:
training_table = ""
scoring_table = ""
target = ""

if training_table == "":
    scoring_table = "white_wine_scored"
    target = "quality"
    input_df = spark.read.option("header",True).option("delimiter",";").csv("dbfs:/databricks-datasets/wine-quality/winequality-white.csv")
    input_df = input_df.select([col(column).alias(column.replace(" ","_")) for column in input_df.columns])
else:
    input_df = sql("select * from %s" % (training_table))

df = input_df.toPandas()
display(input_df)

## 2. XYZ (eg compute custom thing)

In order to explore multiple target framings, we need to apply a transform on....etc

In [None]:
magic =df.apply(lambda x: x.super_magic)