# Tutorial
A. This tutorial will show you how to 
    1. Create jobs
    2. Run instances of a job
    3. Run a job on a schedule
    4. View data in sql directly
B. Requirements
    1. Using the linear regression calculated in the Magics tutorial develop some spark code in a spark context that
        - takes the joined output table generated in the Magics tutorial (D3S_Training_%Name%_PerthMaxTemps)
        - filter the table to a particular day
        - add a new column to the table named PerthEstimate which stores the result of the regression
        - calculate the day as the 
            - days=floor of the ((the current datetime minus recent datetime in minutes) divided by 5)
            - day=datetime(2014,1,1)+timedelta(days=days)
    2. Create a table definition to store the output of the code above in a SQL store
    3. Take the script developed above and create a job with D3S_Training_%Name%_PerthMaxTemps as the input table and the new table definition as the output
    4. Run a single instance of the job
    5. Run the job on a 5 minute schedule. This will cause a new days data to be written out every 5 minutes
    6. View the resulting data in the SQL data store

## Load the required libraries

In [None]:
from neuro_python.neuro_compute import spark_manager as spm
from neuro_python.neuro_data import schema_manager as sm
from neuro_python.neuro_data import sql_commands as sc

## Start the default cluster and ensure it's running

In [None]:
spm.start_cluster()

In [None]:
spm.list_clusters()

## Create context once cluster is in running state

In [None]:
spm.create_context('TrainingContext')

## 1. Build the job script and test it using magics
Change the data store names to match yours

In [None]:
%%spark_import_table
import_table('df_PerthMaxTemps','DataLakeName','D3S_Training_Lee_PerthMaxTemps')

In [None]:
%%spark_sql
select *, (Airport_MaxTemp * 0.668860401751 + Gosnells_MaxTemp * 0.120407996316 + 
           Swanbourne_MaxTemp * 0.145763508482 + Hillarys_MaxTemp * 0.0559960642181 + 0.14003979078607637) as PerthEstimate_MaxTemp
from df_PerthMaxTemps
where
    Year = 2014
    and Month = 1
    and Day = 1

In [None]:
%%spark
year = 2014
month = 1
day = 1
df_PerthMaxTemps.registerTempTable('df_PerthMaxTemps')
df_Estimate = spark.sql("""
select *, (Airport_MaxTemp * 0.668860401751 + Gosnells_MaxTemp * 0.120407996316 + 
           Swanbourne_MaxTemp * 0.145763508482 + Hillarys_MaxTemp * 0.0559960642181 + 0.14003979078607637) as PerthEstimate_MaxTemp
from df_PerthMaxTemps
where
    Year = %s
    and Month = %s
    and Day = %s"""%(year, month, day))

In [None]:
%spark_pandas -df df_Estimate.limit(10)

In [None]:
%%spark
import datetime
import math
current_date=datetime.datetime.utcnow()
startDateTime = datetime.datetime(2019,10,29,2,45)
dataStartDateTime = datetime.datetime(2014,1,1)
diff = (current_date - startDateTime).total_seconds()
iterations = math.floor(diff/(60*5))
estimatedDay = dataStartDateTime + datetime.timedelta(days=iterations)
year = estimatedDay.year
month = estimatedDay.month
day = estimatedDay.day
df_PerthMaxTemps.registerTempTable('df_PerthMaxTemps')
df_Estimate = spark.sql("""
select *, (Airport_MaxTemp * 0.668860401751 + Gosnells_MaxTemp * 0.120407996316 + 
           Swanbourne_MaxTemp * 0.145763508482 + Hillarys_MaxTemp * 0.0559960642181 + 0.14003979078607637) as PerthEstimate_MaxTemp
from df_PerthMaxTemps
where
    Year = %s
    and Month = %s
    and Day = %s"""%(year, month, day))

In [None]:
%spark_pandas -df df_Estimate.limit(10)

## 2. Create output table in a SQL store
Change the data store name to match yours

In [None]:
cols=[sm.column_definition('Year','Int'),
     sm.column_definition('Month','Int'),
     sm.column_definition('Day','Int'),
     sm.column_definition('Airport_MaxTemp','Double'),
     sm.column_definition('Gosnells_MaxTemp','Double'),
     sm.column_definition('Swanbourne_MaxTemp','Double'),
     sm.column_definition('Perth_MaxTemp','Double'),
     sm.column_definition('Hillarys_MaxTemp','Double'),
     sm.column_definition('PerthEstimate_MaxTemp','Double')]
table_def=sm.table_definition(cols,'Processed',file_type='delta')
sm.create_table('SqlStoreName','D3S_Training_Lee_PerthMaxTempsEstimate',table_def)

## 3. Submit the job

In [None]:
?spm.submit_job

In [None]:
pyscript = '''
import datetime
import math
current_date=datetime.datetime.utcnow()
startDateTime = datetime.datetime(2019,10,29,2,45)
dataStartDateTime = datetime.datetime(2014,1,1)
diff = (current_date - startDateTime).total_seconds()
iterations = math.floor(diff/(60*5))
estimatedDay = dataStartDateTime + datetime.timedelta(days=iterations)
year = estimatedDay.year
month = estimatedDay.month
day = estimatedDay.day
df_PerthMaxTemps.registerTempTable('df_PerthMaxTemps')
df_Estimate = spark.sql("""
select *, (Airport_MaxTemp * 0.668860401751 + Gosnells_MaxTemp * 0.120407996316 + 
           Swanbourne_MaxTemp * 0.145763508482 + Hillarys_MaxTemp * 0.0559960642181 + 0.14003979078607637) as PerthEstimate_MaxTemp
from df_PerthMaxTemps
where
    Year = %s
    and Month = %s
    and Day = %s"""%(year, month, day))
'''

In [None]:
itable=spm.import_table('df_PerthMaxTemps','DataLakeName','D3S_Training_Lee_PerthMaxTemps')

In [None]:
etable=spm.export_table('df_Estimate','SqlStoreName','D3S_Training_Lee_PerthMaxTempsEstimate')

In [None]:
job = spm.submit_job('D3S_Training_Lee_EstimatePerthMaxTemp',pyscript,import_tables=[itable],export_tables=[etable])

In [None]:
job

## See your job in the list of jobs and inspect the detail

In [None]:
spm.list_jobs()

In [None]:
spm.get_job_details(job['JobId'])

## 4. Run a single instance of the job

In [None]:
?spm.run_job

In [None]:
run=spm.run_job(job['JobId'],'Test')

#### View the status of the run

In [None]:
spm.list_runs(job['JobId'],run_id=run['RunId'])

## 5. Run the job on a schedule
Crontab is used for the cron expression
https://crontab.guru/

In [None]:
?spm.run_schedule

In [None]:
schedule=spm.run_schedule(job['JobId'],'TestSchedule','*/5 * * * *')

#### View the status of run triggered by the schedule

In [None]:
spm.list_runs(job['JobId'],schedule_id=schedule['ScheduleId'])

## 6. View the results in the sql table
Change the data store name to match yours

In [None]:
%%sql -sn SqlStoreName
select top 100 *
from D3S_Training_Lee_PerthMaxTempsEstimate