# ANOVOS - Datetime
Following notebook shows the list of functions related to "datetime" module provided under ANOVOS package and how it can be invoked accordingly.
- [Timestamp and Epoch Conversion](#Timestamp-and-Epoch-Conversion)
- [Timezone Conversion](#Timezone-Conversion)
- [Timestamp and String Conversion](#Timestamp-and-String-Conversion)
- [Dateformat Conversion](#Dateformat-Conversion)
- [Time Units Extraction](#Time-Units-Extraction)
- [Time Difference](#Time-Difference)
- [Time Elapsed](#Time-Elapsed)
- [Adding Time Units](#Adding-Time-Units)
- [Timestamp Comparison](#Timestamp-Comparison)
- [Aggregator](#Aggregator)
- [Window Aggregation](#Window-Aggregation)
- [Lagged Timeseries](#Lagged-Timeseries)
- [Start / End of Month / Year / Quarter](#Start-/-End-of-Month-/-Year-/-Quarter)
- [Binary features](#Binary-features)
    - Is start/end of month/year/quarter nor not
    - Is first half of the year/selected hours/leap year/weekend or not

**Setting Spark Session**

In [1]:
import pandas as pd
import pyspark
import os
from pyspark.sql import functions as F
from pyspark.sql import types as T
from pyspark.sql.window import Window

In [2]:
#set run type variable
run_type = "local" # "local", "emr", "databricks", "ak8s"

In [4]:
#For run_type Azure Kubernetes, run the following block 
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

if run_type == "ak8s":
    fs_path="<insert conf spark.hadoop.fs master url here> ex: spark.hadoop.fs.azure.sas.<container>.<account_name>.blob.core.windows.net"
    auth_key="<insert value of sas_token here>"
    master_url="<insert kubernetes master url path here> ex: k8s://"
    docker_image="<insert name docker image here>"
    kubernetes_namespace ="<insert kubernetes namespace here>"

    # Create Spark config for our Kubernetes based cluster manager
    sparkConf = SparkConf()
    sparkConf.setMaster(master_url)
    sparkConf.setAppName("Anovos_pipeline")
    sparkConf.set("spark.submit.deployMode","client")
    sparkConf.set("spark.kubernetes.container.image", docker_image)
    sparkConf.set("spark.kubernetes.namespace", kubernetes_namespace)
    sparkConf.set("spark.executor.instances", "4")
    sparkConf.set("spark.executor.cores", "4")
    sparkConf.set("spark.executor.memory", "16g")
    sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
    sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
    sparkConf.set(fs_path,auth_key)
    sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
    sparkConf.set("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.2.0,com.microsoft.azure:azure-storage:8.6.3,io.github.histogrammar:histogrammar_2.12:1.0.20,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.20,org.apache.spark:spark-avro_2.12:3.2.1")

    # Initialize our Spark cluster, this will actually
    # generate the worker nodes.
    spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
    sc = spark.sparkContext

#For other run types import from anovos.shared.
else:
    from anovos.shared.spark import *
    auth_key = "NA"
    
sc = spark.sparkContext
sc.setLogLevel('ERROR')

# Check the timezone
print('Spark Timezone:', spark. conf.get("spark.sql.session.timeZone"))

Spark Timezone: GMT


### Read Input Data

In [5]:
from anovos.data_ingest.data_ingest import read_dataset
df = read_dataset(spark, file_path='../data/datetime_dataset/dataset2.csv', file_type="csv",
                  file_configs={"header": "True", "delimiter": "," , "inferSchema": "True"})
df.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


In [6]:
df.printSchema()

root
 |-- id: integer (nullable = true)
 |-- time1: timestamp (nullable = true)
 |-- time2: string (nullable = true)
 |-- unix: integer (nullable = true)
 |-- Temperature: double (nullable = true)
 |-- Humidity: double (nullable = true)
 |-- Light: double (nullable = true)
 |-- CO2: double (nullable = true)
 |-- HumidityRatio: double (nullable = true)
 |-- Occupancy: integer (nullable = true)



# Timestamp and Epoch Conversion

## Timestamp to Unix
- API specification of function **timestamp_to_unix** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [7]:
from anovos.data_transformer.datetime import timestamp_to_unix

In [8]:
# Example 1: result in second + input column in local timezone + append the new column
odf = timestamp_to_unix(spark, df, 'time1', output_mode='append')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_unix
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1423637280
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,1423637340
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,1423637400
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1423637460
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1423637460


In [9]:
# Example 2: result in millisecond + input column in local timezone + replace the original column
odf = timestamp_to_unix(spark, df, 'time1', precision="ms")
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,1423637280000,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,1423637340000,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,1423637400000,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,1423637460000,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,1423637460000,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


In [10]:
# Example 3: result in second + input column in utc + append the new column
odf = timestamp_to_unix(spark, df, 'time1', tz='utc', output_mode='append')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_unix
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1423637280
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,1423637340
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,1423637400
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1423637460
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1423637460


## Unix to Timestamp
- API specification of function **unix_to_timestamp** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [11]:
from anovos.data_transformer.datetime import unix_to_timestamp

In [12]:
# Example 1: input column in second & local timezone + append the new column
odf = unix_to_timestamp(spark, df, 'unix', output_mode='append')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,unix_ts
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:51:00


In [13]:
# Example 2: input column in millisecond & local timezone + replace the original column
df2 = df.withColumn('unix_ms', F.col('unix')*F.lit(1000.0))
odf = unix_to_timestamp(spark, df2, 'unix_ms', precision="ms")
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,unix_ms
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:51:00


In [14]:
# Example 3: input column in millisecond & UTC + append the new column
odf = unix_to_timestamp(spark, df, 'unix', tz='utc', output_mode='append')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,unix_ts
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:51:00


# Timezone Conversion
- API specification of function **timezone_conversion** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [15]:
from anovos.data_transformer.datetime import timezone_conversion

In [16]:
# Example 1: local to UTC + append the new column
odf = timezone_conversion(spark, df, 'time1', given_tz='local', output_tz='UTC',output_mode='append')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_tzconverted
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:51:00


In [17]:
# Example 2: UTC to local + replace the original column
odf = timezone_conversion(spark, df, 'time1', given_tz='UTC', output_tz='local')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


# Timestamp and String Conversion

## String to Timestamp
- API specification of function **string_to_timestamp** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [18]:
from anovos.data_transformer.datetime import string_to_timestamp

In [19]:
# Example 1: output timestamp + append the new column
odf = string_to_timestamp(spark, df, 'time2', input_format="%d/%m/%y %H:%M", output_type="ts",output_mode="append")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time2_ts
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 14:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 14:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 14:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 14:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 14:51:00


In [20]:
# Example 2: output date + replace the original column
odf = string_to_timestamp(spark, df, 'time2', input_format="%d/%m/%y %H:%M", output_type="dt")
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015-02-11 06:48:00,2015-02-11,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015-02-11 06:49:00,2015-02-11,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015-02-11 06:50:00,2015-02-11,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015-02-11 06:51:00,2015-02-11,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015-02-11 06:51:00,2015-02-11,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


## Timestamp to String
- API specification of function **timestamp_to_string** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [21]:
from anovos.data_transformer.datetime import timestamp_to_string

In [22]:
# Example 1: output format: %Y/%d/%m %H:%M:%S + append the new column
odf = timestamp_to_string(spark, df, 'time1', output_format="%Y/%d/%m %H:%M:%S",output_mode="append")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_str
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015/11/02 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015/11/02 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015/11/02 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015/11/02 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015/11/02 06:51:00


In [23]:
# Example 2: output format: %Y/%d/%m + replace the original column
odf = timestamp_to_string(spark, df, 'time1', output_format="%Y/%d/%m")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015/11/02,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015/11/02,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015/11/02,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015/11/02,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015/11/02,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


In [24]:
# Example 3: output format: %Y + replace the original column
odf = timestamp_to_string(spark, df, 'time1', output_format="%Y")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


# Dateformat Conversion
- API specification of function **dateformat_conversion** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [25]:
from anovos.data_transformer.datetime import dateformat_conversion

In [26]:
# Example 1: to default output format %Y-%m-%d %H:%M:%S + append the new column
odf = dateformat_conversion(spark, df, 'time2', input_format="%d/%m/%y %H:%M", output_mode="append")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time2_ts
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 14:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 14:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 14:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 14:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 14:51:00


In [27]:
# Example 1: to %Y/%m/%d + replace the original column
odf = dateformat_conversion(spark, df, 'time2', input_format="%d/%m/%y %H:%M", output_format="%Y/%m/%d")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,1,2015-02-11 06:48:00,2015/02/11,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2,2015-02-11 06:49:00,2015/02/11,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1
2,3,2015-02-11 06:50:00,2015/02/11,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1
3,4,2015-02-11 06:51:00,2015/02/11,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1
4,5,2015-02-11 06:51:00,2015/02/11,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1


# Time Units Extraction
- API specification of function **timeUnits_extraction** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [28]:
from anovos.data_transformer.datetime import timeUnits_extraction

In [29]:
# Example 1: Extract all units + append new columns
odf = timeUnits_extraction(df, 'time1', 'all')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_hour,time1_minute,time1_second,time1_dayofmonth,time1_dayofweek,time1_dayofyear,time1_weekofyear,time1_month,time1_quarter,time1_year
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,6,48,0,11,4,42,7,2,1,2015
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,6,49,0,11,4,42,7,2,1,2015
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,6,50,0,11,4,42,7,2,1,2015
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,6,51,0,11,4,42,7,2,1,2015
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,6,51,0,11,4,42,7,2,1,2015


In [30]:
# Example 2: Extract selected units + append new columns
odf = timeUnits_extraction(df, 'time1', ['dayofmonth', 'weekofyear', 'quarter'])
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_dayofmonth,time1_weekofyear,time1_quarter
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,11,7,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,11,7,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,11,7,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,11,7,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,11,7,1


In [31]:
# Example 3: Extract selected units + pass units as string + replace the original column
odf = timeUnits_extraction(df, 'time1', 'dayofmonth|weekofyear', output_mode='replace')
odf.limit(5).toPandas()

Unnamed: 0,id,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_dayofmonth,time1_weekofyear
0,1,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,11,7
1,2,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,11,7
2,3,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,11,7
3,4,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,11,7
4,5,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,11,7


# Time Difference
- API specification of function **time_diff** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [32]:
from anovos.data_transformer.datetime import time_diff

In [33]:
# Example 1: output difference in hour + append the new column
df2 = df.withColumn('time3', (F.col('time1') + F.expr('Interval '+ str(1) + ' hours')))

odf = time_diff(df2, 'time1', 'time3', unit='hour')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time3,time1_time3_hourdiff
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 07:48:00,1.0
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 07:49:00,1.0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 07:50:00,1.0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 07:51:00,1.0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 07:51:00,1.0


In [34]:
# Example 2: output difference in second + replace the original column
df2 = df.withColumn('time3', (F.col('time1') + F.expr('Interval '+ str(1) + ' hours')))

odf = time_diff(df2, 'time1', 'time3', unit='second', output_mode="replace")
odf.limit(5).toPandas()

Unnamed: 0,id,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_time3_seconddiff
0,1,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,3600.0
1,2,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,3600.0
2,3,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,3600.0
3,4,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,3600.0
4,5,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,3600.0


# Time Elapsed
- API specification of function **time_elapsed** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [35]:
from anovos.data_transformer.datetime import time_elapsed

In [36]:
# Example 1: output difference in day + append the new column
odf = time_elapsed(df, 'time1', unit='day')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_daydiff
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2847.085424
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2847.084729
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2847.084035
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2847.08334
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2847.08334


In [37]:
# Example 2: output difference in year + replace the original column

odf = time_elapsed(df, 'time1', unit='year', output_mode="replace")
odf.limit(5).toPandas()

Unnamed: 0,id,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_yeardiff
0,1,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,7.800234
1,2,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,7.800232
2,3,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,7.80023
3,4,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,7.800228
4,5,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,7.800228


# Adding Time Units
- API specification of function **adding_timeUnits** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [38]:
from anovos.data_transformer.datetime import adding_timeUnits

In [39]:
# Example 1: minus 2 years + append the new column
odf = adding_timeUnits(df, 'time1', unit='years', unit_value=-2)
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_adjusted
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2013-02-11 06:48:00
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2013-02-11 06:49:00
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2013-02-11 06:50:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2013-02-11 06:51:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2013-02-11 06:51:00


In [40]:
# Example 2: plus 30 seconds + replace the original column
odf = adding_timeUnits(df, 'time1', unit='seconds', unit_value=30, output_mode="replace")
odf.limit(5).toPandas()

Unnamed: 0,id,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_adjusted
0,1,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-11 06:48:30
1,2,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-11 06:49:30
2,3,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:50:30
3,4,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:51:30
4,5,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:51:30


# Timestamp Comparison
- API specification of function **timestamp_comparison** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [41]:
from anovos.data_transformer.datetime import timestamp_comparison

In [42]:
# Example 1: use the default comparison_format + append the new column
odf = timestamp_comparison(spark, df, "time1", comparison_type="less_than", comparison_value="2015-02-11 06:50:00")
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_compared
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


In [43]:
# Example 2: use nondefault comparison_format + append the new column
odf = timestamp_comparison(spark, df, "time1", comparison_type="greaterThan_equalTo", 
                           comparison_value="2015/02/11 06:50:00", comparison_format="%Y/%m/%d %H:%M:%S")
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_compared
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,0
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1


In [44]:
# Example 3: use nondefault comparison_format + replace the original column
odf = timestamp_comparison(spark, df, "time1", comparison_type="greater_than", 
                           comparison_value="2015/02/11 06:50:00", comparison_format="%Y/%m/%d %H:%M:%S",
                           output_mode="replace")
odf.limit(5).toPandas()

Unnamed: 0,id,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_compared
0,1,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,0
1,2,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1
4,5,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1


# Aggregator
- API specification of function **aggregator** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [45]:
from anovos.data_transformer.datetime import aggregator

In [46]:
# Example 1: aggregate by date
odf = aggregator(spark, df, ['Temperature', 'Humidity'], list_of_aggs=['min', 'max'], time_col='time1', 
                 granularity_format="%Y-%m-%d")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,time1,Temperature_min,Temperature_max,Humidity_min,Humidity_max
0,2015-02-15,20.05,23.29,24.39,32.9
1,2015-02-12,20.5,24.39,21.865,28.89
2,2015-02-14,19.633333,20.926667,31.133333,37.5
3,2015-02-16,19.89,22.0,24.29,30.675
4,2015-02-18,20.7,21.0,26.745,28.1


In [47]:
# Example 2: aggregate by week + pass columns and units as string
odf = aggregator(spark, df, 'Light|CO2', list_of_aggs='mean|median', time_col='time1', 
                 granularity_format="%w")
odf.limit(5).toPandas()

                                                                                

Unnamed: 0,time1,Light_mean,Light_median,CO2_mean,CO2_median
0,3,87.574715,0.0,671.932389,571.666667
1,0,52.439838,0.0,715.600399,684.0
2,5,222.474039,0.0,545.164109,514.0
3,6,18.279005,0.0,530.127477,524.0
4,1,176.2375,0.0,820.286956,769.5


# Window Aggregation
- API specification of function **window_aggregator** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [48]:
from anovos.data_transformer.datetime import window_aggregator

In [49]:
# Example 1: order by time1 + expanding window
odf = window_aggregator(df, ['Temperature', 'Light'], ['min', 'max'], order_col='time1', window_type='expanding')
odf.orderBy('id').limit(10).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,Temperature_min,Temperature_max,Light_min,Light_max
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,21.76,21.76,437.333333,437.333333
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,21.76,21.79,437.333333,437.333333
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,21.76,21.79,434.0,437.333333
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,21.76,21.79,434.0,439.0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,21.76,21.79,434.0,439.0
5,6,2015-02-11 06:53:00,11/2/15 14:53,1423637580,21.76,31.26,437.333333,1014.333333,0.005042,1,21.76,21.79,434.0,439.0
6,7,2015-02-11 06:54:00,11/2/15 14:54,1423637640,21.79,31.1975,434.0,1018.5,0.005041,1,21.76,21.79,434.0,439.0
7,8,2015-02-11 06:55:00,11/2/15 14:55,1423637700,21.79,31.393333,437.333333,1018.666667,0.005073,1,21.76,21.79,434.0,439.0
8,9,2015-02-11 06:55:00,11/2/15 14:55,1423637700,21.79,31.3175,434.0,1022.0,0.00506,1,21.76,21.79,434.0,439.0
9,10,2015-02-11 06:57:00,11/2/15 14:57,1423637820,21.79,31.463333,437.333333,1027.333333,0.005084,1,21.76,21.79,434.0,439.0


In [50]:
# Example 2: order by time1 + rolling window of size 2
odf = window_aggregator(df, 'Humidity|id', 'mean|sum', order_col='time1', window_type='rolling', window_size=5)
odf.orderBy('id').limit(10).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,Humidity_mean,Humidity_sum,id_mean,id_sum
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,31.133333,31.133333,1.0,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,31.066667,62.133333,1.5,3
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,31.085278,93.255833,2.0,6
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,31.094583,124.378333,2.5,10
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,31.102333,155.511667,3.0,15
5,6,2015-02-11 06:53:00,11/2/15 14:53,1423637580,21.76,31.26,437.333333,1014.333333,0.005042,1,31.128611,186.771667,3.5,21
6,7,2015-02-11 06:54:00,11/2/15 14:54,1423637640,21.79,31.1975,434.0,1018.5,0.005041,1,31.139306,186.835833,4.5,27
7,8,2015-02-11 06:55:00,11/2/15 14:55,1423637700,21.79,31.393333,437.333333,1018.666667,0.005073,1,31.204861,187.229167,5.5,33
8,9,2015-02-11 06:55:00,11/2/15 14:55,1423637700,21.79,31.3175,434.0,1022.0,0.00506,1,31.237361,187.424167,6.5,39
9,10,2015-02-11 06:57:00,11/2/15 14:57,1423637820,21.79,31.463333,437.333333,1027.333333,0.005084,1,31.294167,187.765,7.5,45


In [51]:
# Example 3: order by time1 + rolling window of size 2 + partition by Occupancy
odf = window_aggregator(df, 'Humidity|id', 'mean|sum', order_col='time1', window_type='rolling', window_size=5,
                        partition_col='Occupancy')
odf.where(F.col('Occupancy')==0).limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,Humidity_mean,Humidity_sum,id_mean,id_sum
0,37,2015-02-11 07:23:00,11/2/15 15:23,1423639380,21.89,31.55,436.5,1047.0,0.00513,0,31.55,31.55,37.0,37
1,38,2015-02-11 07:24:00,11/2/15 15:24,1423639440,21.89,31.36,434.0,1031.0,0.005099,0,31.455,62.91,37.5,75
2,39,2015-02-11 07:26:00,11/2/15 15:26,1423639560,21.89,31.125,432.75,977.5,0.00506,0,31.345,94.035,38.0,114
3,218,2015-02-11 10:24:00,11/2/15 18:24,1423650240,21.7,28.566667,0.0,582.0,0.004587,0,30.650417,122.601667,83.0,332
4,219,2015-02-11 10:26:00,11/2/15 18:26,1423650360,21.7,28.76,0.0,578.0,0.004618,0,30.272333,151.361667,110.2,551


# Lagged Timeseries
- API specification of function **lagged_ts** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [52]:
from anovos.data_transformer.datetime import lagged_ts

In [53]:
# Example 1: generate the lag column 
odf = lagged_ts(df, 'time1', lag=2, output_type='ts')
odf.orderBy('id').limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_lag2
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,NaT
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,NaT
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-11 06:48:00
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-11 06:49:00
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-11 06:50:00


In [54]:
# Example 2: generate the lag column and the time difference column
odf = lagged_ts(df, 'time1', lag=2, output_type='ts_diff')
odf.orderBy('id').limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_time1_lag2_daydiff
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0.001389
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0.001389
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0.000694


In [55]:
# Example 3: generate the lag column and the time difference column in minutes
odf = lagged_ts(df, 'time1', lag=2, output_type='ts_diff', tsdiff_unit='minutes')
odf.orderBy('id').limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_time1_lag2_minutediff
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2.0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2.0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1.0


In [56]:
# Example 3: generate the lag column and the time difference column in minutes + partition by Occupancy
odf = lagged_ts(df, 'time1', lag=2, output_type='ts_diff', tsdiff_unit='minutes', partition_col='Occupancy')
odf.where(F.col('Occupancy')==0).limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_time1_lag2_minutediff
0,37,2015-02-11 07:23:00,11/2/15 15:23,1423639380,21.89,31.55,436.5,1047.0,0.00513,0,
1,38,2015-02-11 07:24:00,11/2/15 15:24,1423639440,21.89,31.36,434.0,1031.0,0.005099,0,
2,39,2015-02-11 07:26:00,11/2/15 15:26,1423639560,21.89,31.125,432.75,977.5,0.00506,0,3.0
3,218,2015-02-11 10:24:00,11/2/15 18:24,1423650240,21.7,28.566667,0.0,582.0,0.004587,0,180.0
4,219,2015-02-11 10:26:00,11/2/15 18:26,1423650360,21.7,28.76,0.0,578.0,0.004618,0,180.0


# Start / End of Month / Year / Quarter
- `output_mode="replace"` can be used to replace the original column

## Start of Month
- API specification of function **start_of_month** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [57]:
from anovos.data_transformer.datetime import start_of_month

odf = start_of_month(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_monthStart
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-01
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-01
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-01
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-01
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-01


## End of Month
- API specification of function **end_of_month** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [58]:
from anovos.data_transformer.datetime import end_of_month

odf = end_of_month(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_monthEnd
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-02-28
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-02-28
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-02-28
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-02-28
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-02-28


## Start of Year
- API specification of function **start_of_year** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [59]:
from anovos.data_transformer.datetime import start_of_year

odf = start_of_year(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_yearStart
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-01-01
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-01-01
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-01-01
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-01-01
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-01-01


## End of Year
- API specification of function **end_of_year** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [60]:
from anovos.data_transformer.datetime import end_of_year

odf = end_of_year(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_yearEnd
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-12-31
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-12-31
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-12-31
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-12-31
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-12-31


## Start of Quarter
- API specification of function **start_of_quarter** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [61]:
from anovos.data_transformer.datetime import start_of_quarter

odf = start_of_quarter(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_quarterStart
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-01-01
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-01-01
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-01-01
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-01-01
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-01-01


## End of Quarter
- API specification of function **end_of_quarter** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [62]:
from anovos.data_transformer.datetime import end_of_quarter

odf = end_of_quarter(df, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_quarterEnd
0,1,2015-02-11 06:48:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,2015-03-31
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,2015-03-31
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,2015-03-31
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,2015-03-31
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,2015-03-31


# Binary features

## Is Month Start
- API specification of function **is_monthStart** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [63]:
from anovos.data_transformer.datetime import is_monthStart
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-02-01 00:00:00'})

odf = is_monthStart(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_ismonthStart
0,1,2015-02-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Month End
- API specification of function **is_monthEnd** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [64]:
from anovos.data_transformer.datetime import is_monthEnd
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-02-28 00:00:00'})

odf = is_monthEnd(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_ismonthEnd
0,1,2015-02-28 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Year Start
- API specification of function **is_yearStart** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [65]:
from anovos.data_transformer.datetime import is_yearStart
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-01-01 00:00:00'})

odf = is_yearStart(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isyearStart
0,1,2015-01-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Year End
- API specification of function **is_yearEnd** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [66]:
from anovos.data_transformer.datetime import is_yearEnd
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-12-31 00:00:00'})

odf = is_yearEnd(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isyearEnd
0,1,2015-12-31 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Quarter Start
- API specification of function **is_quarterStart** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [67]:
from anovos.data_transformer.datetime import is_quarterStart
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-04-01 00:00:00'})

odf = is_quarterStart(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isquarterStart
0,1,2015-04-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Quarter End
- API specification of function **is_quarterEnd** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [68]:
from anovos.data_transformer.datetime import is_quarterEnd
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-03-31 00:00:00'})

odf = is_quarterEnd(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isquarterEnd
0,1,2015-03-31 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is First Half of the Year 
- API specification of function **is_yearFirstHalf** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [69]:
from anovos.data_transformer.datetime import is_yearFirstHalf
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-12-01 00:00:00'})

odf = is_yearFirstHalf(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isFirstHalf
0,1,2015-12-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,0
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1


## Is Selected Hour
- API specification of function **is_selectedHour** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [70]:
from anovos.data_transformer.datetime import is_selectedHour
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-02-01 03:00:00'})

odf = is_selectedHour(df2, 'time1', 6, 7)
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isselectedHour
0,1,2015-02-01 03:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,0
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,1
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,1
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,1
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,1


## Is Leap Year
- API specification of function **is_leapYear** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [71]:
from anovos.data_transformer.datetime import is_leapYear
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2016-02-01 00:00:00'})

odf = is_leapYear(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isleapYear
0,1,2016-02-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0


## Is Weekend
- API specification of function **is_weekend** can be found <a href="https://docs.anovos.ai/api/data_transformer/datetime.html">here</a>

In [72]:
from anovos.data_transformer.datetime import is_weekend
df2 = df.withColumn('time1', F.col('time1').cast('string')).replace({'2015-02-11 06:48:00': '2015-02-01 00:00:00'})

odf = is_weekend(df2, 'time1')
odf.limit(5).toPandas()

Unnamed: 0,id,time1,time2,unix,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,time1_isweekend
0,1,2015-02-01 00:00:00,11/2/15 14:48,1423637280,21.76,31.133333,437.333333,1029.666667,0.005021,1,1
1,2,2015-02-11 06:49:00,11/2/15 14:49,1423637340,21.79,31.0,437.333333,1000.0,0.005009,1,0
2,3,2015-02-11 06:50:00,11/2/15 14:50,1423637400,21.7675,31.1225,434.0,1003.75,0.005022,1,0
3,4,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.7675,31.1225,439.0,1009.5,0.005022,1,0
4,5,2015-02-11 06:51:00,11/2/15 14:51,1423637460,21.79,31.133333,437.333333,1005.666667,0.00503,1,0
