# Datetime Tutorial
This is a tutorial on how to prepare temporal data for use with Lux. To display temporal fields appropriately in Lux, the column must be converted into Pandas's [`datetime`](https://docs.python.org/3/library/datetime.html) objects. 

In [2]:
import pandas as pd
import lux

## Converting Strings to Datetime objects
To convert column referencing dates/times into [`datetime`](https://docs.python.org/3/library/datetime.html) objects, we use [`pd.to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html).

### Summary: 

```
pd.to_datetime(['2020-01-01', '2020-01-15', '2020-02-01'],format="%Y-%m-%d")
```

### Example:

As an example, a dataframe might contain a `date` attribute as strings of dates:

In [3]:
df = pd.DataFrame({'date': ['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01'], 'value': [10.5,15.2,20.3,25.2]})
df

Button(description='Toggle Pandas/Lux', style=ButtonStyle())

Output()



This is the types detected in Pandas's data type [`dtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html):

In [4]:
df.dtypes

date      object
value    float64
dtype: object

Since `date` is detected as an object type in Pandas, in Lux the `date` field is recognized as a `nominal` field, instead of a `temporal` field:

In [5]:
df.dataType

{'quantitative': ['value'], 'ordinal': [], 'nominal': ['date'], 'temporal': []}

The typing has implications on the generated views, since nominal chart types are displayed as bar charts, whereas temporal fields are plotted as time series line charts.

In [6]:
from lux.view.View import View
view = View(["date","value"])
view.load(df)

LuxWidget(current_view={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'mark': {'toolti…

<View  (x: MEAN(value), y: date) mark: bar, score: 0.0 >

To fix this, we can convert the `date` column into a datetime object by doing:

In [7]:
df['date'] = pd.to_datetime(df['date'],format="%Y-%m-%d")

In [8]:
df['date']

0   2020-01-01
1   2020-02-01
2   2020-03-01
3   2020-04-01
Name: date, dtype: datetime64[ns]

After changing the Pandas data type to datetime, we see that date field is recognized as temporal fields in Lux.

In [9]:
df.dataType

{'quantitative': ['value'], 'ordinal': [], 'nominal': [], 'temporal': ['date']}

In [10]:
view.load(df)

LuxWidget(current_view={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name'…

<View  (x: date, y: MEAN(value)) mark: line, score: 0.0 >

## Advanced Date Manipulation
You might notice earlier that all the dates in our example dataset are the first of the month. In this case, there may be situations where we only want to list the year and month, instead of the full date. Here, we look at how to handle these cases.

Below we look at an example stocks dataset that also have `date` field with each row representing data for the first of each month.

In [11]:
df = pd.read_csv("./data/stocks.csv")

df.dtypes

symbol     object
date       object
price     float64
dtype: object

In [12]:
view = View(["date","price"])
view.load(df)

LuxWidget(current_view={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'mark': {'toolti…

<View  (x: MEAN(price), y: date) mark: bar, score: 0.0 >

If we only want Lux to output the month and the year, we can convert the columnn to a [`PeriodIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html) using [`to_period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.to_period.html). The `freq` argument specifies the granularity of the output. In this case, we are using 'M' for monthly. You can find more about how to specify time periods [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects).

In [13]:
df["date"] = pd.DatetimeIndex(df["date"]).to_period(freq='M')
df

Button(description='Toggle Pandas/Lux', style=ButtonStyle())

Output()



In [14]:
view.load(df)

LuxWidget(current_view={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name'…

<View  (x: date, y: MEAN(price)) mark: line, score: 0.0 >

In [15]:
df['date']

0      2000-01
1      2000-02
2      2000-03
3      2000-04
4      2000-05
        ...   
555    2009-11
556    2009-12
557    2010-01
558    2010-02
559    2010-03
Name: date, Length: 560, dtype: period[M]

## Querying With Datetime Fields
The string representation seen in the Dataframe can be used to filter out certain dates. 
For example, in the above `stocks` dataset, we converted the date column to a `PeriodIndex`. Now the string representation only shows the granularity we want to see. 

We can use that string representation to filter the dataframe in Pandas:

In [16]:
df[df['date'] == '2008-11']

Button(description='Toggle Pandas/Lux', style=ButtonStyle())

Output()



We can also use the same string representation for specifying a query in Lux.

In [17]:
view = View(["date=2008-11","price","symbol"])
view.load(df)

LuxWidget(current_view={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'mark': {'toolti…

<View  (x: MEAN(price), y: symbol -- [date=2008-11]) mark: bar, score: 0.0 >