## unit test

In [13]:
from jnoteworkflow.data import get_Fremont_data
import pandas as pd

In [3]:
data = get_Fremont_data()

let's create a function that will examine the output of this function and make sure that it conforme to what it should!

In [10]:
# make sure that the final loaded data has these column names/variables
assert all(data.columns == ['West', 'East', 'Total'])

In [16]:
# make sure the index is long DateTime type
assert isinstance(data.index, pd.DatetimeIndex)

we can put these tests into a function

In [18]:
from jnoteworkflow.data import get_Fremont_data
import pandas as pd
def test_Fremont_data():
    data = get_Fremont_data()
    assert all(data.columns == ['West', 'East', 'Total'])
    assert isinstance(data.index, pd.DatetimeIndex)

In [19]:
test_Fremont_data()

if it does not have any errors we now this operates as expected! But generally speaking you do not want to run this manually. Instead there are unit test frameworks in python that allows you to run those test cases automatically!

### pytest
https://docs.pytest.org/en/latest/

1. make a directory: jnoteowrkflow/tests
2. touch jnoteworkflow/tests/test_data.py
3. save the above function cell into the test_data.py
4. invoke pytest by: python -m pytest jnoteworkflow # this test everything is jnoteworkflow folder

### speed up parsing process

in the previous code the parsing process for datetime strings took long time. 

In [20]:
from jnoteworkflow.data import get_Fremont_data
import pandas as pd

In [26]:
# if we don't put the parsedate the index type of data is just an object (string) rather than a date
data = pd.read_csv('Fremont.csv', index_col='Date')
data.index.dtype

dtype('O')

In [24]:
# but the dateparse in general takes so much time:
data = pd.read_csv('Fremont.csv', index_col='Date', parse_dates=True)

In [27]:
# generally speaking we could do
pd.to_datetime(data.index)
# this is the same thing as before so no speed up can be gained compare to parse_dates

DatetimeIndex(['2012-10-03 00:00:00', '2012-10-03 01:00:00',
               '2012-10-03 02:00:00', '2012-10-03 03:00:00',
               '2012-10-03 04:00:00', '2012-10-03 05:00:00',
               '2012-10-03 06:00:00', '2012-10-03 07:00:00',
               '2012-10-03 08:00:00', '2012-10-03 09:00:00',
               ...
               '2018-04-30 14:00:00', '2018-04-30 15:00:00',
               '2018-04-30 16:00:00', '2018-04-30 17:00:00',
               '2018-04-30 18:00:00', '2018-04-30 19:00:00',
               '2018-04-30 20:00:00', '2018-04-30 21:00:00',
               '2018-04-30 22:00:00', '2018-04-30 23:00:00'],
              dtype='datetime64[ns]', name='Date', length=48864, freq=None)

but if we specify the format we may gain some speed up:

http://strftime.org/


In [28]:
pd.to_datetime(data.index, format='%m/%d/%Y %H:%M:%S %p')

DatetimeIndex(['2012-10-03 12:00:00', '2012-10-03 01:00:00',
               '2012-10-03 02:00:00', '2012-10-03 03:00:00',
               '2012-10-03 04:00:00', '2012-10-03 05:00:00',
               '2012-10-03 06:00:00', '2012-10-03 07:00:00',
               '2012-10-03 08:00:00', '2012-10-03 09:00:00',
               ...
               '2018-04-30 02:00:00', '2018-04-30 03:00:00',
               '2018-04-30 04:00:00', '2018-04-30 05:00:00',
               '2018-04-30 06:00:00', '2018-04-30 07:00:00',
               '2018-04-30 08:00:00', '2018-04-30 09:00:00',
               '2018-04-30 10:00:00', '2018-04-30 11:00:00'],
              dtype='datetime64[ns]', name='Date', length=48864, freq=None)

whihc is way faster!!!

just to be on the safe side:

In [30]:
try:
    data.index = pd.to_datetime(data.index, format='%m/%d/%Y %H:%M:%S %p')
except TypeError:
    data.index = pd.to_datetime(data.index)

this final part now substantially reduces the time of the code
this part can be copied to original get_fremont_data function and after the test run the improvement in the performance is obvious. 