# Creating a rule-based expert system for a virtual life coach

The virtual life coach (VLC) is an applications that gives structure to the life of ADHD students. The VLC plans out the students' days. Therefore, an algorithm is created to achieve this. 
The prototype of the VLC uses a rule-based expert system. This is a non-machine learning algorithm, because of the lack of data available. However, either way there need to be a core descision maker as a fundament of the application. Machine learning and other features could always be added later on. 

The retrieved data is data of a study day, therefore this algorithm will also plan out a study day of the student. When there is more data available, for instance more days that form weeks, the algorithm could be tweaked to make better descisions and plan out weekend days.

In [191]:
# importing libraries
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import DateTime as dt
from datetime import timedelta
from datetime import datetime
import numpy as np

## Data import & cleaning

The data is retrieved from the Google Spreadsheet files of the respondents. To use the data, the right cells are selected, a new dataframe is created and the data is cleaned. Every row stand for half an hour, to be consistent in using time in the protoype.

In [192]:
df = pd.read_csv('dayplanner-n-2.csv')
df = df.rename({'Unnamed: 2': 'N_actual_old', 'DAY PLANNING':'time'}, axis=1)
df = df[3:51].reset_index()
df = df[['time', 'N_actual_old']]
df.head()

Unnamed: 0,time,N_actual_old
0,12:00:00 AM,sleep
1,12:30:00 AM,sleep
2,1:00:00 AM,sleep
3,1:30:00 AM,sleep
4,2:00:00 AM,sleep


In [193]:
df['N_actual_old'].loc[df['N_actual_old'].str.contains('health')] = 'health'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('me')] = 'me'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('study')] = 'study'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('routine')] = 'routine'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('social')] = 'social'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('work')] = 'work'
df['N_actual_old'].loc[df['N_actual_old'].str.contains('health')] = 'health'
df['time'] = pd.to_datetime(df['time'])
df.head()

Unnamed: 0,time,N_actual_old
0,2022-01-22 00:00:00,sleep
1,2022-01-22 00:30:00,sleep
2,2022-01-22 01:00:00,sleep
3,2022-01-22 01:30:00,sleep
4,2022-01-22 02:00:00,sleep


## Data analysis
Here I explore the data of the student's day. First I check how many hours the student spend on all different activity categories. I use this data later to make desiscions for the upcomming day.

In [194]:
df_cats = pd.Series(['sleep', 'study', 'routine', 'me', 'social', 'health', 'distracted'], name='categories')

df_hours_spend = df['N_actual_old'].value_counts()
df_hours_spend = (df_hours_spend / 2)
df_hours_spend['sleep'] = (24 - (df_hours_spend.sum()-df_hours_spend['sleep']))
df_hours_spend = df_hours_spend.to_frame().reset_index()
df_hours_spend = df_hours_spend.rename(columns={"index": "categories", "N_actual_old": "hours"})

df_hours_spend = pd.merge(df_cats, df_hours_spend, on="categories", how='outer').fillna(0)
df_hours_spend

Unnamed: 0,categories,hours
0,sleep,8.5
1,study,7.5
2,routine,4.0
3,me,2.5
4,social,0.0
5,health,1.0
6,distracted,0.5


In [195]:
df_hours_spend.set_index('categories')

Unnamed: 0_level_0,hours
categories,Unnamed: 1_level_1
sleep,8.5
study,7.5
routine,4.0
me,2.5
social,0.0
health,1.0
distracted,0.5


To understand what the maximum study time is of the student, I look at the old day. Now I count the most consecutive (half)hours of studying. 

In [196]:
df_hours_studying = df["N_actual_old"] == 'study'
b = df_hours_studying.cumsum()
b = b.sub(b.mask(df_hours_studying).ffill().fillna(0)).astype(int)
studying_max = b.max()
studying_max

6

## Algorithm prototyping
To create the algorithm I locate three different periods during the day: Morning, Afternoon and Evening. This way, I can fill this in after with functions of if statements.

First I create the variables that vary per person. This is also data normally retrieved in the application.

In [212]:
wake_up_time = pd.to_datetime('08:00:00')
bed_time = pd.to_datetime('23:30:00')
m_routine_time = timedelta(hours=1)
e_routine_time = timedelta(hours=1)
lunch_time = pd.to_datetime('13:00:00')
lunch_time_hours = timedelta(hours=1)
dinner_time = pd.to_datetime('19:00:00')
dinner_time_hours = timedelta(hours=1)
desired_studytime = 14

Second, I fill in all variables with a fixed timestamp of the day. Therefore, the three different periods are automatically created.

In [213]:
# wake up time and bed time
df.loc[df['time'] < wake_up_time, 'N_actual_new'] = 'sleep'
df.loc[df['time'] >= bed_time, 'N_actual_new'] = 'sleep'

# morning and evening routine
actual_mroutine_time = wake_up_time + m_routine_time  
df.loc[(df['time'] < actual_mroutine_time) & (df['time'] >= wake_up_time), 'N_actual_new'] = 'routine'

actual_eroutine_time = bed_time - e_routine_time
df.loc[(df['time'] >= actual_eroutine_time) & (df['time'] < bed_time), 'N_actual_new'] = 'routine'

# lunch and dinner
df.loc[(df['time'] >= lunch_time) & (df['time'] < lunch_time+timedelta(hours=1)), 'N_actual_new'] = 'routine'
df.loc[(df['time'] >= dinner_time) & (df['time'] < dinner_time+timedelta(hours=1)), 'N_actual_new'] = 'routine'

### Morning
The morning is mostly for studying. According to the therapists, most important is that someone starts studying right away for (at least) an hour, to kick off their day correct. If the student only has 2 hours till lunch, those two hours will be scheduled for study, otherwise there will be a break in the middle. If the student has 3 hours or more to lunch, the first block of studying will be one hour (according to the terapists, this is to set the bar low) and reward the student with 'me time' afterwards. If it is more than 4 hours, it schedules 'me time' till lunch after two blocks of studying.

In [214]:
# get the correct time block cells
index_morning = df.index[(df['time'] >= actual_mroutine_time) & (df['time'] < lunch_time)].tolist()

# calculate how much halfhours / rows are between morning routine and lunch
time_in_morning = len(index_morning)

# if statements
if time_in_morning <= 4:
    df['N_actual_new'].iloc[index_morning] = 'study'
    
elif time_in_morning == 5:
    df['N_actual_new'].iloc[index_morning] = 'study'
    index_morning_break = index_morning[2]
    df['N_actual_new'].iloc[index_morning_break] = 'me'
    
elif time_in_morning == 6:
    df['N_actual_new'].iloc[index_morning] = 'study'
    index_morning_break = index_morning[2]
    df['N_actual_new'].iloc[index_morning_break] = 'me'
    
elif time_in_morning == 7 or time_in_morning == 8:
    df['N_actual_new'].iloc[index_morning] = 'study'
    index_morning_break = index_morning[2:4]
    df['N_actual_new'].iloc[index_morning_break] = 'me'
    
else:
    df['N_actual_new'].iloc[index_morning] = 'study'
    df['N_actual_new'].iloc[index_morning[8:]] = 'me'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


## Afternoon
This is normally the longest period of the day. Therefore, it is important to know how much time the student already spend studying and how much time is left to dinner. 

A while loop is created that continues while there are still hours to fill. If there are still hours to study it fills in the longest period of studying followed by a break. It does this till the desired study hours are achieved or till it's dinner time.

In [215]:
# get the correct time block cells
actual_lunch_time = lunch_time + lunch_time_hours  
index_afternoon = df.index[(df['time'] >= actual_lunch_time) & (df['time'] < dinner_time)].tolist()

# # calculate how much halfhours / rows are between lunch and dinner
time_in_afternoon = len(index_afternoon)

def remaining_hours(index):
    hours_left = df["N_actual_new"][index].isnull()
    hl = hours_left.cumsum()
    hl = hl.sub(hl.mask(hours_left).ffill().fillna(0)).astype(int)
    hours_left = hl.max()
    return hours_left

hours_to_go = remaining_hours(index_afternoon)

while hours_to_go != 0:
    
    studying_sofar = df["N_actual_new"] == 'study'
    studying_sofar = studying_sofar.cumsum().max()
    study_period = desired_studytime - studying_sofar
    
    remaining_index_afternoon = len(index_afternoon) - remaining_hours(index_afternoon)
    remaining_index_afternoon = index_afternoon[remaining_index_afternoon:]
    
    if studying_sofar != desired_studytime:
                
        if hours_to_go <= study_period:
            df['N_actual_new'].iloc[remaining_index_afternoon] = 'study'
            break
        else:
            df['N_actual_new'].iloc[remaining_index_afternoon[:studying_max]] = 'study'
            df['N_actual_new'].iloc[remaining_index_afternoon[studying_max:studying_max+2]] = 'me'
            hours_to_go = remaining_hours(index_afternoon)
    else:
        df['N_actual_new'].iloc[index_afternoon] = 'me'


## Evening
How to spend the time in the evening is the hardest to descide with the low amount of data available. It looks at the day before and sees if the person already was socially active the day before. If that was less than the student spend on time on their own, than some social time is scheduled till their evening routine. Otherwise 'me'-time is scheduled.

In [216]:
# get the correct time block cells
actual_dinner_time = dinner_time + dinner_time_hours  
index_evening = df.index[(df['time'] >= actual_dinner_time) & (df['time'] < actual_eroutine_time)].tolist()

# if statements
old_social = df_hours_spend[df_hours_spend["categories"] == "social"].hours.item()
old_me = df_hours_spend[df_hours_spend["categories"] == "me"].hours.item()

if old_social <= old_me:
    df['N_actual_new'].iloc[index_evening] = 'social'
else:
    df['N_actual_new'].iloc[index_evening] = 'me'

## Final planning

In [217]:
df

Unnamed: 0,time,N_actual_old,N_actual_new
0,2022-01-22 00:00:00,sleep,sleep
1,2022-01-22 00:30:00,sleep,sleep
2,2022-01-22 01:00:00,sleep,sleep
3,2022-01-22 01:30:00,sleep,sleep
4,2022-01-22 02:00:00,sleep,sleep
5,2022-01-22 02:30:00,sleep,sleep
6,2022-01-22 03:00:00,sleep,sleep
7,2022-01-22 03:30:00,sleep,sleep
8,2022-01-22 04:00:00,sleep,sleep
9,2022-01-22 04:30:00,sleep,sleep
