## Advising Foot Traffic Data

This algo works through all the foot traffic data at each of the locations. From this data, we can clearly see how many students are calling, emailing, or walking into the office throughout the year. This data is cleaned here in python before moving over to a PowerBI report. It is not necessary to have it in real-time streaming data as the Director and VP cannot even attend to it that frequently. Therefore, this is batched monthly and upon request if it needs to be more frequent.

<div class="alert" style="background-color:#ffa590;"><strong>Note:</strong> 
Before you download the CSVs, you need to convert all of the *date* columns to MM/DD/YY. This will allow the to_datetime() method to be properly applied.
</div>

In [1]:
import pandas as pd
import numpy as np
import re
import sklearn

import os
from pathlib import Path
import sys

import warnings
warnings.filterwarnings('ignore')

# Path to code folder
CODE_FOLDER = Path('code')
CODE_FOLDER.mkdir(exist_ok=True)
sys.path.extend([f"./{CODE_FOLDER}"])

# Path to data
DATA_PATH = Path.cwd()/'Data'

# Path to saved files
COMBINED_LOCATIONS = 'Files/mashup.csv'
DASHBOARD_SETUP = 'Files/Dashboard Setup.csv'

## The section below is the new code for the new dashboard setup

In [2]:
from utilities import retrieve_and_open_csv_files
from processing import new_dashboard_cleaned_dates, dashboard_setup

#Start setting up the dashboard with the new sign-in sheet setup. (1.26.22)
dash = retrieve_and_open_csv_files(DATA_PATH, keyword='Advising')

In [3]:
# Clean dates, set up unique ID, clean up location data
dash = new_dashboard_cleaned_dates(dash)

In [4]:
#Setup 'Type of Student' column
stype = dashboard_setup(dash, ['CURRENT STUDENT', 'NEW STUDENT',
                               'RETURNING STUDENT', 
                               'WORKFORCE', 'VETERAN', 'ATHLETE'], 'Type of Student')

reason = dashboard_setup(dash, ['ENROLL', 'ADD/DROP', 'QUESTIONS', 'MAJOR CHANGE', 'DEGREE CHECK'], 'Reason For Visit')

In [5]:
#Add new columns to the dataframe
dash['TYPE OF STUDENT'], dash['REASON FOR VISIT'] = stype, reason

In [6]:
#Alter HIGH SCHOOL column x's to 'HIGH SCHOOL' because the program we 
#wrote above was creating duplicates. This is because the student workers,
#advising staff, and front office manager vary on when they mark a student 
#only as high school and mark a student as "current" or "new" *and* 
#high school. This was causing the original program to create duplicate 
#indeces. So in order to track both 'current' and 'high school' attributes,
#I created this column.
dash['MOD HIGH SCHOOL'] =  ['HIGH SCHOOL' if i == 'x' or i == 'X' else '' for i in dash['HIGH SCHOOL']]


In [7]:
#Just as we did above for the mashup, we need to create columns for month
#and day. Below is that process.
month_dict = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6:'Jun', \
              7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}

#Comprehension for Month
mon = [month_dict.get(i) for i in list(dash['DATE'].dt.month)]

#Create dictionary for days of the week
day_of_week = {0: 'Mon', 1:'Tues', 2:'Wed', 3:'Thur', 4:'Fri', 5:'Sat', 6:'Sun'}

#Comprehension for Days
dow = [day_of_week.get(i) for i in list(dash['DATE'].dt.weekday)]

#Combine new columns with old dash df
dash['MONTH'], dash['DAY'] = mon, dow

In [8]:
%%capture --no-display 

final_dash = dash[['ID', 'DATE', 'NAME', 'TIME RANGE', 'MONTH', 'DAY', 'LOCATION',
                   'APPT', 'DISTANCE', 'TYPE OF STUDENT', 'REASON FOR VISIT', 
                   'MOD HIGH SCHOOL', 'ADVISOR SIGN', 'ADV TIME']]

#The final setup was to create a column that helps us filter advisors by 
#the employment type (i.e. 'EMP TYPE')
sign = {'KL':'28hr', 'KZ':'28hr', 'CJ':'40hr', 'SLB':'28hr','SV':'28hr',
         'BG':'40hr', 'AKP':'28hr', 'JW':'40hr', 'AS':'40hr', 'SP':'40hr',
         'SB':'20hr', 'DS':'Boss', 'AP':'40hr', 'SS': 'Not Sure', 'MZ':'Adm',
         'JD':'Adm', 'MP':'Adm', 'BM':'40hr', 'CS':'Not Sure', 'RM':'40hr',
         'SH':'40hr', 'KLA':'28hr', 'GR':'Adj', 'DR':'40hr', 'TB':'28hr', 
         'JEC':'28hr', 'KB':'28hr', 'KA':'28hr', '':'N/A'}

#Since there are times the cell is left blank, pandas records that as a float
#variable, which throws and error when doing a comprehension. Therefore, the 
#code below deals with that problem by filling all of the NaNs with a blank
#string.
final_dash['ADVISOR SIGN'] = final_dash['ADVISOR SIGN'].fillna('')

s = [i.upper() for i in final_dash['ADVISOR SIGN']]

final_dash['ADVISOR SIGN'] = s

#Link dictionary we created to the advisor signatures and create new 
#columns with this new attribute
s2 = [sign.get(i) for i in final_dash['ADVISOR SIGN']]

final_dash['Emp Type'] = s2

In [9]:
#Export dashbaord to csv
dashboard = final_dash.to_csv(DASHBOARD_SETUP, index = False)

### `Deprecated` The Code Below Is For The Old Dashboard Setup

In the first iteration of the sign in sheets, each site had some amount of freedom in how they used the sign in sheet and what they included in it. The new version standardized the column headings, order, and how they are used (in theory). I say in theory because humans inevitably enter data with different intentions and sometimes with different markings. One of the challenges in the office is that student workers come and go frequently, which means training and standards are difficult to maintain. I am not the trainer of new staff. This falls on the Office Manager. Only in extreme circumstances do I intervene to instruct new staff about the sign in sheet. 

In [None]:
from processing import (
    modify_df, 
    create_dataframe, 
    clean_location_data, 
    dashboard_setup
)

boa = retrieve_and_open_csv_files(DATA_PATH, keyword = 'BOA')
boe = retrieve_and_open_csv_files(DATA_PATH, keyword = 'BOE')
bsc = retrieve_and_open_csv_files(DATA_PATH, keyword = 'BSC')
bom = retrieve_and_open_csv_files(DATA_PATH, keyword = 'BOM')

In [None]:
#Apply the program to each of the sign-in sheets. 
boa2 = modify_df(boa, "BOA")
boe2 = modify_df(boe, "BOE")
bsc2 = modify_df(bsc, "BSC")
bom2 = modify_df(bom, "BOM")

# Combine locations
all_locations = (pd.concat([boa2, boe2, bsc2, bom2])
                   .reset_index(drop = True)
                )

# Clean data
all_locations = clean_location_data(all_locations, COMBINED_LOCATIONS)

**Number Seen by Each Advisor**

In [None]:
create_dataframe(all_locations, "ADVISOR SIGN")

**Distance Students Seen and in What Modality**

In [None]:
distance = create_dataframe(all_locations, "DISTANCE")

ls = []

for i in list(distance['DISTANCE']):
    if i not in ['Phone', 'Email', 'Central Adv']:
        ls.append('Appt or In-Person')
    else:
        ls.append(i)

distance['DISTANCE'] = ls

distance

**Appointment Type**

In [None]:
appt = create_dataframe(all_locations, 'APPT')

ap = []

for i in list(appt['APPT']):
    if i in ['Phone', 'Zoom', 'In Person']:
        ap.append(i)
    else:
        ap.append('Walk In')

appt['APPT'] = ap

appt

**Data by Site: How Many Students Assisted by Location**

In [None]:
create_dataframe(all_locations, "LOCATION")

mask1 = all_locations['LOCATION'].isin(['BSC', 'BOM'])

appt_types = all_locations[mask1].reset_index()

pd.DataFrame(appt_types.groupby('DISTANCE')['index'].count())\
  .reset_index()\
  .rename(columns = {'index':'Num Of Students'})

**Day of Week Students Arrive**

In [None]:
dow = create_dataframe(all_locations, 'DAY')
day_of_week = {'Mon':0 , 'Tues':1, 'Wed':2, 'Thur':3, 'Fri':4, 'Sat':5, 'Sun':6}
filt = [day_of_week.get(i) for i in dow['DAY']]
dow['FILTER'] = filt
dow = dow.sort_values('FILTER').drop('FILTER', axis = 1).reset_index(drop = True)
dow