In this notebook you will find feature engineering, preprocessing, and modeling.
I have kept enough of the cleaning code to be followed, but if you'd like the step by step process or to see my EDA please refer to 1_cleaning_and_EDA.ipynb


#### Feature Engineering
* Split Test and Train
* Define New Feature
    - Count Features
    - Feature by Feature
    - Total Admission Fees
    - Traveled Feature
* Add Features to Datasets

#### Preprocessing and Modeling
* One Hot Encode
* Encode Labels
* Random Forest


In [1]:
# Importing basics libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report


In [2]:
# Importing data
stay_df = pd.read_csv('data/train_data.csv')
stay_test_df = pd.read_csv('data/test_data.csv')
stay_df.head()

Unnamed: 0,case_id,Hospital_code,Hospital_type_code,City_Code_Hospital,Hospital_region_code,Available Extra Rooms in Hospital,Department,Ward_Type,Ward_Facility_Code,Bed Grade,patientid,City_Code_Patient,Type of Admission,Severity of Illness,Visitors with Patient,Age,Admission_Deposit,Stay
0,1,8,c,3,Z,3,radiotherapy,R,F,2.0,31397,7.0,Emergency,Extreme,2,51-60,4911.0,0-10
1,2,2,c,5,Z,2,radiotherapy,S,F,2.0,31397,7.0,Trauma,Extreme,2,51-60,5954.0,41-50
2,3,10,e,1,X,2,anesthesia,S,E,2.0,31397,7.0,Trauma,Extreme,2,51-60,4745.0,31-40
3,4,26,b,2,Y,2,radiotherapy,R,D,2.0,31397,7.0,Trauma,Extreme,2,51-60,7272.0,41-50
4,5,26,b,2,Y,2,radiotherapy,S,D,2.0,31397,7.0,Trauma,Extreme,2,51-60,5558.0,41-50


In [3]:
print(stay_df.shape, stay_test_df.shape)
stay_df.Stay.unique()

(318438, 18) (137057, 17)


array(['0-10', '41-50', '31-40', '11-20', '51-60', '21-30', '71-80',
       'More than 100 Days', '81-90', '61-70', '91-100'], dtype=object)

In [4]:
stay_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 318438 entries, 0 to 318437
Data columns (total 18 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   case_id                            318438 non-null  int64  
 1   Hospital_code                      318438 non-null  int64  
 2   Hospital_type_code                 318438 non-null  object 
 3   City_Code_Hospital                 318438 non-null  int64  
 4   Hospital_region_code               318438 non-null  object 
 5   Available Extra Rooms in Hospital  318438 non-null  int64  
 6   Department                         318438 non-null  object 
 7   Ward_Type                          318438 non-null  object 
 8   Ward_Facility_Code                 318438 non-null  object 
 9   Bed Grade                          318325 non-null  float64
 10  patientid                          318438 non-null  int64  
 11  City_Code_Patient                  3139

In [5]:
stay_df.describe()[['Available Extra Rooms in Hospital', 'Visitors with Patient', 'Admission_Deposit']]

Unnamed: 0,Available Extra Rooms in Hospital,Visitors with Patient,Admission_Deposit
count,318438.0,318438.0,318438.0
mean,3.197627,3.284099,4880.749392
std,1.168171,1.764061,1086.776254
min,0.0,0.0,1800.0
25%,2.0,2.0,4186.0
50%,3.0,3.0,4741.0
75%,4.0,4.0,5409.0
max,24.0,32.0,11008.0


In [6]:
stay_df.isna().sum()

case_id                                 0
Hospital_code                           0
Hospital_type_code                      0
City_Code_Hospital                      0
Hospital_region_code                    0
Available Extra Rooms in Hospital       0
Department                              0
Ward_Type                               0
Ward_Facility_Code                      0
Bed Grade                             113
patientid                               0
City_Code_Patient                    4532
Type of Admission                       0
Severity of Illness                     0
Visitors with Patient                   0
Age                                     0
Admission_Deposit                       0
Stay                                    0
dtype: int64

In [7]:
stay_test_df.isna().sum()

case_id                                 0
Hospital_code                           0
Hospital_type_code                      0
City_Code_Hospital                      0
Hospital_region_code                    0
Available Extra Rooms in Hospital       0
Department                              0
Ward_Type                               0
Ward_Facility_Code                      0
Bed Grade                              35
patientid                               0
City_Code_Patient                    2157
Type of Admission                       0
Severity of Illness                     0
Visitors with Patient                   0
Age                                     0
Admission_Deposit                       0
dtype: int64

In [8]:
# Clean up column names first by replacing spaes with underscore

[stay_df.rename(columns={name: name.replace(" ", "_")}, inplace= True) for name in stay_df.columns if ' ' in name]

[stay_test_df.rename(columns={name: name.replace(" ", "_")}, inplace = True) for name in stay_test_df.columns if ' ' in name]

stay_df.columns, stay_test_df.columns

(Index(['case_id', 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital',
        'Hospital_region_code', 'Available_Extra_Rooms_in_Hospital',
        'Department', 'Ward_Type', 'Ward_Facility_Code', 'Bed_Grade',
        'patientid', 'City_Code_Patient', 'Type_of_Admission',
        'Severity_of_Illness', 'Visitors_with_Patient', 'Age',
        'Admission_Deposit', 'Stay'],
       dtype='object'),
 Index(['case_id', 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital',
        'Hospital_region_code', 'Available_Extra_Rooms_in_Hospital',
        'Department', 'Ward_Type', 'Ward_Facility_Code', 'Bed_Grade',
        'patientid', 'City_Code_Patient', 'Type_of_Admission',
        'Severity_of_Illness', 'Visitors_with_Patient', 'Age',
        'Admission_Deposit'],
       dtype='object'))

In [9]:
# Fill missing Bed_Grade

stay_df.Bed_Grade.fillna(stay_df.Bed_Grade.mode()[0], inplace=True)

stay_test_df.Bed_Grade.fillna(stay_df.Bed_Grade.mode()[0], inplace=True)

stay_df.Bed_Grade = stay_df.Bed_Grade.astype('int')
stay_test_df.Bed_Grade = stay_test_df.Bed_Grade.astype('int')

In [10]:
# Fill missing City_Code_Patient with the City_Code_Hospital of the sample

stay_df.loc[stay_df.City_Code_Patient.isnull(), 'City_Code_Patient'] = stay_df.City_Code_Hospital.loc[stay_df.City_Code_Patient.isnull()]

stay_test_df.loc[stay_test_df.City_Code_Patient.isnull(), 'City_Code_Patient'] = stay_test_df.City_Code_Hospital.loc[stay_test_df.City_Code_Patient.isnull()]

stay_df.City_Code_Patient = stay_df.City_Code_Patient.astype('int')
stay_test_df.City_Code_Patient = stay_test_df.City_Code_Patient.astype('int')

In [11]:
stay_df.case_id.nunique() == len(stay_df)

True

In [12]:
original_columns = stay_df.columns
original_columns

Index(['case_id', 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital',
       'Hospital_region_code', 'Available_Extra_Rooms_in_Hospital',
       'Department', 'Ward_Type', 'Ward_Facility_Code', 'Bed_Grade',
       'patientid', 'City_Code_Patient', 'Type_of_Admission',
       'Severity_of_Illness', 'Visitors_with_Patient', 'Age',
       'Admission_Deposit', 'Stay'],
      dtype='object')

### Split Data

Splitting data first to ensure our feature engineering doesn't cause any leakage. 

In [13]:
X = stay_df.drop(['Stay', 'case_id'], axis=1)
y = stay_df['Stay']



X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=86)

final_case_ids = stay_test_df['case_id']
stay_test_df = stay_test_df.drop('case_id', axis=1)

## Feature engineering

Create a feature counts by hospital of: 'Visitors_with_Patient', 'Age', 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code'


For each of: 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code'
create a feature for count of:  'Available_Extra_Rooms_in_Hospital', 'Bed_Grade','Severity_of_Illness', 'Visitors_with_Patient'



Creats a feature for whether the patient traveled to get to the hospital.

### Count Features

Impute a count of how many samples have that feature from the train set.


In [14]:
features_tocount = ['Visitors_with_Patient', 'Age', 'Hospital_code', 'Hospital_type_code', 'City_Code_Hospital', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code'] 

def feature_totals(features, df):
    """""Takes a Dataframe and a list of features for each feature 
    it adds a count the feature occurence in the train set to the df"""
    for feature in features:
        num_col = len(df.columns)
        total = X_train.groupby(feature).size().to_dict()
        df['_'.join(['Total', feature, 'Count'])] = df[feature].map(total)
        df['_'.join(['Total', feature, 'Count'])].fillna(0, inplace=True)
        assert len(df.columns) == (num_col + 1), 'column did not add'
    return df


### Feature by Feature

Impute counts of occurrences of a feature by another feature.
count of rooms available, bed grade or visitors by location, type or department of the hospital.

In [15]:
# define list of features to pass two_feature_countby function
hosp_descriptor = ['Hospital_code', 'Hospital_type_code', 'City_Code_Hospital', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code']

hosp_features = ['Available_Extra_Rooms_in_Hospital', 'Bed_Grade', 'Visitors_with_Patient']

In [16]:
def two_feature_countby(to_count, by_name, df):
    """"Takes two lists of features and a dataframe, adds new feature to 
    dataframe that is the count of one feature by the other"""
    
    for count_of in to_count:
        for count_by in by_name:
            num_col = len(df.columns)
            count = X_train.groupby([count_of, count_by]).size().reset_index()
            count.columns = [count_of, count_by, '_'.join([count_by, 'by', count_of ])]
            
            df = df.merge(count, how='left', on=[count_of, count_by])
            df['_'.join([count_by, 'by', count_of ])].fillna(0, inplace=True)
            assert len(df.columns) == (num_col + 1), 'column did not add'
            
    return df

### Patient Traveled

If the patient traveled to get to the hospital it may increase the time needed to recover enough to get home. 

Create a binary feature with one representing that the patient lives in a different region then the hospital. 

In [17]:
# city_code_hospital values split into list by reagion
# if patient 'City_Code_Patient' not in region_list that matches Hospital_region_code
#  add 1 for not in, else leave 0

z_region = X_train[X_train.Hospital_region_code == 'Z']['City_Code_Hospital'].unique()
x_region = X_train[X_train.Hospital_region_code == 'X']['City_Code_Hospital'].unique()
y_region = X_train[X_train.Hospital_region_code == 'Y']['City_Code_Hospital'].unique()


def define_region_list(hosp_region):
    """""takes in a samples region and returns region list of city codes in that region"""
    if hosp_region == 'Z':
        return z_region
    elif hosp_region == 'X':
        return x_region
    elif hosp_region == 'Y':
        return y_region
    assert  hosp_region not in ['Z', 'X', 'Y'], 'reagion not machted'
    



def patiend_travel(df):
    """"Takes in a Dataframe, returns list of 1 and 0s for  whether the patient_traveled or not"""
    traveled = []
    locations = zip(df['Hospital_region_code'], df['City_Code_Patient'])
    for hosp_region, patient_home in locations:
        
        region_list = define_region_list(hosp_region)
        
        if patient_home not in region_list:
            traveled.append(1)
        elif patient_home in region_list:
            traveled.append(0)
    return traveled

### Apply Functions to Dataframes 
By defining all of our new features in functions and a function that applies all the feature functions we only apply changes to the data frame once.

In [18]:
def add_features(df):
    """""Takes in dataframe runs all feature funtions and returns dataframe"""
    samples = len(df)
    
    df = feature_totals(features_tocount, df)
    assert len(df) == samples, 'feature_totals missing samples'
    
    df = two_feature_countby(hosp_descriptor, hosp_features, df)
    #assert len(df) == samples, 'two_feature_countby missing samples'
    
    df['Patient_traveled'] = patiend_travel(df)
    #assert len(df) == samples, 'patiend_travel missing samples'
    
    return df

In [19]:
# add new features to all X datasets
X_train = add_features( X_train)
X_test = add_features( X_test)
final_X_test = add_features(stay_test_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['_'.join(['Total', feature, 'Count'])] = df[feature].map(total)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


In [20]:
# Save list of new feature names and check all are added
new_features = set(X_train.columns) - set(original_columns )
len(new_features)

31

In [21]:
# Check new features
X_train[new_features].head()

Unnamed: 0,Total_Visitors_with_Patient_Count,Total_Hospital_code_Count,Visitors_with_Patient_by_Department,Bed_Grade_by_Ward_Facility_Code,Visitors_with_Patient_by_Hospital_code,Visitors_with_Patient_by_Hospital_type_code,Bed_Grade_by_City_Code_Hospital,Visitors_with_Patient_by_Hospital_region_code,Visitors_with_Patient_by_City_Code_Hospital,Total_City_Code_Hospital_Count,...,Bed_Grade_by_Hospital_type_code,Bed_Grade_by_Department,Available_Extra_Rooms_in_Hospital_by_Ward_Type,Total_Hospital_type_code_Count,Available_Extra_Rooms_in_Hospital_by_Ward_Facility_Code,Available_Extra_Rooms_in_Hospital_by_Hospital_type_code,Total_Age_Count,Available_Extra_Rooms_in_Hospital_by_Department,Visitors_with_Patient_by_Ward_Type,Total_Hospital_region_code_Count
0,6953,24804,5567,15199,776,1628,15199,2401,1036,38869,...,19986,76163,34668,51778,10418,14997,26867,55358,3794,91893
1,103651,24804,78220,13208,11076,22502,13208,38930,16746,38869,...,17990,62493,33259,51778,12002,15572,36259,57491,25667,91893
2,14153,5350,10889,7161,301,1822,7969,2575,1169,23661,...,11779,62493,34668,34477,5893,9480,47844,55358,6620,47067
3,103651,6140,9601,3971,2984,15990,4475,21461,11179,23661,...,6620,3915,2254,34477,441,685,5856,528,29789,47067
4,103651,24804,12653,7411,11076,22502,7411,38930,16746,38869,...,10155,6154,15247,51778,10418,14997,47844,5625,29789,91893


In [22]:
# Catagorical features
X_train.select_dtypes(exclude=[np.number]).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 238828 entries, 0 to 238827
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   Hospital_type_code    238828 non-null  object
 1   Hospital_region_code  238828 non-null  object
 2   Department            238828 non-null  object
 3   Ward_Type             238828 non-null  object
 4   Ward_Facility_Code    238828 non-null  object
 5   Type_of_Admission     238828 non-null  object
 6   Severity_of_Illness   238828 non-null  object
 7   Age                   238828 non-null  object
dtypes: object(8)
memory usage: 16.4+ MB


In [23]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 238828 entries, 0 to 238827
Data columns (total 47 columns):
 #   Column                                                     Non-Null Count   Dtype  
---  ------                                                     --------------   -----  
 0   Hospital_code                                              238828 non-null  int64  
 1   Hospital_type_code                                         238828 non-null  object 
 2   City_Code_Hospital                                         238828 non-null  int64  
 3   Hospital_region_code                                       238828 non-null  object 
 4   Available_Extra_Rooms_in_Hospital                          238828 non-null  int64  
 5   Department                                                 238828 non-null  object 
 6   Ward_Type                                                  238828 non-null  object 
 7   Ward_Facility_Code                                         238828 non-null  object 

## Preprocessing and Modeling

Since the original dataset only had 3 quantitative features Random Forest is a great model to start with. Random Forest requires the least preprocessing and handles categorical features well. Independent features are used for decision boundaries so scaling shouldn't affect it. If we pursue other models then we will pipeline the standard scaler with a list of numeric features. 

For consistency if we use other models we will one hot encode the categorical features. The target variable also need to be encoded for the classification. 


### One Hot Encode


In [24]:
X_train = pd.get_dummies(X_train, prefix=['Hospital_type_code', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code', 'Type_of_Admission', 'Severity_of_Illness', 'Age'])
X_test = pd.get_dummies(X_test, prefix=['Hospital_type_code', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code', 'Type_of_Admission', 'Severity_of_Illness', 'Age'])
final_X_test = pd.get_dummies(final_X_test, prefix=['Hospital_type_code', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code', 'Type_of_Admission', 'Severity_of_Illness', 'Age'])

### Encode Labels

In [25]:
label = LabelEncoder()
label.fit(y_train)
label.transform(y_train)
label.transform(y_test)
y_test.shape, X_test.shape

((79610,), (79610, 82))

In [26]:
X_train.shape, y_train.shape

((238828, 82), (238828,))

In [27]:
#pd.options.display.max_rows = 100


### Random Forest Model

In [28]:
RF = RandomForestClassifier(random_state=86)


RF.fit(X_train, y_train)
rf_pred = RF.predict(X_test)


In [29]:
print(classification_report(y_test, rf_pred))

                    precision    recall  f1-score   support

              0-10       0.29      0.17      0.22      5912
             11-20       0.38      0.44      0.41     19349
             21-30       0.42      0.53      0.47     22041
             31-40       0.33      0.27      0.30     13734
             41-50       0.08      0.02      0.04      2933
             51-60       0.41      0.45      0.43      8752
             61-70       0.15      0.03      0.05       721
             71-80       0.29      0.12      0.17      2526
             81-90       0.41      0.29      0.34      1220
            91-100       0.38      0.10      0.16       719
More than 100 Days       0.56      0.49      0.52      1703

          accuracy                           0.38     79610
         macro avg       0.34      0.27      0.28     79610
      weighted avg       0.37      0.38      0.37     79610



### Tune Hyperparameters

Using randomized search allows us to search a large range for the best parameters, but run less models then grid search would.  

In [30]:
RF.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 86,
 'verbose': 0,
 'warm_start': False}

In [31]:
RFC_params = {'n_estimators': [200, 400, 600, 800, 1000, 1200],
              'max_features': ['auto', 'log2'],
              'max_depth': [10, 20, 30, 40, 50, 60, 70,  None],
              'min_samples_split': [2, 5, 10, 15, 20],
              'min_samples_leaf': [1, 2, 5, 10, 15]
             }
RF = RandomForestClassifier(random_state=86)
RFGS = RandomizedSearchCV(RF, RFC_params, random_state=86)

In [None]:
RFGS.fit(X_train, y_train)
RFGS.best_params_

In [32]:
RF2 = RandomForestClassifier(random_state=86, n_jobs= 2, n_estimators = 1000, 
                             min_samples_split = 5, min_samples_leaf = 10, 
                             max_features = 'auto', max_depth= 40)

RF2.fit(X_train, y_train)
rf_pred2 = RF2.predict(X_test)
print(classification_report(y_test, rf_pred2))

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


                    precision    recall  f1-score   support

              0-10       0.45      0.11      0.18      5912
             11-20       0.41      0.50      0.45     19349
             21-30       0.42      0.66      0.52     22041
             31-40       0.42      0.24      0.30     13734
             41-50       0.00      0.00      0.00      2933
             51-60       0.41      0.50      0.45      8752
             61-70       0.00      0.00      0.00       721
             71-80       0.43      0.01      0.03      2526
             81-90       0.41      0.22      0.28      1220
            91-100       0.36      0.01      0.01       719
More than 100 Days       0.54      0.45      0.49      1703

          accuracy                           0.42     79610
         macro avg       0.35      0.25      0.25     79610
      weighted avg       0.40      0.42      0.38     79610



  _warn_prf(average, modifier, msg_start, len(result))


In [33]:

print('Train Accuracy:', RF2.score(X_train, y_train), ', Test Accuracy', RF2.score(X_test, y_test))

Train Accuracy: 0.5292009312140955 , Test Accuracy 0.4210275091068961


The model is within expectations based on the hackathon leaderboard. We noted in our EDA that some of the classes would be difficult for a model to predict and find the Random Forest model has is not predicting any for the classes 41-50, 61-70, and scores low on the 91-100 class predition. The classes of More than 100 Days, 0-10, and 71-80 are predicted more frequently with higher precision but also many false positives. The model's score on the train set indicates it it not learning enough from the data to make strong predictions on the 10-day range classes. 

The goal is for Hospitals to have an idea how long to plan care for a patient if the model is only prediction one range off then we may still have a good indication withing 20 days.

In [34]:
# check if predictions are within one class range off
stay_range = {'0-10':['0-10', '11-20'], '11-20': ['0-10', '11-20', '21-30'], 
              '21-30':['11-20', '21-30', '31-40'], '31-40':['21-30', '31-40', '41-50'], 
              '41-50':['31-40', '41-50', '51-60'], '51-60':['41-50', '51-60', '61-70'], 
              '61-70':['51-60', '61-70', '71-80'], '71-80':['61-70', '71-80', '81-90'],
              '81-90':['71-80', '81-90', '91-100'], '91-100':['81-90', '91-100', 'More than 100 Days'], 
              'More than 100 Days':['91-100', 'More than 100 Days']}

# if rf_pred2 not in list of y_test key add 1
over20off =0

ys = zip(y_test, rf_pred)

for ay, py in ys:
    if py not in stay_range[ay]:
        over20off += 1

print(((len(y_test) - over20off) / len(y_test)) *100, '% of the predictions are within 20 days of the actual length of hospitalization')

69.65456600929531 % of the predictions are within 20 days of the actual length of hospitalization


In [35]:
# make predictions on final test set
rf_pred2_final = RF2.predict(final_X_test)



In [36]:
# prepare final predictions to be submited
rf_submission = pd.DataFrame({'case_id': final_case_ids, 'Stay': rf_pred2_final})


rf_submission

Unnamed: 0,case_id,Stay
0,318439,0-10
1,318440,51-60
2,318441,21-30
3,318442,21-30
4,318443,51-60
...,...,...
137052,455491,21-30
137053,455492,0-10
137054,455493,21-30
137055,455494,11-20


In [None]:
#rf_submission.to_csv('data/rf_submission.csv', index =False)

This submission receives public accuracy score of: 42.0013863047682
and a private accuracy score of: 41.7048701890922
If the contest was still going it would be submission rank 266 of 19638 placing in the top 1.5% of all recorded submissions.

While the model accuracy on 10 day ranges has room for improvement, a 20 day error is a decent ballpark. Hospitals can plan more off a 11-30 day vs 71-90 day window.