# Feature Engineering Model Selection and Tuning Project

#### Developed by: Olabode James

## Preliminaries 

### Project Context: 

To predict the concrete strength using the data available in file concrete_data.xls. Apply
feature engineering and model tuning to obtain 80% to 95% of R2score.

### Import Libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd 


#Visualization Components
import seaborn as sns
import matplotlib.pyplot as plt # matplotlib.pyplot plots data
sns.set(color_codes=True) # adds a nice background to the graphs
# In order to enable plotting graphs in Jupyter notebook
%matplotlib inline

#Cross validation and data split
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score, cross_val_predict

#ML Models For use 
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor

#Features Selection
from mlxtend.feature_selection import SequentialFeatureSelector as sfs
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs

#Regression metrics
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, explained_variance_score
from sklearn import linear_model

#Pipeline
from sklearn.pipeline import make_pipeline


#Hyperparameter Tuning
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

#Standard maths
import math

#Features preProcessing
from scipy.stats import zscore
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import StandardScaler

#Handling Statistical Components
import statsmodels.api as sm

#Need to handle warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
#pip install mlxtend

#### Load and Preview Data

In [None]:
concData = pd.read_csv('concrete.csv')

In [None]:
concData.head(10)

In [None]:
concData.shape

In [None]:
concData.info()

In [None]:
concData.describe()

In [None]:
concData.isnull().any()

In [None]:
#There are no null values - implies we have relatively clean data

In [None]:
concData.applymap(np.isreal).any()

In [None]:
#Also, all entries are real - numbers, which implies we have relatively clean data.

# Project Deliverables

## Task 1 - Exploratory data quality report

### 1. Univariate analysis – data types and description of the independent attributes which should include (name, meaning, range of values observed, central values (mean and median), standard deviation and quartiles, analysis of the body of distributions / tails, missing values, outliers (10 Marks)

In [None]:
concData.nunique()

In [None]:
concData.describe().transpose()

In [None]:
#Further confirmation of check for missing values - null 
round(concData.isna().sum()*100/concData.shape[0],2)

In [None]:
#There are no missing values

In [None]:
#Explore the datatypes 
concData.info()

In [None]:
plt.figure(figsize=(15,10))
pos = 1
for i in concData.columns:
    plt.subplot(3, 3, pos)
    sns.boxplot(concData[i])
    pos += 1 

INSIGHT: There are outliers, rather than handling those separately here - we will apply RobustScaler later in the data handling before application of ML estimator/model

In [None]:
plt.figure(figsize=(15,10))
posHist = 1
for i in concData.columns:
    plt.subplot(3, 3, posHist)
    plt.hist(concData[i])
    plt.xlabel(i)
    posHist += 1 

### 2. Bi-variate analysis between the predictor variables and between the predictor variables and target column. Comment on your findings in terms of their relationship and degree of relation if any. Visualize the analysis using boxplots and pair plots, histograms or density curves. (10 marks)

In [None]:
concData.corr()

In [None]:
#Check for multi-collinearity between features
# Lets check for highly correlated variables
cor= concData.corr()
cor.loc[:,:] = np.tril(cor,k=-1)
cor=cor.stack()
cor[(cor > 0.8) | (cor< -0.8)]

INSIGHT: Empty series - implies we will not need to worry about multicollinearity between feature variables

In [None]:
sns.pairplot(concData, palette="husl", diag_kind='kde')

### 3. Feature Engineering techniques (10 marks)

In [None]:
#Let do Train-validation-Test split - before refinement - 
# Goal will be exploring the opportunities of Features Engineering, to see extra performance improvement which is 
#obtainable while keeping two versions of the dataset - while paying attention to prevent using test data to validate

# independant variables
X = concData.drop(['strength'], axis=1)

# the dependent variable
y = concData[['strength']]

# Split X and y into training and test set in 70:30 ratio
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)

In [None]:
# Build Lin Reg  to use in feature selection
linR = LinearRegression()

In [None]:
# Building feature selection process using - SequentialFeatureSelector, to determine feature relevances
sfs1 = sfs(linR, k_features=5, forward=True, scoring='r2', cv=5)

In [None]:
# Perform SFFS
sfs1 = sfs1.fit(X_train.values, y_train.values)

In [None]:
sfs1.get_metric_dict()

In [None]:
plt.figure(figsize=(15,10))
fig = plot_sfs(sfs1.get_metric_dict())

plt.title('Sequential Forward Selection (w. R^2)')
plt.grid()
plt.show()

In [None]:
# Which features have high predictive relevance
columnList = list(X_train.columns)
feat_cols = list(sfs1.k_feature_idx_)
subsetColumnList = [columnList[i] for i in feat_cols] 
print(subsetColumnList)

In [None]:
#For comparison, we check performance difference on selected features and entire features set to see what 
#insights we can draw
ml_model = 'Linear Regression'
features_used = 'selected'
linR = LinearRegression()
linR.fit(X_train[subsetColumnList], y_train)

In [None]:
y_train_pred = linR.predict(X_train[subsetColumnList])
train_score = linR.score(X_train[subsetColumnList], y_train)
print('Training accuracy on selected features: %.3f' % train_score)

In [None]:
y_test_pred = linR.predict(X_test[subsetColumnList])
test_score = linR.score(X_test[subsetColumnList], y_test)
print('Testing accuracy on selected features: %.3f' % test_score)

In [None]:
featuresPerfDF = pd.DataFrame({'Model' : [ml_model], 'Features' : [features_used], 'Training R2 Score' : [train_score],
                      'Test R2 Score' : [test_score]})
featuresPerfDF

In [None]:
#Performance on Full feature set

ml_model = 'Linear Regression'
features_used = 'All'

linR = LinearRegression()
linR.fit(X_train, y_train)

In [None]:
y_train_pred = linR.predict(X_train)
train_score = linR.score(X_train, y_train)
print('Training accuracy on selected features: %.3f' % train_score)

In [None]:
y_test_pred = linR.predict(X_test)
test_score = linR.score(X_test, y_test)
print('Testing accuracy on selected features: %.3f' % test_score)

In [None]:
featuresPerfDF.loc[1] = [ml_model, features_used, train_score, test_score]
featuresPerfDF

INSIGHT: We obtained higher R2 Score when we use all the features to build our model, while using selected features has computational performance advantage - its influence was lacking in predictive power in reducing prediction errors

In [None]:
#We desire a scaler that will handle presence of outliers while standardizing the dataset for the estimator 
#- from documentation, RobustScaler does that best
rb_scaler = RobustScaler(quantile_range=(25, 75))

In [None]:
#Feature Engineering using Polynomial Features - So see if any improvement on Linear Regression - this will be used
# for comparison later
poly = PolynomialFeatures(degree=2, interaction_only=True)
X_train2 = poly.fit_transform(X_train)
X_test2 = poly.fit_transform(X_test)
X_train2.shape, X_test2.shape, y_train.shape, y_test.shape

## Task 2 - Creating the model and tuning it

### 1. Algorithms that you think will be suitable for this project (at least 3 algorithms). Use Kfold Cross Validation to evaluate model performance. Use appropriate metrics and make a DataFrame to compare models w.r.t their metrics. (15 marks)

In [None]:
num_folds = 10
seed = 42

In [None]:
def model_regression_metrics(true_data, pred_data):
    mae = mean_absolute_error(true_data, pred_data)
    mse = mean_squared_error(true_data, pred_data)
    rmse = math.sqrt(mse)
    r2score = r2_score(true_data, pred_data)  
    return mae, mse, rmse, r2score

In [None]:
def model_regression_plotter(true_data, pred_data):
    # Let's visualize the model against the test data
    plt.figure(figsize=(20,10))
    fig, ax = plt.subplots()
    ax.scatter(true_data, pred_data, edgecolors=(0, 0, 0))
    ax.plot([true_data.min(), true_data.max()], [true_data.min(), true_data.max()], 'k--', lw=4)
    ax.set_xlabel('Actual')
    ax.set_ylabel('Predicted')
    ax.set_title("Actual Concrete Strength vs Predicted")
    plt.show()

In [None]:
#making pipeline object for all the estimators that will be used
#1. LinearRegression()
#2. LinearRegression() with Polynomial Features
#3. LinearRegression() - comparison on Lasso and Ridge Regression
#4. DecisionTree Regressor - max_depth=none
#5. DecisionTree Regressor - prune, max_depth=5
#6. RandomForest Regressor 
#7. SupportVector Regressor
#8. GradientBoost Regressor

In [None]:
#Initializing the KFold for Cross validation to evaluate model performance
kfold = KFold(n_splits=num_folds, random_state=seed)

In [None]:
ml_model = 'Linear Regression'
indexer = 0

pipe_lr = make_pipeline(rb_scaler, LinearRegression())
results = cross_val_score(pipe_lr, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_lr.fit(X_train, y_train)
y_pred= pipe_lr.predict(X_test)

#y_pred = cross_val_predict(pipe_lr, X, y, cv=kfold)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF = pd.DataFrame({'Model' : [ml_model], 'MAE' : [mae], 'MSE' : [mse],
                      'RMSE' : [rmse], 'R2 Score' : [r2score], 'CV_Score_Avg': [cv_avg], 'CV_Score_STD': [cv_std]})
resultsDF

In [None]:
print("Model Score average and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

In [None]:
#Checking if Poly features extraction will improve our results any better 
ml_model = 'Linear Regression - PolyFeatures'
indexer += 1

pipe_lr_poly = make_pipeline(rb_scaler, linear_model.LinearRegression())
results = cross_val_score(pipe_lr_poly, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

pipe_lr_poly.fit(X_train2, y_train)

y_pred = pipe_lr_poly.predict(X_test2)

In [None]:
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model Score average and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

INSIGHT: Polynomial Features (with only interaction terms) have improved the Out of sample R^2, MSE and other metrics. However at the cost of increaing the number of variables significantly from 8 to 37.

With the general improvement in the R2 Score, however R2 Score generally improves with increase in the number of 
Features - we can thus not effectively conclude the model is better than the linear regression model

In [None]:
#Using DecisionTreeRegressor
# create a regressor object 

ml_model = 'Decision Tree Regressor'
indexer += 1

pipe_dt_rgr = make_pipeline(rb_scaler, DecisionTreeRegressor(random_state = seed))
results = cross_val_score(pipe_dt_rgr, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_dt_rgr.fit(X_train, y_train)

y_pred= pipe_dt_rgr.predict(X_test)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model Score average and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

INSIGHT: Decision Tree Regression with no restriction on max_depth, gave higher and better, coefficient of determination R^2 , Model score of the prediction than both forms of Linear Regression.

In [None]:
#Checking if improvement is possible with Decision Tree Regressor, when Max_depth is limited to 5

ml_model = 'Decision TreeR(pruned,max_depth=5)'
indexer += 1

pipe_dt_rgr_pruned = make_pipeline(rb_scaler, DecisionTreeRegressor(max_depth=5, random_state = seed))
results = cross_val_score(pipe_dt_rgr_pruned, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_dt_rgr_pruned.fit(X_train, y_train)

y_pred= pipe_dt_rgr_pruned.predict(X_test)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model average Score and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

INSIGHT: Decision Tree Regressor with Max_depth 5 performed worse than unpruned Decision Tree Regressor, opportunity for hyperparamter tuning exist here to get best parameters of the model. 

It is important to examine the features of importance in the decision tree regressor

In [None]:
model_regression_plotter(y_test, y_pred)

In [None]:
#importances = pipe_dt_rgr.feature_importances_
#print("Important Features: " importances)

In [None]:
#Implementing RandomForest Regressor

ml_model = 'RandomForest Regressor'
indexer += 1

pipe_rf_rgr = make_pipeline(rb_scaler, RandomForestRegressor(random_state=seed))
results = cross_val_score(pipe_rf_rgr, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_rf_rgr.fit(X_train, y_train)

y_pred= pipe_rf_rgr.predict(X_test)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model average Score and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

INSIGHT: RandomForest Regressor has given best the best prediction, followed by unpruned Decision Tree - we will examine if this performance will be exceeded by SVR and GBR

In [None]:
#Implementing SupportVector Regressor

ml_model = 'SupportVector Regressor'
indexer += 1

pipe_svr = make_pipeline(rb_scaler, SVR(C=1.0, epsilon=0.2))
results = cross_val_score(pipe_svr, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_svr.fit(X_train, y_train)

y_pred= pipe_svr.predict(X_test)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model average Score and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

In [None]:
#Implementing GradientBoost Regressor

ml_model = 'GradientBoost Regressor'
indexer += 1

pipe_gdr = make_pipeline(rb_scaler, GradientBoostingRegressor(random_state=seed))
results = cross_val_score(pipe_gdr, X, y, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_gdr.fit(X_train, y_train)

y_pred= pipe_gdr.predict(X_test)
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model average Score and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

In [None]:
model_regression_plotter(y_test, y_pred)

INSIGHT: Gradient boost Regressor gave the best R2 Score, coefficient of determination of all the models used. Also, the least Root Mean Square Error as well as Mean Square Error. It is followed by RandomForest Regressor

------ Iteration 2 on Linear Regression ---------------

In [None]:
#Can Lasso or Ridge Regression be used to improve the results from the best LinearRegression Model derived 
#from Polynomial Features?

ml_model = 'Lasso Regression - Poly'
indexer += 1

lasso = Lasso(alpha=0.01)

pipe_lr_poly_lasso = make_pipeline(rb_scaler, Lasso(alpha=0.01))
results = cross_val_score(pipe_lr_poly_lasso, X_train2, y_train, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_lr_poly_lasso.fit(X_train2, y_train)

y_pred = pipe_lr_poly_lasso.predict(X_test2)

In [None]:
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
print("Model average Score and 95 percent confidence interval: %0.5f (+/- %0.5f)" % (results.mean(),  results.std()* 2))

INSIGHT: Lasso on PolyFeatures is a better improvement on performance of the LinearRegression -This implies the features combination has better predictive capability than otherwise. Thus, we need to find the features with higher contribution to concrete compressive strength within the mix of n

In [None]:
ml_model = 'Ridge Regression - Poly'
indexer += 1

pipe_lr_poly_ridge = make_pipeline(rb_scaler, Ridge(alpha=.3))
results = cross_val_score(pipe_lr_poly_ridge, X_train2, y_train, cv=kfold)

cv_avg = np.mean(abs(results))
cv_std = results.std()

In [None]:
pipe_lr_poly_ridge.fit(X_train2, y_train)

y_pred = pipe_lr_poly_ridge.predict(X_test2)

In [None]:
mae, mse, rmse, r2score = model_regression_metrics(y_test, y_pred)

In [None]:
resultsDF.loc[indexer] = [ml_model, mae, mse, rmse, r2score, cv_avg, cv_std]
resultsDF

In [None]:
#Sorting Algorithms based on performance on Cross validation scores, 
#which is highly correlated with R2Score on Test data
resultsDF.sort_values(by='CV_Score_Avg', ascending=False).reset_index(drop=True)

### INSIGHT: RandomForest Regressor and GradientBoost Regressor had better performance with 93 and 92.7% performance on Cross validation respectively - we will thus apply next stage of hyperparameter tuning to see if further improvement will be obtainable.

### 2. Techniques employed to squeeze that extra performance out of the model without making it over fit. Use Grid Search or Random Search on any of the two models used above. Make a DataFrame to compare models after hyperparameter tuning and their metrics as above. (15 marks)

In [None]:
#Validation set will be 75:25 of train set, which contains larger data - this is necessary so we don't 
#tune hyperparameters on seen data - information leak, but on unseen data - validation set
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)
X_train.shape, X_test.shape, y_train.shape, y_test.shape, X_val.shape, y_val.shape

In [None]:
#Using GridSearchCV on best two models above - to examine if extra performance can be obtained 
#1. RandomForestRegressor
#2. GradientBoostRegressor

In [None]:
def showTunedModelfit(ml_alg, performCV=True, printFeatureImportance=True, cv_folds=num_folds):
    
    #Fit the best algorithm on the data not seen
    ml_alg.fit(X_val, y_val)
        
    #Predict training set, test the model:
    dtrain_predictions = ml_alg.predict(X_test)
    mae, mse, rmse, r2score = model_regression_metrics(y_test, dtrain_predictions)
   
    
    #Perform cross-validation:
    if performCV:
        cv_score = cross_val_score(ml_alg, X_val, y_val, cv=cv_folds)
    
    #Print model report:
    print ( "\nModel Report")
    print ("R2 Score : %.4g" %  r2score)
    print ("MSE: %f" % mse)
    
    if performCV:
        print("CV Score : Mean - %.7g | Std - %.7g | Min - %.7g | Max - %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score))) 
        
    #Print Feature Importance:
    predictors = X_val.columns
    if printFeatureImportance:
        feat_imp = pd.Series(ml_alg.feature_importances_, predictors).sort_values(ascending=False)
        feat_imp.plot(kind='bar', title='Feature Importances')
        plt.ylabel('Feature Importance Score')
    
    return mae, mse, rmse, r2score

In [None]:
#Due to performance constraints - will limit number and scope of hyper parameters to tune
#RandomForest Regression
#Desired Hyperparamters 

# 
#"n_estimators":[100,200,300],
# "max_depth": [3, None],
#"max_features": ["auto", "sqrt", "log2"],
#"min_samples_split": [2, 3, 10],
#"min_samples_leaf": [1, 3, 10],
#"bootstrap": [True, False],
#"criterion": ["mse", "mae"]
#

# RandomForest Regressor - Hyperparameters use a full grid over all parameters
rfr_param_grid = {"n_estimators":[100,200],
                  "max_depth": [3, None],
                  "max_features": ["auto", "sqrt"],
                  "bootstrap": [True, False],
                  "criterion": ["mse", "mae"]}


#
#"max_depth": [3, None],
#"max_features": [auto", "sqrt", "log2"],
#"min_samples_split": [2, 3, 10],
# "min_samples_leaf": [1, 3, 10],
# "learning_rate": [0.1, 0.3, 0.5],
# "criterion": ["friedman_mse", "mse", "mae"],
#"loss":["ls", "lad", "huber", "quantile"],
# "n_estimators":[100,200,300]
#


#use grid over paramaters for GradientBoost Regressor
gdr_param_grid = {"max_depth": [3, None],
                  "max_features": ["auto", "sqrt"],
                  "criterion": ["friedman_mse", "mse", "mae"],
                  "loss":["ls", "lad"],
                  "n_estimators":[100,200]}

#Implement grid search
rfr = RandomForestRegressor(random_state=seed)
rfr_gs = GridSearchCV(rfr,rfr_param_grid,cv=num_folds)

rfr_gs.fit(X_train, y_train)
rfr_gs.best_params_
rfr_gs.cv_results_['params']
rfr_gs.cv_results_['mean_test_score']

#Create visual display of the best estimator
mae, mse, rmse, r2score = showTunedModelfit(rfr_gs.best_estimator_)

In [None]:
#Insert Result into DataFrame
ml_model = "Grid-Tuned RandomForest Regressor"
turnedResultsDF = pd.DataFrame({'Model' : [ml_model], 'MAE' : [mae], 'MSE' : [mse],
                      'RMSE' : [rmse], 'R2 Score' : [r2score]})
turnedResultsDF

In [None]:
print(rfr_gs.best_params_)

In [None]:
#Checking for improvement in GradientBoost Regressor

ml_model = "Grid-Tuned GradientBoost Regressor"
gdr = GradientBoostingRegressor(random_state=seed)
gdr_gs = GridSearchCV(gdr,gdr_param_grid,cv=num_folds)

gdr_gs.fit(X_train, y_train)
gdr_gs.best_params_
gdr_gs.cv_results_['params']
gdr_gs.cv_results_['mean_test_score']

#Create visual display of the best estimator
mae, mse, rmse, r2score = showTunedModelfit(gdr_gs.best_estimator_)

In [None]:
turnedResultsDF.loc[1] = [ml_model, mae, mse, rmse, r2score]
turnedResultsDF

In [None]:
print(gdr_gs.best_params_)

### CONCLUSION - 
It can be concluded that the top five most important features or properties affecting the compressive strength of Concrete are age, cement, fineagg, coarseagg and water - three of which were among top features also for linear regression.

Hyperparamter tuning require access to large computational power, as most of the default setting for most Machine Learning Algorithm seems to have been relatively optimized for good fit and predict capabilities. RandomForest Regressor gave the best result at 93% closely followed by GradientBoosting Regressor at 92.7% putting us within range of the problem statement scope.