# Model Training Research: Candidate Models and Considerations

This document outlines candidate models to be considered for our wine quality prediction task, along with key factors to evaluate during the research phase.

## Candidate Models

Here are some potential machine learning models that could be explored for predicting wine quality:

1. **Decision Trees:**
   - **Strengths:** Flexible and can handle both continuous and categorical features without extensive data preprocessing. Robust to outliers.
   - **Weaknesses:** Prone to overfitting if not properly tuned. Can be less interpretable than linear models.

2. **Random Forests:**
   - **Strengths:** Ensemble method combining multiple decision trees, leading to improved accuracy and reduced overfitting compared to single decision trees.
   - **Weaknesses:** Interpretability can be challenging due to the ensemble nature. May require hyperparameter tuning for optimal performance.

3. **Support Vector Machines (SVMs):**
   - **Strengths:** Powerful for classification tasks, especially with high-dimensional data. Effective at handling non-linear relationships using kernel functions.
   - **Weaknesses:** Sensitive to outliers and feature scaling. Can be computationally expensive for large datasets.

4. **K-Nearest Neighbors (KNN):**
   - **Strengths:** Simple and easy to implement. No explicit model training required. Effective for both classification and regression.
   - **Weaknesses:** Performance can be affected by the "curse of dimensionality." Sensitive to noisy data and choice of distance metric.

5. **Logistic Regression:**
   - **Strengths:** Simple and interpretable. Suitable for binary classification tasks. Can handle large datasets efficiently.
   - **Weaknesses:** Assumes linear relationship between features and target variable. May underperform if the data is not linearly separable.

6. **Gradient Boosting Machines (GBM):**
   - **Strengths:** Builds decision trees sequentially, focusing on correcting errors made by previous trees. Typically achieves high accuracy.
   - **Weaknesses:** Prone to overfitting if not properly regularized. Requires careful tuning of hyperparameters.

7. **Naive Bayes:**
   - **Strengths:** Simple and fast to train. Performs well on text classification tasks and with categorical features.
   - **Weaknesses:** Assumes independence between features, which may not hold true in practice. Can be sensitive to imbalanced class distributions.

8. **Neural Networks:**
   - **Strengths:** Capable of learning complex patterns in data. Can handle high-dimensional inputs and non-linear relationships.
   - **Weaknesses:** Requires large amounts of data for training. May suffer from overfitting if not properly regularized. Computationally intensive.

## Selection Criteria

When evaluating these models, we should consider the following factors:

- **Problem Type:** Since we're predicting wine quality (likely a categorical variable), we'll be focusing on classification models.
- **Data Characteristics:**
  - Feature types (continuous, categorical)
  - Data size and dimensionality
  - Presence of missing values or outliers
- **Model Performance:**
  - Accuracy, precision, recall, F1-score (depending on class imbalance)
  - Overfitting potential
- **Interpretability:**
  - Importance of understanding model predictions and feature relationships
  - Trade-off between accuracy and interpretability

## Research Plan

The research will involve:

1. **Data Preprocessing:** Exploring data cleaning, handling missing values, and feature scaling if necessary.
2. **Model Training and Evaluation:** Implementing and training each candidate model with appropriate hyperparameter tuning. Evaluating performance metrics on a separate validation set.
3. **Model Selection:** Choosing the model with the best balance of accuracy, interpretability, and suitability for the data.
4. **Model Interpretation:** Analyzing the chosen model to understand how features influence predictions (if applicable).

Through this research, we aim to identify the most effective model for predicting wine quality based on the given dataset and evaluation criteria.
ed on the given dataset and evaluation criteria.


In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from catboost import CatBoostClassifier
from xgboost import XGBClassifier

In [2]:
df = pd.read_csv("new_df_wine.csv")
df = df.drop('Unnamed: 0',axis = 1)
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,medium
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,medium
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,medium
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,medium
4,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,medium


I will encode my target variable since some of the models can handle categorical data...

In [3]:
label_mapping = {'low': 0, 'medium': 1, 'high': 2}
# Map the labels to numerical values
df['quality'] = df['quality'].map(label_mapping)
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,1
4,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,1


#### Train test split
Doing this now to avoid data leakage.

In [4]:
def train_test(df,target_column,test_size,random_state):
    """
    Perform train-test split on a DataFrame.
    
    Parameters:
    df (DataFrame): The DataFrame containing features and target variable.
    target_variable (str): The name of the target variable.
    random_state (int): Random state for reproducibility.
    test_size (float): The proportion of the dataset to include in the test split.
    
    Returns:
    X_train (DataFrame): The features for training.
    X_test (DataFrame): The features for testing.
    y_train (Series): The target variable for training.
    y_test (Series): The target variable for testing.
    """
    X = df.drop(target_column,axis = 1)
    y = df[target_column]

    X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=test_size,random_state=random_state)
    return X_train, X_test, y_train, y_test

In [5]:
X_train, X_test, y_train, y_test = train_test(df,'quality',0.2,42)

In [6]:
X_train.shape, y_test.shape

((1087, 11), (272,))

In [7]:
y_train

865     2
1289    1
394     1
731     1
54      1
       ..
1095    1
1130    1
1294    1
860     1
1126    1
Name: quality, Length: 1087, dtype: int64

In [13]:
len(list(X_train.columns))

11

In [12]:
len(X_train.select_dtypes(exclude="object").columns)

11

In [8]:
from sklearn.compose import ColumnTransformer
num_features = list(X_train.columns)
numeric_transformer = StandardScaler()

In [9]:
preprocessor = ColumnTransformer(
    [
         ("StandardScaler", numeric_transformer, num_features)       
    ]
)

In [10]:
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

X_train.shape,X_test.shape

((1087, 11), (272, 11))

In [11]:
X_train,X_test

(array([[ 0.35162311, -0.82832312,  0.64877105, ..., -0.25982192,
          0.51985077,  1.98424001],
        [-0.98154785,  0.96589124, -0.98983279, ...,  0.85726724,
         -0.45848943, -0.2158671 ],
        [-0.86561994,  0.18092246, -1.24586464, ...,  0.85726724,
         -0.17074231, -0.39920936],
        ...,
        [-0.86561994, -0.32370033, -0.98983279, ..., -0.06268854,
         -0.05564346, -0.76589387],
        [ 0.35162311, -1.16473831,  0.18791372, ..., -0.91693319,
         -0.6311377 , -0.03252484],
        [ 0.46755102, -1.05259991,  0.75118379, ..., -0.85122206,
         -0.6311377 ,  0.88418646]]),
 array([[-0.34394435,  0.51733765, -1.19465827, ...,  0.20015597,
         -0.74623655, -0.857565  ],
        [-0.05412458,  0.60144145, -0.88742005, ..., -0.12839966,
          0.51985077,  0.42583081],
        [ 0.06180333, -0.88439232,  0.80239016, ..., -0.91693319,
          0.86514732,  0.7008442 ],
        ...,
        [ 0.35162311, -0.71618472,  0.23912009, ...,  

In [12]:
y_train

865     2
1289    1
394     1
731     1
54      1
       ..
1095    1
1130    1
1294    1
860     1
1126    1
Name: quality, Length: 1087, dtype: int64

#### Evaluation and model training:

In [13]:
def evaluate_model(true, predicted):
    accuracy = accuracy_score(true, predicted)
    precision = precision_score(true, predicted, average='weighted')  # Update here
    recall = recall_score(true, predicted, average='weighted')  # Update here
    f1 = f1_score(true, predicted, average='weighted')  # Update here
    return accuracy, precision, recall, f1

In [16]:
models = {
    "Decision Tree": DecisionTreeClassifier(),
    "Random Forest": RandomForestClassifier(),
    "Support Vector Machine": SVC(),
    "K-Nearest Neighbors": KNeighborsClassifier(),
    "Logistic Regression": LogisticRegression(),
    "Gradient Boosting": GradientBoostingClassifier(),
    "Gaussian Naive Bayes": GaussianNB(),
    "Multilayer Perceptron": MLPClassifier(),
    "CatBoost Classifier": CatBoostClassifier(),
    "XGBoost Classifier": XGBClassifier()
}

model_list = []
accuracy_list = []
_accuracy_train = []

for model_name, model in models.items():
    model.fit(X_train, y_train) # Train model

    # Make predictions
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    # Evaluate Train and Test dataset
    accuracy_train, precision_train, recall_train, f1_train = evaluate_model(y_train, y_train_pred)
    accuracy_test, precision_test, recall_test, f1_test = evaluate_model(y_test, y_test_pred)

    print(model_name)
    model_list.append(model_name)
    
    print('Model performance for Training set')
    print("- Accuracy: {:.4f}".format(accuracy_train))
    print("- Precision: {:.4f}".format(precision_train))
    print("- Recall: {:.4f}".format(recall_train))
    print("- F1 Score: {:.4f}".format(f1_train))
    print('----------------------------------')
    
    print('Model performance for Test set')
    print("- Accuracy: {:.4f}".format(accuracy_test))
    print("- Precision: {:.4f}".format(precision_test))
    print("- Recall: {:.4f}".format(recall_test))
    print("- F1 Score: {:.4f}".format(f1_test))
    print('='*35)
    print('\n')

    _accuracy_train.append(accuracy_train)
    accuracy_list.append(accuracy_test)

Decision Tree
Model performance for Training set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000
----------------------------------
Model performance for Test set
- Accuracy: 0.7574
- Precision: 0.7615
- Recall: 0.7574
- F1 Score: 0.7592




  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Random Forest
Model performance for Training set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000
----------------------------------
Model performance for Test set
- Accuracy: 0.8529
- Precision: 0.7997
- Recall: 0.8529
- F1 Score: 0.8150


Support Vector Machine
Model performance for Training set
- Accuracy: 0.8445
- Precision: 0.8426
- Recall: 0.8445
- F1 Score: 0.8025
----------------------------------
Model performance for Test set
- Accuracy: 0.8640
- Precision: 0.8125
- Recall: 0.8640
- F1 Score: 0.8301


K-Nearest Neighbors
Model performance for Training set
- Accuracy: 0.8638
- Precision: 0.8460
- Recall: 0.8638
- F1 Score: 0.8451
----------------------------------
Model performance for Test set
- Accuracy: 0.8199
- Precision: 0.7761
- Recall: 0.8199
- F1 Score: 0.7966


Logistic Regression
Model performance for Training set
- Accuracy: 0.8362
- Precision: 0.8161
- Recall: 0.8362
- F1 Score: 0.8049
----------------------------------
Model performance f



Multilayer Perceptron
Model performance for Training set
- Accuracy: 0.8712
- Precision: 0.8601
- Recall: 0.8712
- F1 Score: 0.8535
----------------------------------
Model performance for Test set
- Accuracy: 0.8456
- Precision: 0.8282
- Recall: 0.8456
- F1 Score: 0.8272


Learning rate set to 0.079464
0:	learn: 1.0161531	total: 4.06ms	remaining: 4.05s
1:	learn: 0.9523774	total: 7.53ms	remaining: 3.76s
2:	learn: 0.8944809	total: 11.6ms	remaining: 3.84s
3:	learn: 0.8453778	total: 16.1ms	remaining: 4.01s
4:	learn: 0.8029076	total: 20.5ms	remaining: 4.08s
5:	learn: 0.7652276	total: 24.4ms	remaining: 4.05s
6:	learn: 0.7325721	total: 29ms	remaining: 4.11s
7:	learn: 0.7032581	total: 33.7ms	remaining: 4.17s
8:	learn: 0.6784298	total: 38.1ms	remaining: 4.2s
9:	learn: 0.6550096	total: 41.9ms	remaining: 4.15s
10:	learn: 0.6338475	total: 45.7ms	remaining: 4.11s
11:	learn: 0.6137057	total: 50.2ms	remaining: 4.13s
12:	learn: 0.5969070	total: 53.7ms	remaining: 4.08s
13:	learn: 0.5800949	total: 57.5

In [17]:
results = pd.DataFrame(list(zip(model_list, accuracy_list, _accuracy_train)), 
                       columns=['Model Name', 'Accuracy Test', 'Accuracy Train']).sort_values(by=["Accuracy Test"],
                                                                       ascending=False)
results

Unnamed: 0,Model Name,Accuracy Test,Accuracy Train
2,Support Vector Machine,0.863971,0.844526
5,Gradient Boosting,0.863971,0.964121
1,Random Forest,0.852941,1.0
7,Multilayer Perceptron,0.845588,0.871205
8,CatBoost Classifier,0.841912,1.0
4,Logistic Regression,0.838235,0.836247
9,XGBoost Classifier,0.830882,1.0
3,K-Nearest Neighbors,0.819853,0.863845
6,Gaussian Naive Bayes,0.797794,0.762649
0,Decision Tree,0.757353,1.0


## Top 3 Models:

1. **Support Vector Machine (SVM):**
   - **Accuracy:** Test (0.864) & Train (0.845) are reasonably close, indicating good generalization (less overfitting).
   - **Interpretability:** SVMs are well-established and interpretable, making them a good choice for industries where explainability is often valuable.
   - **Reasoning:** SVMs exhibit good generalization with a balanced trade-off between train and test accuracy, making them reliable for real-world applications.

2. **Gradient Boosting:**
   - **Accuracy:** Test (0.864) is similar to SVM, suggesting decent generalization.
   - **Train accuracy:** (0.964) is higher than test accuracy, but not excessively high, indicating a moderate risk of overfitting.
   - **Interpretability:** Gradient Boosting models can achieve high accuracy while offering some interpretability, which is beneficial for understanding model predictions in industry settings.
   - **Reasoning:** Despite the slight risk of overfitting, Gradient Boosting offers competitive accuracy and interpretability, making it suitable for a wide range of industrial applications.

3. **Logistic Regression:**
   - **Accuracy:** Test (0.838) is lower than SVM and Gradient Boosting, but it might be sufficient depending on the industry's specific requirements.
   - **Generalization:** Train and Test accuracy are relatively close (0.838 vs 0.836) suggesting good generalization and less overfitting.
   - **Interpretability:** Logistic Regression is highly interpretable, making it ideal for understanding the model's decision-making process, which is crucial in many industries.
   - **Reasoning:** Despite slightly lower accuracy, Logistic Regression offers excellent interpretability and generalization, making it a reliable choice for industries where transparency and model understanding are paramount.

## Why not Random Forest, CatBoost, or XGBoost:

- While Random Forest, CatBoost, and XGBoost achieve high training accuracy (1.000), their test accuracy is lower than SVM and Gradient Boosting, indicating a high risk of overfitting.
- These models might perform well on the training data but might not generalize well to unseen data, leading to poor real-world performance.

## Factors to Consider:

- **Generalization vs. Overfitting:** Models with a significant gap between training and test accuracy may suffer from overfitting, impacting their performance on new data.
- **Interpretability:** For industries requiring model transparency, interpretability is crucial. SVM, Logistic Regression, and Gradient Boosting offer varying degrees of interpretability.
- **Industry Requirements:** The acceptable accuracy threshold and the importance of interpretability can vary based on the specific industry and application.
- **Computational Efficiency:** Consider the training and prediction time of different models, especially for large datasets or real-time predictions. Logistic Regression and SVM are generally faster to train and use compared to complex ensemble methods.

## Conclusion:

- The top three models, SVM, Logistic Regression, and Gradient Boosting, offer a balance between accuracy, interpretability, and generalization, making them suitable for various industrial applications.
- Selecting the best model involves a trade-off between accuracy, interpretability, computational efficiency, and industry-specific requirements, ensuring alignment with the organization's goals and constraints.

## Additional Tips:

- Consider techniques like cross-validation to get a more robust estimate of model performance.
- Ensemble methods combining SVM, Logistic Regression, and Gradient Boosting can further enhance predictive performance and model robustness.


#### Hyper parameter tuning.

In [19]:
from sklearn.model_selection import RandomizedSearchCV

# Define a smaller hyperparameter grid with focus on commonly used values
param_grid_svc = {
    'kernel': ['linear', 'rbf'],  # Start with common choices
    'C': [0.01, 0.1, 1, 10],
    'gamma': ['scale', 'auto', 0.1],  # Focus on relevant gamma values
    'class_weight': [None, 'balanced']
}

# Use RandomizedSearchCV for efficient search
svm_cv = RandomizedSearchCV(SVC(), param_grid_svc, cv=5, scoring='accuracy')  # Set cv for cross-validation

# Fit the model with hyperparameter tuning
svm_cv.fit(X_train, y_train)

# Access the best model with tuned hyperparameters
best_svm = svm_cv.best_estimator_
print("Best Hyperparameters:", best_svm)

Best Hyperparameters: SVC(C=0.01, kernel='linear')


In [20]:
# Calculate accuracy for train and test datasets
train_accuracy = best_svm.score(X_train, y_train)
test_accuracy = best_svm.score(X_test, y_test)

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

Train Accuracy: 0.8178472861085556
Test Accuracy: 0.8198529411764706


In [26]:
# Define a larger hyperparameter grid with more options
param_grid_svc = {
    'kernel': ['linear', 'poly', 'rbf'],#, 'sigmoid'],
    'C': [1, 10, 100],
    'gamma': ['scale', 'auto', 0.01, 0.1, 1],
    'class_weight': [None, 'balanced']
}

# Increase the number of iterations for RandomizedSearchCV
svm_cv = RandomizedSearchCV(SVC(), param_grid_svc, cv=5, scoring='accuracy', n_iter=200)

# Fit the model with hyperparameter tuning
svm_cv.fit(X_train, y_train)

# Access the best model with tuned hyperparameters
best_svm = svm_cv.best_estimator_
print("Best Hyperparameters:", best_svm)

# Calculate accuracy for train and test datasets
train_accuracy = best_svm.score(X_train, y_train)
test_accuracy = best_svm.score(X_test, y_test)

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)




Best Hyperparameters: SVC(C=1, gamma=1)
Train Accuracy: 0.9586016559337627
Test Accuracy: 0.8455882352941176


In [28]:
param_grid_gbr = {
  'learning_rate': [0.05, 0.1, 0.2],
  'n_estimators': [100, 150, 200],
  'max_depth': [3, 5, 8],
  'min_samples_split': [2, 5],
  'min_samples_leaf': [1, 3],
}

# Use RandomizedSearchCV with early stopping
gbr_cv = RandomizedSearchCV(GradientBoostingClassifier(), param_grid_gbr, cv=5, scoring="accuracy")

# Fit the model with hyperparameter tuning
gbr_cv.fit(X_train, y_train)

# Access the best model with tuned hyperparameters
best_gbr = gbr_cv.best_estimator_

train_accuracy = best_gbr.score(X_train, y_train)
test_accuracy = best_gbr.score(X_test, y_test)

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

Train Accuracy: 1.0
Test Accuracy: 0.8272058823529411


In [30]:
best_gbr, best_svm

(GradientBoostingClassifier(max_depth=5, min_samples_leaf=3, min_samples_split=5,
                            n_estimators=200),
 SVC(C=1, gamma=1))

In [73]:
# Define the logistic regression classifier
logistic_regression = LogisticRegression()

# Define the hyperparameters grid
param_grid_lr = {
  'C': [0.01, 0.1, 1, 10, 100, 1000],  # Regularization parameter
  'solver': ['lbfgs', 'liblinear', 'sag', 'newton-cg'],  # Optimization algorithms
  'penalty': ['l2', 'l1'],  # Penalty types (L1 or L2 regularization)
  'class_weight': [None, 'balanced', 'balanced_subsample'],  # Class weighting strategies
  'max_iter': [100, 200, 500]  # Maximum iterations for training
}

# Perform Grid Search Cross Validation
grid_search = GridSearchCV(estimator=logistic_regression, param_grid=param_grid_lr, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Instantiate a logistic regression model with the best hyperparameters
best_logistic_regression = LogisticRegression(**best_params)

# Fit the model on the training data
best_logistic_regression.fit(X_train, y_train)

# Evaluate the model on the test data
test_accuracy = best_logistic_regression.score(X_test, y_test)
print("Test Accuracy:", test_accuracy)

Best Hyperparameters: {'C': 1, 'class_weight': None, 'max_iter': 100, 'penalty': 'l1', 'solver': 'liblinear'}
Test Accuracy: 0.8382352941176471


In [68]:
best_logistic_regression

In [70]:
import warnings
warnings.filterwarnings("ignore")

# Define the hyperparameter grid
param_grid_lr = {
  'C': [0.001, 0.01, 0.1, 1, 10, 100],  # Regularization parameter
  'solver': ['lbfgs', 'liblinear', 'sag'],  # Optimization algorithms
  'class_weight': [None, 'balanced']  # Handle imbalanced classes (optional)
}

# Define RandomizedSearchCV with Logistic Regression
lr_cv = RandomizedSearchCV(LogisticRegression(), param_grid_lr, cv=5, scoring='accuracy', n_iter=100)

# Fit the model with hyperparameter tuning
lr_cv.fit(X_train, y_train)

# Access the best model with tuned hyperparameters
best_lr = lr_cv.best_estimator_
print("Best Hyperparameters:", best_lr)

# Calculate accuracy on train and test sets
train_accuracy = best_lr.score(X_train, y_train)
test_accuracy = best_lr.score(X_test, y_test)

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)


Best Hyperparameters: LogisticRegression(C=1, solver='liblinear')
Train Accuracy: 0.8334866605335787
Test Accuracy: 0.8419117647058824


In [71]:
lg = LogisticRegression()

In [72]:
lg.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}