## What is bias in machine learning?
- Bias is a phenomenon that skews the result of an algorithm in favor or against an idea.

- Bias is considered a systematic error that occurs in the machine learning model itself due to incorrect assumptions in the ML process.

- Technically, we can define bias as the error between average model prediction and the ground truth. Moreover, it describes how well the model matches the training data set:
     
    - A model with a higher bias would not match the data set closely.
    - A low bias model will closely match the training data set.




## What is variance in machine learning?
- Variance refers to the changes in the model when using different portions of the training data set.

- Simply stated, variance is the variability in the model prediction—how much the ML function can adjust depending on the given data set. Variance comes from highly complex models with a large number of features.


    - Models with high bias will have low variance.
    - Models with high variance will have a low bias.


### Underfitting
Underfitting occurs when the model is unable to match the input data to the target data. This happens when the model is not complex enough to match all the available data and performs poorly with the training dataset.

### Overfitting
- Overfitting relates to instances where the model tries to match non-existent data. This occurs when dealing with highly complex models where the model will match almost all the given data points and perform well in training datasets. However, the model would not be able to generalize the data point in the test data set to predict the outcome accurately.

### Considering bias & variance is crucial

- Bias and variance are two key components that you must consider when developing any good, accurate machine learning model.

   - Bias creates consistent errors in the ML model, which represents a simpler ML model that is not suitable for a specific requirement.
   - On the other hand, variance creates variance errors that lead to incorrect predictions seeing trends or data points that do not exist.


In [4]:

# Load the necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
 
# Laod the dataset
X, y = fetch_california_housing(return_X_y=True)
 
# Split train and test dataset
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.25,
                                       random_state=23,
                                       shuffle=True)
 
# Build the regression model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1)
])
 
# Set optimizer and loss
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mean_squared_error',
              optimizer=optimizer)
 
# Train the model
model.fit(X_train, y_train, epochs=25, verbose=0)
# Evaluations
accuracy = model.evaluate(X_test, y_test)
print('Average: %.2f' % accuracy)
 
# Bias variance decompositions
avg_expected_loss, avg_bias,\
    avg_var = bias_variance_decomp(model,
                                   X_train, y_train,
                                   X_test, y_test,
                                   loss='mse',
                                   random_seed=23,
                                   epochs=5,
                                   verbose=0)
 
# Print the result
print('Average expected loss: %.2f' % avg_expected_loss)
print('Average bias: %.2f' % avg_bias)
print('Average variance: %.2f' % avg_var)

ModuleNotFoundError: No module named 'tensorflow'