### Box-Cox Custom Implementation

In [16]:
import numpy as np
from scipy.stats import boxcox
from scipy.special import log1p
from scipy.stats import norm

def compute_log_likelihood(data):
    # Assuming a normal distribution for illustration purposes
    mu = np.mean(data)
    sigma = np.std(data)
    log_likelihood = np.sum(norm.logpdf(data, loc=mu, scale=sigma))
    return log_likelihood

def boxcox_transform(data):
    lambdas = range(-5, 6)  # Range of lambda values to consider
    log_likelihoods = []

    # Step 2: Calculate Transformed Data for Each lambda
    for lambda_value in lambdas:
        if lambda_value == 0:
            transformed_data = log1p(data)
        else:
            transformed_data = (data**lambda_value - 1) / lambda_value

        # Step 3: Evaluate Log-Likelihood
        log_likelihood = compute_log_likelihood(transformed_data)

        log_likelihoods.append(log_likelihood)

        print(f"Lambda: {lambda_value}, Log-Likelihood: {log_likelihood}")

    # Step 4: Identify Optimal lambda
    optimal_lambda = list(lambdas)[np.argmax(log_likelihoods)]

    # Step 5: Apply Box-Cox Transformation
    if optimal_lambda == 0:
        transformed_data = log1p(data)
    else:
        transformed_data = (data**optimal_lambda - 1) / optimal_lambda

    return transformed_data, optimal_lambda

# Generate a sample dataset
np.random.seed(42)
data = np.random.exponential(size=1000)

# Apply Box-Cox Transformation using the provided function
transformed_data, optimal_lambda = boxcox_transform(data)

# Alternatively, you can use the built-in scipy function for Box-Cox transformation
scipy_transformed_data, scipy_optimal_lambda = boxcox(data)

# Print the results
print("\nCustom Box-Cox Transformation:")
print("Optimal Lambda:", optimal_lambda)
print("Transformed Data (Sample):", transformed_data[:5])

print("\nScipy Box-Cox Transformation:")
print("Optimal Lambda:", scipy_optimal_lambda)
print("Transformed Data (Sample):", scipy_transformed_data[:5])


Lambda: -5, Log-Likelihood: -23451.638674802245
Lambda: -4, Log-Likelihood: -18355.370231047993
Lambda: -3, Log-Likelihood: -13343.38428081157
Lambda: -2, Log-Likelihood: -8501.383730547284
Lambda: -1, Log-Likelihood: -4167.738799399524
Lambda: 0, Log-Likelihood: -549.2350996899013
Lambda: 1, Log-Likelihood: -1390.5632452147552
Lambda: 2, Log-Likelihood: -2132.918397443552
Lambda: 3, Log-Likelihood: -3440.432643524827
Lambda: 4, Log-Likelihood: -5087.411338218459
Lambda: 5, Log-Likelihood: -6901.6385209558175

Custom Box-Cox Transformation:
Optimal Lambda: 0
Transformed Data (Sample): [0.38476438 1.38882152 0.84016348 0.64864266 0.15668307]

Scipy Box-Cox Transformation:
Optimal Lambda: 0.24618452757119594
Transformed Data (Sample): [-0.69029945  1.26594507  0.28469728 -0.09006874 -1.43746402]


Ques-Why the result of custom Box-Cox is different from Scipy Box-Cox

        Ans -The reason for the difference between the custom Box-Cox transformation and the one provided by SciPy might be related to the optimization process used to find the optimal lambda.

        In the custom implementation, the optimization is based on a user-defined log-likelihood function, and it uses a simple range of lambda values from -5 to 5. The optimal lambda is the one that maximizes this log-likelihood function within that range.

        On the other hand, the boxcox function from SciPy uses a more sophisticated optimization algorithm to find the optimal lambda. It searches for the optimal value within a wider range and may use numerical optimization techniques that provide a more accurate result.

        In practice, the difference in results between custom implementations and well-established libraries like SciPy can occur due to variations in optimization methods, parameter settings, and numerical precision. It's not uncommon to observe slight differences, and in many cases, the choice between custom implementations and library functions depends on the specific requirements and use case. If the difference is substantial, you might want to investigate further or consider adjusting the optimization parameters in your custom implementation.

Ques- Difference between box-cox and yeo-johnson transformation

    Handling of Zero and Negative Values:

    Box-Cox: It is not defined for zero or negative values. If your data contains zeros, you can add a constant to make it positive before applying the Box-Cox transformation.
    Yeo-Johnson: It handles zero and negative values. The transformation formula includes different expressions for positive and negative values

    Computational Complexity:

    Box-Cox: The Box-Cox transformation involves logarithmic computations, which can be computationally intensive.
    Yeo-Johnson: The Yeo-Johnson transformation is computationally less intensive due to its piecewise definition