# Risk Challenge

Allowed libraries: NumPy, Pandas, SkLearn(or any related machine learning/regression library), YFinance

Used GPT for commenting code, getting information about finance terminology, for documentaiton, and for some code

## Part 1:
a) Create your own statistical metric. It can be based off of some that already exist, or you can entirely create your own. How can this metric be useful to SSMIF, what does the metric tell us? 

b) Write a function that takes in an input of a stock ticker and returns the metric

In [1]:
import yfinance as yf
import numpy as np
import pandas as pd
from scipy.stats import norm
from sklearn.model_selection import ParameterGrid

def calculate_risk_metric(start_date, end_date, ticker, weights):
    """
    Calculates the risk metric for a given stock ticker over a specified date range.

    Args:
        start_date (str): The start date of the analysis period in the format "YYYY-MM-DD".
        end_date (str): The end date of the analysis period in the format "YYYY-MM-DD".
        ticker (str): The stock ticker symbol.
        weights (dict): A dictionary containing the weights for each risk component.

    Returns:
        tuple: A tuple containing the risk metric output as a string and the calculated risk metric value.
            If an error occurs during the calculation, returns (None, np.nan).
    """
    try:
        # Download stock data from Yahoo Finance for the specified ticker and date range
        stock_data = yf.download(ticker, start=start_date, end=end_date)
        
        # Calculate daily returns based on the adjusted closing prices
        daily_returns = stock_data['Adj Close'].pct_change()
        
        # Calculate Beta (measure of a stock's volatility in relation to the market)
        # Download benchmark data (S&P 500) from Yahoo Finance for the same date range
        benchmark_data = yf.download('^GSPC', start=start_date, end=end_date)
        benchmark_returns = benchmark_data['Adj Close'].pct_change()
        
        # Check if there are sufficient data points for beta calculation
        if len(daily_returns) > 1 and len(benchmark_returns) > 1:
            covariance = np.cov(daily_returns[1:], benchmark_returns[1:])[0][1]
            benchmark_variance = np.var(benchmark_returns[1:])
            beta = covariance / benchmark_variance
            beta_calculated = True
        else:
            beta = np.nan
            beta_calculated = False
        
        # Calculate Value at Risk (VaR) - potential loss at a given confidence level
        confidence_level = 0.95
        if len(daily_returns) > 1:
            var = norm.ppf(1 - confidence_level, loc=np.mean(daily_returns[1:]), scale=np.std(daily_returns[1:]))
            var_calculated = True
        else:
            var = np.nan
            var_calculated = False
        
        # Calculate Historical Drawdowns (measure of the decline from a historical peak)
        cumulative_returns = (1 + daily_returns).cumprod()
        peak = cumulative_returns.expanding(min_periods=1).max()
        drawdowns = (cumulative_returns / peak) - 1
        max_drawdown = drawdowns.min()
        max_drawdown_calculated = True
        
        # Calculate Sharpe Ratio (risk-adjusted return)
        risk_free_rate = 0.02  # Assuming a 2% risk-free rate
        if len(daily_returns) > 1:
            sharpe_ratio = (np.mean(daily_returns[1:]) - risk_free_rate) / np.std(daily_returns[1:])
            sharpe_ratio_calculated = True
        else:
            sharpe_ratio = np.nan
            sharpe_ratio_calculated = False
        
        # Calculate Sortino Ratio (risk-adjusted return considering only downside deviation)
        downside_returns = daily_returns[daily_returns < 0]
        if len(downside_returns) > 1:
            downside_deviation = np.std(downside_returns[1:])
            sortino_ratio = (np.mean(daily_returns[1:]) - risk_free_rate) / downside_deviation
            sortino_ratio_calculated = True
        else:
            sortino_ratio = np.nan
            sortino_ratio_calculated = False
        
        # Perform Fundamental Analysis
        stock_info = yf.Ticker(ticker).info
        market_cap = stock_info.get('marketCap', np.nan)
        pe_ratio = stock_info.get('trailingPE', np.nan)
        debt_to_equity = stock_info.get('debtToEquity', np.nan)
        
        # Check if PE ratio is valid and calculate fundamentals factor
        if np.isnan(pe_ratio) or pe_ratio == 0:
            fundamentals_factor = np.nan
            fundamentals_factor_calculated = False
        else:
            fundamentals_factor = debt_to_equity / pe_ratio
            fundamentals_factor_calculated = True
        
        # Calculate the Risk Metric by combining the weighted risk components
        risk_metric_components = [
            beta * weights['beta'] if beta_calculated else np.nan,
            abs(var) * weights['var'] if var_calculated else np.nan,
            abs(max_drawdown) * weights['max_drawdown'] if max_drawdown_calculated else np.nan,
            (1 / sharpe_ratio) * weights['sharpe_ratio'] if sharpe_ratio_calculated else np.nan,
            (1 / sortino_ratio) * weights['sortino_ratio'] if sortino_ratio_calculated else np.nan,
            fundamentals_factor * weights['fundamentals'] if fundamentals_factor_calculated else np.nan
        ]
        
        risk_metric = sum(filter(lambda x: not np.isnan(x), risk_metric_components))
        
        # Prepare the output string with the risk metric components and their calculation status
        output = f"Risk Metric for {ticker}:\n"
        output += f"Beta: {'Calculated' if beta_calculated else 'Insufficient data'}\n"
        output += f"Value at Risk (VaR): {'Calculated' if var_calculated else 'Insufficient data'}\n"
        output += f"Historical Drawdowns: {'Calculated' if max_drawdown_calculated else 'Insufficient data'}\n"
        output += f"Sharpe Ratio: {'Calculated' if sharpe_ratio_calculated else 'Insufficient data'}\n"
        output += f"Sortino Ratio: {'Calculated' if sortino_ratio_calculated else 'Insufficient data'}\n"
        output += f"Fundamental Analysis: {'Calculated' if fundamentals_factor_calculated else 'Insufficient data'}\n"
        output += f"Risk Metric: {risk_metric:.4f}\n"
        
        return output, risk_metric
    
    except (KeyError, ZeroDivisionError, ValueError, Exception) as e:
        print(f"Error for {ticker}: {str(e)}")
        return None, np.nan

def evaluate_risk_metric(start_date, end_date, tickers, weights):
    """
    Evaluates the risk metric for a list of stock tickers using the specified weights.

    Args:
        start_date (str): The start date of the analysis period in the format "YYYY-MM-DD".
        end_date (str): The end date of the analysis period in the format "YYYY-MM-DD".
        tickers (list): A list of stock ticker symbols.
        weights (dict): A dictionary containing the weights for each risk component.

    Returns:
        float: The evaluation metric (e.g., mean absolute error) for the risk metric.
    """
    risk_metrics = []
    for ticker in tickers:
        risk_metric_output, risk_metric = calculate_risk_metric(start_date, end_date, ticker, weights)
        if risk_metric_output is not None:
            risk_metrics.append(risk_metric)
    
    # Calculate the evaluation metric (e.g., mean absolute error)
    evaluation_metric = np.nanmean(np.abs(risk_metrics))
    
    return evaluation_metric

def parameter_tuning(start_date, end_date, tickers):
    """
    Performs parameter tuning to find the best weights for the risk metric components.

    Args:
        start_date (str): The start date of the analysis period in the format "YYYY-MM-DD".
        end_date (str): The end date of the analysis period in the format "YYYY-MM-DD".
        tickers (list): A list of stock ticker symbols.

    Returns:
        dict: The best weights found during the parameter tuning process.
    """
    # Define the parameter grid for weights
    param_grid = {
        'beta': [0.1, 0.2, 0.3],
        'var': [0.2, 0.3, 0.4],
        'max_drawdown': [0.1, 0.2, 0.3],
        'sharpe_ratio': [0.05, 0.1, 0.15],
        'sortino_ratio': [0.05, 0.1, 0.15],
        'fundamentals': [0.05, 0.1, 0.15]
    }
    
    # Generate all combinations of weights using the parameter grid
    grid = ParameterGrid(param_grid)
    
    best_weights = None
    best_metric = float('inf')
    
    # Iterate over each combination of weights and evaluate the risk metric
    for weights in grid:
        evaluation_metric = evaluate_risk_metric(start_date, end_date, tickers, weights)
        if evaluation_metric < best_metric:
            best_metric = evaluation_metric
            best_weights = weights
    
    return best_weights


# Example usage
start_date = "2010-01-01"
end_date = "2024-01-01"
tickers = ['BRK-B', '^GSPC'] # Increased run time with addition of more tickers; tested with many ticker combinations and found current to be best

# Remove tickers that raise exceptions during data download
valid_tickers = []
for ticker in tickers:
    try:
        stock_data = yf.download(ticker, start=start_date, end=end_date)
        valid_tickers.append(ticker)
    except Exception as e:
        print(f"Skipping {ticker} due to error: {str(e)}")


# Uncomment to test parameter_tuning
best_weights = parameter_tuning(start_date, end_date, valid_tickers)
print("Best weights:", best_weights)

# User-defined weights (you can manually set the weights based on the optimal weights found)
user_weights = {
    'beta': 0.2,
    'var': 0.3,
    'max_drawdown': 0.2,
    'sharpe_ratio': 0.1,
    'sortino_ratio': 0.1,
    'fundamentals': 0.1
}


# Calculate risk metric for a specific stock using user-defined weights
ticker = 'RCL'
risk_metric_output, _ = calculate_risk_metric(start_date, end_date, ticker, user_weights)
if risk_metric_output is not None:
    print(risk_metric_output)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Best weights: {'beta': 0.1, 'fundamentals': 0.05, 'max_drawdown': 0.1, 'sharpe_ratio': 0.15, 'sortino_ratio': 0.15, 'var': 0.2}
Risk Metric for RCL:
Beta: Calculated
Value at Risk (VaR): Calculated
Historical Drawdowns: Calculated
Sharpe Ratio: Calculated
Sortino Ratio: Calculated
Fundamental Analysis: Calculated
Risk Metric: 2.3253



To interpret the risk metric, consider the following general guidelines (note that these guidelines can vary drasticaly based on parameter tuning):
- value close to 0 indicates relatively low risk.
- value greater than 1 suggests moderate to high risk.
- value significantly greater than 1 (e.g., 2 or higher) indicates high risk.


The custom risk metric implemented in this code combines several risk measures to provide a comprehensive assessment of a stock's risk such as Beta, VaR, Historical Drawdowns, Sharpe Ratio, Sortino Ratio, and Fundamental Analysis. They are scaled by their respective weights before being summed up. These weightes are modified based on further optimazation. 

Utilizes a grid search or a randomized search to explore different combinations of weights and evaluate their performance using a suitable metric.

The code is written in a modular manner, allowing users to modify the weights assigned to each risk measure or even add/remove risk measures based on their preferences. This flexibility enables users to tailor the risk metric to their specific needs and investment philosophy.

## Part 2: 
a) Using the risk-challenge data fit a linear model (Bonus: do not use sklearn to create the model) to the data to predict the y values. What are the coefficients of the x values? Does the model have any predictive power? How statistically significant is the models predictive power at an alpha of 0.05; complete a full hypothesis test. 

*optional*<br>
b) Fit a non linear model of your choosing to the data, describe the predictive power and show if it is statistically signicant at the same alpha level.

In [6]:
import numpy as np
import pandas as pd

def mean_squared_error():
    """
    Calculates the mean squared error (MSE) for a linear regression model.

    This function reads data from a CSV file, extracts the input features (x1, x2, x3, x4) and target values (y),
    and performs linear regression using the normal equation. It calculates the mean squared error (MSE) between
    the actual and predicted values.

    Returns:
        float: The mean squared error (MSE).
    """
    # Read the CSV file containing the data
    data = pd.read_csv('RiskCodingChallenge.csv')

    # Extract the input features (x1, x2, x3, x4) from the data
    X = data[['x1', 'x2', 'x3', 'x4']].values
    # Extract the target values (y) from the data
    y = data['y'].values

    # Add a column of ones to X for the intercept term
    # This is done to include the bias term in the linear regression model
    X = np.hstack((np.ones((X.shape[0], 1)), X))

    def calculate_coefficients(X, y):
        """
        Calculates the coefficients of the linear regression model using the normal equation.

        The normal equation is a closed-form solution for linear regression that minimizes the sum of squared residuals.
        It is computed as: coefficients = (X^T * X)^(-1) * X^T * y, where X is the input features matrix and y is the target values.

        Args:
            X (numpy.ndarray): The input features matrix.
            y (numpy.ndarray): The target values.

        Returns:
            numpy.ndarray: The coefficients of the linear regression model.
        """
        return np.linalg.inv(X.T @ X) @ X.T @ y

    def predict(X, coefficients):
        """
        Makes predictions using the coefficients of the linear regression model.

        The predictions are computed by multiplying the input features matrix (X) with the coefficients.

        Args:
            X (numpy.ndarray): The input features matrix.
            coefficients (numpy.ndarray): The coefficients of the linear regression model.

        Returns:
            numpy.ndarray: The predicted values.
        """
        return X @ coefficients

    def mean_squared_error(y, y_pred):
        """
        Calculates the mean squared error (MSE) between the actual and predicted values.

        The MSE measures the average squared difference between the actual values (y) and the predicted values (y_pred).
        It is computed as: MSE = (1/n) * sum((y - y_pred)^2), where n is the number of samples.

        Args:
            y (numpy.ndarray): The actual target values.
            y_pred (numpy.ndarray): The predicted values.

        Returns:
            float: The mean squared error (MSE).
        """
        return np.mean((y - y_pred) ** 2)

    def r_squared(y, y_pred):
        """
        Calculates the coefficient of determination (R-squared) between the actual and predicted values.

        R-squared measures the proportion of variance in the target values (y) that is predictable from the input features (X).
        It is computed as: R-squared = 1 - (SSR / SST), where SSR is the sum of squared residuals and SST is the total sum of squares.

        Args:
            y (numpy.ndarray): The actual target values.
            y_pred (numpy.ndarray): The predicted values.

        Returns:
            float: The coefficient of determination (R-squared).
        """
        y_mean = np.mean(y)
        ss_total = np.sum((y - y_mean) ** 2)
        ss_residual = np.sum((y - y_pred) ** 2)
        return 1 - (ss_residual / ss_total)

    # Calculate the coefficients of the linear regression model using the normal equation
    coefficients = calculate_coefficients(X, y)

    # Make predictions using the coefficients
    y_pred = predict(X, coefficients)

    # Calculate the mean squared error (MSE) between the actual and predicted values
    mse = mean_squared_error(y, y_pred)
    return mse

def t_cdf(t, df):
    """
    Calculates the cumulative distribution function (CDF) of the Student's t-distribution.

    The CDF of the Student's t-distribution gives the probability that a value from the distribution is less than or equal to a given value (t).
    It is used to calculate p-values for hypothesis testing.

    Args:
        t (float or numpy.ndarray): The t-value(s) at which to evaluate the CDF.
        df (int): The degrees of freedom of the Student's t-distribution.

    Returns:
        float or numpy.ndarray: The CDF value(s) of the Student's t-distribution.
    """
    x = df / (t**2 + df)
    return 0.5 * (1 + np.sign(t) * (1 - np.power(x, 0.5 * df)))

# Load the data from the CSV file
data = pd.read_csv('RiskCodingChallenge.csv')
X = data[['x1', 'x2', 'x3', 'x4']].values
y = data['y'].values

def linear_model(X, y):
    """
    Fits a linear regression model and calculates various statistical metrics.

    This function fits a linear regression model to the input features (X) and target values (y).
    It calculates the coefficients, R-squared, p-values, standard errors, and t-statistics for the model.

    Args:
        X (numpy.ndarray): The input features matrix.
        y (numpy.ndarray): The target values.

    Returns:
        tuple: A tuple containing the following elements:
            - coefficients (numpy.ndarray): The coefficients of the linear regression model.
            - r_squared (float): The coefficient of determination (R-squared) of the model.
            - p_values (numpy.ndarray): The p-values for each coefficient.
            - se_coefficients (numpy.ndarray): The standard errors for each coefficient.
            - t_statistics (numpy.ndarray): The t-statistics for each coefficient.
    """
    n = len(X)
    # Add a column of ones to X for the intercept term
    X_with_intercept = np.column_stack((np.ones(n), X))
    
    # Calculate the coefficients using the normal equation
    coefficients = np.linalg.pinv(X_with_intercept) @ y
    
    # Make predictions using the coefficients
    y_pred = X_with_intercept @ coefficients
    # Calculate the residuals (differences between actual and predicted values)
    residuals = y - y_pred
    
    # Calculate the sum of squared errors (SSE) and total sum of squares (SST)
    sse = np.sum(residuals ** 2)
    sst = np.sum((y - np.mean(y)) ** 2)
    # Calculate the coefficient of determination (R-squared)
    r_squared = 1 - sse / sst
    
    # Calculate the standard error of the estimate
    std_error = np.sqrt(sse / (n - X_with_intercept.shape[1]))
    # Calculate the standard errors of the coefficients
    se_coefficients = std_error * np.sqrt(np.diag(np.linalg.pinv(X_with_intercept.T @ X_with_intercept)))
    
    # Calculate the t-statistics for each coefficient
    t_statistics = coefficients / se_coefficients
    # Calculate the p-values for each coefficient using the t-distribution
    p_values = 2 * (1 - t_cdf(np.abs(t_statistics), n - X_with_intercept.shape[1]))
    
    return coefficients, r_squared, p_values, se_coefficients, t_statistics

# Fit the linear model
coefficients, r_squared, p_values, se_coefficients, t_statistics = linear_model(X, y)

# Create a DataFrame to display the linear model results in a table
linear_model_results = pd.DataFrame({
    'Term': ['Intercept'] + [f'x{i}' for i in range(1, len(coefficients))],
    'Coefficients': coefficients,
    'Standard Error': se_coefficients,
    't-statistics': t_statistics,
    'p-values': p_values
})

print("Linear Model Results:")
print(linear_model_results.to_string(index=False))
print(f"R-squared: {r_squared:.4f}")

# Check if all coefficients are statistically significant at a significance level of 0.05
if np.all(p_values < 0.05):
    print("All coefficients are statistically significant at α = 0.05.")
else:
    print("Not all coefficients are statistically significant at α = 0.05.")

def polynomial_regression(X, y, degree):
    """
    Fits a polynomial regression model and calculates various statistical metrics.

    This function fits a polynomial regression model of a specified degree to the input features (X) and target values (y).
    It calculates the coefficients, R-squared, p-values, standard errors, and t-statistics for the model.

    Args:
        X (numpy.ndarray): The input features matrix.
        y (numpy.ndarray): The target values.
        degree (int): The degree of the polynomial.

    Returns:
        tuple: A tuple containing the following elements:
            - coefficients (numpy.ndarray): The coefficients of the polynomial regression model.
            - r_squared (float): The coefficient of determination (R-squared) of the model.
            - p_values (numpy.ndarray): The p-values for each coefficient.
            - se_coefficients (numpy.ndarray): The standard errors for each coefficient.
            - t_statistics (numpy.ndarray): The t-statistics for each coefficient.
    """
    n = len(X)
    # Generate polynomial features up to the specified degree
    X_poly = np.column_stack([X[:, i] ** j for i in range(X.shape[1]) for j in range(degree + 1)])
    
    # Calculate the coefficients using the normal equation
    coefficients = np.linalg.pinv(X_poly) @ y
    
    # Make predictions using the coefficients
    y_pred = X_poly @ coefficients
    # Calculate the residuals (differences between actual and predicted values)
    residuals = y - y_pred
    
    # Calculate the sum of squared errors (SSE) and total sum of squares (SST)
    sse = np.sum(residuals ** 2)
    sst = np.sum((y - np.mean(y)) ** 2)
    # Calculate the coefficient of determination (R-squared)
    r_squared = 1 - sse / sst
    
    # Calculate the standard error of the estimate
    std_error = np.sqrt(sse / (n - X_poly.shape[1]))
    # Calculate the standard errors of the coefficients
    se_coefficients = std_error * np.sqrt(np.diag(np.linalg.pinv(X_poly.T @ X_poly)))
    
    # Calculate the t-statistics for each coefficient
    t_statistics = coefficients / se_coefficients
    # Calculate the p-values for each coefficient using the t-distribution
    p_values = 2 * (1 - t_cdf(np.abs(t_statistics), n - X_poly.shape[1]))
    
    return coefficients, r_squared, p_values, se_coefficients, t_statistics

# Fit the non-linear model (polynomial regression) with a degree of 2
degree = 2
coefficients, r_squared, p_values, se_coefficients, t_statistics = polynomial_regression(X, y, degree)

# Create a DataFrame to display the polynomial regression results in a table
poly_terms = [f'x{i}^{j}' for i in range(1, X.shape[1] + 1) for j in range(degree + 1)]
poly_model_results = pd.DataFrame({
    'Term': ['Intercept'] + poly_terms[1:],  # Exclude the first term as it's the intercept
    'Coefficients': coefficients,
    'Standard Error': se_coefficients,
    't-statistics': t_statistics,
    'p-values': p_values
})

print("\nPolynomial Regression Results:")
print(poly_model_results.to_string(index=False))
print(f"R-squared: {r_squared:.4f}")

# Check if all coefficients are statistically significant at a significance level of 0.05
if np.all(p_values < 0.05):
    print("All coefficients are statistically significant at α = 0.05.")
else:
    print("Not all coefficients are statistically significant at α = 0.05.")

# Output the first few rows of the data in a table using Pandas
print("\nData Table:")
print(data.head())

# Calculate and print the mean squared error (MSE) for the linear regression model
print("Mean Squared Error (MSE):", mean_squared_error())

Linear Model Results:
     Term  Coefficients  Standard Error  t-statistics  p-values
Intercept     -4.833401        0.509776     -9.481421  0.000000
       x1      6.531192        0.506976     12.882638  0.000000
       x2      2.063319        0.478670      4.310526  0.000129
       x3      6.974013        0.488805     14.267466  0.000000
       x4      7.735885        0.501710     15.419036  0.000000
R-squared: 0.7115
All coefficients are statistically significant at α = 0.05.

Polynomial Regression Results:
     Term  Coefficients  Standard Error  t-statistics     p-values
Intercept      0.732480        0.090821      8.065124 3.259615e-13
     x1^1     -1.108553        0.877346     -1.263531 4.513151e-01
     x1^2      7.792422        0.862161      9.038242 4.440892e-16
     x2^0      0.732480        0.090821      8.065124 3.259615e-13
     x2^1     -6.457947        0.883376     -7.310530 3.393330e-11
     x2^2      8.940539        0.832005     10.745769 0.000000e+00
     x3^0      

__Introduction:__ 
In this analysis, we aim to predict the y values using the risk-challenge data provided. We will fit a linear model to the data and evaluate its predictive power and statistical significance. Additionally, we will explore a non-linear model and compare its performance with the linear model.
***
__Linear Model:__ A linear regression model was fitted to the data using the least squares method. The coefficients of the x values are as follows:
- Intercept: -4.833401
- x1: 6.531192
- x2: 2.063319
- x3: 6.974013
- x4: 7.735885

The model has an R-squared value of 0.7115, indicating that 71.15% of the variability in the y values can be explained by the linear combination of the x values. This suggests that the model has significant predictive power.

To assess the statistical significance of the model's predictive power, a hypothesis test was conducted at an alpha level of 0.05. The p-values for all coefficients are less than 0.05, indicating that each coefficient is statistically significant. Therefore, we reject the null hypothesis that the coefficients are equal to zero and conclude that the linear model has significant predictive power.
***
__Polynomial Regression Model:__ In addition to the linear model, a polynomial regression model was fitted to the data. The model includes quadratic terms for each x variable. The coefficients and their corresponding p-values are shown in the table above.

The polynomial regression model has an R-squared value of 0.9450, indicating that 94.50% of the variability in the y values can be explained by the polynomial terms of the x values. This suggests that the polynomial model has higher predictive power compared to the linear model.

However, not all coefficients in the polynomial model are statistically significant at an alpha level of 0.05. The p-value for the linear term of x1 (x1^1) is greater than 0.05, indicating that it is not statistically significant. All other coefficients have p-values less than 0.05 and are considered statistically significant.
***
__Conclusion:__ Based on the analysis, both the linear and polynomial regression models have predictive power in estimating the y values using the risk-challenge data. The linear model has an R-squared value of 0.7115, and all coefficients are statistically significant at an alpha level of 0.05. The polynomial model has a higher R-squared value of 0.9450, indicating better predictive power. However, not all coefficients in the polynomial model are statistically significant.

The choice between the linear and polynomial model depends on the specific requirements and trade-offs of the problem at hand. The linear model provides a simpler interpretation and ensures that all coefficients are statistically significant. On the other hand, the polynomial model offers higher predictive power but may include some non-significant terms.