1.
Simple Linear Regression (SLR) models the relationship between a single continuous predictor and an outcome with the equation:

Outcome = β₀ + β₁ * Predictor

In contrast, Multiple Linear Regression (MLR) extends SLR by incorporating multiple predictors, including both continuous and indicator (binary) variables, such as:

Outcome = β₀ + β₁ * Predictor₁ + β₂ * Indicator

This allows MLR to control for several factors simultaneously, enhancing accuracy and providing a more comprehensive analysis. When an indicator variable is added alongside a continuous variable, the model adjusts the baseline (β₀) and accounts for category-specific effects (β₂). Introducing an interaction term, like:

Outcome = β₀ + β₁ * Predictor + β₂ * Indicator + β₃ * Predictor * Indicator

enables the effect of the continuous predictor to vary by category. Additionally, for a non-binary categorical variable with k categories, MLR uses k-1 indicator variables (e.g., Outcome = β₀ + β₁ * Indicator₁ + β₂ * Indicator₂ + ... + βₖ₋₁ * Indicatorₖ₋₁) to represent the categories relative to a baseline group. This approach avoids multicollinearity and allows comparisons across groups. Overall, MLR provides greater flexibility and deeper insights compared to SLR by effectively handling multiple and categorical predictors.

2.
Without Interaction (Continuous Variables): Sales = β₀ + β₁ * TV_Spend + β₂ * Online_Spend

Assumes each advertising type independently affects sales.
With Interaction (Continuous Variables): Sales = β₀ + β₁ * TV_Spend + β₂ * Online_Spend + β₃ * TV_Spend * Online_Spend

Allows the effect of one ad type to change based on the amount spent on the other.
Using these models, predictions without interaction assume the combined effect is just the sum of each ad type's effect. With interaction, the combined effect can be greater or lesser, showing how the ads work together.

If advertising budgets are categorized as "high" or "low" (binary):

Without Interaction (Binary Variables): Sales = β₀ + β₁ * High_TV + β₂ * High_Online

Compares each high category to the baseline (low TV and low online).
With Interaction (Binary Variables): Sales = β₀ + β₁ * High_TV + β₂ * High_Online + β₃ * High_TV * High_Online

Captures the additional effect when both TV and online advertising are high.

3.

4.
The apparent contradiction between a low R-squared (17.6%) and large, significant coefficients arises because R-squared and p-values assess different aspects of the model. R-squared measures the overall proportion of variability in the outcome (HP) that the model explains. A low R-squared indicates that, collectively, the predictors (like Sp. Def and Generation) account for only a small portion of the variation in HP.

On the other hand, p-values evaluate the significance of each individual coefficient, indicating whether each predictor has a statistically meaningful relationship with the outcome. Large coefficients with strong p-values mean that each predictor reliably affects HP when considered alone or alongside others, but these individual effects might not add up to explain much of the total variability in the data.

In summary, while the predictors significantly influence HP individually (as shown by the significant coefficients), they do not collectively capture much of the overall variation in HP, leading to a low R-squared. This highlights that significant predictors do not necessarily result in a model with high explanatory power.

5.

6.

7.
Model5 extends Model3 and Model4 by incorporating significant predictors to improve the model's explanatory power. The equation for Model5 is:

HP = β₀ + β₁·Attack + β₂·Defense + β₃·Speed + β₄·Legendary + β₅·Sp.Def + β₆·Sp.Atk + β₇·(Generation) + β₈·(Type 1) + β₉·(Type 2)

Model6 refines Model5 by selecting only the most impactful predictors and reducing complexity to address potential multicollinearity. The equation for Model6 is:

HP = β₀ + β₁·Attack + β₂·Speed + β₃·Sp.Def + β₄·Sp.Atk + β₅·Indicator(Type 1 = "Normal") + β₆·Indicator(Type 1 = "Water") + β₇·Indicator(Generation = 2) + β₈·Indicator(Generation = 5)

Model7 builds on Model6 by adding interaction terms among continuous variables to capture synergistic effects and applies centering and scaling to mitigate multicollinearity. The equation for Model7 is:

HP = β₀ + (β₁·Attack × β₂·Speed × β₃·Sp.Def × β₄·Sp.Atk) + β₅·Indicator(Type 1 = "Normal") + β₆·Indicator(Type 1 = "Water") + β₇·Indicator(Generation = 2) + β₈·Indicator(Generation = 5)

In summary, each model progressively enhances predictive accuracy by adding significant variables, refining selections, and incorporating interactions, while addressing multicollinearity through centering and scaling in the final model.

8.

9.
The illustration highlights the trade-off between model complexity and reliability in regression analysis. Model7_fit is more complex than Model6_fit, incorporating additional interaction terms and scaled variables. While Model7_fit shows improved "out of sample" R-squared, indicating better predictive performance on new data, it also has many coefficients with weaker statistical significance. This complexity increases the risk of overfitting, where the model captures noise specific to the training data rather than general patterns. In contrast, Model6_fit is simpler, with stronger and more consistent coefficient significance, enhancing its interpretability and generalizability. Despite Model7_fit performing better in some metrics, Model6_fit is often preferred because it strikes a better balance between predictive accuracy and the ability to reliably interpret the effects of predictors without being overly tailored to the training data.