# Python Coding Exercise for Module 2, Lesson 1: Linear Regression

Write a Python function that performs linear regression on a given dataset and prints out the regression coefficients, p-values for each coefficient, and the R-squared value of the model.

Your function should accept two parameters: X, a 2D list or NumPy array with the predictor variables, and y, a 1D list or NumPy array with the dependent variable.

### Your tasks are:

- Fit a linear regression model using the provided X and y.
- Print the coefficients of the model.
- Perform a statistical test on each coefficient to determine if it is significantly different from zero, and print the p-value for each test.
- Print the R-squared value of the model.

### Constraints:

- Use the statsmodels library for statistical calculations.
- Assume that X and y are already preprocessed (no missing values, all numeric data, etc.).

In [1]:
import numpy as np
import statsmodels.api as sm

In [2]:
def linear_regression_analysis(X, y):
    # Adding a constant to get an intercept
    X = sm.add_constant(X)
    
    # Fitting the regression model
    model = sm.OLS(y, X).fit()
    
    # Getting the coefficients and p-values
    coefficients = model.params
    summary = model.summary()
    r_squared = model.rsquared
    
    # Printing the results
    print('Coefficients:', coefficients)
    print('R-squared:', r_squared)
    print(summary)
    

In [3]:
# Sample input
X = [[1, 2], [2, 1], [2, 3], [3, 4]]
y = [2, 3, 5, 4]

# Function call
linear_regression_analysis(X, y)

Coefficients: [1.33333333 0.66666667 0.33333333]
R-squared: 0.46666666666666656
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.467
Model:                            OLS   Adj. R-squared:                 -0.600
Method:                 Least Squares   F-statistic:                    0.4375
Date:                Thu, 09 Nov 2023   Prob (F-statistic):              0.730
Time:                        14:37:35   Log-Likelihood:                -4.8648
No. Observations:                   4   AIC:                             15.73
Df Residuals:                       1   BIC:                             13.89
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------

  warn("omni_normtest is not valid with less than 8 observations; %i "


1. The intercept is 1.3333 and the coefficients are $\beta_1 = 0.6667$ and $\beta_2 = 0.3333$.
2. None of the coefficients were statistically significant. So this likely isn't a good candidate for linear regression.
3. The coefficient of determination is 0.4667.