# 1. Importing Required Libraries

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

💫We import all the necessary libraries for our analysis :-

**pandas** helps us work with data in table format (DataFrames)

**numpy** provides mathematical functions and array operations

**matplotlib** allows us to create visualizations and plots

**sklearn (scikit-learn)** contains machine learning tools for model training and evaluation

# 2. Defining the Linear Regression Function

In [12]:
def linear_regression_from_csv(csv_file, feature_cols, target_col):
    """
    Perform linear regression from CSV file
    
    Parameters:
    csv_file (str): Path to CSV file
    feature_cols (list): List of feature column names
    target_col (str): Target column name
    """

💫 We create a function that will perform all our linear regression steps

This function takes three inputs:

The path to our data file **CSV format**

The column names we want to use as features **predictors**

The column name we want to predict **target variable**



# 3. Loading and Preparing the Data

In [13]:
    # Load data
    df = pd.read_csv("Salary_dataset.csv")
    
    # Prepare features and target
    X = df["YearsExperience"]
    y = df["Salary"]

 **pd.read_csv()** reads our data file and converts it into a DataFrame (like a spreadsheet)

We separate our data into:

X: The features **(input variables)** we'll use to make predictions

y: The target **(output variable)** we want to predict

# 4. Splitting Data into Training and Testing Sets

In [None]:
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

💫 We divide our data into two parts:

Training set (80%): **Used to teach our model patterns in the data**

Testing set (20%): **Used to evaluate how well our model performs on new, unseen data**

random_state=42 **ensures we get the same split every time (for reproducibility)**

# 5. Training the Linear Regression Model

In [None]:
    # Train model
    model = LinearRegression()
    model.fit(X_train, y_train)

💫 We create a Linear Regression model object

The **fit()** method **trains the model using our training data**

The model learns the relationship between our features (X_train) and target (y_train)

# 6. Making Predictions with Our Model

In [None]:
    # Make predictions
    y_pred = model.predict(X_test)

💫 We use our trained model to predict values for our test features

**y_pred** contains the model's predictions for **what the target values should be**

We'll compare these predictions to the actual values (y_test) to evaluate performance

# 7. Evaluating Model Performance

In [None]:
    # Calculate metrics
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

💫 We calculate two important metrics to evaluate our model:

**Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values (lower is better)**

**R² Score: Measures how well our model explains the variation in the data (closer to 1 is better)**

# 8. Displaying Results

In [None]:
    # Print results
    print("="*50)
    print("LINEAR REGRESSION RESULTS")
    print("="*50)
    print(f"Dataset: {csv_file}")
    print(f"Features: {feature_cols}")
    print(f"Target: {target_col}")
    print(f"Mean Squared Error: {mse:.4f}")
    print(f"R² Score: {r2:.4f}")
    print(f"Coefficients: {model.coef_}")
    print(f"Intercept: {model.intercept_:.4f}")

We print a summary of our analysis results

**The coefficients show how much each feature affects the target**

**The intercept is the predicted value when all features are zero**

# 9. Visualizing Results (For Single Feature)

In [None]:
    # Plot results for single feature
    if len(feature_cols) == 1:
        plt.figure(figsize=(10, 6))
        plt.scatter(X_test, y_test, color='blue', alpha=0.6, label='Actual')
        plt.scatter(X_test, y_pred, color='red', alpha=0.6, label='Predicted')
        plt.plot(X_test, y_pred, color='red', linewidth=2)
        plt.xlabel(feature_cols[0])
        plt.ylabel(target_col)
        plt.title('Linear Regression Results')
        plt.legend()
        plt.grid(True)
        plt.show()
    
    return model, mse, r2

If we're using only one feature, we create a visualization

**Blue dots show the actual values from our test data**

**Red dots and line show our model's predictions**

This helps us see how well our line fits the data

# 10. Example Usage

In [None]:
# Example usage
if __name__ == "__main__":
    # Replace with your CSV file path
    csv_file = "your_data.csv"
    
    # Replace with your column names
    feature_columns = ["feature_column"]  # Can be list of multiple columns
    target_column = "target_column"
    
    # Run linear regression
    model, mse, r2 = linear_regression_from_csv(csv_file, feature_columns, target_column)

This section shows how to use our function with your own data

Replace the placeholder values with your actual:

CSV file path

Feature column name(s)

Target column name

When you run the script, it will execute the linear regression analysis

# How to Use This Code:

Prepare your data in a CSV file with clear column names

Update the example usage section with your specific file path and column names

Run the script to perform linear regression on your data

Interpret the results:

Lower MSE means better predictions

R² closer to 1 means the model explains more of the variance

Coefficients show the relationship direction and strength for each feature

This approach helps you understand the relationship between variables and make predictions based on historical data patterns.