<a href="https://colab.research.google.com/github/MohdHassan7721/Customer-Segmentation-Analysis/blob/main/Revenue%20Forecasting%20%26%20Prediction%20Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Revenue Prediction using Linear Regression
Create an interactive system to predict company revenue with Linear Regression, using variables like expenses and employee count. Includes data preprocessing, user input for predictions, and performance metrics (MAE, RMSE, R²).


Approach to Solving the Problem

1. Dataset Understanding
Use data.csv as the input dataset.
Target variable: Revenue
Input features:
Numerical: Marketing_Spend, R&D_Spend, Administration_Costs, Number_of_Employees
Categorical: Region
Dataset remains static and is used for both training and evaluation.
2. Feature–Target Separation
Separate the dataset into:
X (features) → all columns except Revenue
y (target) → Revenue
This separation is mandatory before preprocessing and model training.
3. Data Preprocessing
Apply preprocessing using a ColumnTransformer.
Numerical Features
Handle missing values using mean imputation.
Apply StandardScaler to normalize values.
Categorical Features
Handle missing values using most frequent category.
Apply One-Hot Encoding to convert Region into numeric columns.
Ignore unknown categories during prediction.
4. Pipeline Construction
Combine preprocessing and model training into a single pipeline.
Pipeline flow:
Preprocessing step
Linear Regression model
This ensures consistent transformations during training and prediction.
5. Train–Test Split
Split the dataset into:
Training set
Test set
Use a fixed random state for reproducibility.
6. Model Training
Train the Linear Regression model using the training dataset.
Model learns the relationship between business inputs and revenue.
7. Model Evaluation
Evaluate model performance on the test dataset.
Display the following metrics:
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
R-squared (R²)
These metrics indicate prediction accuracy and model reliability.
8. User Input Handling
Accept user input for:
Marketing Spend
R&D Spend
Administration Costs
Number of Employees
Region
Validate numeric and categorical inputs before prediction.
9. Revenue Prediction
Convert user input into a DataFrame.
Pass input through the trained pipeline.
Display predicted revenue in readable format.
10. Continuous Interaction Loop
Allow users to make multiple predictions in a loop.
Provide an option to exit the program cleanly.
11. Program Termination
Exit the program when the user chooses to stop.
Ensure no further predictions are made after exit

##Business Revenue Prediction using Linear Regression:

Objective:

The objective of this project is to predict company revenue based on key business drivers such as:

* Marketing Spend

* R&D Spend

* Administrative Costs

* Number of Employees

* Business Region

This helps organizations:

* Forecast revenue

* Optimize spending decisions

* Understand the impact of different business factors

##Step 1: Import Required Libraries
* pandas → data loading and manipulation.

* numpy → numerical computations.

Explanation

* train_test_split → model validation

* LinearRegression → regression algorithm

* MAE, RMSE, R² → model evaluation

* StandardScaler → feature scaling

* OneHotEncoder → categorical encoding

* ColumnTransformer + Pipeline → clean, production-ready preprocessing

In [None]:
import pandas as pd
import numpy as np


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer


##Step 2: Load and Preprocess the Dataset.
***Machine learning models require:***

* Clean data

* Numerical inputs

* Proper handling of missing values.

***Identify Numerical and Categorical Features:***

* Numerical → scaling required.

* Categorical → encoding required.

***Numerical Data Preprocessing Pipeline.***

Linear Regression is sensitive to:

* Missing values.

* Feature scale.

**Explanation:**

* Mean imputation → handles missing numerical values

* Standard scaling → ensures equal feature contribution

***Categorical Data Preprocessing Pipeline.***

**Explanation:**

* Most frequent imputation → handles missing categories

* One-Hot Encoding → converts regions into binary features

* handle_unknown='ignore' → prevents runtime errors

In [None]:
# Step 2: Load and preprocess the dataset
def load_and_preprocess(file_path):
    data = pd.read_csv(file_path)

    # Separate features and target
    X = data.drop('Revenue', axis=1)
    y = data['Revenue']

    # Identify columns
    numerical_features = [
        'Marketing_Spend',
        'R&D_Spend',
        'Administration_Costs',
        'Number_of_Employees'
    ]

    categorical_features = ['Region']

    # Numerical preprocessing
    numerical_pipeline = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='mean')),
        ('scaler', StandardScaler())
    ])

    # Categorical preprocessing
    categorical_pipeline = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])

    # Combine preprocessing
    preprocessor = ColumnTransformer(transformers=[
        ('num', numerical_pipeline, numerical_features),
        ('cat', categorical_pipeline, categorical_features)
    ])

    return X, y, preprocessor


##Step 3: Train the Linear Regression Model.
 **Explanation:**

* Preprocessing + model combined into one pipeline

* Prevents data leakage

* Makes deployment easier

In [None]:
def train_model(X_train, y_train, preprocessor):
    model = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('regressor', LinearRegression())
    ])

    model.fit(X_train, y_train)
    return model


##Step 4: Model Evaluation.
* MAE → average prediction error.

* RMSE → penalizes large errors.

* R² Score → variance explained by the model.

In [None]:
# Step 4: Evaluate model performance
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)

    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)

    print("\nModel Evaluation Metrics:")
    print(f"MAE  : {mae:.2f}")
    print(f"RMSE : {rmse:.2f}")
    print(f"R\u00b2   : {r2:.3f}")

##Step 5: Revenue Prediction for New Inputs.
 **Explanation:**

* Accepts dynamic user inputs

* Applies same preprocessing automatically

* Outputs predicted revenue

In [None]:
 #Step 5: Predict revenue for new input
def predict_revenue(model, user_input):
    input_df = pd.DataFrame([user_input])
    prediction = model.predict(input_df)
    print(f"\nPredicted Revenue: {prediction[0]:.2f}")

##Step 6: Train-Test Split and Execution Flow.
* 80% training

* 20% testing

* Reproducible results

In [None]:
# Step 6: Main program
def main():
    file_path = "Revenue.csv"  # upload this file if using Colab

    X, y, preprocessor = load_and_preprocess(file_path)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    # Train model
    model = train_model(X_train, y_train, preprocessor)

    # Evaluate model
    evaluate_model(model, X_test, y_test)

    # Interactive prediction loop
    while True:
        print("\nEnter business details (or type 'exit' to stop):")

        choice = input("Continue? (yes/exit): ").lower()
        if choice == 'exit':
            print("Program terminated.")
            break

        user_input = {
            'Marketing_Spend': float(input("Marketing Spend: ")),
            'R&D_Spend': float(input("R&D Spend: ")),
            'Administration_Costs': float(input("Administration Costs: ")),
            'Number_of_Employees': int(input("Number of Employees: ")),
            'Region': input("Region (North America / Europe / Asia): ")
        }

        predict_revenue(model, user_input)


if __name__ == "__main__":
    main()

FileNotFoundError: [Errno 2] No such file or directory: 'Revenue.csv'

## Step 7: File Upload and Main Program Execution Control

This code block ensures that the `Revenue.csv` file, which is crucial for the model, is available in the Colab environment.

- It first checks if `Revenue.csv` already exists in the current directory.
- If the file is *not* found, it prompts the user to upload it using `files.upload()`.
- After a successful upload (or if the file already existed), it proceeds to call the `main()` function, which orchestrates the entire revenue prediction workflow (data loading, preprocessing, model training, evaluation, and interactive prediction).

In [None]:
import os
from google.colab import files

file_name_revenue = 'Revenue.csv'

# Check if the file already exists
if not os.path.exists(file_name_revenue):
    print(f"'{file_name_revenue}' not found. Please upload the file.")
    try:
        # Prompt user to upload the file
        uploaded = files.upload()

        if file_name_revenue not in uploaded:
            print(f"Error: '{file_name_revenue}' was not uploaded. Please ensure you select the correct file.")
        else:
            print(f"'{file_name_revenue}' uploaded successfully. Running the main function for Revenue Prediction...")
            # Call the main function defined in the notebook for Revenue Prediction (cell 5NCgfYhychDY)
            # Ensure the main function for Revenue Prediction is called from its respective cell's scope or properly imported
            main() # This calls the main function from the Revenue Prediction script
    except Exception as e:
        print(f"An error occurred during file upload: {e}")
else:
    print(f"'{file_name_revenue}' already exists. Running the main function for Revenue Prediction...")
    # Call the main function defined in the notebook for Revenue Prediction
    main() # This calls the main function from the Revenue Prediction script

'Revenue.csv' not found. Please upload the file.


Saving Revenue.csv to Revenue.csv
'Revenue.csv' uploaded successfully. Running the main function for Revenue Prediction...

Model Evaluation Metrics:
MAE  : 6648.40
RMSE : 8363.06
R²   : 0.933

Enter business details (or type 'exit' to stop):
Continue? (yes/exit): exit
Program terminated.
