# Machine Learninig Project

# Startup Profit Prediction

This notebook explains **each step** of building, training, saving, and deploying a Machine Learning model.


## 1. Import Required Libraries
We import libraries needed for data handling, visualization, preprocessing, modeling, and saving the model.

In [95]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 2. Load Dataset

In [96]:
df= pd.read_csv(r"C:\Users\PMLS\OneDrive\Desktop\ml\50_Startups (1).csv")

## 3. Basic Data Exploration
Understanding dataset shape, columns, and missing values.

In [97]:
df.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


In [98]:
df.describe()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit
count,50.0,50.0,50.0,50.0
mean,73721.6156,121344.6396,211025.0978,112012.6392
std,45902.256482,28017.802755,122290.310726,40306.180338
min,0.0,51283.14,0.0,14681.4
25%,39936.37,103730.875,129300.1325,90138.9025
50%,73051.08,122699.795,212716.24,107978.19
75%,101602.8,144842.18,299469.085,139765.9775
max,165349.2,182645.56,471784.1,192261.83


In [99]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   R&D Spend        50 non-null     float64
 1   Administration   50 non-null     float64
 2   Marketing Spend  50 non-null     float64
 3   State            50 non-null     object 
 4   Profit           50 non-null     float64
dtypes: float64(4), object(1)
memory usage: 2.1+ KB


In [100]:
df.isnull().sum()

R&D Spend          0
Administration     0
Marketing Spend    0
State              0
Profit             0
dtype: int64

In [101]:
df.shape

(50, 5)

In [102]:
df.columns

Index(['R&D Spend', 'Administration', 'Marketing Spend', 'State', 'Profit'], dtype='object')

##  Handle Categorical Data
We apply One-Hot Encoding to the `State` column.

In [103]:
df_encoded = pd.get_dummies(df, drop_first=True)

In [104]:
df_encoded.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit,State_Florida,State_New York
0,165349.2,136897.8,471784.1,192261.83,False,True
1,162597.7,151377.59,443898.53,191792.06,False,False
2,153441.51,101145.55,407934.54,191050.39,True,False
3,144372.41,118671.85,383199.62,182901.99,False,True
4,142107.34,91391.77,366168.42,166187.94,True,False


##  Feature & Target Separation
Profit is our target variable.

In [105]:
X = df_encoded.drop('Profit', axis=1)
y = df_encoded['Profit']

##  Train-Test Split

In [106]:
from sklearn.model_selection import train_test_split

In [107]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [108]:
from sklearn.preprocessing import StandardScaler
scaler= StandardScaler()

In [109]:
from sklearn.tree import DecisionTreeRegressor
dt= DecisionTreeRegressor()

## Feature Scaling
Scaling ensures equal contribution of features.

In [110]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

##  Model Training
 train a Decision Tree Regressor.

In [111]:
dt.fit(X_train_scaled, y_train)

In [112]:
predict= dt.predict(X_test_scaled)

In [113]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [114]:
mae=mean_absolute_error
mse=mean_squared_error
r2=r2_score

## Model Evaluation
Using R² score to evaluate performance.

In [115]:
mae(y_test, predict)

8705.371999999998

In [116]:
mse(y_test, predict)

129310587.81397995

In [117]:
r2(y_test, predict)

0.8403163083287633

In [118]:
import joblib

##  Save Model & Preprocessing Objects
These files are required for deployment (Streamlit, Flask, FastAPI).

In [119]:
# Save model
joblib.dump(dt, "D:\\ML project\\model.pkl")
print('Model saved successfully')

Model saved successfully


In [122]:
import joblib

joblib.dump(dt, "model.pkl")
joblib.dump(scaler, "scaler.pkl")
joblib.dump(X.columns.tolist(), "columns.pkl")

print("Model, scaler, and columns saved")


Model, scaler, and columns saved


# Deployment of Gardio 

In [121]:
import gradio as gr
import joblib
import pandas as pd

model = joblib.load("model.pkl")
scaler = joblib.load("scaler.pkl")
columns = joblib.load("columns.pkl")

def predict_profit(rd, admin, marketing, state):
    data = {
        "R&D Spend": rd,
        "Administration": admin,
        "Marketing Spend": marketing,
        "State": state
    }

    df = pd.DataFrame([data])
    df = pd.get_dummies(df)

    for col in columns:
        if col not in df.columns:
            df[col] = 0

    df = df[columns]
    scaled = scaler.transform(df)
    prediction = model.predict(scaled)

    return f"Predicted Profit: ${prediction[0]:,.2f}"

app = gr.Interface(
    fn=predict_profit,
    inputs=[
        gr.Number(label="R&D Spend"),
        gr.Number(label="Administration"),
        gr.Number(label="Marketing Spend"),
        gr.Dropdown(["New York", "California", "Florida"], label="State")
    ],
    outputs="text",
    title="Startup Profit Prediction"
)

app.launch()


* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.


