<a href="https://colab.research.google.com/github/Devesh42508/50startup/blob/main/startup_50.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [15]:
data = pd.read_csv('/content/50_Startups.csv')
data.columns

Index(['R&D Spend', 'Administration', 'Marketing Spend', 'Profit'], dtype='object')

In [16]:
X = data[['R&D Spend', 'Administration','Marketing Spend']]
y = data['Profit']

In [10]:
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.2)

In [17]:
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

In [18]:
dt_model = DecisionTreeRegressor(random_state=42)
dt_model.fit(X_train, y_train)

In [19]:
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

R^2 =1: The model perfectly explains the variance in the target variable. All data points fall exactly on the regression line.

R^2 =0: The model does not explain any of the variance in the target variable. It provides no improvement over using the mean of the target values to make predictions.

TSS=∑
i=1
n
​
 (y
i
​
 −
y
'
​
 )^
2

RSS=∑
i=1
n
​
 (y
i
​
 −
y
​
  )^
2

R^2
 =1−
(TSS/RSS)
​


In [20]:
def evaluate_model(model, X, y):
  y_pred = model.predict(X)
  mse = mean_squared_error(y, y_pred)
  r2 = r2_score(y, y_pred)
  return mse, r2

In [22]:
lr_mse, lr_r2 = evaluate_model(lr_model, X_test, y_test)
dt_mse, dt_r2 = evaluate_model(dt_model, X_test, y_test)
rf_mse, rf_r2 = evaluate_model(rf_model, X_test, y_test)

In [23]:
print("Linear Regression:")
print(f"Mean Squared Error: {lr_mse}")
print(f"R-squared: {lr_r2}")

Linear Regression:
Mean Squared Error: 230472975.5243033
R-squared: 0.8927432593100261


In [25]:
print("Decision Tree Regression:")
print(f"Mean Squared Error: {dt_mse}")
print(f"R-squared: {dt_r2}")

Decision Tree Regression:
Mean Squared Error: 465881687.68864
R-squared: 0.7831895420495464


In [26]:
print("Random Forest Regression:")
print(f"Mean Squared Error: {rf_mse}")
print(f"R-squared: {rf_r2}")

Random Forest Regression:
Mean Squared Error: 368587243.8657841
R-squared: 0.8284681041366787


In [27]:
best_model = None
best_r2 = -float('inf')

In [28]:
models = {'Linear Regression': lr_model, 'Decision Tree Regression': dt_model, 'Random Forest Regression': rf_model}

In [29]:
models.items()

dict_items([('Linear Regression', LinearRegression()), ('Decision Tree Regression', DecisionTreeRegressor(random_state=42)), ('Random Forest Regression', RandomForestRegressor(random_state=42))])

In [33]:
for model_name, model in models.items():
    mse, r2 = evaluate_model(model, X_test, y_test)
    if r2 > best_r2:
        best_model = model_name
        best_r2 = r2

In [35]:
print(f"The best model is {best_model} with R-squared:{best_r2}")

The best model is Linear Regression with R-squared:0.8927432593100261
