# üìä Sales Prediction Using Python

---

**Author:** Piyush Ramteke  
**Organization:** CodSoft  
**Task:** Sales Prediction Using Python (Task 2)  

---

## üéØ Objective

Sales prediction involves forecasting the amount of a product that customers will purchase, taking into account various factors such as **advertising expenditure**, **target audience segmentation**, and **advertising platform selection**.

In this project, we will use **Interactive Visualizations** üìà to:
1. **Explore** the Advertising dataset to understand relations between ad spending and sales
2. **Visualize** key patterns using **Plotly** for dynamic charts
3. **Build** robust regression models to predict sales
4. **Create an Interactive Widget** to simulate ad budget scenarios

## üì¶ Dataset Overview

The dataset contains **200 records** of advertising budgets and sales:

| Feature | Description |
|---------|-------------|
| `TV` | Ad budget for TV (in thousands of $) |
| `Radio` | Ad budget for Radio (in thousands of $) |
| `Newspaper` | Ad budget for Newspaper (in thousands of $) |
| `Sales` | Product sales (in thousands of units) ‚Äî **Target** |

---
## 1Ô∏è‚É£ Importing Libraries üì¶

In [1]:
# Core Libraries
import numpy as np
import pandas as pd
import warnings

# Interactive Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import matplotlib.pyplot as plt

# Interactive Widgets
from ipywidgets import interact, widgets

# Modeling
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

warnings.filterwarnings('ignore')
print('‚úÖ All libraries loaded with Interactive Power! üöÄ')

‚úÖ All libraries loaded with Interactive Power! üöÄ


---
## 2Ô∏è‚É£ Loading Data üìÇ

In [2]:
df = pd.read_csv('Advertising.csv')
print(f'üìê Shape: {df.shape}')
df.head()

üìê Shape: (200, 4)


Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


---
## 3Ô∏è‚É£ Interactive EDA üîç

In [3]:
# ‚îÄ‚îÄ 3.1 Distribution of Features (Interactive) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

@interact(Column=['TV', 'Radio', 'Newspaper', 'Sales'])
def plot_distribution(Column):
    fig = px.histogram(df, x=Column, nbins=30, 
                       title=f'üìä Distribution of {Column}',
                       marginal='box', 
                       color_discrete_sequence=['#636EFA'])
    fig.show()

interactive(children=(Dropdown(description='Column', options=('TV', 'Radio', 'Newspaper', 'Sales'), value='TV'‚Ä¶

In [4]:
# ‚îÄ‚îÄ 3.2 Sales vs Ad Spend (Interactive Trendlines) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

fig = make_subplots(rows=1, cols=3, subplot_titles=('TV vs Sales', 'Radio vs Sales', 'Newspaper vs Sales'))

for i, col in enumerate(['TV', 'Radio', 'Newspaper'], 1):
    # Add scatter trace
    fig.add_trace(go.Scatter(x=df[col], y=df['Sales'], mode='markers', name=col), row=1, col=i)
    
fig.update_layout(title_text="üìà Sales vs Advertising Channels", showlegend=False)
fig.show()

In [5]:
# ‚îÄ‚îÄ 3.3 Correlation Heatmap (Interactive) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

corr = df.corr()
fig = px.imshow(corr, text_auto=True, aspect="auto",
                title='üî• Correlation Heatmap',
                color_continuous_scale='Viridis')
fig.show()

**üí° Insight:** **TV** has the strongest correlation with Sales, followed by **Radio**. **Newspaper** has a weaaak correlation.

In [6]:
# ‚îÄ‚îÄ 3.4 3D Scatter Plot: TV, Radio & Sales ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

fig = px.scatter_3d(df, x='TV', y='Radio', z='Sales',
                    color='Sales', opacity=0.7,
                    title='üßä 3D View: Sales vs TV & Radio',
                    color_continuous_scale='Bluered')
fig.show()

---
## 4Ô∏è‚É£ Feature Engineering & Split ‚úÇÔ∏è

In [7]:
# Create Total Ad Spend
df['Total_Ad_Spend'] = df['TV'] + df['Radio'] + df['Newspaper']

# Select Features
X = df[['TV', 'Radio', 'Newspaper', 'Total_Ad_Spend']]
y = df['Sales']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('‚úÖ Data Split Successfully!')

‚úÖ Data Split Successfully!


---
## 5Ô∏è‚É£ Model Training & Evaluation ü§ñ

In [8]:
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42)
}

results = {}
best_model = None
best_score = -np.inf

print('üöÄ Training Models...\n')

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    r2 = r2_score(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    results[name] = r2
    print(f'üìå {name:20} | R¬≤: {r2:.4f} | RMSE: {rmse:.4f}')
    
    if r2 > best_score:
        best_score = r2
        best_model = model

print(f'\nüèÜ Best Model: {best_model.__class__.__name__} with R¬≤: {best_score:.4f}')

üöÄ Training Models...

üìå Linear Regression    | R¬≤: 0.9059 | RMSE: 1.7052
üìå Random Forest        | R¬≤: 0.9066 | RMSE: 1.6986
üìå Gradient Boosting    | R¬≤: 0.9275 | RMSE: 1.4964

üèÜ Best Model: GradientBoostingRegressor with R¬≤: 0.9275


In [9]:
# ‚îÄ‚îÄ 5.1 Interactive Model Performance Comparison ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

scores_df = pd.DataFrame(list(results.items()), columns=['Model', 'R2_Score'])

fig = px.bar(scores_df, x='R2_Score', y='Model', orientation='h',
             text='R2_Score', color='R2_Score',
             title='üìä Model R¬≤ Score Comparison',
             color_continuous_scale='RdYlGn')
fig.update_traces(texttemplate='%{text:.4f}', textposition='inside')
fig.show()

---
## 6Ô∏è‚É£ Interactive Prediction Simulator üîÆ

Use the sliders below to adjust advertising budgets and predict sales in real-time!

In [10]:
def predict_sales(TV, Radio, Newspaper):
    total_spend = TV + Radio + Newspaper
    input_data = pd.DataFrame([[TV, Radio, Newspaper, total_spend]], 
                              columns=['TV', 'Radio', 'Newspaper', 'Total_Ad_Spend'])
    
    prediction = best_model.predict(input_data)[0]
    
    print(f'üí∞ Predicted Sales: {prediction:.2f} k units')
    print(f'üìä Total Investment: ${total_spend:.2f} k')

# Create Widget
interact(predict_sales, 
         TV=widgets.FloatSlider(min=0, max=300, step=5, value=150, description='üì∫ TV Ad:'),
         Radio=widgets.FloatSlider(min=0, max=100, step=2, value=40, description='üìª Radio Ad:'),
         Newspaper=widgets.FloatSlider(min=0, max=100, step=2, value=25, description='üì∞ News Ad:')
);

interactive(children=(FloatSlider(value=150.0, description='üì∫ TV Ad:', max=300.0, step=5.0), FloatSlider(value‚Ä¶

---