# Advanced iPhone Price Prediction Chatbot

**Enhanced Features:**
- ✅ GitHub real data + synthetic data combined
- ✅ ML model price predictions
- ✅ Gemini LLM price predictions
- ✅ Gemini-powered optimal price analysis
- ✅ Market & technical analysis from Gemini
- ✅ Dashboard tracking with auto-update
- ✅ Real-time predictions with timestamps

---

## 1. Install Dependencies

In [23]:
!pip install google-generativeai pandas numpy scikit-learn joblib -q
print("✅ All dependencies installed")

✅ All dependencies installed


## 2. Load Saved ML Model & Artifacts

In [24]:
import joblib
import pickle
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

print("📦 Loading ML model artifacts...")

model = joblib.load('iphone_price_prediction_model_random_forest.pkl')
scaler = joblib.load('price_prediction_scaler.pkl')

with open('price_prediction_encoders.pkl', 'rb') as f:
    encoders = pickle.load(f)
le_model = encoders['model_encoder']
le_source = encoders['source_encoder']

with open('price_prediction_features.pkl', 'rb') as f:
    feature_info = pickle.load(f)
all_features = feature_info['all_features']
numerical_features = feature_info['numerical_features']

print(f"✅ ML model loaded: {len(all_features)} features")
print(f"✅ Encoders loaded")
print(f"✅ Scaler loaded")

📦 Loading ML model artifacts...
✅ ML model loaded: 33 features
✅ Encoders loaded
✅ Scaler loaded


## 3. Clone GitHub & Load Real Data

In [25]:
import subprocess
import os
import sqlite3

print("📥 Cloning GitHub repository...")

if not os.path.exists('Real-time-competitor-strategy-tracker'):
    subprocess.run(['git', 'clone', 'https://github.com/Techierookies/Real-time-competitor-strategy-tracker.git'],
                   capture_output=True, timeout=60)
    print("✅ Repository cloned")
else:
    print("✅ Repository already exists")

# Load synthetic data first
df = pd.read_csv('enhanced_synthetic_dataset_with_timestamps.csv')
df['Scraped_At'] = pd.to_datetime(df['Scraped_At'])
print(f"✅ Loaded synthetic data: {len(df)} records")

# Try to load real data from GitHub
try:
    db_path = 'Real-time-competitor-strategy-tracker/competitor_tracker.db'
    if os.path.exists(db_path):
        conn = sqlite3.connect(db_path)
        real_df = pd.read_sql_query("""
            SELECT * FROM raw_scrapes
            WHERE model IN ('iPhone 15', 'iPhone 16', 'iPhone 17')
            LIMIT 100
        """, conn)
        conn.close()

        if len(real_df) > 0:
            real_df = real_df.rename(columns={
                'model': 'Model',
                'site': 'Source',
                'price': 'Price',
                'reviews': 'Reviews',
                'rating': 'Rating',
                'url': 'URL'
            })
            if 'Scraped_At' not in real_df.columns:
                real_df['Scraped_At'] = datetime.now()
            else:
                real_df['Scraped_At'] = pd.to_datetime(real_df['Scraped_At'])

            df = pd.concat([df, real_df], ignore_index=True)
            print(f"✅ Loaded real data: {len(real_df)} records")
            print(f"✅ Combined dataset: {len(df)} total records")
        else:
            print("⚠️ No real data found, using synthetic only")
    else:
        print(f"⚠️ Database file not found at {db_path}")
except Exception as e:
    print(f"⚠️ Error loading real data: {e}")

print(f"\n📊 Final dataset: {len(df)} records from {df['Model'].nunique()} models")
print(f"   Models: {df['Model'].unique().tolist()}")
print(f"   Sources: {df['Source'].unique().tolist()}")

📥 Cloning GitHub repository...
✅ Repository already exists
✅ Loaded synthetic data: 2500 records
✅ Loaded real data: 18 records
✅ Combined dataset: 2518 total records

📊 Final dataset: 2518 records from 3 models
   Models: ['iPhone 15', 'iPhone 16', 'iPhone 17']
   Sources: ['Amazon', 'Flipkart']


## 4. Initialize Gemini LLM

In [26]:
import google.generativeai as genai

GEMINI_API_KEY = "AIzaSyDmolAXZRas_1fFAvjIvOdeAfE1HQrIXM8"
genai.configure(api_key=GEMINI_API_KEY)

LATEST_MODEL = 'models/gemini-2.5-flash'

try:
    llm = genai.GenerativeModel(LATEST_MODEL)
    test = llm.generate_content("Say hi")
    print(f"✅ Gemini LLM ready: {LATEST_MODEL}")
except Exception as e:
    print(f"❌ LLM failed: {e}")
    llm = None

✅ Gemini LLM ready: models/gemini-2.5-flash


## 5. ML Model Prediction Function

In [27]:
def predict_iphone_price(model_name, source, rating=4.2, review_text="Good phone", target_date=None):
    if target_date is None:
        target_date = datetime.now()

    pred = pd.DataFrame({
        'Model': [model_name],
        'Source': [source],
        'Rating': [rating],
        'Reviews': [review_text],
        'Scraped_At': [target_date]
    })

    pred['Year'] = pred['Scraped_At'].dt.year
    pred['Month'] = pred['Scraped_At'].dt.month
    pred['Day'] = pred['Scraped_At'].dt.day
    pred['Hour'] = pred['Scraped_At'].dt.hour
    pred['DayOfWeek'] = pred['Scraped_At'].dt.dayofweek
    pred['DayOfYear'] = pred['Scraped_At'].dt.dayofyear
    pred['WeekOfYear'] = pred['Scraped_At'].dt.isocalendar().week
    pred['Quarter'] = pred['Scraped_At'].dt.quarter
    pred['DaysAgo'] = (df['Scraped_At'].max() - pred['Scraped_At']).dt.days

    pred['IsWeekend'] = pred['DayOfWeek'].isin([5, 6]).astype(int)
    pred['IsHolidaySeason'] = pred['Month'].isin([11, 12]).astype(int)
    pred['IsLaunchSeason'] = pred['Month'].isin([9, 10]).astype(int)
    pred['IsSummerSeason'] = pred['Month'].isin([4, 5, 6]).astype(int)

    pred['ReviewLength'] = pred['Reviews'].str.len()
    pred['ReviewWordCount'] = pred['Reviews'].str.split().str.len()
    pred['HasExclamation'] = pred['Reviews'].str.contains('!').astype(int)
    pred['HasQuestion'] = pred['Reviews'].str.contains('\\?').astype(int)

    pred['Model_Encoded'] = le_model.transform([model_name])[0]
    pred['Source_Encoded'] = le_source.transform([source])[0]

    day_name = target_date.strftime('%A')
    month_name = target_date.strftime('%B')

    try:
        pred['DayName_Encoded'] = le_day_name.transform([day_name])[0]
    except:
        pred['DayName_Encoded'] = 3

    try:
        pred['MonthName_Encoded'] = le_month_name.transform([month_name])[0]
    except:
        pred['MonthName_Encoded'] = pred['Month'].iloc[0]

    hour = pred['Hour'].iloc[0]
    if 6 <= hour < 12: tod = 'Morning'
    elif 12 <= hour < 18: tod = 'Afternoon'
    elif 18 <= hour < 22: tod = 'Evening'
    else: tod = 'Night'

    try:
        pred['TimeOfDay_Encoded'] = le_time_of_day.transform([tod])[0]
    except:
        pred['TimeOfDay_Encoded'] = 1

    pred['Model_Source_Interaction'] = pred['Model_Encoded'] * pred['Source_Encoded']
    pred['Rating_Month_Interaction'] = pred['Rating'] * pred['Month']
    pred['Rating_ReviewLength_Interaction'] = pred['Rating'] * pred['ReviewLength']

    model_data = df[df['Model'] == model_name]
    pred['Model_Price_mean'] = model_data['Price'].mean()
    pred['Model_Price_std'] = model_data['Price'].std()
    pred['Model_Price_min'] = model_data['Price'].min()
    pred['Model_Price_max'] = model_data['Price'].max()
    pred['Model_Price_median'] = model_data['Price'].median()

    pred['Price_7Day_MA'] = model_data['Price'].tail(7).mean()
    pred['Price_30Day_MA'] = model_data['Price'].tail(30).mean()

    X_pred = pred[all_features].fillna(0)
    X_pred[numerical_features] = scaler.transform(X_pred[numerical_features])

    return model.predict(X_pred)[0]

print("✅ ML prediction function ready")

✅ ML prediction function ready


## 6. Gemini Prediction & Analysis Functions

In [28]:
def get_gemini_price_prediction(model_name, source, target_date):
    """Get price prediction from Gemini LLM"""
    if not llm:
        return None

    prompt = f"""Based on market trends for iPhone {model_name.split()[-1]},
    predict the most likely market price on {source} for {target_date.strftime('%B %d, %Y')} in Indian Rupees.
    Consider competitor pricing, demand, seasonality.
    Reply with ONLY a number (e.g., 75000)"""

    try:
        result = llm.generate_content(prompt)
        import re
        match = re.search(r'\d+', result.text.replace(',', ''))
        return float(match.group()) if match else None
    except:
        return None

def get_gemini_analysis(model_name, source, ml_price, gemini_price, current_price, target_date):
    """Get market and technical analysis from Gemini"""
    if not llm:
        return "Analysis unavailable"

    prompt = f"""Analyze iPhone {model_name} pricing on {source} for {target_date.strftime('%B %d, %Y')}:
    - ML Model predicts: ₹{ml_price:,.0f}
    - Gemini analysis suggests: ₹{gemini_price:,.0f}
    - Current actual price: ₹{current_price:,.0f}

    Provide BRIEF 2-line analysis on:
    1. Market sentiment (bullish/bearish/neutral)
    2. Technical insight (overpriced/underpriced/fair)

    Format: "📊 Market: [sentiment]. 📈 Technical: [insight]"""

    try:
        result = llm.generate_content(prompt)
        return result.text[:200]
    except:
        return "Analysis unavailable"

def get_gemini_optimal_price(model_name, source, ml_price, gemini_price, current_price):
    """Let Gemini recommend optimal price based on all three signals"""
    if not llm:
        return (ml_price + gemini_price + current_price) / 3 if gemini_price else (ml_price + current_price) / 2

    prompt = f"""Given three price signals for {model_name} on {source}:
    - ML Model prediction: ₹{ml_price:,.0f}
    - Gemini market analysis: ₹{gemini_price:,.0f}
    - Current actual price: ₹{current_price:,.0f}

    Recommend ONE optimal selling price that balances competitiveness and profitability.
    Reply ONLY with a single number (INR)."""

    try:
        result = llm.generate_content(prompt)
        import re
        match = re.search(r'\d+', result.text.replace(',', ''))
        if match:
            return float(match.group())
    except:
        pass

    return (ml_price + gemini_price + current_price) / 3 if gemini_price else (ml_price + current_price) / 2

print("✅ Gemini functions ready")

✅ Gemini functions ready


## 7. Dashboard Management

In [29]:
# ============================================================================
# ENHANCED DASHBOARD MANAGEMENT - Stores ALL sources
# ============================================================================

# Initialize dashboard
dashboard = pd.DataFrame(columns=[
    'DateTime', 'Model', 'Source', 'ML_Predicted', 'Gemini_Predicted',
    'Current_Price', 'Optimal_Price', 'Analysis', 'Review'
])

# Store predictions for all sources
all_predictions = []

def update_dashboard(model_name, source, ml_price, gemini_price, current_price, optimal_price, analysis, review=""):
    """Update dashboard with latest prediction"""
    global dashboard

    # Remove old entry for this model-source combo
    dashboard = dashboard[~((dashboard['Model'] == model_name) & (dashboard['Source'] == source))]

    # Add new entry
    new_row = pd.DataFrame({
        'DateTime': [datetime.now().strftime('%Y-%m-%d %H:%M:%S')],
        'Model': [model_name],
        'Source': [source],
        'ML_Predicted': [f"₹{ml_price:,.0f}"],
        'Gemini_Predicted': [f"₹{gemini_price:,.0f}" if gemini_price else "N/A"],
        'Current_Price': [f"₹{current_price:,.0f}"],
        'Optimal_Price': [f"₹{optimal_price:,.0f}"],
        'Analysis': [analysis[:80]],
        'Review': [review[:40]]
    })

    dashboard = pd.concat([dashboard, new_row], ignore_index=True)
    return dashboard

def update_dashboard_batch(model_name, predictions_list):
    """Update dashboard with multiple source predictions at once"""
    global dashboard

    for pred in predictions_list:
        source = pred['source']
        # Remove old entry
        dashboard = dashboard[~((dashboard['Model'] == model_name) & (dashboard['Source'] == source))]

        # Add new entry
        new_row = pd.DataFrame({
            'DateTime': [datetime.now().strftime('%Y-%m-%d %H:%M:%S')],
            'Model': [model_name],
            'Source': [source],
            'ML_Predicted': [f"₹{pred['ml_pred']:,.0f}"],
            'Gemini_Predicted': [f"₹{pred['gemini_pred']:,.0f}" if pred['gemini_pred'] else "N/A"],
            'Current_Price': [f"₹{pred['current_price']:,.0f}"],
            'Optimal_Price': [f"₹{pred['optimal_price']:,.0f}"],
            'Analysis': [pred['analysis'][:80]],
            'Review': [pred['review'][:40]]
        })

        dashboard = pd.concat([dashboard, new_row], ignore_index=True)

    return dashboard

print("✅ Enhanced dashboard ready (stores all sources)")

✅ Enhanced dashboard ready (stores all sources)


## 8. Advanced Chatbot with All Features

In [30]:
def advanced_iphone_chatbot():
    global dashboard, all_predictions

    print("=" * 80)
    print("🤖 ADVANCED iPHONE PRICE PREDICTION CHATBOT")
    print("=" * 80)
    print("\n📊 Features:")
    print("  • ML Model + Gemini LLM price predictions")
    print("  • Real data from GitHub + synthetic data")
    print("  • Market & technical analysis")
    print("  • Gemini-powered optimal pricing")
    print("  • Real-time dashboard tracking (saves ALL sources)")
    print("\n Commands:")
    print("  • Just ask naturally: 'price for iPhone 16 on Amazon'")
    print("  • 'compare iPhone 15' - Compare both sources")
    print("  • 'update' - Save ALL last predictions to dashboard")
    print("  • 'dashboard' - View all tracked predictions")
    print("  • 'quit' - Exit\n")

    conversation_history = []
    all_predictions = []

    while True:
        user_input = input("You: ").strip()

        if not user_input:
            continue

        if user_input.lower() in ['quit', 'exit', 'bye']:
            print("\n👋 Thank you for using Advanced iPhone Price Predictor!")
            break

        # Handle dashboard view
        if user_input.lower() == 'dashboard':
            if len(dashboard) == 0:
                print("\nAssistant: 📊 Dashboard is empty. Make predictions first!\n")
            else:
                print("\n" + "="*80)
                print("📊 PREDICTION DASHBOARD")
                print("="*80)
                print(dashboard.to_string(index=False))
                print("="*80 + "\n")
            continue

        # Handle update command - saves ALL predictions from last batch
        if user_input.lower() == 'update':
            if len(all_predictions) > 0:
                pred_list = all_predictions
                model_name = pred_list[0]['model']

                dashboard_updated = update_dashboard_batch(model_name, pred_list)
                dashboard = dashboard_updated

                sources_saved = ', '.join([p['source'] for p in pred_list])
                print(f"\nAssistant: ✅ Dashboard updated with {len(pred_list)} predictions!")
                print(f"   Saved: {sources_saved}\n")

                # Clear predictions after saving
                all_predictions = []
            else:
                print(f"\nAssistant: ⚠️ No predictions to save. Make a prediction first!\n")
            continue

        # Parse user input more flexibly
        user_lower = user_input.lower()

        # Extract model - much more flexible
        model_name = None
        if 'iphone 17' in user_lower or 'iphone17' in user_lower or 'model 17' in user_lower or '17' in user_lower.split():
            model_name = 'iPhone 17'
        elif 'iphone 16' in user_lower or 'iphone16' in user_lower or 'model 16' in user_lower or '16' in user_lower.split():
            model_name = 'iPhone 16'
        elif 'iphone 15' in user_lower or 'iphone15' in user_lower or 'model 15' in user_lower or '15' in user_lower.split():
            model_name = 'iPhone 15'

        # Extract sources - support "both" or individual
        sources = []
        if 'both' in user_lower or ('amazon' in user_lower and 'flipkart' in user_lower):
            sources = ['Amazon', 'Flipkart']
        elif 'amazon' in user_lower:
            sources = ['Amazon']
        elif 'flipkart' in user_lower:
            sources = ['Flipkart']

        # Extract date if mentioned
        target_date = datetime.now()
        import re
        date_match = re.search(r'(\d{1,2})\s*(st|nd|rd|th)?\s*(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)', user_lower)
        if date_match:
            day = int(date_match.group(1))
            month_str = date_match.group(3)
            month_map = {'jan':1,'feb':2,'mar':3,'apr':4,'may':5,'jun':6,'jul':7,'aug':8,'sep':9,'oct':10,'nov':11,'dec':12}
            month = month_map.get(month_str, datetime.now().month)
            year = 2025
            try:
                target_date = datetime(year, month, day, 12, 0)
            except:
                target_date = datetime.now()

        # Handle compare command
        if ('compare' in user_lower or 'both' in user_lower) and model_name:
            print("\n⏳ Comparing prices across sources...\n")

            comparisons = {}
            all_predictions = []

            for src in ['Amazon', 'Flipkart']:
                ml_pred = safe_predict_iphone_price(model_name, src, target_date=target_date)
                gemini_pred = safe_get_gemini_price(model_name, src, target_date)
                current_data = df[(df['Model'] == model_name) & (df['Source'] == src)].sort_values('Scraped_At', ascending=False)
                current_price = current_data['Price'].iloc[0] if len(current_data) > 0 else ml_pred
                optimal = safe_get_optimal_price(model_name, src, ml_pred, gemini_pred, current_price)
                review = current_data['Reviews'].iloc[0][:50] if len(current_data) > 0 else ""
                analysis = safe_get_gemini_analysis(model_name, src, ml_pred, gemini_pred, current_price, target_date)

                comparisons[src] = {
                    'ml': ml_pred,
                    'gemini': gemini_pred,
                    'current': current_price,
                    'optimal': optimal
                }

                # Store for batch update
                all_predictions.append({
                    'model': model_name,
                    'source': src,
                    'ml_pred': ml_pred,
                    'gemini_pred': gemini_pred,
                    'current_price': current_price,
                    'optimal_price': optimal,
                    'analysis': analysis,
                    'review': review
                })

            response = f"\n{'='*80}\n"
            response += f"🔄 **PRICE COMPARISON - {model_name}**\n"
            response += f"📅 **Date:** {target_date.strftime('%B %d, %Y')}\n\n"

            for src, prices in comparisons.items():
                response += f"🛒 **{src}:**\n"
                response += f"   🤖 ML Predicted: ₹{prices['ml']:,.0f}\n"
                response += f"   🧠 Gemini LLM: ₹{prices['gemini']:,.0f}\n" if prices['gemini'] else ""
                response += f"   💰 Current: ₹{prices['current']:,.0f}\n"
                response += f"   ✅ Optimal: ₹{prices['optimal']:,.0f}\n\n"

            best_source = min(comparisons, key=lambda x: comparisons[x]['optimal'])
            response += f"🎯 Best deal: {best_source} at ₹{comparisons[best_source]['optimal']:,.0f}\n"
            response += f"💡 Type 'update' to save both to dashboard\n"
            response += f"{'='*80}\n"

            print(f"Assistant: {response}\n")
            continue

        # Handle prediction with analysis (if model is specified)
        if model_name:
            # If no source specified or both sources requested, show both
            if len(sources) == 0:
                sources = ['Amazon', 'Flipkart']

            if len(sources) == 2:
                # Show both sources and store ALL
                print(f"\n⏳ Analyzing prices for {model_name} on both platforms...\n")

                all_predictions = []  # Reset for new batch

                for source in sources:
                    # Get ML prediction
                    ml_pred = safe_predict_iphone_price(model_name, source, target_date=target_date)

                    # Get Gemini prediction
                    gemini_pred = safe_get_gemini_price(model_name, source, target_date)

                    # Get current price
                    current_data = df[(df['Model'] == model_name) & (df['Source'] == source)].sort_values('Scraped_At', ascending=False)
                    current_price = current_data['Price'].iloc[0] if len(current_data) > 0 else ml_pred
                    review = current_data['Reviews'].iloc[0][:50] if len(current_data) > 0 else ""

                    # Get optimal price
                    optimal_price = safe_get_optimal_price(model_name, source, ml_pred, gemini_pred, current_price)

                    # Get analysis
                    analysis = safe_get_gemini_analysis(model_name, source, ml_pred, gemini_pred, current_price, target_date)

                    # Display
                    response = f"\n{'='*80}\n"
                    response += f"🎯 **PRICE ANALYSIS - {model_name} on {source}**\n\n"
                    response += f"📅 **Timestamp:** {target_date.strftime('%Y-%m-%d %H:%M:%S')}\n\n"
                    response += f"📊 **PREDICTIONS:**\n"
                    response += f"   🤖 ML Model: ₹{ml_pred:,.0f}\n"
                    response += f"   🧠 Gemini LLM: ₹{gemini_pred:,.0f}\n" if gemini_pred else ""
                    response += f"   💰 Current Market: ₹{current_price:,.0f}\n\n"
                    response += f"✅ **OPTIMAL PRICE:** ₹{optimal_price:,.0f}\n\n"
                    response += f"📈 **ANALYSIS:** {analysis}\n"
                    response += f"{'='*80}\n"

                    print(f"Assistant: {response}\n")

                    # Store ALL predictions for batch update
                    all_predictions.append({
                        'model': model_name,
                        'source': source,
                        'ml_pred': ml_pred,
                        'gemini_pred': gemini_pred,
                        'current_price': current_price,
                        'optimal_price': optimal_price,
                        'analysis': analysis,
                        'review': review
                    })

                print("💡 Type 'update' to save BOTH predictions to dashboard\n")

            else:
                # Single source
                source = sources[0]
                print(f"\n⏳ Analyzing {model_name} price on {source}...\n")

                ml_pred = safe_predict_iphone_price(model_name, source, target_date=target_date)
                gemini_pred = safe_get_gemini_price(model_name, source, target_date)
                current_data = df[(df['Model'] == model_name) & (df['Source'] == source)].sort_values('Scraped_At', ascending=False)
                current_price = current_data['Price'].iloc[0] if len(current_data) > 0 else ml_pred
                review = current_data['Reviews'].iloc[0][:50] if len(current_data) > 0 else ""
                optimal_price = safe_get_optimal_price(model_name, source, ml_pred, gemini_pred, current_price)
                analysis = safe_get_gemini_analysis(model_name, source, ml_pred, gemini_pred, current_price, target_date)

                response = f"\n{'='*80}\n"
                response += f"🎯 **PRICE ANALYSIS - {model_name} on {source}**\n\n"
                response += f"📅 **Timestamp:** {target_date.strftime('%Y-%m-%d %H:%M:%S')}\n\n"
                response += f"📊 **PREDICTIONS:**\n"
                response += f"   🤖 ML Model: ₹{ml_pred:,.0f}\n"
                response += f"   🧠 Gemini LLM: ₹{gemini_pred:,.0f}\n" if gemini_pred else ""
                response += f"   💰 Current Market: ₹{current_price:,.0f}\n\n"
                response += f"✅ **OPTIMAL PRICE:** ₹{optimal_price:,.0f}\n\n"
                response += f"📈 **ANALYSIS:** {analysis}\n\n"
                response += f"💡 Type 'update' to save to dashboard\n"
                response += f"{'='*80}\n"

                print(f"Assistant: {response}\n")

                all_predictions = [{
                    'model': model_name,
                    'source': source,
                    'ml_pred': ml_pred,
                    'gemini_pred': gemini_pred,
                    'current_price': current_price,
                    'optimal_price': optimal_price,
                    'analysis': analysis,
                    'review': review
                }]

            conversation_history.append({"role": "User", "content": user_input})
        else:
            print("\nAssistant: I couldn't identify the iPhone model. Please mention:\n")
            print("  - iPhone 15, 16, or 17\n")
            print("Examples:")
            print("  - 'iPhone 16 on Amazon'")
            print("  - 'price for 17 on both'")
            print("  - 'compare iPhone 15'\n")

print("✅ Enhanced chatbot with batch update ready!")

✅ Enhanced chatbot with batch update ready!


In [31]:
# ============================================================================
# COMPLETE CLEANUP & ERROR HANDLING CELL
# Run this BEFORE running advanced_iphone_chatbot()
# ============================================================================

print("=" * 80)
print("🧹 COMPREHENSIVE DATA CLEANUP & VALIDATION")
print("=" * 80)

# Step 1: Clean Price column
print("\n1️⃣ Cleaning Price column...")
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
initial_rows = len(df)
df = df.dropna(subset=['Price'])
df = df[df['Price'] > 0]
print(f"   Removed {initial_rows - len(df)} invalid price records")
print(f"   Price range: ₹{df['Price'].min():,.0f} - ₹{df['Price'].max():,.0f}")

# Step 2: Ensure required columns
print("\n2️⃣ Validating columns...")
required_cols = ['Model', 'Source', 'Price', 'Rating', 'Reviews', 'Scraped_At']
for col in required_cols:
    if col not in df.columns:
        print(f"   Adding missing column: {col}")
        if col == 'Rating':
            df[col] = 4.2
        elif col == 'Reviews':
            df[col] = "Good product"
        else:
            df[col] = ""
    else:
        print(f"   ✅ {col}")

# Step 3: Convert to numeric types
print("\n3️⃣ Converting data types...")
try:
    df['Price'] = df['Price'].astype(float)
    print(f"   ✅ Price: float")
except:
    print(f"   ⚠️ Price conversion failed")

try:
    df['Rating'] = pd.to_numeric(df['Rating'], errors='coerce')
    df['Rating'] = df['Rating'].fillna(4.2)
    print(f"   ✅ Rating: float")
except:
    print(f"   ⚠️ Rating conversion failed")

try:
    df['Reviews'] = df['Reviews'].astype(str)
    print(f"   ✅ Reviews: string")
except:
    print(f"   ⚠️ Reviews conversion failed")

# Step 4: Fix timestamps
print("\n4️⃣ Fixing timestamps...")
if 'Scraped_At' in df.columns:
    df['Scraped_At'] = pd.to_datetime(df['Scraped_At'], errors='coerce')
    df = df.dropna(subset=['Scraped_At'])
    print(f"   ✅ Datetime: valid")
else:
    df['Scraped_At'] = datetime.now()
    print(f"   ✅ Datetime: added current time")

# Step 5: Final validation
print("\n5️⃣ Final validation...")
print(f"   Total records: {len(df)}")
print(f"   Models: {df['Model'].unique().tolist()}")
print(f"   Sources: {df['Source'].unique().tolist()}")
print(f"   Date range: {df['Scraped_At'].min().date()} to {df['Scraped_At'].max().date()}")

# Step 6: Create safe wrapper functions
print("\n6️⃣ Creating error-safe prediction functions...")

def safe_predict_iphone_price(model_name, source, rating=4.2, review_text="Good phone", target_date=None):
    """Predict price with error handling"""
    try:
        return predict_iphone_price(model_name, source, rating, review_text, target_date)
    except Exception as e:
        print(f"   ⚠️ Prediction error: {e}")
        model_data = df[df['Model'] == model_name]
        if len(model_data) > 0:
            return float(model_data['Price'].mean())
        return 75000.0

def safe_get_gemini_price(model_name, source, target_date):
    """Get Gemini price prediction with error handling"""
    try:
        return get_gemini_price_prediction(model_name, source, target_date)
    except Exception as e:
        return None

def safe_get_gemini_analysis(model_name, source, ml_price, gemini_price, current_price, target_date):
    """Get Gemini analysis with error handling"""
    try:
        return get_gemini_analysis(model_name, source, ml_price, gemini_price, current_price, target_date)
    except Exception as e:
        return "📊 Market: Neutral. 📈 Technical: Fair value"

def safe_get_optimal_price(model_name, source, ml_price, gemini_price, current_price):
    """Calculate optimal price with error handling"""
    try:
        return get_gemini_optimal_price(model_name, source, ml_price, gemini_price, current_price)
    except Exception as e:
        prices = [ml_price, current_price]
        if gemini_price and gemini_price > 0:
            prices.append(gemini_price)
        return float(np.mean(prices))

print(f"   ✅ safe_predict_iphone_price()")
print(f"   ✅ safe_get_gemini_price()")
print(f"   ✅ safe_get_gemini_analysis()")
print(f"   ✅ safe_get_optimal_price()")

print("\n" + "=" * 80)
print("✅ ALL CHECKS PASSED - READY TO RUN CHATBOT!")
print("=" * 80 + "\n")

🧹 COMPREHENSIVE DATA CLEANUP & VALIDATION

1️⃣ Cleaning Price column...
   Removed 18 invalid price records
   Price range: ₹53,229 - ₹205,643

2️⃣ Validating columns...
   ✅ Model
   ✅ Source
   ✅ Price
   ✅ Rating
   ✅ Reviews
   ✅ Scraped_At

3️⃣ Converting data types...
   ✅ Price: float
   ✅ Rating: float
   ✅ Reviews: string

4️⃣ Fixing timestamps...
   ✅ Datetime: valid

5️⃣ Final validation...
   Total records: 2500
   Models: ['iPhone 15', 'iPhone 16', 'iPhone 17']
   Sources: ['Amazon', 'Flipkart']
   Date range: 2025-04-19 to 2025-10-16

6️⃣ Creating error-safe prediction functions...
   ✅ safe_predict_iphone_price()
   ✅ safe_get_gemini_price()
   ✅ safe_get_gemini_analysis()
   ✅ safe_get_optimal_price()

✅ ALL CHECKS PASSED - READY TO RUN CHATBOT!



## 9. Run the Advanced Chatbot

In [32]:
advanced_iphone_chatbot()

🤖 ADVANCED iPHONE PRICE PREDICTION CHATBOT

📊 Features:
  • ML Model + Gemini LLM price predictions
  • Real data from GitHub + synthetic data
  • Market & technical analysis
  • Gemini-powered optimal pricing
  • Real-time dashboard tracking (saves ALL sources)

 Commands:
  • Just ask naturally: 'price for iPhone 16 on Amazon'
  • 'compare iPhone 15' - Compare both sources
  • 'update' - Save ALL last predictions to dashboard
  • 'dashboard' - View all tracked predictions
  • 'quit' - Exit

You: Predict price analysis iphone 17  on 15th jan 2026

⏳ Analyzing prices for iPhone 17 on both platforms...

Assistant: 
🎯 **PRICE ANALYSIS - iPhone 17 on Amazon**

📅 **Timestamp:** 2025-01-15 12:00:00

📊 **PREDICTIONS:**
   🤖 ML Model: ₹140,069
   🧠 Gemini LLM: ₹76,900
   💰 Current Market: ₹128,855

✅ **OPTIMAL PRICE:** ₹134,462

📈 **ANALYSIS:** 📊 Market: Neutral, as starkly divergent model predictions (bullish ML, highly bearish Gemini) create significant market uncertainty.
📈 Technical: Unde

## 10. Export Dashboard (Optional)

In [33]:
# Export dashboard to CSV
if len(dashboard) > 0:
    dashboard.to_csv('price_prediction_dashboard.csv', index=False)
    print(f"✅ Dashboard exported to 'price_prediction_dashboard.csv'")
else:
    print("Dashboard is empty. Make predictions first!")

✅ Dashboard exported to 'price_prediction_dashboard.csv'


In [34]:
import pandas as pd

df = pd.read_csv("price_prediction_dashboard.csv")
df.head()

Unnamed: 0,DateTime,Model,Source,ML_Predicted,Gemini_Predicted,Current_Price,Optimal_Price,Analysis,Review
0,2025-10-31 08:07:50,iPhone 17,Amazon,"₹140,069","₹76,900","₹128,855","₹134,462","📊 Market: Neutral, as starkly divergent model ...",Good user interface but camera in low li
1,2025-10-31 08:07:50,iPhone 17,Flipkart,"₹134,907","₹94,900","₹102,157","₹108,530",📊 Market: Neutral. Conflicting strong bullish ...,Great design and good for gaming.
