# Step 1: Data Cleaning and SQL-Like Analysis

**Objective:** Load and prepare datasets for analysis using SQL-like queries, keeping datasets separate to preserve all data detail.

**Strategy:**
1. Load all datasets separately (no merging)
2. Standardize column names for consistency
3. Create SQL-like query functions for flexible analysis
4. Demonstrate different types of analyses without data loss
5. Keep all original data intact for future use

**Benefits:**
- ✅ Preserves all data detail (no aggregation loss)
- ✅ Flexible analysis with SQL-like queries
- ✅ No memory explosion from merging
- ✅ Easy to add new analysis types
- ✅ Scales well with large datasets


## 1. Setup and Imports


In [25]:
# --- Imports ---
import pandas as pd
import numpy as np
import os
import warnings
from typing import Dict, List, Optional, Any
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')

# --- Data Path Configuration ---
DATA_PATH = '../data/'

# --- Set display options ---
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print("✅ Imports and setup completed")
print("📊 Display options configured for better data viewing")


✅ Imports and setup completed
📊 Display options configured for better data viewing


## 2. Load Datasets Separately


In [26]:
# --- Load All Datasets Separately ---
print("📂 LOADING DATASETS SEPARATELY")
print("="*50)

# Dictionary to store all datasets
datasets = {}

# Define files to load
files_to_load = {
    'basic': 'Basic_table.csv',
    'trim': 'Trim_table.csv', 
    'price': 'Price_table.csv',
    'sales': 'Sales_table.csv'
}

print("Loading datasets...")
for name, filename in files_to_load.items():
    file_path = os.path.join(DATA_PATH, filename)
    try:
        datasets[name] = pd.read_csv(file_path)
        print(f"✅ {name.upper()}: {filename} - Shape: {datasets[name].shape}")
        print(f"   Memory: {datasets[name].memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    except FileNotFoundError:
        print(f"❌ ERROR: {filename} not found at {file_path}")

print(f"\n📊 SUMMARY:")
print(f"   Total datasets loaded: {len(datasets)}")
print(f"   Total memory usage: {sum(df.memory_usage(deep=True).sum() for df in datasets.values()) / 1024**2:.2f} MB")

print("\n✅ All datasets loaded successfully!")


📂 LOADING DATASETS SEPARATELY
Loading datasets...
✅ BASIC: Basic_table.csv - Shape: (1011, 4)
   Memory: 0.19 MB
✅ TRIM: Trim_table.csv - Shape: (335562, 9)
   Memory: 120.94 MB
✅ PRICE: Price_table.csv - Shape: (6333, 5)
   Memory: 1.23 MB
✅ SALES: Sales_table.csv - Shape: (773, 23)
   Memory: 0.26 MB

📊 SUMMARY:
   Total datasets loaded: 4
   Total memory usage: 122.63 MB

✅ All datasets loaded successfully!


## 3. Column Name Standardization


In [27]:
# --- Standardize Column Names ---
print("🔧 STANDARDIZING COLUMN NAMES")
print("="*50)

# Track changes
column_changes = {}

for name, df in datasets.items():
    original_columns = list(df.columns)
    
    # Standardize 'Maker' to 'Automaker'
    if 'Maker' in df.columns:
        df_renamed = df.rename(columns={'Maker': 'Automaker'})
        datasets[name] = df_renamed
        column_changes[name] = 'Maker → Automaker'
        print(f"✅ {name.upper()}: Renamed 'Maker' to 'Automaker'")
    else:
        column_changes[name] = 'No changes needed'
        print(f"✅ {name.upper()}: No column name changes needed")

print(f"\n📋 COLUMN STANDARDIZATION SUMMARY:")
for name, change in column_changes.items():
    print(f"  {name.upper()}: {change}")

print("\n📊 FINAL COLUMNS BY DATASET:")
for name, df in datasets.items():
    print(f"  {name.upper()}: {list(df.columns)}")

print("\n✅ Column standardization completed")


🔧 STANDARDIZING COLUMN NAMES
✅ BASIC: No column name changes needed
✅ TRIM: Renamed 'Maker' to 'Automaker'
✅ PRICE: Renamed 'Maker' to 'Automaker'
✅ SALES: Renamed 'Maker' to 'Automaker'

📋 COLUMN STANDARDIZATION SUMMARY:
  BASIC: No changes needed
  TRIM: Maker → Automaker
  PRICE: Maker → Automaker
  SALES: Maker → Automaker

📊 FINAL COLUMNS BY DATASET:
  BASIC: ['Automaker', 'Automaker_ID', 'Genmodel', 'Genmodel_ID']
  TRIM: ['Genmodel_ID', 'Automaker', 'Genmodel', 'Trim', 'Year', 'Price', 'Gas_emission', 'Fuel_type', 'Engine_size']
  PRICE: ['Automaker', 'Genmodel', 'Genmodel_ID', 'Year', 'Entry_price']
  SALES: ['Automaker', 'Genmodel', 'Genmodel_ID', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001']

✅ Column standardization completed


## 4. SQL-Like Query Functions


In [28]:
# --- SQL-Like Query Functions ---
print("🔍 CREATING SQL-LIKE QUERY FUNCTIONS")
print("="*50)

class CarDataAnalyzer:
    """
    A class to perform SQL-like queries on car datasets without merging them.
    Keeps all data separate and provides flexible analysis methods.
    """
    
    def __init__(self, datasets: Dict[str, pd.DataFrame]):
        self.datasets = datasets
        self.basic = datasets['basic']
        self.trim = datasets['trim']
        self.price = datasets['price']
        self.sales = datasets['sales']
        
    def get_basic_info(self) -> Dict[str, Any]:
        """Get basic information about all datasets"""
        info = {}
        for name, df in self.datasets.items():
            info[name] = {
                'shape': df.shape,
                'memory_mb': df.memory_usage(deep=True).sum() / 1024**2,
                'columns': list(df.columns),
                'unique_genmodel_ids': df['Genmodel_ID'].nunique() if 'Genmodel_ID' in df.columns else 0
            }
        return info
    
    def query_models_by_automaker(self, automaker: str) -> pd.DataFrame:
        """Get all models for a specific automaker"""
        return self.basic[self.basic['Automaker'] == automaker]
    
    def query_trim_details(self, genmodel_id: str) -> pd.DataFrame:
        """Get all trim details for a specific model"""
        return self.trim[self.trim['Genmodel_ID'] == genmodel_id]
    
    def query_price_history(self, genmodel_id: str) -> pd.DataFrame:
        """Get price history for a specific model"""
        return self.price[self.price['Genmodel_ID'] == genmodel_id]
    
    def query_sales_data(self, genmodel_id: str) -> pd.DataFrame:
        """Get sales data for a specific model"""
        return self.sales[self.sales['Genmodel_ID'] == genmodel_id]
    
    def get_price_range_by_model(self) -> pd.DataFrame:
        """Get price range (min, max, avg) for each model"""
        price_stats = self.price.groupby('Genmodel_ID')['Entry_price'].agg(['min', 'max', 'mean', 'count']).reset_index()
        price_stats.columns = ['Genmodel_ID', 'price_min', 'price_max', 'price_mean', 'price_entries']
        
        # Join with basic info
        result = self.basic.merge(price_stats, on='Genmodel_ID', how='left')
        return result
    
    def get_trim_summary_by_model(self) -> pd.DataFrame:
        """Get trim summary statistics for each model"""
        trim_stats = self.trim.groupby('Genmodel_ID').agg({
            'Price': ['min', 'max', 'mean', 'count'],
            'Year': ['min', 'max'],
            'Fuel_type': lambda x: x.mode().iloc[0] if not x.mode().empty else None,
            'Trim': 'count'
        }).reset_index()
        
        # Flatten column names
        trim_stats.columns = ['Genmodel_ID', 'trim_price_min', 'trim_price_max', 'trim_price_mean', 
                             'trim_price_count', 'year_min', 'year_max', 'most_common_fuel', 'trim_count']
        
        # Join with basic info
        result = self.basic.merge(trim_stats, on='Genmodel_ID', how='left')
        return result
    
    def get_sales_summary(self) -> pd.DataFrame:
        """Get sales summary by model"""
        # Reshape sales data from wide to long
        year_columns = [col for col in self.sales.columns if col.isdigit()]
        sales_long = pd.melt(
            self.sales,
            id_vars=['Automaker', 'Genmodel', 'Genmodel_ID'],
            value_vars=year_columns,
            var_name='Year',
            value_name='Sales_Volume'
        )
        
        # Calculate summary statistics
        sales_stats = sales_long.groupby('Genmodel_ID')['Sales_Volume'].agg(['sum', 'mean', 'max', 'count']).reset_index()
        sales_stats.columns = ['Genmodel_ID', 'total_sales', 'avg_sales', 'max_sales', 'years_with_data']
        
        # Join with basic info
        result = self.basic.merge(sales_stats, on='Genmodel_ID', how='left')
        return result
    
    def get_comprehensive_model_info(self, genmodel_id: str) -> Dict[str, pd.DataFrame]:
        """Get comprehensive information for a specific model"""
        return {
            'basic_info': self.basic[self.basic['Genmodel_ID'] == genmodel_id],
            'trim_details': self.query_trim_details(genmodel_id),
            'price_history': self.query_price_history(genmodel_id),
            'sales_data': self.query_sales_data(genmodel_id)
        }

# Initialize the analyzer
analyzer = CarDataAnalyzer(datasets)

print("✅ SQL-like query functions created")
print("📊 Available methods:")
print("  - get_basic_info()")
print("  - query_models_by_automaker(automaker)")
print("  - query_trim_details(genmodel_id)")
print("  - query_price_history(genmodel_id)")
print("  - query_sales_data(genmodel_id)")
print("  - get_price_range_by_model()")
print("  - get_trim_summary_by_model()")
print("  - get_sales_summary()")
print("  - get_comprehensive_model_info(genmodel_id)")


🔍 CREATING SQL-LIKE QUERY FUNCTIONS
✅ SQL-like query functions created
📊 Available methods:
  - get_basic_info()
  - query_models_by_automaker(automaker)
  - query_trim_details(genmodel_id)
  - query_price_history(genmodel_id)
  - query_sales_data(genmodel_id)
  - get_price_range_by_model()
  - get_trim_summary_by_model()
  - get_sales_summary()
  - get_comprehensive_model_info(genmodel_id)


## 5. Data Quality Assessment


In [29]:
# --- Data Quality Assessment ---
print("🔍 DATA QUALITY ASSESSMENT")
print("="*60)

# Get basic info about all datasets
basic_info = analyzer.get_basic_info()

print("📊 DATASET OVERVIEW:")
print("-" * 40)
for name, info in basic_info.items():
    print(f"\n{name.upper()}:")
    print(f"  Shape: {info['shape']}")
    print(f"  Memory: {info['memory_mb']:.2f} MB")
    print(f"  Unique Genmodel_IDs: {info['unique_genmodel_ids']}")
    print(f"  Columns: {len(info['columns'])}")

# Check data coverage
print(f"\n📈 DATA COVERAGE ANALYSIS:")
print("-" * 40)

basic_ids = set(datasets['basic']['Genmodel_ID'].unique())
trim_ids = set(datasets['trim']['Genmodel_ID'].unique())
price_ids = set(datasets['price']['Genmodel_ID'].unique())
sales_ids = set(datasets['sales']['Genmodel_ID'].unique())

print(f"Basic table unique Genmodel_IDs: {len(basic_ids)}")
print(f"Trim table unique Genmodel_IDs: {len(trim_ids)}")
print(f"Price table unique Genmodel_IDs: {len(price_ids)}")
print(f"Sales table unique Genmodel_IDs: {len(sales_ids)}")

print(f"\nCoverage percentages:")
trim_coverage = len(basic_ids.intersection(trim_ids)) / len(basic_ids) * 100
price_coverage = len(basic_ids.intersection(price_ids)) / len(basic_ids) * 100
sales_coverage = len(basic_ids.intersection(sales_ids)) / len(basic_ids) * 100

print(f"  Trim coverage: {trim_coverage:.1f}% ({len(basic_ids.intersection(trim_ids))}/{len(basic_ids)})")
print(f"  Price coverage: {price_coverage:.1f}% ({len(basic_ids.intersection(price_ids))}/{len(basic_ids)})")
print(f"  Sales coverage: {sales_coverage:.1f}% ({len(basic_ids.intersection(sales_ids))}/{len(basic_ids)})")

# Check for missing data
print(f"\n❌ MISSING DATA:")
print("-" * 40)
missing_trim = basic_ids - trim_ids
missing_price = basic_ids - price_ids
missing_sales = basic_ids - sales_ids

print(f"  Models missing Trim data: {len(missing_trim)}")
print(f"  Models missing Price data: {len(missing_price)}")
print(f"  Models missing Sales data: {len(missing_sales)}")

if len(missing_trim) > 0:
    print(f"  Sample missing Trim: {list(missing_trim)[:5]}")

print("\n✅ Data quality assessment completed")


🔍 DATA QUALITY ASSESSMENT
📊 DATASET OVERVIEW:
----------------------------------------

BASIC:
  Shape: (1011, 4)
  Memory: 0.19 MB
  Unique Genmodel_IDs: 1011
  Columns: 4

TRIM:
  Shape: (335562, 9)
  Memory: 120.94 MB
  Unique Genmodel_IDs: 647
  Columns: 9

PRICE:
  Shape: (6333, 5)
  Memory: 1.23 MB
  Unique Genmodel_IDs: 647
  Columns: 5

SALES:
  Shape: (773, 23)
  Memory: 0.26 MB
  Unique Genmodel_IDs: 734
  Columns: 23

📈 DATA COVERAGE ANALYSIS:
----------------------------------------
Basic table unique Genmodel_IDs: 1011
Trim table unique Genmodel_IDs: 647
Price table unique Genmodel_IDs: 647
Sales table unique Genmodel_IDs: 734

Coverage percentages:
  Trim coverage: 64.0% (647/1011)
  Price coverage: 64.0% (647/1011)
  Sales coverage: 72.6% (734/1011)

❌ MISSING DATA:
----------------------------------------
  Models missing Trim data: 364
  Models missing Price data: 364
  Models missing Sales data: 277
  Sample missing Trim: ['75_4', '8_50', '51_1', '26_4', '7_24']

✅ Da

## 6. Demo: SQL-Like Queries in Action


In [30]:
# --- Demo: SQL-Like Queries in Action ---
print("🚀 DEMO: SQL-LIKE QUERIES IN ACTION")
print("="*60)

# Example 1: Get all BMW models
print("\n1️⃣ EXAMPLE: Get all BMW models")
print("-" * 40)
bmw_models = analyzer.query_models_by_automaker('BMW')
print(f"Found {len(bmw_models)} BMW models:")
display(bmw_models.head())

# Example 2: Get price range for all models
print("\n2️⃣ EXAMPLE: Price range by model")
print("-" * 40)
price_ranges = analyzer.get_price_range_by_model()
print(f"Price ranges for {len(price_ranges)} models:")
display(price_ranges.head())

# Example 3: Get trim summary
print("\n3️⃣ EXAMPLE: Trim summary by model")
print("-" * 40)
trim_summary = analyzer.get_trim_summary_by_model()
print(f"Trim summary for {len(trim_summary)} models:")
display(trim_summary.head())

# Example 4: Get sales summary
print("\n4️⃣ EXAMPLE: Sales summary by model")
print("-" * 40)
sales_summary = analyzer.get_sales_summary()
print(f"Sales summary for {len(sales_summary)} models:")
display(sales_summary.head())

# Example 5: Get comprehensive info for a specific model
print("\n5️⃣ EXAMPLE: Comprehensive info for Abarth 124 Spider")
print("-" * 40)
model_info = analyzer.get_comprehensive_model_info('2_1')
print("Model ID: 2_1 (Abarth 124 Spider)")
for key, df in model_info.items():
    print(f"\n{key.upper()}:")
    if len(df) > 0:
        display(df.head())
    else:
        print("  No data available")

print("\n✅ Demo completed successfully!")


🚀 DEMO: SQL-LIKE QUERIES IN ACTION

1️⃣ EXAMPLE: Get all BMW models
----------------------------------------
Found 50 BMW models:


Unnamed: 0,Automaker,Automaker_ID,Genmodel,Genmodel_ID
97,BMW,8,1 Series,8_1
98,BMW,8,2 Series,8_2
99,BMW,8,2 Series Active Tourer,8_3
100,BMW,8,2 Series Gran Tourer,8_4
101,BMW,8,3 Series,8_5



2️⃣ EXAMPLE: Price range by model
----------------------------------------
Price ranges for 1011 models:


Unnamed: 0,Automaker,Automaker_ID,Genmodel,Genmodel_ID,price_min,price_max,price_mean,price_entries
0,AC,1,Cobra,1_1,,,,
1,Abarth,2,124 Spider,2_1,26665.0,29515.0,28052.5,4.0
2,Abarth,2,500,2_2,13400.0,14325.0,13955.0,8.0
3,Abarth,2,500C,2_3,15775.0,17290.0,16394.571429,7.0
4,Abarth,2,595,2_4,14425.0,17675.0,15447.142857,7.0



3️⃣ EXAMPLE: Trim summary by model
----------------------------------------
Trim summary for 1011 models:


Unnamed: 0,Automaker,Automaker_ID,Genmodel,Genmodel_ID,trim_price_min,trim_price_max,trim_price_mean,trim_price_count,year_min,year_max,most_common_fuel,trim_count
0,AC,1,Cobra,1_1,,,,,,,,
1,Abarth,2,124 Spider,2_1,26665.0,35365.0,30524.090909,11.0,2016.0,2019.0,Petrol,11.0
2,Abarth,2,500,2_2,13400.0,15625.0,14542.578947,19.0,2009.0,2016.0,Petrol,19.0
3,Abarth,2,500C,2_3,15775.0,17658.0,16876.473684,19.0,2010.0,2016.0,Petrol,19.0
4,Abarth,2,595,2_4,14425.0,23805.0,19294.044586,157.0,2012.0,2018.0,Petrol,157.0



4️⃣ EXAMPLE: Sales summary by model
----------------------------------------
Sales summary for 1011 models:


Unnamed: 0,Automaker,Automaker_ID,Genmodel,Genmodel_ID,total_sales,avg_sales,max_sales,years_with_data
0,AC,1,Cobra,1_1,,,,
1,Abarth,2,124 Spider,2_1,1691.0,42.275,777.0,40.0
2,Abarth,2,500,2_2,5419.0,270.95,915.0,20.0
3,Abarth,2,500C,2_3,,,,
4,Abarth,2,595,2_4,18128.0,906.4,3907.0,20.0



5️⃣ EXAMPLE: Comprehensive info for Abarth 124 Spider
----------------------------------------
Model ID: 2_1 (Abarth 124 Spider)

BASIC_INFO:


Unnamed: 0,Automaker,Automaker_ID,Genmodel,Genmodel_ID
1,Abarth,2,124 Spider,2_1



TRIM_DETAILS:


Unnamed: 0,Genmodel_ID,Automaker,Genmodel,Trim,Year,Price,Gas_emission,Fuel_type,Engine_size
0,2_1,Abarth,124 spider,124 Spider1.4 Turbo MultiAir 170hp 2d,2016,29365,148,Petrol,1368
1,2_1,Abarth,124 spider,124 Spider1.4 Turbo MultiAir 170hp Sequenziale...,2016,31365,153,Petrol,1368
2,2_1,Abarth,124 spider,124 Spider1.4 Turbo MultiAir 170hp 2d,2017,29365,148,Petrol,1368
3,2_1,Abarth,124 spider,124 Spider1.4 Turbo MultiAir 170hp Sequenziale...,2017,31365,153,Petrol,1368
4,2_1,Abarth,124 spider,124 SpiderScorpione 1.4 Turbo MultiAir 170hp 2d,2017,26665,148,Petrol,1368



PRICE_HISTORY:


Unnamed: 0,Automaker,Genmodel,Genmodel_ID,Year,Entry_price
0,Abarth,124 Spider,2_1,2016,29365
1,Abarth,124 Spider,2_1,2017,26665
2,Abarth,124 Spider,2_1,2018,26665
3,Abarth,124 Spider,2_1,2019,29515



SALES_DATA:


Unnamed: 0,Automaker,Genmodel,Genmodel_ID,2020,2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001
0,ABARTH,ABARTH 124,2_1,0,19,27,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,ABARTH,ABARTH SPIDER,2_1,0,223,777,409,176,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0



✅ Demo completed successfully!


## 7. Advanced Query Examples


In [31]:
# --- Advanced Query Examples ---
print("🔬 ADVANCED QUERY EXAMPLES")
print("="*60)

# Example 1: Top 10 most expensive models
print("\n1️⃣ TOP 10 MOST EXPENSIVE MODELS")
print("-" * 40)
price_ranges = analyzer.get_price_range_by_model()
top_expensive = price_ranges.nlargest(10, 'price_max')
print("Top 10 most expensive models (by max price):")
display(top_expensive[['Automaker', 'Genmodel', 'price_min', 'price_max', 'price_mean']])

# Example 2: Models with most trim variations
print("\n2️⃣ MODELS WITH MOST TRIM VARIATIONS")
print("-" * 40)
trim_summary = analyzer.get_trim_summary_by_model()
most_trims = trim_summary.nlargest(10, 'trim_count')
print("Top 10 models with most trim variations:")
display(most_trims[['Automaker', 'Genmodel', 'trim_count', 'trim_price_min', 'trim_price_max']])

# Example 3: Best selling models
print("\n3️⃣ BEST SELLING MODELS")
print("-" * 40)
sales_summary = analyzer.get_sales_summary()
best_selling = sales_summary.nlargest(10, 'total_sales')
print("Top 10 best selling models:")
display(best_selling[['Automaker', 'Genmodel', 'total_sales', 'avg_sales', 'max_sales']])

# Example 4: Fuel type analysis
print("\n4️⃣ FUEL TYPE ANALYSIS")
print("-" * 40)
fuel_analysis = trim_summary.groupby('most_common_fuel').agg({
    'Genmodel_ID': 'count',
    'trim_price_mean': 'mean'
}).reset_index()
fuel_analysis.columns = ['fuel_type', 'model_count', 'avg_price']
fuel_analysis = fuel_analysis.sort_values('model_count', ascending=False)
print("Fuel type distribution:")
display(fuel_analysis)

# Example 5: Year range analysis
print("\n5️⃣ YEAR RANGE ANALYSIS")
print("-" * 40)
year_analysis = trim_summary.groupby(['year_min', 'year_max']).size().reset_index(name='model_count')
year_analysis = year_analysis.sort_values('model_count', ascending=False)
print("Year range distribution:")
display(year_analysis.head(10))

print("\n✅ Advanced query examples completed!")


🔬 ADVANCED QUERY EXAMPLES

1️⃣ TOP 10 MOST EXPENSIVE MODELS
----------------------------------------
Top 10 most expensive models (by max price):


Unnamed: 0,Automaker,Genmodel,price_min,price_max,price_mean
786,Rolls-Royce,Phantom,252038.0,320120.0,282802.25
546,Maybach,62,281200.0,302725.0,291846.2
784,Rolls-Royce,Dawn,264000.0,275240.0,266810.0
545,Maybach,57,243600.0,266707.0,254985.8
466,Lamborghini,Aventador,256020.0,262860.0,260295.0
271,Ferrari,812 Superfast,260908.0,260908.0,260908.0
788,Rolls-Royce,Wraith,228800.0,251240.0,237584.0
151,Bentley,Brooklands,225100.0,241539.0,231659.75
275,Ferrari,F12berlinetta,238232.0,239908.0,239051.142857
154,Bentley,Mulsanne,220000.0,238700.0,228583.333333



2️⃣ MODELS WITH MOST TRIM VARIATIONS
----------------------------------------
Top 10 models with most trim variations:


Unnamed: 0,Automaker,Genmodel,trim_count,trim_price_min,trim_price_max
101,BMW,3 Series,12289.0,13650.0,85130.0
586,Mercedes-Benz,C Class,9861.0,17600.0,97710.0
924,Vauxhall,Astra,7490.0,9430.0,29945.0
53,Audi,A4,7060.0,16015.0,70920.0
51,Audi,A3,6602.0,14475.0,44820.0
97,BMW,1 Series,5832.0,15507.0,40175.0
339,Ford,Mondeo,5815.0,12810.0,35440.0
932,Vauxhall,Insignia,5801.0,15376.0,36525.0
326,Ford,Focus,5673.0,9830.0,39040.0
594,Mercedes-Benz,E Class,5378.0,23345.0,108325.0



3️⃣ BEST SELLING MODELS
----------------------------------------
Top 10 best selling models:


Unnamed: 0,Automaker,Genmodel,total_sales,avg_sales,max_sales
325,Ford,Fiesta,1505740.0,75287.0,125619.0
326,Ford,Focus,1166989.0,58349.45,77782.0
928,Vauxhall,Corsa,1043713.0,52185.65,86840.0
962,Volkswagen,Golf,1018866.0,50943.3,70714.0
924,Vauxhall,Astra,816382.0,40819.1,65907.0
969,Volkswagen,Polo,662837.0,33141.85,52347.0
528,MINI,Hatch,618637.0,10310.616667,38958.0
677,Nissan,Qashqai,555468.0,27773.4,58132.0
101,BMW,3 Series,547991.0,27399.55,38244.0
917,Toyota,Yaris,474954.0,23747.7,31025.0



4️⃣ FUEL TYPE ANALYSIS
----------------------------------------
Fuel type distribution:


Unnamed: 0,fuel_type,model_count,avg_price
3,Petrol,408,41447.351079
0,Diesel,215,28080.627042
2,Other,22,43503.461766
1,Electric Diesel REX,2,30730.0



5️⃣ YEAR RANGE ANALYSIS
----------------------------------------
Year range distribution:


Unnamed: 0,year_min,year_max,model_count
20,1998.0,2018.0,40
171,2016.0,2018.0,17
174,2017.0,2018.0,17
143,2010.0,2018.0,16
164,2014.0,2018.0,15
162,2013.0,2018.0,14
6,1998.0,2004.0,14
22,1998.0,2020.0,13
79,2003.0,2018.0,13
4,1998.0,2002.0,11



✅ Advanced query examples completed!
