# 1. Introduction

This notebook investigates the Automobile CO2 Emissions dataset, aiming to analyze key attributes influencing vehicle emissions and uncover patterns among features such as engine size, fuel consumption, and fuel type.

# 2. Problem Statement

To identify and visualize relationships between automobile features (like engine size, cylinders, fuel type, and fuel consumption) and CO2 emissions, aiding in regulatory and manufacturing insights.

# 3. Installing & Importing Libraries

In [None]:
!pip install pandas-profiling --upgrade -q

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_profiling import ProfileReport
import warnings
warnings.filterwarnings('ignore')

# Configuration
np.set_printoptions(precision=4)
pd.set_option('mode.chained_assignment', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
plt.rc('figure', figsize=(10, 8))
sns.set()
%matplotlib inline

# 4. Data Acquisition & Description

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/insaid2018/Term-2/master/Data/FuelConsumptionCo2.csv')
df.head()

# 5. Data Pre-Profiling

In [None]:
df.info()
df.describe(include='all')
df.isnull().sum()

# 6. Data Pre-Processing

In [None]:
df.drop(columns=['MODELYEAR','FUELCONSUMPTION_COMB_MPG','MODEL'], inplace=True)
df = df.drop_duplicates().reset_index(drop=True)
df.info()

# 7. Data Post-Profiling

In [None]:
profile = ProfileReport(df, title='Pandas Profiling Report', minimal=True, progress_bar=False)
profile.to_file('Automobile_CO2_Emissions_Profile.html')

# 8. Exploratory Data Analysis

10 key visual questions answered with visuals.

### CO2 Emissions by Car Brand

In [None]:
brand_avg = df[['MAKE', 'CO2EMISSIONS']].groupby('MAKE').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(y=brand_avg.index[:15], x=brand_avg['CO2EMISSIONS'][:15]); plt.title('Top 15 Brands by Avg CO2 Emissions'); plt.xlabel('CO2 EMISSIONS (g/km)'); plt.show()

### Top 10 Vehicle Classes by Average Emissions

In [None]:
veh_class_avg = df[['VEHICLECLASS','CO2EMISSIONS']].groupby('VEHICLECLASS').mean().sort_values('CO2EMISSIONS', ascending=False).head(10)
sns.barplot(y=veh_class_avg.index, x=veh_class_avg['CO2EMISSIONS']); plt.title('Top Vehicle Classes by Avg Emissions'); plt.xlabel('CO2 EMISSIONS (g/km)'); plt.show()

### Engine Size Distribution

In [None]:
sns.histplot(df['ENGINESIZE'], bins=30, kde=True); plt.xlabel('Engine Size (L)'); plt.title('Engine Size Distribution'); plt.show()

### Engine Size vs CO2 Emissions

In [None]:
sns.lineplot(x='ENGINESIZE', y='CO2EMISSIONS', data=df); plt.xlabel('Engine Size'); plt.ylabel('CO2 EMISSIONS'); plt.title('Engine Size vs CO2 Emissions'); plt.show()

### Cylinders Count Distribution

In [None]:
sns.countplot(x='CYLINDERS', data=df, order=df['CYLINDERS'].value_counts().index); plt.title('Cylinders Count Distribution'); plt.xlabel('Cylinders'); plt.show()

### CO2 Emissions by Cylinder Count

In [None]:
cyl_emissions = df[['CYLINDERS','CO2EMISSIONS']].groupby('CYLINDERS').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(x=cyl_emissions.index, y=cyl_emissions['CO2EMISSIONS']); plt.ylabel('CO2 EMISSIONS (g/km)'); plt.title('CO2 Emissions by Cylinders'); plt.show()

### CO2 Emissions by Transmission Type

In [None]:
trans_emissions = df[['TRANSMISSION','CO2EMISSIONS']].groupby('TRANSMISSION').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(y=trans_emissions.index, x=trans_emissions['CO2EMISSIONS']); plt.title('CO2 Emissions by Transmission'); plt.xlabel('CO2 EMISSIONS'); plt.show()

### CO2 Emissions by Fuel Type

In [None]:
fuel_emissions = df[['FUELTYPE','CO2EMISSIONS']].groupby('FUELTYPE').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(x=fuel_emissions.index, y=fuel_emissions['CO2EMISSIONS']); plt.ylabel('CO2 EMISSIONS'); plt.title('CO2 Emissions by Fuel Type'); plt.show()

### Fuel Consumption (City) vs CO2 Emissions by Fuel Type

In [None]:
sns.scatterplot(x='FUELCONSUMPTION_CITY', y='CO2EMISSIONS', hue='FUELTYPE', data=df); plt.title('Fuel Consumption City vs CO2 Emissions'); plt.xlabel('City Fuel Consumption (L/100km)'); plt.ylabel('CO2 Emissions'); plt.show()

### Overall Correlation Heatmap

In [None]:
sns.heatmap(df.corr(), annot=True, cmap='RdBu'); plt.title('Correlation Heatmap'); plt.show()

# 9. Summarization

## 9.1 Conclusion

- Engine size, number of cylinders, and fuel consumption are major contributors to CO2 emissions.
- Vehicle class and transmission types also influence emissions levels.
- Profiling reveals high-emission fuel types and brands that exceed average thresholds.

## 9.2 Actionable Insights

- Regulations can target vehicle classes and brands with highest emissions.
- Consumers should prefer models with lower engine size and optimized fuel usage.
- Automotive manufacturers can prioritize emission reduction in larger engine designs.

# 📊 Extended EDA: 10 Key Questions with Visualizations

### 🔹 Q: Which car brands produce the highest average CO2 emissions?

In [None]:
brand_avg = df[['MAKE', 'CO2EMISSIONS']].groupby('MAKE').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(y=brand_avg.index[:15], x=brand_avg['CO2EMISSIONS'][:15]); plt.title('Top 15 Brands by Avg CO2 Emissions'); plt.xlabel('CO2 EMISSIONS (g/km)'); plt.show()

### 🔹 Q: What is the distribution of vehicle classes in the dataset?

In [None]:
plt.figure(figsize=(10,8))
sns.countplot(y='VEHICLECLASS', data=df, order=df['VEHICLECLASS'].value_counts().index)
plt.title('Distribution of Vehicle Classes'); plt.show()

### 🔹 Q: Which vehicle classes have the highest average CO2 emissions?

In [None]:
veh_class_avg = df[['VEHICLECLASS','CO2EMISSIONS']].groupby('VEHICLECLASS').mean().sort_values('CO2EMISSIONS', ascending=False).head(10)
sns.barplot(y=veh_class_avg.index, x=veh_class_avg['CO2EMISSIONS']); plt.title('Top Vehicle Classes by Avg Emissions'); plt.xlabel('CO2 EMISSIONS (g/km)'); plt.show()

### 🔹 Q: What is the distribution of engine sizes in the dataset?

In [None]:
sns.histplot(df['ENGINESIZE'], bins=30, kde=True); plt.xlabel('Engine Size (L)'); plt.title('Engine Size Distribution'); plt.show()

### 🔹 Q: How does engine size affect CO2 emissions?

In [None]:
sns.lineplot(x='ENGINESIZE', y='CO2EMISSIONS', data=df); plt.xlabel('Engine Size'); plt.ylabel('CO2 EMISSIONS'); plt.title('Engine Size vs CO2 Emissions'); plt.show()

### 🔹 Q: How are CO2 emissions related to the number of cylinders?

In [None]:
cyl_emissions = df[['CYLINDERS','CO2EMISSIONS']].groupby('CYLINDERS').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(x=cyl_emissions.index, y=cyl_emissions['CO2EMISSIONS']); plt.ylabel('CO2 EMISSIONS (g/km)'); plt.title('CO2 Emissions by Cylinders'); plt.show()

### 🔹 Q: How do different transmission types compare in terms of CO2 emissions?

In [None]:
trans_emissions = df[['TRANSMISSION','CO2EMISSIONS']].groupby('TRANSMISSION').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(y=trans_emissions.index, x=trans_emissions['CO2EMISSIONS']); plt.title('CO2 Emissions by Transmission'); plt.xlabel('CO2 EMISSIONS'); plt.show()

### 🔹 Q: Which fuel types are most common and how do they impact CO2 emissions?

In [None]:
fuel_emissions = df[['FUELTYPE','CO2EMISSIONS']].groupby('FUELTYPE').mean().sort_values('CO2EMISSIONS', ascending=False)
sns.barplot(x=fuel_emissions.index, y=fuel_emissions['CO2EMISSIONS']); plt.ylabel('CO2 EMISSIONS'); plt.title('CO2 Emissions by Fuel Type'); plt.show()

### 🔹 Q: How does fuel consumption in the city affect CO2 emissions by fuel type?

In [None]:
sns.scatterplot(x='FUELCONSUMPTION_CITY', y='CO2EMISSIONS', hue='FUELTYPE', data=df); plt.title('Fuel Consumption City vs CO2 Emissions'); plt.xlabel('City Fuel Consumption (L/100km)'); plt.ylabel('CO2 Emissions'); plt.show()

### 🔹 Q: What do the correlations between all numerical variables reveal?

In [None]:
sns.heatmap(df.corr(), annot=True, cmap='RdBu'); plt.title('Correlation Heatmap'); plt.show()