<a href="https://www.kaggle.com/code/swish9/car-information?scriptVersionId=132235091" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer

In [None]:
car = pd.read_csv('/kaggle/input/automobile-dataset/Automobile.csv')

In [None]:
car.columns

 Each element in the list corresponds to a specific attribute or variable in the dataset. Let's go through each of these column names:
 
* 'name': This column represents the name or identifier of the automobile. It could contain unique identifiers or labels for each car in the dataset.
* 'mpg': This column represents the miles per gallon (mpg) metric, which indicates the fuel efficiency of the automobile. It specifies the number of miles the car can travel on one gallon of fuel.
* 'cylinders': This column represents the number of cylinders in the automobile's engine. It indicates the power and performance capabilities of the engine.
* 'displacement': This column represents the engine displacement, which is the total volume swept by all the pistons inside the cylinders during one complete cycle. It is typically measured in cubic centimeters (cc) or liters (L).
* 'horsepower': This column represents the horsepower rating of the automobile's engine. Horsepower is a unit of power that measures the rate at which work is done.
* 'weight': This column represents the weight of the automobile, typically measured in pounds (lbs) or kilograms (kg). It indicates the mass of the vehicle.
* 'acceleration': This column represents the acceleration performance of the automobile, measured in seconds. It indicates how quickly the car can go from 0 to 60 mph or another standard acceleration metric.
* 'model_year': This column represents the year of the automobile model. It indicates the manufacturing year of the vehicle.
* 'origin': This column represents the origin or manufacturing country of the automobile. It could specify the country of origin using categorical values such as codes or names.


In [None]:
car.head(10)

In [None]:
car.nunique()

In [None]:
car.isnull().sum()

# Here are the problem statements which I'll be working on:

<h2> Exploratory Data Analysis (EDA) </h2>

<h3>I will conduct a comprehensive exploratory data analysis on the automobile dataset to gain insights into the relationships, distributions, and patterns within the data. I will explore the statistical summaries, perform data visualizations, and analyze the correlations between variables. My goal is to identify any interesting trends, outliers, or patterns in the dataset that can provide a deeper understanding of the data and set the foundation for further analysis or modeling. </h3>

In [None]:
car.describe()

In [None]:
# Correlation heatmap
correlation = df.corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

In [None]:
# Distribution of MPG
sns.histplot(df['mpg'], kde=True)
plt.title("Distribution of MPG")
plt.xlabel("MPG")
plt.show()

In [None]:
# Scatter plot of MPG vs. Horsepower
sns.scatterplot(x='horsepower', y='mpg', data=df)
plt.title("MPG vs. Horsepower")
plt.xlabel("Horsepower")
plt.ylabel("MPG")
plt.show()


<h2> Fuel Efficiency Prediction</h2>

<h3> I will build a machine learning model to predict the fuel efficiency (MPG) of automobiles based on features such as cylinders, displacement, horsepower, weight, and acceleration. I will split the dataset into training and testing sets to ensure unbiased evaluation. Next, I will select an appropriate regression algorithm, such as linear regression or decision tree regression, to train the model using the training data. Once the model is trained, I will evaluate its performance on the testing data. My aim is to create a model that can accurately predict the fuel efficiency of automobiles based on their attributes, providing valuable insights for future analysis and decision-making </h3>

In [None]:
car.dropna(inplace=True,axis=1)

In [None]:
# Split the data into features (X) and target (y)
X = df[['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration']]
y = df['mpg']

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Handle missing values
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

In [None]:
# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train_imputed, y_train)

In [None]:
# Predict on the testing set
y_pred = model.predict(X_test_imputed)

In [None]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)