# Car Price Prediction using Feedforward Neural Network
This notebook demonstrates the process of building a Feedforward Neural Network (FNN) to predict car prices using the provided dataset. We will go through data loading, cleaning, preprocessing, model building, training, evaluation, and prediction.

## Data Loading
We begin by loading the dataset using pandas.

In [3]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv("CarPrice_dataset.csv")
df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


## Data Cleaning
We check for missing values and drop unnecessary columns if any.

In [5]:
df.info()
df.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

Unnamed: 0,0
car_ID,0
symboling,0
CarName,0
fueltype,0
aspiration,0
doornumber,0
carbody,0
drivewheel,0
enginelocation,0
wheelbase,0


## Data Preprocessing
We handle categorical variables using one-hot encoding and normalize numerical features.

In [13]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder,LabelEncoder
from sklearn.compose import ColumnTransformer

class ModifiedLabelEncoder(LabelEncoder):
  def fit_transform(self,y,*args,**kwargs):
    for i in y:
      y[i]= super().fit_transform(y[i]).reshape(-1,1)
    return y
  def transform(self,y,*args,**kwargs):
    for i in y:
      y[i]= super().transform(y[i]).reshape(-1,1)
    return y

In [16]:


# Separate features and target
X = df.drop(columns=['price'])
y = df['price']

# Identify categorical and numerical columns
#/categorical_cols = X.select_dtypes(include=['object']).columns.tolist()
#numerical_cols = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
le=['CarName','carbody','drivewheel','enginetype','cylindernumber','fuelsystem']

ohe=[i for i in df.select_dtypes(include=['object']).columns if len(df[i].unique())==2]

num = df.select_dtypes([int,float]).columns.tolist()
num.pop()
transformer=ColumnTransformer(
    transformers=[
        ('num',StandardScaler(),num),
        ('ohe',OneHotEncoder(drop='first'),ohe),
        ('le',ModifiedLabelEncoder(),le)
    ],
    remainder='passthrough'
)
data=transformer.fit_transform(X)
X_train,X_test,y_train,y_test=train_test_split(data,y,test_size=0.2,random_state=42)
X_train.shape,X_test.shape,y_train.shape,y_test.shape
# Define preprocessing steps
#preprocessor = ColumnTransformer(
  #  transformers=[
   #     ('num', StandardScaler(), numerical_cols),
   #     ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_cols)
   # ])

# Apply preprocessing
#X_processed = preprocessor.fit_transform(X)

# Split the data
#X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

((164, 25), (41, 25), (164,), (41,))

## Model Building
We build a Feedforward Neural Network using TensorFlow/Keras.

In [32]:
X_train.shape

(164, 25)

In [59]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# Define the model
model = Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(8, activation='relu'))
model.add(layers.Dense(1,activation='relu'))


# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error', metrics = ["mean_absolute_error", "mean_squared_error"])
model.summary()

## Model Training
We train the model using the training data.

In [60]:
history = model.fit(X_train, y_train, epochs=130, validation_split=0.2, verbose=1)

Epoch 1/130
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 397ms/step - loss: 12463.6094 - mean_absolute_error: 12463.6094 - mean_squared_error: 206057424.0000 - val_loss: 15068.7510 - val_mean_absolute_error: 15068.7510 - val_mean_squared_error: 317550720.0000
Epoch 2/130
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - loss: 12716.9141 - mean_absolute_error: 12716.9141 - mean_squared_error: 216189632.0000 - val_loss: 15050.6113 - val_mean_absolute_error: 15050.6113 - val_mean_squared_error: 317049504.0000
Epoch 3/130
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 12877.1133 - mean_absolute_error: 12877.1133 - mean_squared_error: 215124784.0000 - val_loss: 15016.8301 - val_mean_absolute_error: 15016.8301 - val_mean_squared_error: 316121536.0000
Epoch 4/130
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 13023.9277 - mean_absolute_error: 13023.9277 - mean_squared_error: 220394848.00

## Model Evaluation
We evaluate the model using the test data.

In [61]:
from sklearn.metrics import mean_absolute_error,mean_squared_error, r2_score

y_pred = model.predict(X_test)

# Calculate metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"R-squared Score: {r2}")

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 263ms/step
Mean Absolute Error: 2416.7369833269818
Mean Squared Error: 14868305.128655318
R-squared Score: 0.8116601490146977


## Prediction on New Data
We demonstrate prediction using new sample data.

In [40]:
# Example new data (replace with actual values from dataset)
new_data = X.iloc[[0]]
new_data_processed = preprocessor.transform(new_data)

predicted_price = model.predict(new_data_processed)
print(f"Predicted Price: {predicted_price[0][0]}")

NameError: name 'preprocessor' is not defined