# **About Dataset:**


The goal of this project is to build a Random Forest Regression model that predicts petrol consumption based on economic and infrastructure factors such as petrol tax, average income, paved highways, and driver license percentage. This helps in analyzing how different factors influence fuel usage and provides insights for better resource planning.


## **Step 00: Import the libaries**

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

# **Step 01 : Load the CSV using Pandas**

In [None]:
# Load the data
df = pd.read_csv("data.csv")

# **Step 02: Display the first five rows**

In [None]:
df.head()

Unnamed: 0,Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption
0,9.0,3571,1976,0.525,541
1,9.0,4092,1250,0.572,524
2,9.0,3865,1586,0.58,561
3,7.5,4870,2351,0.529,414
4,8.0,4399,431,0.544,410


# **Step 03: Seperate the Input and output features**

In [None]:
# Features and target
X = df.drop(columns=["Petrol_Consumption"])
y = df["Petrol_Consumption"]

# **Step 05: Split the dataset into training and testing (80% for training, 20% for testing)**

In [None]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


# **Step 06: Train Random Forest Regressor**

In [None]:
# Create Random Forest Regressor
rf = RandomForestRegressor(
    n_estimators=200
)

In [None]:
# Train the model
rf.fit(X_train, y_train)

# Predictions
y_pred = rf.predict(X_test)

In [None]:
# ---- CUSTOM INPUT ----
custom_input = pd.DataFrame({
    "Petrol_tax": [8.5],
    "Average_income": [4800],
    "Paved_Highways": [7000],
    "Population_Driver_licence(%)": [0.56]
})

# Predict
prediction = rf.predict(custom_input)

print("Predicted Petrol Consumption:", prediction[0])

Predicted Petrol Consumption: 478.985
