# **Analysis of Fuel Consumption Data from Vessel Reports**

This notebook provides an in-depth analysis of the fuel consumption data recorded from vessel operations. The dataset includes key metrics such as `total_fo` (total fuel oil consumption), `me_con` (main engine consumption), `ae_con` (auxiliary engine consumption), and `bl_con` (boiler consumption) across various time periods. 
The Excel file contains data related to fuel consumption on a vessel. The key columns are:

- **report_date_time**: The date and time when the report was generated.
- **status**: The status of the vessel at the time of the report (e.g., "DRIFTING," "AT SEA," "IN PORT").
- **total_fo**: Total fuel oil consumption.
- **me_con**: Main engine fuel consumption.
- **ae_con**: Auxiliary engine fuel consumption.
- **bl_con**: Boiler fuel consumption.



The primary objectives of this analysis are:
- To identify any discrepancies in fuel consumption records.
- To analyze trends in fuel consumption based on time, such as quarters and months.
- To provide actionable insights for operational efficiency.

In [87]:
import pandas as pd
df = pd.read_csv('Downloads//Vessel_1_Report_3_Fuel.csv')

In [89]:
df.head()

Unnamed: 0,report_date_time,status,total_fo,me_con,ae_con,bl_con
0,02-01-2018 19:48,DRIFTING,0.7,0.0,0.0,0.0
1,14-02-2018 12:00,AT SEA,75.3,0.0,0.0,0.0
2,22-03-2018 14:30,DRIFTING,1.1,0.0,0.0,0.0
3,16-05-2018 21:30,AT SEA,14.2,0.0,0.0,0.0
4,13-12-2018 12:00,IN PORT,2.3,0.0,0.0,0.0


In [91]:
df.describe()

Unnamed: 0,total_fo,me_con,ae_con,bl_con
count,6257.0,6257.0,6257.0,6257.0
mean,11.04758,5.775436,1.474614,0.362691
std,18.734881,14.429418,2.216978,0.72496
min,0.0,0.0,0.0,0.0
25%,0.67,0.0,0.0,0.0
50%,2.79,0.0,0.34,0.0
75%,9.3,1.8,2.3,0.3
max,90.04,77.69,17.88,8.4


## Data Cleaning and Preparation

In [94]:
df['report_date_time']= pd.to_datetime(df['report_date_time'],format='%d-%m-%Y %H:%M')
df.head()

Unnamed: 0,report_date_time,status,total_fo,me_con,ae_con,bl_con
0,2018-01-02 19:48:00,DRIFTING,0.7,0.0,0.0,0.0
1,2018-02-14 12:00:00,AT SEA,75.3,0.0,0.0,0.0
2,2018-03-22 14:30:00,DRIFTING,1.1,0.0,0.0,0.0
3,2018-05-16 21:30:00,AT SEA,14.2,0.0,0.0,0.0
4,2018-12-13 12:00:00,IN PORT,2.3,0.0,0.0,0.0


In [96]:
df.isnull().sum()

report_date_time      0
status              940
total_fo              0
me_con                0
ae_con                0
bl_con                0
dtype: int64

In [98]:
def corrected(row):
    if row['me_con'] == 0 and row['ae_con'] == 0 and row['bl_con'] == 0:
        return row['total_fo']
    else:
        return row['me_con'] + row['ae_con'] + row['bl_con']

df['corrected_total_fo'] = df.apply(corrected, axis=1)
df.head()

Unnamed: 0,report_date_time,status,total_fo,me_con,ae_con,bl_con,corrected_total_fo
0,2018-01-02 19:48:00,DRIFTING,0.7,0.0,0.0,0.0,0.7
1,2018-02-14 12:00:00,AT SEA,75.3,0.0,0.0,0.0,75.3
2,2018-03-22 14:30:00,DRIFTING,1.1,0.0,0.0,0.0,1.1
3,2018-05-16 21:30:00,AT SEA,14.2,0.0,0.0,0.0,14.2
4,2018-12-13 12:00:00,IN PORT,2.3,0.0,0.0,0.0,2.3


**Insight**:
- A `corrected_total_fo` column was created to correct any discrepancies where the `total_fo` value was correct, but the component-wise consumption (`me_con`, `ae_con`, `bl_con`) was missing. This ensured data accuracy before proceeding with further analysis.

In [101]:
# Fill missing 'status' values using forward fill
df['status'] = df['status'].ffill()

# Fix the transition from 'IN PORT' to 'DRIFTING' to 'AT SEA'
for i in range(1, len(df)):
    if df.loc[i-1, 'status'] == 'IN PORT' and df.loc[i, 'status'] == 'DRIFTING':
        df.loc[i, 'status'] = 'AT SEA'
df.head(17)

Unnamed: 0,report_date_time,status,total_fo,me_con,ae_con,bl_con,corrected_total_fo
0,2018-01-02 19:48:00,DRIFTING,0.7,0.0,0.0,0.0,0.7
1,2018-02-14 12:00:00,AT SEA,75.3,0.0,0.0,0.0,75.3
2,2018-03-22 14:30:00,DRIFTING,1.1,0.0,0.0,0.0,1.1
3,2018-05-16 21:30:00,AT SEA,14.2,0.0,0.0,0.0,14.2
4,2018-12-13 12:00:00,IN PORT,2.3,0.0,0.0,0.0,2.3
5,2017-12-14 12:00:00,AT SEA,43.5,0.0,0.0,0.0,43.5
6,2019-07-24 12:00:00,IN PORT,0.0,0.0,0.0,0.0,0.0
7,2022-06-21 07:47:00,IN PORT,0.0,0.0,0.0,0.0,0.0
8,2017-07-09 23:54:00,IN PORT,2.2,0.0,0.0,0.0,2.2
9,2019-09-04 14:42:00,AT SEA,2.96,0.0,0.0,0.0,2.96


**The dataset was first cleaned by handling missing values, particularly in the `status` column, where forward filling (`ffill`) was applied based on logical assumptions about vessel movement.**

In [104]:
def process_datetime_columns(df, datetime_col='report_date_time'):
    # Ensure the column is in datetime format
    df[datetime_col] = pd.to_datetime(df[datetime_col])

    # Extract various components from the datetime column
    df['Date'] = df[datetime_col].dt.date
    df['Time'] = df[datetime_col].dt.time
    df['Timetz'] = df[datetime_col].dt.timetz
    df['Year'] = df['report_date_time'].dt.year
    df['Month'] = df[datetime_col].dt.month
    df['Quarter'] = df[datetime_col].dt.quarter
    df['Day'] = df[datetime_col].dt.day
    df['Hour'] = df[datetime_col].dt.hour
    df['Minute'] = df[datetime_col].dt.minute
    df['Second'] = df[datetime_col].dt.second
    df['Microsecond'] = df[datetime_col].dt.microsecond
    df['Nanosecond'] = df[datetime_col].dt.nanosecond
    df['Dayofweek'] = df[datetime_col].dt.dayofweek
    df['Day_of_week'] = df[datetime_col].dt.day_of_week
    df['weekday'] = df[datetime_col].dt.weekday
    df['dayofyear'] = df[datetime_col].dt.dayofyear
    df['days_in_month'] = df[datetime_col].dt.days_in_month
    df['month_start'] = df[datetime_col].dt.is_month_start
    df['month_end'] = df[datetime_col].dt.is_month_end
    df['quarter_start'] = df[datetime_col].dt.is_quarter_start
    df['quarter_end'] = df[datetime_col].dt.is_quarter_end
    df['is_year_start'] = df[datetime_col].dt.is_year_start
    df['is_year_end'] = df[datetime_col].dt.is_year_end
    df['is_leap_year'] = df[datetime_col].dt.is_leap_year
    df['daysinmonth'] = df[datetime_col].dt.daysinmonth
    df['period'] = df[datetime_col].dt.to_period('M')
    df['strftime'] = df[datetime_col].dt.strftime('%a')
    df['Round'] = df[datetime_col].dt.round(freq='h')
    df['Floor'] = df[datetime_col].dt.floor('h')
    df['Ceil'] = df[datetime_col].dt.ceil('h')
    df['Month_Name'] = df[datetime_col].dt.month_name()
    df['Day_Name'] = df[datetime_col].dt.day_name()

    return df

df = process_datetime_columns(df)
df.head()

Unnamed: 0,report_date_time,status,total_fo,me_con,ae_con,bl_con,corrected_total_fo,Date,Time,Timetz,...,is_year_end,is_leap_year,daysinmonth,period,strftime,Round,Floor,Ceil,Month_Name,Day_Name
0,2018-01-02 19:48:00,DRIFTING,0.7,0.0,0.0,0.0,0.7,2018-01-02,19:48:00,19:48:00,...,False,False,31,2018-01,Tue,2018-01-02 20:00:00,2018-01-02 19:00:00,2018-01-02 20:00:00,January,Tuesday
1,2018-02-14 12:00:00,AT SEA,75.3,0.0,0.0,0.0,75.3,2018-02-14,12:00:00,12:00:00,...,False,False,28,2018-02,Wed,2018-02-14 12:00:00,2018-02-14 12:00:00,2018-02-14 12:00:00,February,Wednesday
2,2018-03-22 14:30:00,DRIFTING,1.1,0.0,0.0,0.0,1.1,2018-03-22,14:30:00,14:30:00,...,False,False,31,2018-03,Thu,2018-03-22 14:00:00,2018-03-22 14:00:00,2018-03-22 15:00:00,March,Thursday
3,2018-05-16 21:30:00,AT SEA,14.2,0.0,0.0,0.0,14.2,2018-05-16,21:30:00,21:30:00,...,False,False,31,2018-05,Wed,2018-05-16 22:00:00,2018-05-16 21:00:00,2018-05-16 22:00:00,May,Wednesday
4,2018-12-13 12:00:00,IN PORT,2.3,0.0,0.0,0.0,2.3,2018-12-13,12:00:00,12:00:00,...,False,False,31,2018-12,Thu,2018-12-13 12:00:00,2018-12-13 12:00:00,2018-12-13 12:00:00,December,Thursday


**Insight**:
- Time-related features such as the exact date, month, quarter, and year were extracted from the `report_date_time` column. This step was crucial for conducting time-based analyses, including identifying seasonal trends and variations in fuel consumption.


## Analysis of Fuel Consumption

In [108]:
# Create a column to identify mismatches
df['mismatch'] = df['total_fo'] != df['corrected_total_fo']

# Filter the DataFrame to find the mismatches
mismatch_df = df[df['mismatch']]

# Display the mismatches
print("Mismatches between total_fo and corrected_total_fo:")
print(mismatch_df[['report_date_time', 'total_fo', 'corrected_total_fo', 'mismatch']])

Mismatches between total_fo and corrected_total_fo:
        report_date_time  total_fo  corrected_total_fo  mismatch
13   2023-10-18 11:48:00     1.500                1.50      True
17   2019-10-12 08:24:00     3.840                3.84      True
20   2019-10-31 12:00:00     6.371                6.37      True
23   2019-11-10 12:00:00    59.020               59.02      True
28   2019-12-12 17:06:00     1.880                1.88      True
...                  ...       ...                 ...       ...
6246 2020-12-27 11:48:00     9.650                9.66      True
6248 2020-12-28 05:18:00     4.690                4.68      True
6253 2020-12-30 17:48:00     8.670                8.68      True
6255 2020-12-31 16:48:00     1.630                1.63      True
6256 2020-12-31 18:48:00     2.070                2.07      True

[1025 rows x 4 columns]


**Insight**:
- Upon checking for mismatches between `total_fo` and `corrected_total_fo`, it was found that some records displayed minimal floating-point differences, which were not significant. The dataset was found to be consistent after applying rounding to two decimal places.


In [111]:
# Group by year and month and find the max for each
monthly_analysis = df.groupby(['Year', 'Month']).agg({'me_con': 'sum',
                                                      'ae_con': 'sum',
                                                      'bl_con': 'sum'}).reset_index()
# Find the month and year with the highest me_con, ae_con, and bl_con
max_me_con = monthly_analysis.loc[monthly_analysis['me_con'].idxmax()]
max_ae_con = monthly_analysis.loc[monthly_analysis['ae_con'].idxmax()]
max_bl_con = monthly_analysis.loc[monthly_analysis['bl_con'].idxmax()]

# Display the results
print("Month and Year with Highest me_con:",max_me_con)

Month and Year with Highest me_con: Year      2020.00
Month        4.00
me_con    1399.83
ae_con     338.00
bl_con      19.46
Name: 33, dtype: float64


In [113]:
print("Month and Year with Highest ae_con:", max_ae_con)

Month and Year with Highest ae_con: Year      2020.00
Month        4.00
me_con    1399.83
ae_con     338.00
bl_con      19.46
Name: 33, dtype: float64


In [115]:
print("Month and Year with Highest bl_con:", max_bl_con)

Month and Year with Highest bl_con: Year      2024.00
Month        3.00
me_con     277.18
ae_con     181.70
bl_con      74.25
Name: 79, dtype: float64


**Insight**:
- The month of ``April 2020`` was identified as having the highest main engine consumption (`me_con`), while ``April 2020`` saw the peak in auxiliary engine consumption (`ae_con`). Boiler consumption (`bl_con`) was highest in ``March 2024``, suggesting a potential correlation with colder weather or increased heating demands.


In [118]:
# Group data by quarter and calculate the sum of each type of fuel consumption
quarterly_data = df.groupby('Quarter').agg({'me_con': 'sum',
                                            'ae_con': 'sum',
                                            'bl_con': 'sum',
                                            'corrected_total_fo': 'sum'}).reset_index()

# Identify the quarter with the maximum consumption for each type of fuel
max_me_con_quarter = quarterly_data.loc[quarterly_data['me_con'].idxmax()]
max_ae_con_quarter = quarterly_data.loc[quarterly_data['ae_con'].idxmax()]
max_bl_con_quarter = quarterly_data.loc[quarterly_data['bl_con'].idxmax()]
max_corrected_total_fo = quarterly_data.loc[quarterly_data['corrected_total_fo'].idxmax()]

In [120]:
print("Quarter with Highest me_con:",max_me_con_quarter)

Quarter with Highest me_con: Quarter                   4.000
me_con                10663.090
ae_con                 2335.372
bl_con                  538.900
corrected_total_fo    20078.072
Name: 3, dtype: float64


In [122]:
print("Quarter with Highest ae_con:",max_ae_con_quarter)

Quarter with Highest ae_con: Quarter                   2.000
me_con                 9085.296
ae_con                 2767.917
bl_con                  659.340
corrected_total_fo    17838.093
Name: 1, dtype: float64


In [124]:
print("Quarter with Highest bl_con:",max_bl_con_quarter)

Quarter with Highest bl_con: Quarter                   2.000
me_con                 9085.296
ae_con                 2767.917
bl_con                  659.340
corrected_total_fo    17838.093
Name: 1, dtype: float64


In [126]:
print("Quarter with Highest corrected_total_fo:",max_corrected_total_fo)

Quarter with Highest corrected_total_fo: Quarter                   4.000
me_con                10663.090
ae_con                 2335.372
bl_con                  538.900
corrected_total_fo    20078.072
Name: 3, dtype: float64


**Insight**:
- The analysis of fuel consumption based on quarters revealed that the ```highest fuel consumption occurred in Q4```, with the main engine (`me_con`) and total fuel oil (`total_fo`) showing significant increases. This could indicate increased operational activity during the_year_end.


**A detailed analysis was conducted on a quarterly basis to observe fuel consumption trends. Additionally, the data was grouped by month and year to identify periods of highest consumption for each engine component.**

In [134]:
# Create a pivot table
pivot_table = pd.pivot_table(df,
                             values=['me_con', 'ae_con', 'bl_con'],
                             index=['Year', 'Month'],
                             columns='status',
                             aggfunc='sum')

# Display the pivot table
display(pivot_table)

Unnamed: 0_level_0,Unnamed: 1_level_0,ae_con,ae_con,ae_con,bl_con,bl_con,bl_con,me_con,me_con,me_con
Unnamed: 0_level_1,status,AT SEA,DRIFTING,IN PORT,AT SEA,DRIFTING,IN PORT,AT SEA,DRIFTING,IN PORT
Year,Month,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
2017,7,0.00,0.00,0.000,0.00,0.00,0.000,0.00,0.00,0.00
2017,8,0.00,0.00,0.000,0.00,0.00,0.000,0.00,0.00,0.00
2017,9,0.00,0.00,0.000,0.00,0.00,0.000,0.00,0.00,0.00
2017,10,0.00,0.00,0.000,0.00,0.00,0.000,0.00,0.00,0.00
2017,11,0.00,0.00,0.000,0.00,0.00,0.000,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...
2024,3,96.96,35.53,49.210,37.65,14.60,22.000,244.57,30.83,1.78
2024,4,97.09,35.58,53.546,11.61,8.84,13.432,343.98,33.35,2.61
2024,5,95.01,42.43,41.530,13.96,17.74,21.420,318.22,42.81,0.25
2024,6,108.44,27.94,41.500,16.96,11.42,20.910,282.95,38.24,1.04


**A pivot table was created to summarize the cleaned data, providing a quick overview of fuel consumption patterns across different time periods.**


## Conclusions

The analysis of the fuel consumption data provided several key insights:

1. **Data Accuracy**: After correcting for any discrepancies in fuel consumption records, the dataset was found to be consistent, with no significant mismatches between `total_fo` and `corrected_total_fo`.
2. **Quarterly Trends**: The highest fuel consumption occurred in Q4, indicating a period of increased operational activity.
3. **Peak Consumption**: The peak periods for each type of fuel consumption were identified:
   - **Main Engine Consumption**: April 2020
   - **Auxiliary Engine Consumption**: April 2020
   - **Boiler Consumption**: March 2024
     
These findings suggest that fuel consumption tends to peak towards the end of the year

Overall, the analysis provides valuable insights into fuel consumption patterns