# Aalto Pro/Diploma in Artificial Intelligence: Project work

# Recommendating Virtual Assistant for Sustainable Waste Management Onboard Cruise Ship

## Background

In this work, a dataproduct will be produced by using data science tools and machine learning (ML) algorithms that would help a cruise ship operator to plan the future waste management operations by a most sustainable and feasible way. The project work is done by constructing the ML-product by using real shipboard operational data and external web sources. The purpose is to create ML-system that would give a decent picture and prediction of the best option for the operations.

The main goal is to produce a tool and by no means aiming for a product that would give a comprehensive picture of the whole situation. The work is done with Jupyter Lab and by using Anaconda Python package management system, meaning thatthe coding part will be done on Python language.

There are two aspects in this work: Sustainability and feasibility. Evac Oy supplies integrated cleantech solutions, including e.g. waste management systems, to all types of ships. The sustainability is the key value for all major cruise ship owners. Not only because of regulations and guidelines, but because of acceptability of the cruise business as a whole. The sustainability part will be done as a prediction by using mass balance of recyclable materials and carbon balance derived from the balance. In addition to sustainability, the operations should preferably be as economical as possible. This part would be covered by market price data of recyclable materials as well as the carbon price.

### Goals and risks

The goal is to study how to extract usable data from the existing sources, clean and explore the data so that a usable predictive ML models could be produced and trained with new and fresh data in the future from the IoT-system. If this goal is achieved, it may be worth of considering to apply and further develop the extent of the work to an usable data science product.

The IoT-system that would be used for the purpose is installed and connected to our recent new-build cruise ship project. The risk is that the availability of the data may delay. In that case, we need to somehow "construct" the data based on our best knowledge. I am planning to parse a part of the data from the web. The other risk is that the data cannot be parsed and cleaned properly due to complexity into a usable form. Third risk here may be that the available relevant data may not be sufficient for training of the constructed ML model.

### Data Perspective

There are both internal and external data sources here that would be used. The internal data source would be a dry and wet waste production data. That would be derived on a daily basis from the ship IoT-system in form of CSV-files. One CSV file for the dry waste production and one for the wet waste production. The external data would be historical trends of market values of recycled materials and value of carbon in terms of greenhouse gas emission abatement.

Concerning the data availability, I have already contacted the ship owner for the waste data. As mentioned, the raw data would be in form of CSV-files. Some data wrangling with Pandas is needed as well as calculations for the deriving the actual values. The challenge for the shipboard data is the timing of the availability. We are working on that matter. In terms of market value information, some data is already gained. However, I would like to automize this activity as a continuous retrieval of the information. That wouldupdate the output signal of the ML-model each time the data would be available. The challenge here may be to find a data source that would produce a single value. For this project work purpose, we may need to use the source that is fairl easily available.



# Data sources and methods

## The work flow for the data science and producing the machine learning model

The text will be added here. Before that, I will include below Scikitlearn-rehearsal:

In [27]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

or:

In [28]:
FWW_flow = pd.read_csv('MunVene2/BORGEN_PUMP0.csv', delimiter=';')

In [29]:
FWW_flow.head()

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,BORGEN_PUMP_STATUS,20.02.2018 05:10:27,-5,1,431512155897106
1,BORGEN_PUMP_STATUS,20.02.2018 05:10:29,-5,1,431512156130903
2,BORGEN_PUMP_STATUS,20.02.2018 05:10:31,-5,1,431512156364931
3,BORGEN_PUMP_STATUS,20.02.2018 05:10:33,-5,1,431512156598843
4,BORGEN_PUMP_STATUS,20.02.2018 05:10:35,-5,1,43151215683287


In [30]:
FWW_flow.shape

(401195, 5)

In [31]:
FWTank1Level = pd.read_csv('MunVene2/LEVEL_TANK_A0.csv', delimiter=';')

In [32]:
FWTank1Level.head()

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,FOODWASTE_TANK_A_LEVEL_TREND,25.02.2018 16:07:23,47,1,431566717885185
1,HARSLEV_PUMP_1_1PM_1_STATUS,25.02.2018 16:07:25,20,1,431566718119097
2,THREEWAY_VALVE_1_1PV4_STATUS,25.02.2018 16:07:25,5,1,431566718119097
3,HARSLEV_PUMP_1_1PM_1_STATUS,25.02.2018 16:07:27,20,1,431566718352894
4,THREEWAY_VALVE_1_1PV4_STATUS,25.02.2018 16:07:27,5,1,431566718352894


In [33]:
FWTank1Level.shape

(402860, 5)

In [34]:
BWTankLevel = pd.read_csv('MunVene2/LEVEL_BIOSLUDGETANK0.csv', delimiter=';')

In [35]:
BWTankLevel.head()

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,BIO_SLUDGE_MIXING_STATUS,17.02.2018 21:03:26,0,1,431488773880556
1,BIO_SLUDGE_TANK_LEVEL_TREND,17.02.2018 21:03:26,19,1,431488773880556
2,BIO_SLUDGE_MIXING_STATUS,17.02.2018 21:03:31,0,1,431488774465162
3,BIO_SLUDGE_TANK_LEVEL_TREND,17.02.2018 21:03:31,19,1,431488774465162
4,BIO_SLUDGE_MIXING_STATUS,17.02.2018 21:03:36,0,1,43148877505


In [36]:
Burner1 = pd.read_csv('MunVene2/BURNER_10.csv', delimiter=';')

In [37]:
Burner1.head(8)

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:10,246,1,431371855315972
1,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:15,245,1,43137185590081
2,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:20,245,1,431371856485417
3,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:25,245,1,431371857070139
4,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:30,245,1,431371857654745
5,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:35,245,1,431371858239468
6,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:40,244,1,431371858824074
7,INCI1_DB_PV_MAIN_CHAMBER,06.02.2018 04:27:45,244,1,431371859408796


In [38]:
Burner2 = pd.read_csv('MunVene2/BURNER_20.csv', delimiter=';')

In [39]:
Burner2.head(8)

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:10,230,1,431371855315972
1,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:15,230,1,43137185590081
2,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:20,229,1,431371856485417
3,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:25,228,1,431371857070139
4,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:30,229,1,431371857654745
5,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:35,228,1,431371858239468
6,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:40,228,1,431371858824074
7,INCI1_DB_PV_SECOND_CHAMBER,06.02.2018 04:27:45,227,1,431371859408796


In [40]:
Burner2.shape

(400478, 5)

In [41]:
Dryer0 = pd.read_csv('MunVene2/DRYER0.csv', delimiter=';')

In [45]:
Dryer0.shape

(406811, 5)

In [44]:
Dryer0.head(20)

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,6S1_MOTOR_CURRENT,10.02.2018 14:39:47,5007233,1,431416109657639
1,6S1_MOTOR_CURRENT,10.02.2018 14:39:52,5007233,1,43141611024213
2,6S1_MOTOR_CURRENT,10.02.2018 14:39:58,5007233,1,431416110826968
3,6S1_MOTOR_CURRENT,10.02.2018 14:40:03,5007233,1,431416111411574
4,6S1_MOTOR_CURRENT,10.02.2018 14:40:08,5007233,1,431416111996296
5,6S1_MOTOR_CURRENT,10.02.2018 14:40:13,5007233,1,431416112580903
6,6S1_MOTOR_CURRENT,10.02.2018 14:40:18,5007233,1,431416113165625
7,6S1_MOTOR_CURRENT,10.02.2018 14:40:23,5007233,1,431416113750231
8,6S1_MOTOR_CURRENT,10.02.2018 14:40:28,5007233,1,431416114334954
9,6S1_MOTOR_CURRENT,10.02.2018 14:40:33,5007233,1,431416114919792


In [50]:
FluegasTemp0 = pd.read_csv('MunVene2/FLUEGAS0.csv', delimiter=';')

In [51]:
FluegasTemp0.shape

(400478, 5)

In [54]:
FluegasTemp0.head()

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,INCI1_DB_PV_FLUEGAS_TEMPERATURE,06.02.2018 04:27:10,75,1,431371855316088
1,INCI1_DB_PV_FLUEGAS_TEMPERATURE,06.02.2018 04:27:15,75,1,43137185590081
2,INCI1_DB_PV_FLUEGAS_TEMPERATURE,06.02.2018 04:27:20,75,1,431371856485417
3,INCI1_DB_PV_FLUEGAS_TEMPERATURE,06.02.2018 04:27:25,74,1,431371857070139
4,INCI1_DB_PV_FLUEGAS_TEMPERATURE,06.02.2018 04:27:30,74,1,431371857654745


In [55]:
FluegasFlow0 = pd.read_csv('MunVene2/ID_FAN0.csv', delimiter=';')

In [56]:
FluegasFlow0.head()

Unnamed: 0,VarName,TimeString,VarValue,Validity,Time_ms
0,INCI1_DB_PV_IDFAN_SPEED,24.02.2018 21:24:51,60,1,431558922612732
1,INCI1_DB_PV_IDFAN_SPEED,24.02.2018 21:24:52,60,1,431558922729745
2,INCI1_DB_PV_IDFAN_SPEED,24.02.2018 21:24:53,60,1,431558922846644
3,INCI1_DB_PV_IDFAN_SPEED,24.02.2018 21:24:54,60,1,431558922963657
4,INCI1_DB_PV_IDFAN_SPEED,24.02.2018 21:24:55,60,1,431558923080556
