# AquaPredict

## Dataset Information

### Location:
An Internet of Things Labelled Dataset for Aquaponics Fish Pond Water Quality Monitoring System,
HiPIC Research Group, Department of Computer Science, University of Nigeria Nsukka, Nigeria
Click here for more details

### Contact:
Collins Udanor, email: collins.udanor@unn.edu.ng
Blessing Oguokiri, email: blessing.ogbuokiri@unn.edu.ng

### Dataset Information:
Aquaponics meta-data
The enclosed datasets are generated from freshwater aquaponics catfish ponds. The datasets are generated automatically at 5 seconds intervals using the following water quality sensors driven by the ESP 32 microcontroller: Dallas Instrument Temperature sensor (DS18B20), DF Robot Turbidity sensor, DF Robot Dissolved Oxygen sensor, DF Robot pH sensor V2.2, MQ-137 Ammonia sensor, and MQ-135 Nitrate sensor.
The project is funded by the Lacuna Award for Agriculture in Sub-Saharan Africa 2020 under the management of the Meridian Institute Colorado, USA.
The datasets and results in this section were sensor readings from June to mid-October 2021. There are 12 datasets, each from 12 aquaponics catfish ponds. Each of the 12 ponds IoT unit has six sensors (temperature, turbidity, dissolved oxygen, pH, ammonia, nitrate). As of the time of this report each unit has generated over 170,000 instances. The datasets are downloaded at intervals, cleaned, and labelled.

#### The attributes are:
1) Date/Time
2) Temperature
3) Turbidity
4) Dissolved Oxygen (DO)
5) pH
6) Ammonia
7) Nitrate
8) Population of fish in the pond
9)Length of Fish
10) Weight of Fish

### Attribute Information:
All attributes are continuous

No statistics available, but suggest to standardise variables for certain uses (e.g. for use with classifiers that are NOT scale-invariant)

*NOTE:* Attributes 9 and 10 are class identifiers that can be used differently. For example, one may need to predict the length of the fish using the water quality parameters (2-7) or in addition to the population or stocking density (parameter 8). Similarly, the same can be done using the weight attribute.

Source: [https://www.kaggle.com/datasets/ogbuokiriblessing/sensor-based-aquaponics-fish-pond-datasets](https://www.kaggle.com/datasets/ogbuokiriblessing/sensor-based-aquaponics-fish-pond-datasets)

## Imports

In [None]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [None]:
from fastai.imports import pd, plt, np

In [None]:
plt.rcParams['figure.figsize'] = (20, 5)

## Constants

In [None]:
OPTIMAL_WATER_TEMP = 24.5
OPTIMAL_BED_TEMP = 24.5

## Data

Take data from the first pond from [kaggle](https://www.kaggle.com/datasets/ogbuokiriblessing/sensor-based-aquaponics-fish-pond-datasets)

In [None]:
df = pd.read_csv("data/IoTpond1.csv", parse_dates=["created_at"])

In [None]:
df.drop(['entry_id', 'Turbidity(NTU)', 'Dissolved Oxygen(g/ml)', 'PH', 'Ammonia(g/ml)', 'Nitrate(g/ml)','Population', 'Fish_Length(cm)', 'Fish_Weight(g)'], axis=1, inplace=True, errors='ignore')

In [None]:
# create DatetimeIndex from created_at
df.index = df.created_at

In [None]:
df

In [None]:
df.dtypes

In [None]:
df.describe(datetime_is_numeric=True)

Let assume that given parameter describe water temperature.

In [None]:
# rename columns Temperature (C) to temp_water
df.rename(columns={'Temperature (C)': 'temp_water'}, inplace=True)

In [None]:
# remove rows with temp_water == -127
df = df[df.temp_water != -127]

In [None]:
# show plot of temp_water. Agreggate by day. Plot as line chart
plt.plot(df.temp_water.resample('D').mean(), label='temp_water day')
plt.plot(df.temp_water.resample('M').mean(), label='temp_water month')
plt.legend()
plt.show()


Let ambient temperature be ranging from 21 to 29

[source](https://www.climatestotravel.com/climate/nigeria)

In [None]:
print(np.mean([21,29]))

Then ambient and the growbed temperature have higher variance.

In [None]:
# add simple moving average of temp_water from 3 days to df
df['temp_water_sma'] = df["temp_water"].rolling('3D').mean()

In [None]:
# plot temp_water and temp_water_sma
plt.plot(df.temp_water, label='temp_water')
plt.plot(df.temp_water_sma, label='temp_water_sma')
plt.legend()
# increase width of the graph
plt.show()

Amplify distance from `sma`

In [None]:
# Amplify distance from `sma` to `temp_water`
df['temp_bed'] = df['temp_water_sma'] + (df['temp_water'] - df['temp_water_sma']) * 2
df['temp_ambient'] = df['temp_water_sma'] + (df['temp_water'] - df['temp_water_sma']) * 3

In [None]:
df.describe()

In [None]:
# get slice of df from 2021-08-01 to 2021-08-07
df_slice = df.loc['2021-08-01':'2021-08-07']
# plot temp_water and temp_water_sma and temp_bed and temp_ambient
plt.plot(df_slice.temp_water, label='temp_water')
plt.plot(df_slice.temp_water_sma, label='temp_water_sma')
plt.plot(df_slice.temp_bed, label='temp_bed')
plt.plot(df_slice.temp_ambient, label='temp_ambient')
plt.legend()

plt.show()

Add offset to water and growbad temperatures. Shifting using datetime is more precise but it would need interpolation to match indexes.

In [None]:
# shift temp_bed column by 3 hours
df_shift = df.shift(periods=3, freq='H')
left, right = df.align(df_shift, join='inner', axis=0)
left["temp_bed"] = right["temp_bed"]

# shift temp_water column by 5 hours
df_shift = df.shift(periods=5, freq='H')
left_2, right_2 = df.align(df_shift, join='inner', axis=0)
left_2["temp_water"] = right_2["temp_water"]

# shift by periods
df_shift_simple = df 
df_shift_simple["temp_bed"] = df["temp_bed"].shift(periods=200)
df_shift_simple["temp_water"] = df["temp_water"].shift(periods=400)

df_shifted = df_shift_simple

In [None]:
# get slice of df from 2021-08-01 to 2021-08-07
df_slice = df_shifted.loc['2021-08-01':'2021-08-12']
# plot temp_water and temp_water_sma and temp_bed and temp_ambient
plt.plot(df_slice.temp_water, label='temp_water' )
plt.plot(df_slice.temp_water_sma, label='temp_water_sma')
plt.plot(df_slice.temp_bed, label='temp_bed')
plt.plot(df_slice.temp_ambient, label='temp_ambient')

plt.legend()

In [None]:
df = df_shifted

Add heater values

In [None]:
# add column "water_heater" with 1 if temp_water < OPTIMAL_WATER_TEMP and 0 otherwise
df['water_heater'] = np.where(df['temp_water'] < OPTIMAL_WATER_TEMP, 1, 0).astype('int8')
df['bed_heater'] = np.where(df['temp_bed'] < OPTIMAL_BED_TEMP, 1, 0).astype('int8')
df.dtypes

In [None]:
counts = pd.DataFrame(df["bed_heater"].value_counts())
counts["water_heater"] = df["water_heater"].value_counts()
counts.plot(kind='bar', title='bed_heater vs water_heater')

Save dataset in feather format

In [None]:
# remove index from df
df.drop(['created_at'], axis=1, inplace=True)
df.reset_index(inplace=True)
os.makedirs('tmp', exist_ok=True)
df.to_feather('tmp/aquaponics.feather')

In [None]:
df = pd.read_feather('tmp/aquaponics.feather')
df.index = df.created_at