### Project analyzes soil data received from sensors. Our main goal is to compare data and see which one is more reliable for measurements.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
list_data = ['L10_switch4', 'L11_switch4', 'L13_sensor123','L23_sensor123', 'L03_sensor003', 'L14_sensor003']

data = {}
for i in list_data:
    value = pd.read_csv('datasets/{}.csv'.format(i))
    data[i] = value
    
for key, value in data.items():
    print('{}\n{}\n'. format(key, data[key].head(3)))

---
Some soils have additional columns of Air_Hum and Ai_temp. These features were not measured so we can drop them. 

---

In [None]:
# drop Air_Hum, Air_temp columns

drop_data = ['L13_sensor123', 'L23_sensor123', 'L03_sensor003']

for i in drop_data:
    data[i] = data[i].drop(['Air_Hum', 'Air_temp'], axis = 1)

In [None]:
fig = plt.figure(figsize = (32, 32))

for i in range(0,6):
    ax = fig.add_subplot(3,2,i+1)

    ax.plot(data[list_data[i]]['Moisture'], label = 'Moisture')
    ax.set_title(list_data[i])
    for key in ax.spines:
        ax.spines[key].set_visible(False)
    ax.tick_params(bottom = False, top = False, left = False, right = False)



---
We can see from graphs of moisture above, that sensors have usually same values apart from switch4 which values jump too high. We can observe this pattern with any data measured by switch4 and below we can see big difference between minimal and maximal values.  

---

In [None]:
print('Moisture L10_switch4\n\nmin: {}\nmax: {}'.format(data['L10_switch4']['Moisture'].min(), data['L10_switch4']['Moisture'].max()))


---  
We will focus on sensor 003, which data is more meaningful and sufficient.  
In next screen we can see plots of all features for sensor003  

---

In [None]:
L14 = data['L14_sensor003']
L03 = data['L03_sensor003']

In [None]:
def plot_sensor003(datasets1, datasets2):    
    fig = plt.figure(figsize = (12, 10))
    plt.subplot(3,2,1)
    plt.plot(datasets1['Moisture'], label = 'Moisture')
    plt.title('L14_sensor003')
    plt.legend()

    plt.subplot(3,2,2)
    plt.plot(datasets2['Moisture'], label = 'Moisture')
    plt.title('L03_sensor003')

    plt.subplot(3,2,3)
    plt.plot(datasets1['Soil_temp'], c = 'red', label = 'Soil temp')
    plt.legend()

    plt.subplot(3,2,4)
    plt.plot(datasets2['Soil_temp'], c = 'red', label = 'Soil temp')

    plt.subplot(3,2,5)
    plt.plot(datasets1['EC'], c = 'orange', label = 'EC')
    plt.legend()

    plt.subplot(3,2,6)
    plt.plot(datasets2['EC'], c = 'orange', label = 'EC')
    return fig

sensor003 = plot_sensor003(L14, L03)
                               

---
* Soil temp for L14 has some unreasonable values, that we can remove.
---


In [None]:
# statistics for L14, min and max values 

L14['Soil_temp'].describe()

In [None]:
# values with temp higher than 30 degress

high_temp = L14[L14['Soil_temp'] > 30]
high_temp

In [None]:
# values with temp lower than 20 degrees

low_temp = L14[L14['Soil_temp'] < 20]
low_temp

In [None]:
# filtering out these values

data['L14_sensor003'] = data['L14_sensor003'][(data['L14_sensor003']['Soil_temp'] > 20) & (data['L14_sensor003']['Soil_temp'] < 30)]

In [None]:

data['L14_sensor003']['Soil_temp'].describe()

In [None]:
sensor003 = plot_sensor003()

---

From graphs above, we can see actually 2 experiments. With L14 we measured soil continuosly without any interference. Though L03 soil was measured for 90 minutes and then heated in oven for another 90 minutes. We repeated this cycle 3 times per day and we can observe this pattern everytime values jump high. With soil_temp it's reasonable that values increased after heating, but with moisture we expected to decrease. The reason why the moisture increased after heating could be perhaps due to changed position of coil in soil. 

---