# 0. 背景

本模型为根据将来某日的温度、湿度，和上次浇水的日期，预测改日是否可以进行浇水。适合于像我这种初期养花人，要么忘记浇水、要么就浇水太多，需要一套模型来继续预测。

例如，上次浇水是在一个星期之前了，那今天就应该浇水；如果天气太热、或湿度太干燥，那就浇水间隔短一点。后续如有可能的话，还考虑引入是否下雨了，例如在雷雨天气，也需要减少浇水频率。

是否需要浇水，这显然是个二分类问题。

# 1.训练模型

### 1.1 包含头文件

In [92]:
import pandas as pd
import time 
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense

### 1.2 读取历史数据

该文件记录了【浇水】的日期，包含当天的温度和湿度

In [93]:
history = pd.read_csv("./flower.csv", usecols=['date', 'temperature','humidity'])
print(history)

          date  temperature  humidity
0     2019/1/1           20        50
1     2019/1/4           20        50
2     2019/1/7           20        50
3    2019/1/10           20        50
4    2019/1/13           20        50
..         ...          ...       ...
123  2020/6/19           30        60
124  2020/6/22           30        60
125  2020/6/25           30        60
126  2020/6/28           30        60
127   2020/7/1           30        60

[128 rows x 3 columns]


### 1.3 预处理数据

日期，是不需要作为模型的输入参数，而日期间的间隔是需要的，所下面进行了预处理（可以先看下面的输出结果进行预处理后的结果）
处理后，data将是二维列表，元素是一个列表[天数间隔, 温度，湿度]， label是二维列表，元素是[是否需要进行浇水（1是0否）]

另外，flower.csv 记录了浇水了的日期，在此也生成未浇水的日期，才不会造成label里数值都是1，无法进行而分类。

In [94]:
# 计算两个日期间相差了多少天
def dateDiff(dateStr1, dateStr2):
    date1 = int(time.mktime(time.strptime(dateStr1, "%Y/%m/%d")))
    date2 = int(time.mktime(time.strptime(dateStr2, "%Y/%m/%d")))
    return int((date2 - date1)/3600/24)

# 增加日期
def dateAdd(dateStr, days):
    date = int(time.mktime(time.strptime(dateStr, "%Y/%m/%d")))
    date += days * 24 * 3600
    timeArray = time.localtime(date)
    return time.strftime("%Y/%m/%d", timeArray)

# 构建完整数据，包含未浇水的日期
dateTimeList = []
data = []
label = []

for index in history.index:
    item = list(history.loc[index])
    # print(item) #type(data.loc[indexs])为series
    
    if index == 0:
        # 第一天没有距离上次的天数，为0
        elapsedDays = 0
        date = item[0]
        temperature = item[1]
        humidity = item[2]
        dateTimeList.append(date)
        data.append([float(elapsedDays), float(temperature), float(humidity)])
        label.append([1.])
    else:
        # 距离上次浇花的天数
        lastDate = list(history.loc[index - 1])
        date = item[0]
        elapsedDays = dateDiff(lastDate[0], date)
        temperature = item[1]
        humidity = item[2]

        # 生成(lastDate, item) 之间的数据(采用等差自动生成)
        if elapsedDays > 1:
            temperatureList = np.linspace(lastDate[1], temperature, elapsedDays+1, dtype="float")
            humidityList = np.linspace(lastDate[2], humidity, elapsedDays+1, dtype="float")
            for k in range(1, elapsedDays):
                dateTimeList.append(dateAdd(lastDate[0], k))
                data.append([float(k), float(temperatureList[k]), float(humidityList[k])])
                label.append([0.])
        
        dateTimeList.append(date)
        data.append([float(elapsedDays), float(temperature), float(humidity)])
        label.append([1.])

补齐了每一天的数据，下图中最后一列为是否浇水：

In [95]:
print(len(data), len(label))
print(np.concatenate((data,label),axis=1))

548 548
[[ 0. 20. 50.  1.]
 [ 1. 20. 50.  0.]
 [ 2. 20. 50.  0.]
 ...
 [ 1. 30. 60.  0.]
 [ 2. 30. 60.  0.]
 [ 3. 30. 60.  1.]]


### 1.4 生成训练和测试数据

In [96]:
# 前 trainCount 条数据为训练数据，后续的为测试数据 
trainCount = 485

# 训练数据的日期
trainX = tf.reshape(np.array(data)[:trainCount, :], shape=[-1, 3])
trainY = tf.reshape(label[:trainCount], shape=[-1])
trainDateTimeList = dateTimeList[:trainCount]

# 测试数据的日期
testX = tf.reshape(np.array(data)[trainCount:, :], shape=[-1, 3])
testY = tf.reshape(label[trainCount:], shape=[-1])
testDateTimeList = dateTimeList[trainCount:]

### 1.5 构架模型

In [97]:
def build_model():
    model= Sequential()
    model.add(Dense(4, activation='relu', input_dim=3))
    # Dense(4) is a fully-connected layer with 4 hidden units.
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='sgd')
    return model

def train(model, x, y):
    print('Training........')
    for step in range(1001):
        cost = model.train_on_batch(x, y)
        if step % 100 == 0:
            print('COST:', cost)
            
def test(model, x, y):
    print('Testing ------------')
    cost = model.evaluate(x, y, steps=40)
    print('test cost:', cost)
    W, b = model.layers[0].get_weights()
    print('Weights=', W, '\nbiases=', b)
      

model = build_model()
train(model, trainX, trainY)
test(model, testX, testY)

Training........
COST: 9.554024
COST: 0.5286655
COST: 0.52853525
COST: 0.5284359
COST: 0.52835834
COST: 0.5282964
COST: 0.5282465
COST: 0.52820534
COST: 0.5281713
COST: 0.5281424
COST: 0.5281179
Testing ------------
test cost: 0.01668531894683838
Weights= [[ 0.6021092   0.18700011 -0.20357186 -0.52960354]
 [ 0.07448053 -0.16953301  0.17461216 -0.12051457]
 [-0.26195943  0.23027727 -0.26747638  0.04654618]] 
biases= [ 0.         -0.01334758  0.         -0.00044463]


### 1.6 验证模型

In [111]:
y_pred = model.predict(testX, steps=1)

# 因为激活函数 sigmoid 的数值机制在是0.5，使用下列方式进行调整
y_pred = (y_pred*2).astype('int')

with tf.Session() as sess:
    datetimes = np.array(testDateTimeList).reshape((len(testDateTimeList), 1))
    print(np.concatenate((datetimes, testX.eval(), y_pred), axis=1))

[['2020/04/30' '5.0' '20.0' '50.0' '0']
 ['2020/5/1' '6.0' '20.0' '50.0' '0']
 ['2020/05/02' '1.0' '20.0' '50.0' '0']
 ['2020/05/03' '2.0' '20.0' '50.0' '0']
 ['2020/05/04' '3.0' '20.0' '50.0' '0']
 ['2020/05/05' '4.0' '20.0' '50.0' '0']
 ['2020/05/06' '5.0' '20.0' '50.0' '0']
 ['2020/5/7' '6.0' '20.0' '50.0' '0']
 ['2020/05/08' '1.0' '20.0' '50.0' '0']
 ['2020/05/09' '2.0' '20.0' '50.0' '0']
 ['2020/05/10' '3.0' '20.0' '50.0' '0']
 ['2020/05/11' '4.0' '20.0' '50.0' '0']
 ['2020/05/12' '5.0' '20.0' '50.0' '0']
 ['2020/5/13' '6.0' '20.0' '50.0' '0']
 ['2020/05/14' '1.0' '20.0' '50.0' '0']
 ['2020/05/15' '2.0' '20.0' '50.0' '0']
 ['2020/05/16' '3.0' '20.0' '50.0' '0']
 ['2020/05/17' '4.0' '20.0' '50.0' '0']
 ['2020/05/18' '5.0' '20.0' '50.0' '0']
 ['2020/5/19' '6.0' '20.0' '50.0' '0']
 ['2020/05/20' '1.0' '25.0' '50.0' '0']
 ['2020/5/21' '2.0' '30.0' '50.0' '0']
 ['2020/05/22' '1.0' '30.0' '50.0' '0']
 ['2020/5/23' '2.0' '30.0' '50.0' '0']
 ['2020/05/24' '1.0' '30.0' '50.0' '0']
 ['2020/

从上图看出，每一行的最后一列都为0明显不准确啊