advanced techniques for improving the performance and generalization power of recurrent neural networks:
- recurrent dropout -- specific, built-in way to use dropout to fight overfitting in recurrent layers
- stacking recurrent layers -- increases the representational power of the network (at the cost of higher computational loads)
- bidirectional recurrent layers -- present the same information to a recurrent network in different ways, increasing accuracy and mitigating forgetting issues

### A Stock Price Forecasting Problem

In [None]:
import os 
import pandas as pd

data_dir = 'C:/Users/jim_c/Desktop/Task/local_base/benchmark_index'
fname = os.path.join(data_dir, '000300.SH.csv')

df = pd.read_csv(fname, header=0, 
                 usecols=[2,3,4,5,6,7],
                 names=["open", "high", "low", "close", "vol", "amount"])
df = df.reindex(index=df.index[::-1])

output_path='C:/Users/jim_c/Desktop/Task/Literature/Deep_Learning_with_Python/Deep_Learning_in_Practice/deep_learning_for_text_and_sequences/HS300.csv'
df.to_csv(output_path,sep=',',index=True,header=True)

In [None]:
# onspecting the data of the HS.300 dataset
import os 

data_dir = 'C:/Users/jim_c/Desktop/Task/Literature/Deep_Learning_with_Python/Deep_Learning_in_Practice/deep_learning_for_text_and_sequences'
fname = os.path.join(data_dir, 'HS300.csv')

f = open(fname)
data = f.read()
f.close()

lines = data.split('\n')
header = lines[0].split(',')
lines = lines[1:]

print(header)
print(len(lines))

In [None]:
# parsing the data (indexerror but work)
import numpy as np

float_data = np.zeros((len(lines) - 1, len(header) - 1))
for i, line in enumerate(lines):
    values = [float(x) for x in line.split(',')[1:]]
    float_data[i, :] = values

In [None]:
# plotting the closing price timeseries
from matplotlib import pyplot as plt

close_price = float_data[:, 3]
plt.plot(range(len(close_price)), close_price)

In [None]:
# ploting the recent 250 days of closing price timeseries
plt.plot(range(250), close_price[len(close_price) - 250:])

### Preparing the Data

exact formlation:
- given data going as far back as lookback timesteps (a timestep is 1 trading day) and sampled every steps tiemsteps, can you predict the temperature in delay timesteps?

- lookback = 60 (obeservations will go back 3 months)
- steps = 5 (observations will be sampleed at one data point per week)
- delay = 1 (target will be the next day in future)

- preprocess the data to a format a neural network can ingest; normalize each timeseries independently so that they all take small values on a similar scale
- write a python generator that takes the current arry of float data and yield batches of data from the recent past

preprocess the data by subtracting the mean of each timeseries and dividing by the standard deviation; the first 4000 timestepds will be as the training daata

In [None]:
# normalizing the data
mean = float_data[:4000].mean(axis=0)
float_data -= mean
std = float_data[:4000].std(axis=0)
float_data /= std

the data generator yields a tuple (samples, targets), where samples is one batch of input data and targets is the corresponding arry of target temperatures, with the following arguments:
- data -- the original array of floating-point data, which is normalized
- lookack -- timesteps back the input data go
- delay  -- timesteps in the future target
- min_index and max_index -- indices in the data array that delimit which timesteps to draw from ; useful for keeping a segment of the data for validation and another for testing
- shuffle -- whether to shuffle the samples or draw them in chronological order
- batch_size = the number of samples per batch
- steps = the period in timesteps, at which you sample data; set to 5 in order to draw one data point per week

In [None]:
# generator yielding timeseries samples an their targets
def generator(data, lookback, delay, min_index, max_index,
              shuffle=False, batch_size = 64, step=5):
    if max_index is None:
        max_index = len(data) - delay - 1
    i = min_index + lookback
    while 1:
        if shuffle:
            rows = np.random.randint(
                min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                    i = min_index + lookback
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)
                    
        samples = np.zeros((len(rows), 
                            lookback // step,
                            data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay][1]
        yield samples, targets

use the generator function to instantiate three generators: one for training, one for validation, and one for testing:
- training -- the first 4000 timesteps
- validation -- the following 300 timesteps
- test -- the remaineder

In [None]:
# preparing the training, validation, and test generators
lookback = 60
step = 5
delay = 1
batch_size = 64

train_gen = generator(float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=4000,
                      shuffle=True,
                      step=step,
                      batch_size=batch_size)

val_gen = generator(float_data,
                    lookback=lookback,
                    delay=delay,
                    min_index=4001,
                    max_index=4300,
                    step=step,
                    batch_size=batch_size)

test_gen = generator(float_data,
                     lookback=lookback,
                     delay = delay,
                     min_index=4301,
                     max_index=None,
                     step=step,
                     batch_size=batch_size)

val_steps = (4300 - 4001 - lookback)

test_steps = (len(float_data) - 4301 - lookback)

### A Common-Sense, Non-Machine-Learning Baseline

simply assume that the stock market series can safely be assumed to be continuous as well as periodical with a daily period, thus a common sense approach is to predict that the stock market series will be equal to its previous days

evaluate with the mean absolute error metric: np.mean(np.abs(preds - targets))

In [None]:
# computing thte common-sense baseline MAE
def evaluate_navie_method():
    batch_maes = []
    for step in range(val_steps):
        samples, targets = next(val_gen)
        preds = samples[:, -1, 1]
        mae = np.mean(np.abs(preds - targets))
        batch_maes.append(mae)
        print(np.mean(batch_maes))

evaluate_navie_method()

In [None]:
# converting the mae back to a celsius error
celsius_mae = 0.29 * std[1]
celsius_mae

### A Basic Machine-Learning Approach

it's useful to try simple, cheap machine-learning models (such as small, densely connected networks) before looking into complicated and computationally expensive models such as RNNs, which make sure any further complexity you throw at the problem is legitimate and delives real benefits

note that the lack of activation function on the last Dense layer, which is typical for a regression problem; using mae as the loss beacause you evaluate on the  exact same data and with the exact same metric with the common-sense approach, the results will be directly comparable

In [None]:
# training and evaluating a densely connected model
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.Flatten(input_shape=(lookback // step, float_data.shape[-1])))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
                              steps_per_epoch=128,
                              epochs=10,
                              validation_data=val_gen,
                              validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

You may wonder, if a simple, well-performing model exists to go from the data to the targets (the common-sense baseline), why doesn’t the model you’re training find it and improve on it? Because this simple solution isn’t what your training setup is looking for. The space of models in which you’re searching for a solution—that is, your hypothesis space—is the space of all possible two-layer networks with the configuration you defined. These networks are already fairly complicated. When you’re looking for a solution with a space of complicated models, the simple, well-performing baseline may be unlearnable, even if it’s technically part of the hypothesis space. That is a pretty significant limitation of machine learning in general: unless the learning algorithm is hardcoded to look for a specific kind of simple model, parameter learning can sometimes fail to find a simple solution to a simple problem.

### A First Recurrent Baseline

In [None]:
# training and evaluating a GRU-based model
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.GRU(32, input_shape=(None, float_data.shape[-1])))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=12,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

### Using Recurrent Dropout to Fight Overfitting

In [None]:
# training and evaluating a dropout-regularized GRU-based model
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.GRU(32, 
                     dropout=0.5,
                     recurrent_dropout=0.5,
                     input_shape=(None, float_data.shape[-1])))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=10,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

### Stacking Recurrent Layers

because the model is no longer overfitting but seem to have hit a performance bottleneck, its capacity should be increased 

increseing network capacity is typically done by increasing the number of units in the layers or addnig more layers; to stack recurrent layers on top of each other in keras, all intermediate layers should return their full sequence of output rahter than their outptu at thte last timestep; this is done by specifying 'return_sequence=True'

In [None]:
# training and evaluating a dropout-regularized, stacked GRU model
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.GRU(32, 
                     dropout=0.2,
                     recurrent_dropout=0.5,
                     input_shape=(None, float_data.shape[-1]),
                     return_sequences=True))
model.add(layers.GRU(64, activation='relu',
                     dropout=0.2,
                     recurrent_dropout=0.5))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=16,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

### Using Bidirectional RNNs

a bidirectional RNN is a common RNN variant taht can offer greater performance than a regualr RNN on certain tasks, frequently used in natural-language processing

- RNNs are notably order dependent, or time dependent: process the timesteps of their input sequence in order and shuffling or reversing the timesteps can completely change the representation the RNN extracts from the sequence
- a bidirectional RNN exploits the orfer sensitivity of RNNs: it consists of using two regular RNNs, such as the GRU an LSTM layers , each of which processes the input sequence in one direction (chronologically and antichronologically), and then merging their representations
- by processing a sequence both ways, a bidirectional RNN can catch patterns that may be overlookded by a unidirectional RNN

In [None]:
# generator yielding reversed timeseries samples an their targets
def generator(data, lookback, delay, min_index, max_index,
              shuffle=False, batch_size = 64, step=5):
    if max_index is None:
        max_index = len(data) - delay - 1
    i = min_index + lookback
    while 1:
        if shuffle:
            rows = np.random.randint(
                min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                    i = min_index + lookback
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)
                    
        samples = np.zeros((len(rows), 
                            lookback // step,
                            data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay][1]
        yield samples[:, ::-1, :], targets

In [None]:
# preparing the reversed training, validation, and test generators
lookback = 60
step = 5
delay = 1
batch_size = 64

train_gen = generator(float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=4000,
                      shuffle=True,
                      step=step,
                      batch_size=batch_size)

val_gen = generator(float_data,
                    lookback=lookback,
                    delay=delay,
                    min_index=4001,
                    max_index=4300,
                    step=step,
                    batch_size=batch_size)

test_gen = generator(float_data,
                     lookback=lookback,
                     delay = delay,
                     min_index=4301,
                     max_index=None,
                     step=step,
                     batch_size=batch_size)

val_steps = (4300 - 4001 - lookback)

test_steps = (len(float_data) - 4301 - lookback)

In [None]:
# training and evaluating a dropout-regularized, stacked GRU model
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.GRU(32, 
                     dropout=0.2,
                     recurrent_dropout=0.5,
                     input_shape=(None, float_data.shape[-1]),
                     return_sequences=True))
model.add(layers.GRU(64, activation='relu',
                     dropout=0.2,
                     recurrent_dropout=0.5))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=16,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

the reversed-order GRU strongly underperforms even the common-sense baselinem, indicating that chronological processing is important to the success in this case

In [None]:
# training and evaluating an LSTM using reversed sequences
model = Sequential()
model.add(layers.LSTM(32, 
                     dropout=0.2,
                     recurrent_dropout=0.5,
                     input_shape=(None, float_data.shape[-1]),
                     return_sequences=True))
model.add(layers.LSTM(64, activation='relu',
                     dropout=0.2,
                     recurrent_dropout=0.5))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=16,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

![How a Bidirectional RNN Layer Works](./bidir_rnn.png)

In [None]:
# training and evaluating a bidirectional LSTM
model = Sequential()
model.add(layers.Bidirectional(layers.LSTM(32, 
                                          dropout=0.2,
                                          recurrent_dropout=0.5,
                                          input_shape=(None, float_data.shape[-1]),
                                          return_sequences=True)))
model.add(layers.Bidirectional(layers.LSTM(64, activation='relu',
                                          dropout=0.2,
                                          recurrent_dropout=0.5)))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=16,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

In [None]:
# training and evaluating a bidirectional GRU
model = Sequential()
model.add(layers.Bidirectional(layers.GRU(32, 
                                         dropout=0.2,
                                         recurrent_dropout=0.5,
                                         input_shape=(None, float_data.shape[-1]),
                                         return_sequences=True)))
model.add(layers.Bidirectional(layers.GRU(64, activation='relu',
                                         dropout=0.2,
                                         recurrent_dropout=0.5)))
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,
                    steps_per_epoch=128,
                    epochs=16,
                    validation_data=val_gen,
                    validation_steps=val_steps)

In [None]:
# plotting results
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.show()

### Going Even Further

there are many other things to imporve performance on the temperature-forecasting problem:
- adjust the number of units in each recurrent layer in the stacked setup, the current choice are largely arbitraty and thus probably suboptimal
- adjust the learning rate used by the RMSprop optimizer
- try using LSTM layers instead of GRU layers
- try using a bigger densely connected regressor on top of the recurrent layers: a bigger Dense layer or even a stack of Dense layers
- don't forget to eventually run the best-performing models (in terms of validation MAE) on the test set! otherwise, the architectures are overfitting to the validation set

![Markets and Machine Learning](./markets_and_ml.png)

In [None]:
for value in test_gen:
    print(value)

In [None]:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
 
if __name__=='__main':
    #test image directory
    dst_path = 'E:/datasets/Rcam-plusMelangerTaile/8KLSBackWindow/test'
    #model path
    model_file = "D:/CF_new/piglin_alchemy/ckpt/tl_weights.40-0.7715.h5"
    batch_size = 8
 
    # load model
    model = load_model(model_file)
    # generator image
    test_datagen = ImageDataGenerator(rescale=1. / 255)
 
    test_generator = test_datagen.flow_from_directory(
        dst_path,
        target_size=(128, 128),
        batch_size=batch_size,
        shuffle=False
        )
 
    labels = test_generator.class_indices #查看类别的label
    #然后直接用predice_geneorator 可以进行预测
    test_generator.reset()
    pred = model.predict_generator(test_generator, verbose=1)
    # 输出每个图像的预测类别
    predicted_class_indices = np.argmax(pred, axis=1)
    #测试集的真实类别
    true_label= test_generator.classes
 
    #使用pd.crosstab来简单画出混淆矩阵
    import pandas as pd
    table=pd.crosstab(predicted_class_indices,true_label,colnames=['predict'],rownames=['label'])
    print(table)