To train a Keras model using numpy without loading all the data into memory at once, you can use the fit_generator method of the Sequential or Model class.

First, you can create a custom data generator function that yields batches of data on the fly. This function can use pandas to load a subset of the data from the feather file into memory, transform it using numpy, and then yield the transformed batch.

You can then use the fit_generator method to train your Keras model on the generated batches of data. This method takes as input the data generator function, the number of steps per epoch (i.e., the number of batches to yield in each epoch), and the number of epochs to train for.

Here is some sample code to get you started:

In [7]:
from xdata_config import *

In [8]:
ls ../Data/fea/w5_buy0.21.fea

[0m[01;32m../Data/fea/w5_buy0.21.fea[0m*


In [9]:
feather_file="../Data/fea/w5_buy0.21.fea"

df = pd.read_feather(feather_file, chunksize=500)

TypeError: read_feather() got an unexpected keyword argument 'chunksize'

In [6]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
feather_file="../Data/fea/w5_buy0.21.fea"
# Define data generator function
def data_generator(feather_file, batch_size):
    while True:
        df = pd.read_feather(feather_file, chunksize=batch_size)
        for chunk in df:
            X = np.array(chunk.drop(['target'], axis=1))  # Assuming 'target' is the name of the target column
            y = np.array(chunk['target'])
            yield X, y

# Create Keras model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=X.shape))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model using the data generator
model.fit_generator(data_generator(feather_file='my_data.feather', batch_size=32), 
                    steps_per_epoch=100, epochs=10)


  from pandas.core import (
2023-02-14 20:50:00.692873: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-14 20:50:00.692906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-14 20:50:00.777577: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-02-14 20:50:02.157293: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-14 20:50:02.157514: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror

NameError: name 'input_dim' is not defined

In this example, the data_generator function loads the feather file in chunks of size batch_size and yields each chunk as a pair of X and y arrays. The fit_generator method then uses this function to train the Keras model for 10 epochs, with 100 batches per epoch.
