In [2]:
from keras.datasets import reuters
from keras.utils import pad_sequences
from keras.layers import Embedding, Flatten, Dense
from keras.models import Sequential

#Data Loading from keras datasets
max_features = 10000
maxlen = 100
num_classes = 46
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=max_features, test_split=0.2)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


Data preprocessing is an essential step in any machine learning project. 
It includes:

1)Data cleaning: This involves identifying and handling missing or erroneous data, such as removing duplicates, imputing missing values, and dealing with outliers.

2)Feature Engineering Which Includes
    a)Feature scaling: Different features may have different scales, which can affect the performance of some machine learning algorithms. Common methods of feature scaling include normalization and standardization.

    b)Feature encoding: Machine learning algorithms typically require numerical data, so categorical variables need to be encoded. This can be done using methods like one-hot encoding, label encoding, and target encoding.

3)Train-test split: It's important to split your data into separate training and testing sets to evaluate the performance of your model. Typically, around 80% of the data is used for training and 20% for testing.

4)Data Augumentation: Data augmentation is a technique used to increase the size of a training dataset by generating new examples through transformations of the existing data. This can help to prevent overfitting and improve the performance of machine learning models.

By paying attention to these main points, you can ensure that your data is properly prepared for machine learning and that your models are as accurate and reliable as possible.

In [3]:
# Data Preprocessing
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)


In [4]:
# Model
model = Sequential([
    Embedding(max_features, 32, input_length=maxlen),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 100, 32)           320000    
                                                                 
 flatten (Flatten)           (None, 3200)              0         
                                                                 
 dense (Dense)               (None, 64)                204864    
                                                                 
 dense_1 (Dense)             (None, 46)                2990      
                                                                 
Total params: 527,854
Trainable params: 527,854
Non-trainable params: 0
_________________________________________________________________


In [6]:
# Train the model
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_data=(x_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In this example, we loaded the Reuters newswire dataset from keras. Then we preprocessed the data. Then, we 
built a model using keras. After that we trained the model.