# Import Signal Labels

In [21]:
import tensorflow as tf
import pandas as pd
import numpy as np

This code creates the signals into a datastore. 

In [2]:
train_dataset = tf.data.Dataset.list_files("../data/flooddata/*/*.csv", shuffle=False)
train_dataset = train_dataset.batch(1)

In [13]:
files = []
for file in train_dataset:
    files.append(file.numpy().astype(str)[0])

We have used a Dataset to access the files. To train a network that can classify our signals, our data needs to be labeled. The location of these labels will depend on our data set.
Each signal in the flood data set is stored in a folder according to its class. 

In [16]:
labels = []
for file in files:
    labels.append(file.split("\\")[3])
labels = pd.Categorical(labels)
labels

['depth_0_0', 'depth_0_0', 'depth_0_0', 'depth_0_0', 'depth_0_0', ..., 'depth_4_5', 'depth_4_5', 'depth_4_5', 'depth_4_5', 'depth_4_5']
Length: 222
Categories (4, object): ['depth_0_0', 'depth_0_19', 'depth_2_5', 'depth_4_5']

The variable labels contains a categorical vector. We can use the [value_counts()](https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html#pandas-series-value-counts) function with categorical data to see what labels are our data set and the number of signals per class.

In [17]:
labels.value_counts()

depth_0_0      33
depth_0_19     53
depth_2_5      34
depth_4_5     102
Name: count, dtype: int64

Wou can see the number of signals in each class. The class labels come from the subfolder names, which correspond to the depth of water that the volunteer walked through while their smartphone data was recorded.
When we classify with our trained deep network, these are the classes that the network will use. We might want to make changes to the labels before training our network. For example, we can combine similar classes into one class. For the flood data set, using the class names below could be more descriptive. We can rename the classes with the [rename_categories()](https://pandas.pydata.org/docs/reference/api/pandas.Series.cat.rename_categories.html#pandas-series-cat-rename-categories) function.

In [19]:
labels.rename_categories(["0.0 ft", "0.19 ft", "2.5 ft", "4.5 ft"])

['0.0 ft', '0.0 ft', '0.0 ft', '0.0 ft', '0.0 ft', ..., '4.5 ft', '4.5 ft', '4.5 ft', '4.5 ft', '4.5 ft']
Length: 222
Categories (4, object): ['0.0 ft', '0.19 ft', '2.5 ft', '4.5 ft']

In [20]:
u, indices = np.unique(labels, return_index=True)
print(u, indices)

['depth_0_0' 'depth_0_19' 'depth_2_5' 'depth_4_5'] [  0  33  86 120]
