# Labeling DataFrame
In this notebook, we will create a DF/CSV that corresponds to the DataSets we have created. Each dataset is stored in the model_data directory under unscattered_images. 

Each dataset will have a corresponding DF under labels_unscattered within the same directory tree. This will make it easier to provide labels for our Deep Learning systems to learn. 

We will be utilizing FastAIs framework to test and experiment quickly with pretained models. The beauty of this framework is the ease to experiment and mess with hyper-parameters and regulatory parameters. This is essential when evaluating and hypothesis testing behind a model as it allows for faster prototyping which is lacking with various other frameworks. 

Ultimately we want a model to classify trend movement, therefor this will be a supervised learning task and we will test various convolutional neural network architecture

In [1]:
import pandas as pd
import numpy as np
import os, gc, time

In [5]:
# Creating our path
path = './model_data/images_unscattered/'

# Grabbing raw file names
unscattered_files = os.listdir(path)

# Filtering our any checkpoints from file names
for f in unscattered_files:
    if f == ".ipynb_checkpoints":
        unscattered_files.remove(f) # remove checkpoint
    else:
        pass

In [7]:
# Creating our df
unscattered_df = pd.DataFrame(unscattered_files, columns=['image_name'])

In [9]:
# Creating label column
unscattered_df['label'] = [i.split('.')[1] for i in unscattered_df['image_name']]

In [11]:
# Removing extension
unscattered_df['image_name'] = [i.split('.')[0] + '.' + i.split('.')[1] for i in unscattered_df['image_name']]

In [13]:
# Creating new index
unscattered_df['idx'] = [int(i.split('.')[0]) for i in unscattered_df['image_name']]

In [15]:
# setting the index and sorting by new index so things are in order
unscattered_df.set_index('idx', inplace=True)
unscattered_df.sort_index(ascending=True, inplace=True)

In [18]:
# Saving our labels df as csv
save_path = './model_data/labels_unscattered/'

# Saving
unscattered_df.to_csv(f'{save_path}/unscattered.csv', index=False)

In [21]:
unscattered_df['label'].value_counts()

Hold    5764
Sell    5383
Buy      833
Name: label, dtype: int64