# Combining Data

The purpose of this notebook is to make one giant CSV module, out of the many image files in the dataset, which are scattered across different folders.

## Imports

In [1]:
import pandas as pd
import glob
import rarfile

## Creating a DataFrame for the Images in Each Folder

This is a **binary classification** problem. Here is what the labels I'm assigning mean:

- 0 - there is NO FIRE detected in the image
- 1 - there is a FIRE detected in the image

In [35]:
def make_df(folder, label):
    """Returns a pandas.DataFrame object from the folder."""
    # read data from the folder
    with folder as f:
        # list of entire file paths
        file_paths = f.namelist()
        # list of folder names separated from file names
        ls_folders, ls_files, ls_labels = list(), list(), list()
        # last item is folder is the folder itself, so we ignore last item
        for path in file_paths[:-1]:
            folder, file = path.split('/')
            # add values for each list
            ls_folders.append(folder)
            ls_files.append(file)
            ls_labels.append(label)
        # create a DataFrame
        data = {
            'Folder': ls_folders,
            'filename': ls_files,
            'label': ls_labels
        }
        df_fire = pd.DataFrame(data)
        return df_fire

### Collect DataFrames for All Folders Together

In [36]:
ls_df = list()

# add all the images with no fire
for path in glob.glob('Fire-Detection-Image-Dataset/Normal Images*'):
    folder = rarfile.RarFile(path)
    df_no_fire = make_df(folder, 0)
    ls_df.append(df_no_fire)
    
    
# add the images with fire
for filepath in glob.glob('Fire-Detection-Image-Dataset/Fire Images*'):
    pass


Unnamed: 0,Folder,filename,label
0,Normal Images 4,mps-conv-store-model2.jpg,0
1,Normal Images 4,mr_00098143.jpg,0
2,Normal Images 4,Multipurpose-home-office-design-with-unique-ta...,0
3,Normal Images 4,MW-AU747_wolf_t_20120924164002_MG.jpg,0
4,Normal Images 4,n36036_kitchen_01.jpg,0
