# CLEAN
The purpose of this section is to use the cleaning and moving data utilites. Cleaning utilites will clean all labels in a csv file, and the mvoing utilities will clean all image file names and move them to a new directory at the same time updating the cvs file to keep the new file name up to track with its corresponding label.

■ **Directory and files Structure:** Make sure your directory is the same  
```
⇀ cleanData
      |- train
      |- val
      
      
⇀ rawData
      |--------- train
      |            |- XXX.jpg .. etc
      |            |- your_csv.csv
      |
      |--------- Val
                   |- XXX.jpg .. etc
                   |- your_csv.csv
```
<hr>  

  
■ **Input:** The input to the function should be.

| filename  | words     |
|:---------:|:---------:|
| XXXX.jpg  | raw label |
| XXXX.png  | raw label |
| XXXX.jpeg | raw label |
|  ...      | ...       |  

■ **Output:** The following example is what the final output should look like

| filename  | words       |
|:---------:|:-----------:|
| 1.jpg     | clean label |
| 2.png     | clean label |
| 3.jpeg     | clean label |
|  ...      | ...         |

# Imports

In [1]:
import os 
import sys

sys.path.append(os.path.dirname("__file__") + '..')

from panda_utils import get_data, clean_df
from move_utils import move

In [2]:
def move_and_clean(path, new_path, labels_csv, col_to_clean='words'):
    df = get_data(path+labels_csv)

    clean, cleaned_df = clean_df(df, col_to_clean)
    
    if clean:
        move(cleaned_df, path, new_path, labels_csv)

### Train dataset

In [3]:
labels_csv = '/labels.csv'
path = './rawData/train'
new_path = './cleanData/train'

move_and_clean(path, new_path, labels_csv)

data before cleaning


Unnamed: 0,filename,words
0,750.jpg,غرام
1,640.jpg,و
2,9038.jpg,مصدرة
3,2451.jpg,كل
4,7311.jpg,الطبيعية
5,6696.jpg,يحفظ
6,7110.JPG,الكلية
7,8175.jpg,زيت
8,10197.jpg,الهند
9,2476.JPG,قشطة


Number of entries:  4497
Number of files inside ./rawData/train: 4498
data after moving and cleaning


Unnamed: 0,filename,words
0,0.jpg,غرام
1,1.jpg,و
2,2.jpg,مصدرة
3,3.jpg,كل
4,4.jpg,الطبيعية
5,5.jpg,يحفظ
6,6.JPG,الكلية
7,7.jpg,زيت
8,8.jpg,الهند
9,9.JPG,قشطة


Number of entries:  4497
Number of files inside ./cleanData/train: 4498


### Val dataset

In [4]:
labels_csv = '/labels.csv'
path = './rawData/val'
new_path = './cleanData/val'

move_and_clean(path, new_path, labels_csv)

data before cleaning


Unnamed: 0,filename,words
0,2138.jpg,حلال
1,2344.jpg,مقلي
2,792.jpg,على
3,1098.jpg,صوديوم
4,1400.jpg,طاقة
5,1309.jpg,بريتوس
6,38.jpg,بودرة
7,1838.jpg,المنتج
8,2606.jpg,سكريات
9,1637.jpg,قواما


Number of entries:  1089
Number of files inside ./rawData/val: 1090
data after moving and cleaning


Unnamed: 0,filename,words
0,0.jpg,حلال
1,1.jpg,مقلي
2,2.jpg,على
3,3.jpg,صوديوم
4,4.jpg,طاقة
5,5.jpg,بريتوس
6,6.jpg,بودرة
7,7.jpg,المنتج
8,8.jpg,سكريات
9,9.jpg,قواما


Number of entries:  1089
Number of files inside ./cleanData/val: 1090
