# Functions for Preprocessing

### Initial Folder Structure

Below is an outline of the file structure I received
``` 
MICROFICHE/
    BATCH/
        CALLSIGN/
            FILING/
                SCAN.jpg

Batch #/CALLSIGN/MONTH YYYY - NOTE/*(#).jpg 
Example:

~/Batch_25/KNKN991/APRIL 1991 STEP 1 OF 4:
KNKN991-APRIL-1991-STEP 1 OF 4- (1).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (10).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (11).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (12).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (13).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (14).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (2).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (3).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (4).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (5).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (6).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (7).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (8).jpg
KNKN991-APRIL-1991-STEP 1 OF 4- (9).jpg
```

### Issues
- WHITESPACE
    - Difficult to pass files into functions as arguments. Unless whitespace is escaped, file names are read as *multiple* arugments, leading to errors. 
- SORTING
    - Images displayed in the order: 1, 10, 11, 12,.. 19, 2, 20.. etc.
    - FILING folder name didn't include CALLSIGN -- combining into a single folder for chronological sorting wasn't possible as multiple FILING's were named "MAY 1994", (with different parent BATCH/CALLSIGN/)
    - BATCH designation was unclear. It was unclear if BATCH/CALLSIGN folders were comprehensive for that particular callsign (e.g. KNKN991).
- INTERPRETATION
    - Date, and NOTE were based on quick readings of the folder contents. However, each FILING contained multiple dates. It was not clear whether the date was the signature date, date received/filed, approval date. Dates in a single FILING could span over 12 months. 


### 1 Tidying File Paths

Below are transformations I used to tidy file paths (folder and file names) before passing files to `imagemagick` `convert` for image transformation. 

#### 1.1 Set paths

In [3]:
MICROFICHE=~/Desktop/test-scans;
WORK=~/Desktop/test-working;
clean=${WORK}/cleaningLog.txt;

#### 1.2 Replace whitespace in BATCH/ with underscore

Reference: [Parameter Expansion - Search and Replace, bash-hackers.org](http://wiki.bash-hackers.org/syntax/pe#search_and_replace)

In [50]:
# Remove whitespace from BATCH/ folder name
# Use Search and Replace
# loop over every BATCH/ folder

cd $MICROFICHE
for f in *[0-9];
do echo mv -v $f ${f/\ /_}; # TEST RUN
# mv -v "${f}" ${f/\ /_};
done

# UNCOMMENT to run

mv -v Batch 25 Batch_25
Batch 25 -> Batch_25
mv -v Batch 26 Batch_26
Batch 26 -> Batch_26
mv -v Batch 27 Batch_27
Batch 27 -> Batch_27
mv -v Batch 28 Batch_28
Batch 28 -> Batch_28
mv -v Batch 29 Batch_29
Batch 29 -> Batch_29
mv -v Batch 30 Batch_30
Batch 30 -> Batch_30


#### 1.2 Remove NOTE from FILING folder name

[Incrementing using Double Parenthesis](http://tldp.org/LDP/abs/html/dblparens.html)

In [108]:
cd $MICROFICHE

for BATCH in */;
do cd $MICROFICHE/$BATCH;
# echo $BATCH;
    for CALLSIGN in *;
    do cd $MICROFICHE/$BATCH/$CALLSIGN;
    echo $CALLSIGN/:;
    i=0;
        for FILING in *;
        do #echo $FILING;
            rm1=${FILING/\ /_};
            #echo $rm1;
            rm2=${rm1%% *};
            #echo $rm2;
            rm3=${rm2%%-*};
            #echo $rm3;
            rm4="${rm3//_/ }";
            #echo $rm4;
            case "$rm4" in
                "JAN"* ) mt='01';;
                "FEB"* ) mt='02';;
                "MAR"* ) mt='03';;
                "APR"* ) mt='04';;
                "MAY"* ) mt='05';;
                "JUN"* ) mt='06';;
                "JUL"* ) mt='07';;
                "AUG"* ) mt='08';;
                "SEP"* ) mt='09';;
                "OCT"* ) mt='10';;
                "NOV"* ) mt='11';;
                "DEC"* ) mt='12';;
                esac;
            yyyy=${rm4#* };
            ## crude numbering to keep filings with the same month/year separate
            mv -v "${FILING}" $yyyy-$mt-$CALLSIGN-0$i >> $clean
            ((i++));
            #echo $i
        done
    tail -n $i $clean; ## print renames to screen
    done
done


KNKN273
JANUARY 1990 STEP 2 (STEP 1 MISSING) -> 1990-01-KNKN273-00
OCTOBER 1991- 1 -> 1991-10-KNKN273-01
OCTOBER 1991- 2 -> 1991-10-KNKN273-02
KNKN298
AUGUST 1990 -> 1990-08-KNKN298-00
DECEMBER 1992 -> 1992-12-KNKN298-01
JANUARY 1994 -> 1994-01-KNKN298-02
JULY 1990 -> 1990-07-KNKN298-03
MARCH 1990 STEP 1 OF 3 -> 1990-03-KNKN298-04
MARCH 1990 STEP 2 OF 3 -> 1990-03-KNKN298-05
MARCH 1990 STEP 3 OF 3 -> 1990-03-KNKN298-06
MARCH 1996 -> 1996-03-KNKN298-07
MAY 1992 STEP 1 OF 2 -> 1992-05-KNKN298-08
MAY 1992 STEP 2 OF 2 -> 1992-05-KNKN298-09
NOVEMBER 1994 -> 1994-11-KNKN298-010
KNKN303
DECEMBER 1991 -> 1991-12-KNKN303-00
JUNE 1996 -> 1996-06-KNKN303-01
JUNE 1997 -> 1997-06-KNKN303-02
OCTOBER 1995 -> 1995-10-KNKN303-03
SEPTEMBER 1994 -> 1994-09-KNKN303-04
KNKN304
JUNE 1997 -> 1997-06-KNKN304-00
OCTOBER 1992 -> 1992-10-KNKN304-01
SEPTEMBER 1998 -> 1998-09-KNKN304-02
KNKN323
OCTOBER 1998 -> 1998-10-KNKN323-00
SEPTEMEBR 1997 -> 1997-09-KNKN323-01
KNKN333
JANUARY 1995 -> 1995-01-KNKN333-00
JULY 1

: 1

#### 1.3 Check for irregular FILING folder names

In [41]:
# CHECK FOR folder name length > 18 [yyyy-mm-callsign-0i]
cd $MICROFICHE

i=0;
declare -a fix_list=();
fix_list[0]="Folders to rename";
echo ${fix_list[0]}
for f in */*/*;
do FILING=${f##*/};
    if [ ${#FILING} != 18 ]
    then
	    ((i++));
        fix_list[$i]=${MICROFICHE}/${f};
        echo ${i}: ${fix_list[$i]};
    fi
done

tofix=${i}
echo ${fix_list[0]}: $tofix

Folders to rename
1: /Users/cynthiiee/Desktop/test-scans/Batch_25/KNKN298/1994-11-KNKN298-010
Folders to rename: 1


In [35]:
echo ${fix_list[*]}
        echo ${fix_list[$i]##*/};
        echo ${fix_list[$i]%/*}/;

1
Folders to rename /Users/cynthiiee/Desktop/test-scans/Batch_25/KNKN298/1994-11-KNKN298-010
1994-11-KNKN298-010
/Users/cynthiiee/Desktop/test-scans/Batch_25/KNKN298/


REMEMBER TO RENAME!

```
mv -v $tooLong $correctName >> $clean;
tail -n 1 $clean
```

no error: `exit code $?=0` 

error: `exit code $?=1` 
ref:https://www.linuxjournal.com/article/10844

In [36]:
i=1;
until [ ${i} -gt ${tofix} ]; 
do
    read -p "What do you want to rename ${fix_list[$i]##*/} to?" target_list[$i];
        if [ -d ${fix_list[$i]%/*}/${target_list[$i]}]
        then 
            mv -v ${fix_list[$i]} ${target_list[$i]} >> $clean;
            tail -1 $clean;
            ((i++));
        fi
done

What do you want to rename 1994-11-KNKN298-010?


testing reference: http://wiki.bash-hackers.org/commands/classictest

#### 1.4 Remove whitespace from SCAN.jpg

Install [rename utility](http://plasmasturm.org/code/rename/)

```
brew install rename
```

In [149]:
cd $MICROFICHE;
rename -v "s/ *//g" */*/*/*.jpg | wc -l

       0


#### 1.5 Add Leading zero to SCAN.jpg numbering

In [157]:
echo $MICROFICHE

/Users/cynthiiee/Desktop/test-scans


In [173]:
cd $MICROFICHE;
for f in $(find $(pwd) -name "*([1-9])*.jpg");  # find parentheses with single digits
    do
	    # short=${f##*/};
        # echo $short ${short//\(/\(0};
        # echo ${f//\(/\(0}; # replace all ( with (0
        mv -v $f ${f//\(/\(0} # >> $clean;
            # tail -1 $clean;
    done

### 2 Crop in half, and gray scale

Requires ImageMagick. Check installation using:
`convert -version`

In [116]:
convert -version

Version: ImageMagick 7.0.6-0 Q16 x86_64 2017-06-12 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules 
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib


In [152]:
function graycrop {
	convert -colorspace Gray -crop 50%x100% +repage "$1" "$1";
	rm -v "$1" #>> $clean;
	#tail -1 $clean; #$clean is a log of all operations
}

- Check original number of files
- Run loop
- Check new number of files

```shell
~$ for f in $(find $SCANS -name"*.jpg"); do      echo $f; done | wc -l
    4634
## ^ can return inaccurate count if file path runs over two lines
~$ for f in $(find $SCANS -name"*.jpg"); do      graycrop $f; done | wc -l
    4634
~$ find $SCANS -name "*.jpg" | wc -l
    9268
```    

In [185]:
cd $SCANS
#find $(pwd) -name "*.jpg"
unsplit=$(find $(pwd) -name "*).jpg" | wc -l); # count only unsplit jpgs [name ends in ).jpg]
echo $unsplit;
#find $SCANS -name "*.jpg" 
#for img in $(find $(pwd) -name "*.jpg"); do echo ${img##*/}; done | wc -l


0
2802


In [153]:
for f in $(find $(pwd) -name "*.jpg");
do graycrop $f;
done | wc -l

    1402


In [186]:
split=$(find $(pwd) -name "*-[0,1].jpg" | wc -l)
echo $split;

2802


In [9]:
echo $work;
ls $work/scripts
cat $work/scripts/fcc-project.sh

/Users/cynthiiee/Dropbox/FCC/Working/
Grayson Codes.txt	archive			fcc-project.sh
Grayson codes colin.txt	copyFromListToFolder.sh	list file names.txt
#!/bin/bash
# 
# Structure of file
#
# -re loads re-org functions
#   - mvlater(folder)
#   - mvfol(licence,MMM,yyyy):mv to $mwork
#   - Lmvfol():loops current folder
#   - mvv(folder, target):mv with log to $sesslog
#   - Lnumber(folder): renumber all ([1-9]) to (0[1-]) in given folder & sub folders using find
#
# 1. Set locations
# This includes FCC, microfiche, scripts, docs, xls, progress
#
# 2. Load functions and scripts
#   - scripts contain prompts (``startSess, imgpro, dataE, endSess, mvlater)
#   - functions set global variables, run loops, and group commands
#        entryO(yyyy,mm,licence)
#        [*.jpg].nameorder.sh
#        [*.jpg].grapcrop()
#        unsplit(img1,img2)
#        [*.jpg].finalrename()
#        checkD()
#        [*.jpg].Lfindrepjpg(find,rep)
#        ocr(img,doctype)
#        entryC()

# Set locations

## path