# Image Modelling - Pipeline Creation (Python file)
In this notebook we will cover: 
- how to prepare images for training a neural network and using shell commands instead of pandas to do so
- We’ll start by preparing the data in a way that it can be loaded into tensorflow, followed by the loading itself and checking if everything went fine. 
- Each step will be defined as a function, which we will directly write into a python file. 

In the second notebook we will import and use those functions in order to train a neural network that classifies our pictures.

In [None]:
# Remove any file that gets constructed by the notebook.
## noch anpassen 
!rm -f image_modeling.py #flowers_train.csv flowers_eval.csv flowers_test.csv

The following cell defines a register cell magic which lets you write the content of a cell into a python script automatically, while still executing the cell. Mode 'a' (can be set with the -a flag) appends to the file while mode 'w' overwrites all existing lines.

In [None]:
# Let's make some dark cell magic. Why not!
from IPython.core.magic import register_cell_magic

@register_cell_magic
def write_and_run(line, cell):
    argz = line.split()
    file = argz[-1]
    mode = 'w'
    if len(argz) == 2 and argz[0] == '-a':
        mode = 'a'
        print("Appended to file ", file)
    else:
        print('Written to file:', file)
    with open(file, mode) as f:
        f.write(cell.format(**globals()))        
    get_ipython().run_cell(cell)

Import needed libraries. `%%write_and_run image_modeling.py` is the call of the register cell magic from above in 'w' mode (default). It writes the imports at the beginning of the `image_modeling.py`.

In [None]:
%%write_and_run image_modeling.py
import pathlib
import IPython.display as display
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import os

Print the tensorflow version and set the threshold for what messages will be logged. 

In [None]:
print(tf.__version__)
tf.compat.v1.logging.set_verbosity(v=tf.compat.v1.logging.INFO)

Get the absolute path to the data folder, count all images and get the class names. 

In [None]:
# Get paths as POSIX paths
#home_path = str(pathlib.Path.home())
data_dir = '../images'
data_dir = pathlib.Path(data_dir)
print(f'The total number of images is: {len(os.listdir(data_dir))}')

Let's have a look at some images

In [None]:
# Get all turtles images
turtles = list(data_dir.glob('*'))

for image in turtles[:2]:
    display.display(Image.open(str(image)))

## Data preparation using shell commands

Now we will use shell commands to look at the data, clean the paths to the images and split our data into train and evaluation set.

Let's look at the first five entries. 
First we use the [head](https://linuxhint.com/bash_head_tail_command/) command to generate the first five lines of the `train.csv`. Then we redirect the output of the [head](https://linuxhint.com/bash_head_tail_command/) command to the `/tmp/input.csv` via the ['>'](https://www.cs.ait.ac.th/~on/O/oreilly/unix/upt/ch13_01.htm#UPT-ART-1023) operator. We now print the content of this file with the [cat](https://www.interserver.net/tips/kb/linux-cat-command-usage-examples/?__cf_chl_f_tk=sbsfrwcq2e.iPk93oGmvT0LSXdGVW7BuzsZsRhl85GI-1642513145-0-gaNycGzNCOU) command.

In [None]:
# Let us take a look into the training set
!head -5 ../data/train.csv > /tmp/input.csv 
!cat /tmp/input.csv

Save a copy from train.csv to train_split.csv and conduct the split (remove X images from train_split and save as test_split)

In [None]:
#%%bash
!cat ../data/train.csv > ../data/train_split.csv 

#sort -R ../data/train_split.csv --random-source=random.seed | 
#split -l $(( $(wc -l <../data/train_split.csv) - 10)) - ../data/train_split

#csvfile = open('../data/train_split.csv', 'r').readlines()
#open(str("../data/1") + '.csv', 'w+').writelines(csvfile[i:(int(len("../data/train_split.csv")*0.7))])

#mv 1.csv ../data/train_split.csv
#mv 2.csv ../data/test_split.csv

In [None]:
#%%bash
#wc -l ../data/train_split.csv
#wc -l ../data/test_split.csv