# Input Pipeline for image dataset
- Building input pipeline for cats and dogs custom made dataset ,  using web scrapping

## Getting Data
- Using `!git clone`  to load dataset from git hub

In [None]:
!git clone 'https://github.com/DevloperHS/Dockship---Utility.git' 

Cloning into 'Dockship---Utility'...
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (143/143), done.[K
remote: Total 145 (delta 2), reused 136 (delta 0), pack-reused 0[K
Receiving objects: 100% (145/145), 50.96 MiB | 42.57 MiB/s, done.
Resolving deltas: 100% (2/2), done.


## Loading Data
- Can be loaded using `tf.data.Dataset.list_files(filepath)`
- Stores the image path and not the actual image
- `*` means all and is **get all operators** for files path 

In [None]:
# importing modules
import tensorflow as tf
import numpy

In [None]:
image_ds = tf.data.Dataset.list_files('/content/Dockship---Utility/Tf Input Pipeline/images/*/*', shuffle = 'False')

# view the files
for i in image_ds.take(5):
    print(i.numpy())

b'/content/Dockship---Utility/Tf Input Pipeline/images/cat/A cat appears to have caught the....jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/Rottweiler Dog Breed Information....jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/The History of Dogs as Pets - ABC News.jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/Germany_ Dogs must be walked twice a....jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/cat/Thinking of getting a cat....png'


***only the image path is loaded***

## Shuffling 
- shuffle using `shuffle(buffer_size)`
- use use **buffer_size = 200**

In [None]:
image_ds = image_ds.shuffle(200)

#view files
for i in image_ds.take(5):
    print(i.numpy())

b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/Colitis in Dogs _ VCA Animal Hospital.jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/cat/Is My Cat Normal_.jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/Carolina Dog Dog Breed Information....jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/How My Dog Knows When I_m Sick - The....jpg'
b'/content/Dockship---Utility/Tf Input Pipeline/images/dog/The Best Dogs of BBC Earth _ Top 5....jpg'


***randomly arranged dataset - quite good in image data analysis.***

## Split
- Splitting dataset into train and test
- use `take` to keep that much of data
- use `skip` to pass the take ammount of data



---
Procedure
* Create `class names`
* Get the dataset size using `len()` keyword
* Create a `train_size` which is some percent of the length/ coutn
* Use the above two fn to make the split


In [None]:
class_names = ['cat', 'dog']

In [None]:
img_count = len(image_ds)
img_count

130

In [None]:
train_size = int (img_count*0.8)  # 80
train_size

104

In [None]:
train_ds = image_ds.take(train_size)
test_ds = image_ds.skip(train_size)

In [None]:
len(train_ds)


104

In [None]:
len(test_ds)

26

## Custom functions
### Get Labels
- Write a function which **get_labels** from all the files using `tf.strings.split(file_path, sperator)[n]`
- use **os seprator** using `os.path.sep`

### Read files
- create a funtion which reads in file path , `read_files()`
- It gets the labels from `get_labels(file_path)` fn 
- load the image using `tf.io.read_file(file_path)`
- decode the image - here 'jpeg' using `tf.image.decode_jpeg(img)`
- resize the image using `tf.image.resize(img , dim = [n,n])`
and finally `return` image and labels


### Scale images
- create a func which takes in **images** & **labels** and return 
- `rescaled images` (values between 0-1) 
- `labels` as it is 

In [None]:
# sample how fn will work
s = '/content/Dockship---Utility/Tf Input Pipeline/images/dog/Maltese Dog Breed Information_ Pictures....jpg'
s.split("/")[-2]  # retrive the 2nd last element


'dog'

In [None]:
#get labels fn

def get_labels(file_path):
    import os 
    return tf.strings.split(file_path , os.path.sep)[-2]
    

In [None]:
def process_img(file_path):
    label = get_labels(file_path)

    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img)
    img = tf.image.resize(img, [224,224])
    return img , label


In [None]:
def scale(img, labels):
    return img/255 , labels 

## Mapping
- Using `map` to map the custom functions to to our dataset:
-- `process_img()`
-- `scale()`

In [None]:
train_ds = train_ds.map(process_img)
test_ds = test_ds.map(process_img)

In [None]:
for img, label in train_ds.take(3):
    print('*** Image', img.numpy()[0][0])
    print('*** Label', label.numpy())

*** Image [39. 22. 12.]
*** Label b'dog'
*** Image [87.35714  49.357143 30.357143]
*** Label b'cat'
*** Image [254. 254. 254.]
*** Label b'dog'


In [None]:
# here our file_path is train_ds itself as it hols all the file paths for training set
train_ds = train_ds.map(scale)

In [None]:
for img, label in train_ds.take(3):
    print('img: ', img.numpy()[0][0])
    print('labels', label.numpy())


img:  [0.99607843 0.99607843 0.99607843]
labels b'dog'
img:  [0.32156864 0.32156864 0.32156864 1.        ]
labels b'cat'
img:  [0.9696954  0.20891106 0.24420518]
labels b'dog'


END

