<h3  align="center" style='color:blue'>TF Data Input Pipeline Text</h3>

Moview reviews are present as individual text file (one file per review) in review folder. 

Folder structure looks like this,

reviews

    |__ positive
        |__pos_1.txt
        |__pos_2.txt
        |__pos_3.txt
    |__ negative
        |__neg_1.txt
        |__neg_2.txt
        |__neg_3.txt
   
You need to read these reviews using tf.data.Dataset and perform following transformations,

(1) Read text review and generate a label from folder name. your dataset should have review text and label as a tuple

(2) Filter blank text review. Two files are blank in this dataset

(3) Do all of the above transformations in single line of code. Also shuffle all the reviews

In [1]:
import tensorflow as tf

<h3 style='color:purple'>Retrieve review file paths in a tensorflow dataset</h3>

In [2]:
reviews_ds = tf.data.Dataset.list_files('reviews/*/*', shuffle=False)
reviews_ds

<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>

In [3]:
for file in reviews_ds:
    print(file.numpy())

b'reviews\\negative\\neg_1.txt'
b'reviews\\negative\\neg_2.txt'
b'reviews\\negative\\neg_3.txt'
b'reviews\\positive\\pos_1.txt'
b'reviews\\positive\\pos_2.txt'
b'reviews\\positive\\pos_3.txt'


<b>Extract review text from these files. Extract label from folder name

In [4]:
import os
def extract_review_and_label(file_path):
    return tf.io.read_file(file_path), tf.strings.split(file_path, os.path.sep)[-2]

In [5]:
reviews_ds_1 = reviews_ds.map(extract_review_and_label)
for review, label in reviews_ds_1:
    print("Review: ",review.numpy()[:50])
    print("Label: ",label.numpy())

Review:  b"Basically there's a family where a little boy (Jak"
Label:  b'negative'
Review:  b'This show was an amazing, fresh & innovative idea '
Label:  b'negative'
Review:  b''
Label:  b'negative'
Review:  b'One of the other reviewers has mentioned that afte'
Label:  b'positive'
Review:  b'A wonderful little production. <br /><br />The fil'
Label:  b'positive'
Review:  b''
Label:  b'positive'


<b>Filter Blank Reviews

In [6]:
reviews_ds_2 = reviews_ds_1.filter(lambda review, label: review!="")
for review, label in reviews_ds_2.as_numpy_iterator():
    print("Review: ",review[:50])
    print("Label: ",label)

Review:  b"Basically there's a family where a little boy (Jak"
Label:  b'negative'
Review:  b'This show was an amazing, fresh & innovative idea '
Label:  b'negative'
Review:  b'One of the other reviewers has mentioned that afte'
Label:  b'positive'
Review:  b'A wonderful little production. <br /><br />The fil'
Label:  b'positive'


<h1>Perform map, filter and shuffle all in single line of code</h1>

In [7]:
final_ds = reviews_ds.map(extract_review_and_label).filter(lambda review, label: review!="").shuffle(3)
for review, label in final_ds.as_numpy_iterator():
    print("Review:",review[:50])
    print("Label:",label)

Review: b"Basically there's a family where a little boy (Jak"
Label: b'negative'
Review: b'A wonderful little production. <br /><br />The fil'
Label: b'positive'
Review: b'This show was an amazing, fresh & innovative idea '
Label: b'negative'
Review: b'One of the other reviewers has mentioned that afte'
Label: b'positive'
