### **INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
# !pip install -Uqq fastbook
# import fastbook
# fastbook.setup_book()

In [4]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES: 
from fastbook import *                              # Getting all the Libraries. 
from fastai.callback.fp16 import *
from fastai.text.all import *                       # Getting all the Libraries.

**GETTING THE DATASET:**
- I will get the **IMDB Dataset** here.

In [6]:
#@ GETTING THE DATASET: 
path = untar_data(URLs.IMDB)                       # Getting Path to the Dataset. 
path.ls()                                          # Inspecting the Path.

(#7) [Path('/root/.fastai/data/imdb/train'),Path('/root/.fastai/data/imdb/imdb.vocab'),Path('/root/.fastai/data/imdb/unsup'),Path('/root/.fastai/data/imdb/tmp_lm'),Path('/root/.fastai/data/imdb/tmp_clas'),Path('/root/.fastai/data/imdb/README'),Path('/root/.fastai/data/imdb/test')]

### **DATABLOCK AND DATALOADERS:**

In [7]:
#@ CREATING THE DATALOADERS: IMDB FORMAT DATASET: 
dls = TextDataLoaders.from_folder(path, valid="test")                       # Initializing DataLoaders. 

In [8]:
#@ CREATING THE DATALOADERS: 
path = untar_data(URLs.IMDB)                                                # Path to the Dataset. 
dls = DataBlock(blocks=(TextBlock.from_folder(path), CategoryBlock),        # Initializing Text Block. 
                get_y=parent_label,                                         # Getting Labels. 
                get_items=partial(get_text_files,folders=["train","test"]), # Getting Text Files. 
                splitter=GrandparentSplitter(valid_name="test")             # Splitting the Data. 
                ).dataloaders(path)                                         # Initializing Data Loaders. 

**TRANSFORMS:**

In [9]:
#@ GETTING THE TEXT DATA: 
files = get_text_files(path, folders=["train", "test"])                     # Getting Text Files. 
txts = L(o.open().read() for o in files[:2000])                             # Getting List of Texts. 

In [10]:
#@ INITIALIZING TOKENIZATION: 
tok = Tokenizer.from_folder(path)                                           # Initializing Tokenizer. 
tok.setup(txts)                                                             # Getting Tokens. 
toks = txts.map(tok)                                                        # Getting Tokens. 
toks[0]                                                                     # Inspecting Tokens. 

(#78) ['xxbos','xxmaj','wow',',','a','movie','about','xxup','nyc','politics'...]

In [11]:
#@ INITIALIZING NUMERICALIZATION: 
num = Numericalize()                                                        # Initializing Numericalizer. 
num.setup(toks)
nums = toks.map(num)                                                        # Numericalization. 
nums[0][:10]                                                                # Inspection. 

TensorText([   2,    8, 1005,   11,   12,   27,   61,    7, 4040, 2168])

In [12]:
#@ CONVERTING INTEGERS INTO STRING TOKENS: 
nums_dec = num.decode(nums[0][:20]); nums_dec                               # Decoding Integers. 

(#20) ['xxbos','xxmaj','wow',',','a','movie','about','xxup','nyc','politics'...]

In [13]:
#@ GETTING TOKENS: 
tok.decode(nums_dec)

'xxbos xxmaj wow , a movie about xxup nyc politics seemingly written by someone who has never set foot in'

In [14]:
#@ IMPLEMENTATION OF TRANSFORMS: TOKENIZATION IN TUPLES: 
tok((txts[0], txts[1]))                                                     # Implemetation of Tokenization. 

((#78) ['xxbos','xxmaj','wow',',','a','movie','about','xxup','nyc','politics'...],
 (#292) ['xxbos','i','felt','obliged','to','watch','this','movie','all','the'...])

**CUSTOM TRANSFORM FUNCTION:**

In [15]:
#@ WRITING CUSTOM TRANSFORM FUNCTION: 
def f(x:int): return x + 1                                 # Defining Function. 
tfm = Transform(f)                                         # Initializing a Transform. 
tfm(2.0), tfm(2)                                           # Inspection. 

(2.0, 3)

**DECORATOR:**
- Python has a special syntax for passing a function to another function or something that behaves like a function which is also known as callable in **Python** is called a **Decorator**. A **Decorator** is used by prepending a callable with @ and placing it before the function definition. 

In [16]:
#@ WRITING CUSTOM TRANSFORM FUNCTION WITH DECORATORS: 
@Transform
def f(x:int): return x + 1                                 # Defining Function. 
tfm(2.0), tfm(2)                                           # Inspection. 

(2.0, 3)

In [17]:
#@ WRITING CUSTOM TRANSFORM FUNCTION WITH SETUP AND DECODE: 
class NormalizeMean(Transform):                            # Initializing Transform Subclass. 
    def setups(self, items):                               # Defining Setup Function. 
        self.mean = sum(items) / len(items)                # Getting Mean of Items. 
    def encodes(self, x): return x - self.mean             # Defining Encode Function. 
    def decodes(self, x): return x + self.mean             # Defining Decode Function. 

#@ IMPLEMENTATION OF NORMALIZE MEAN CLASS: 
tfm = NormalizeMean()                                      # Initializing Class. 
tfm.setup([1, 2, 3, 4, 5])                                 # Initializing Object. 
start = 2
y = tfm(start)                                             # Implementation of Class. 
z = tfm.decode(y)                                          # Implementation of Decode Function. 
tfm.mean, y, z                                             # Inspection. 

(3.0, -1.0, 2.0)

**PIPELINE**
- **Pipeline** class helps to compose several **Transforms** together. 

In [18]:
#@ INITIALIZING PIPELINE CLASS: 
tfms = Pipeline([tok, num])                               # Initializing Tokenizer and Numericalization. 
t = tfms(txts[0]); t[:20]                                 # Implementation of Pipeline Class. 

TensorText([   2,    8, 1005,   11,   12,   27,   61,    7, 4040, 2168, 1309,  395,   50,  246,   57,   69,  133,  301, 1669,   19])

In [19]:
#@ IMPLEMENTATION OF PIPELINE CLASS: 
tfms.decode(t)[:100]                                      # Initializing Decoder. 

'xxbos xxmaj wow , a movie about xxup nyc politics seemingly written by someone who has never set foo'