<a href="https://colab.research.google.com/github/BHouwens/kitchen_sink/blob/main/NLPPrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Natural Language Processing (NLP) Prediction**

Natural Language Processing (or NLP) modeling takes data in the form of text, where the objective can change depending on what the intent of the model and modeller. This notebook will serve as a boilerplate handler for NLP modeling in prediction, which includes things like predicting sentiment, spam, or things like the author or source website.

This notebook will *not* deal with generation tasks, such as generating new content, translation or summarisation. This is left to a separate notebook.

..

---



**REMEMBER**: This boilerplate is just that: boilerplate! It's a good idea to perform your own exploration in a manner that's specific to your given dataset.

## **Setup**

This section will contain everything you need to get set up in your environment, including all imports and installations that you may require for your project.

In [None]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [None]:
# export
COLOURS = {
    'Reset': "\x1b[0m",
    'Bright': "\x1b[1m",
    'Dim': "\x1b[2m",
    'Underscore': "\x1b[4m",
    'Blink': "\x1b[5m",
    'Reverse': "\x1b[7m",
    'Hidden': "\x1b[8m",
    'FgBlack': "\x1b[30m",
    'FgRed': "\x1b[31m",
    'FgGreen': "\x1b[32m",
    'FgYellow': "\x1b[33m",
    'FgBlue': "\x1b[34m",
    'FgMagenta': "\x1b[35m",
    'FgCyan': "\x1b[36m",
    'FgWhite': "\x1b[37m",
    'BgBlack': "\x1b[40m",
    'BgRed': "\x1b[41m",
    'BgGreen': "\x1b[42m",
    'BgYellow': "\x1b[43m",
    'BgBlue': "\x1b[44m",
    'BgMagenta': "\x1b[45m",
    'BgCyan': "\x1b[46m",
    'BgWhite': "\x1b[47m"
}

def pretty_log(msg, msg_type='info'):
  print("")
  print("{c}//===== {m} =====//{r}".format(c=COLOURS['FgBlue'], m=msg, r=COLOURS['Reset']))
  print("")

### **Imports and Installs**

In [None]:
# Installs
!pip install -Uqq fastai

[K     |████████████████████████████████| 204kB 14.1MB/s 
[K     |████████████████████████████████| 61kB 5.6MB/s 
[?25h

In [None]:
# Imports
from fastai.vision.all import *
import numpy as np
import math
import seaborn as sn

### **Colab Setup**

This will get you set up in a Colab environment, with your Google Drive mounted and ready to read and write data

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

%cd gdrive/My Drive/

..


---



## **Data Collection**

This section contains all the code necessary to pull in the relevant data for your project

In [None]:
# FETCH YOUR DATA HERE

Great! Now that we have our data we can proceed to explore it a little.

..



---



## **Exploratory Data Analysis**

EDA can be performed here, where you'll find cells for showing batches, as well as utility functions for displaying certain analytics. It also contains headings to prompt some thinking about possible exploratory approaches.

In [None]:
# EXPLORE ALL YOUR FANTASTIC DATA HERE!

Now that we've had a chance to explore the data, we can start to preprocess it to get it into a state that's appropriate for our modeling.

..



---



## **Preprocessing**

This section is for preprocessing textual data so that it can be fed into a model. A general approach will involve tokenising the text corpus in some way so that we can more easily prepare it for the model, but there are many ways in which tokenisation can be done.

In [None]:
# PERFORM YOUR PREPROCESSING HERE!

Now that the text has been preprocessed we can finally start with our model.

..



---



## **Model**

Model work can be performed here, with utilities to help with cross-validation and architecture construction.

In [None]:
# PERFORM MODEL WORK HERE

Great, now that we have a working model we can proceed to export and get ready for implementation in future projects.

..



---



## **Export and Clean Up**

Our model can be exported in this section, as well as any clean up of the environment that we may be running the notebook in.

In [None]:
!python notebook2script.py NLPPrediction.ipynb

In [None]:
# Tear down the data folder
!rm -rf data
!ls

..