Practical NLP

This series of notebooks is aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks. My goal is to keep the code and work flow simple, and focus on actual use cases.

PART 5: Fine Tune Transformer Model On Custom Dataset

Notebook5.0 is an adaptation of my new repo on using transformer models to detect state trolls on Twitter. I reckon many may not be interested in the subject matter, so I only ported over the Colab notebook for fine tuning with custom dataset for folks who are specifically looking for examples like this.

This notebook took about 5.5 hours to run on a Colab Pro account on TPU and "high-RAM" settings. It could go slower or faster depending on your set-up. The datasets needed - train_raw.csv and validate.csv - are in the data folder of this repo.

PART 4: Batch Machine Translation with Hugging Face+MarianMT and FB/Fairseq

Machine translation doesn't generate as much excitement as other emerging areas in NLP, but recent advances have opened up interesting new possibilities in this space. Over 5 short notebooks, I'll demo a simple workflow for using Hugging Face's version of MarianMT, as well as Facebook's Fairseq toolkit for translation.

The HF-MMT demos cover:

The FB-Fairseq demos cover (Added Dec 29 2020):

Results from neural machine translation models are not (yet) as artful or precise as those by a skilled human translator. But they get 60% or more of the job done, in my view. Depending on your use case, that could be a huge time saver.

Fuller background and details in this Medium post here.

PART 3: Beginner's Guide To Building A Singlish AI Chatbot

AI text generation is one of the most exciting fields in NLP, but also a daunting one for beginners. These 4 notebooks aim to speed up the learning process for newcomers by combining and adapting various existing tutorials into a practical end-to-end walkthrough with notebooks and sample data for a conversational chatbot that can be used in an interactive app.

3.0: Data preparation
3.1: Fine tuning a pretrained DialoGPT-medium model on Colab
3.2: Testing the model's performance on an interactive Dash app
3.3: CPU alternative to text generation

Fuller background and details in this Medium post here.

PART 2: Text Summarisation Of Short and Long Speeches Using Hugging Face's Pipeline

Text summarization is a far less common downstream NLP task compared to, say, classification or sentiment analysis. The resources and time needed to do it well are considerable. Hugging Face's transformers pipeline, however, has made the first part of the task much faster and efficient. More time can then be devoted to analysing the results, and/or building your own benchmarks for assessing the summaries. This notebook incorporates minor work-arounds to handle longer speeches, which is trickier to handle due to sequence length limits in the transformer models/pipeline.

Fuller background and details in this Medium post here.

PART 1: Sentiment Analysis Of Political Speeches Using Hugging Face's Pipeline

Sentiment analysis is a fairly common task in machine learning. Hugging Face's new pipeline feature, however, has made it incredibly easy to use a transformer-based model for this task. In this notebook, I'll explore how the HF pipeline can be used together with Plotly and Google Sheets to produce a detailed analysis of one speech, as well as how the same technique can be adapted for longer-term analysis of political speeches on one topic, or those by a common group of speakers.

Fuller background in this post here.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
apps		apps
charts_tables		charts_tables
data		data
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical NLP

PART 5: Fine Tune Transformer Model On Custom Dataset

PART 4: Batch Machine Translation with Hugging Face+MarianMT and FB/Fairseq

PART 3: Beginner's Guide To Building A Singlish AI Chatbot

PART 2: Text Summarisation Of Short and Long Speeches Using Hugging Face's Pipeline

PART 1: Sentiment Analysis Of Political Speeches Using Hugging Face's Pipeline

About

Releases

Packages

Languages

chuachinhon/practical_nlp

Folders and files

Latest commit

History

Repository files navigation

Practical NLP

PART 5: Fine Tune Transformer Model On Custom Dataset

PART 4: Batch Machine Translation with Hugging Face+MarianMT and FB/Fairseq

PART 3: Beginner's Guide To Building A Singlish AI Chatbot

PART 2: Text Summarisation Of Short and Long Speeches Using Hugging Face's Pipeline

PART 1: Sentiment Analysis Of Political Speeches Using Hugging Face's Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages