Skip to content

TrevorDArcyEvans/themane

Repository files navigation

themane - Totally Awesome Text Summarisation!

We can take some text, analyse it, and generate a concise summary at the click of a button.
There's even some parameters to play with, so you can tweak it to your tastes, but, honestly, the defaults work just fine.

intro

input

output

Background:

This is a simple app to wrap three different text summarisation algorithms:

  • CodePlex.OpenTextSummarizer
  • Open Text Summarizer
  • Text Rank

The app is written in:

  • C#
  • Blazor
  • Dotnet Core

There is a database to authenticate users and track usage. Database creation scripts are provided for:

  • Microsoft SQL Server
  • SQLite

There are also various websites which do similar summarisations:

Building

git clone https://github.com/TrevorDArcyEvans/themane.git
dotnet restore
dotnet build
cd Themane.Web
dotnet run

Navigate to http://localhost:5000/

Summary:

Extractive summarisation works well; abstractive summarisation does not.

Discussion:

Text summarisation is generating an abstract or summary of an article. There are currently two main types of summarisation:

  • extractive is where the most relevant/important sentences are taken from the article and used directly in the summary.
  • abstractive is where the summary is written in much the same way that a human would write it. This requires understanding of both the subject matter and language.

Extractive summarisation is well known, well understood and works reasonably well. There are several implementations available and most previous research has been on this technique.

Abstractive summarisation is an emerging technique and is using artificial intelligence (AI) and machine learning (ML) methods. There has been a lot of recent activity, probably fuelled by the current interest in AI+ML. Whilst the technique shows a lot of promise, there are a lot of issues:

  • deep understanding of AI
  • very large training datasets
  • resource intensive to train AI algorithms
  • difficulty in training algorithms
  • currently only works (at all) with short articles

References:

A very small selection of articles:

A search in google will no doubt yield other articles.