Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Insight Extractor

The Insight Extractor was the ML model that Considdr used to identify abstractive sentences in full text documents on the web. Considdr closed in the summer of 2020 and now we're making our model freely available to all. We'd love to hear the interesting ways people apply this model. All we ask is that you cite this repo.

Abstractive sentences are of particular value when it comes to understanding the key insights in adjacent documents. For more on this summarization approach see "Summarization by Adjacent Document."


pip install insight_extractor


  • We use Tensorflow 2.X and recommend using Python 3.6 or higher.
  • Python 2 is not supported

Using insight_extractor

v0.1.1 of insight_extractor exposes one primary function --extract_insights -- which takes a list of candidate insight sentences and returns a list of prediction scores signifying the probability that our model thinks a given sentence is an insight.


# given a list of input sentences
sentences = [
    'According to the most recent statistics, more than a million people a year are arrested for simple drug possession in the United States -- and more than half a million of those arrests are for marijuana possession.',
    'One study found that for cancer patients considering experimental chemotherapy, trust in their physician was one of the most important reasons they enrolled in a clinical trial -- on par with the belief that the treatment would be effective.',
    'Senate leaders were working to agree on a dual track to try the departing president at the same time it considered the agenda of the incoming one, an exercise never tried before.',

Insight Extraction

# import
from insight_extractor.pipeline import extract_insights

# get insight predictions
predictions = extract_insights(sentences)

# print predictions


[0.7167318, 0.6289567, 0.01138071]

Notes on Interpretation

Of the three sample input sentences, we would define the first two as an "insight", but not the last sentence. As you can see our model predicts that the first and second sample sentences are insights with a probability of ~72% and ~63% respectively.

Generally most sentences in a given article are not insight sentences. However, some sentences are more "abstractive" than others. In practice, we found that most sentences predicted with >10% probability of being an insight often have at least some abstractive value. You may want to fiddle with the threshold given your use-case and tolerance for False Positives.


v0.1.X is really the bare minimum functionality of the Considdr insight model.

  1. In the actual production implementation we took as inputs entire articles (html pages) and returned insight sentences from that article.
  2. We leveraged the fact that multiple documents often abstract the same works and built a second much more complex model to cluster similar insights together.
  3. We also trained various versions of our model on academic documents when we built out proof of concepts for academic search engines that were interested in our technology. Citation structure enables a very clear extension of our summarization by adjacent document approach.

Over time, we plan to update this package to better reflect the robustness of the Considdr product. Collaborators and contributors are welcome.


This insight extraction model benefitted from the hard work of many of our team members at Considdr. In particular, hand labeling thousands and thousands of sentences and cross-validating those labels across members of our team was an especially grueling effort. Thank you to Hailey Wahl, Kevin Lane, Derek Yau, and Eddie Korando for all your help here.

A special thank you to Gaurav Sood who encouraged us to share our model with the broader community and who helped walk us through best practices for packaging ML models.

We also heavily utilized the following resources in building our CNN model.


Noah Finberg and Marcus Christiansen


The package is released under the MIT License.


A package of a simple version of the Considdr (2014-2020) ML model used to extract insights from full text documents.








No releases published


No packages published