Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
maheshwarishikha edited this page Dec 12, 2017 · 10 revisions

Welcome to the Engineering Insights wiki!

Umbrella SI Journey

Engineering Insights: Leverage IBM Cloud, Watson services and Open source technologies to derive insights from unstructured text content generated in various business domains

Short Name

Engineering Insights from artifacts generated in the software development and maintenance lifecycle

Short Description

Derive engineering insights from unstructured text content from artifacts generated in the software development and maintenance lifecycle using IBM Cloud, IBM Data Science Experience, Python NLTK, Watson services and Orient DB

Offering Type

Cognitive

Introduction

There is a lot of unstructured text content that is generated in any domain - software development lifecycle, finance, healthcare, social media etc. Valuable insights can be generated by analyzing the unstructured text content and correlating the information across various document sources.

This composite code pattern uses Watson Natural Language Understanding, Python NLTK, Orient DB, Node-RED and IBM Data Science Experience to build a complete analytics solution that generates insights for an informed decision making.

Author

By Vishal Chahal, Manjula Hosurmath,Balaji Kadambi,Shikha Maheshwari

Code

Demo

N/A

Video

https://youtu.be/O4_eAFaYanY

Overview

This composite pattern uses a combination of other individual code patterns to derive insights from unstructured text content across various data sources. It is intended for developers who want a head start to build a complete end to end solution for such insights.

This composite pattern demonstrates a methodology to derive insights with IBM Cloud, Watson services, Python NLTK, Orient DB and IBM Data Science experience using the below code patterns:

Flow

  1. The unstructured text data that need to be analyzed and correlated is extracted from the documents using custom python code.
  2. The text is classified and tagged using the code pattern - Extend Watson text classification
  3. The text is correlated with other text using the code pattern - Correlate documents
  4. The document data and correlations are stored in the Orient DB database using the code pattern Store, graph, and derive insights from interconnected data
  5. The analytics solution on IBM Data Science Experience is invoked and visualized using the code pattern - Orchestrate data science workflows using Node-RED

Included components

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.

  • Watson Natural Language Understanding: An IBM Cloud service that can analyze text to extract meta-data from content such as concepts, entities, keywords, categories, sentiment, emotion, relations, semantic roles, using natural language understanding.

  • Node-RED: Node-RED is a programming tool for wiring together APIs and online services.

  • OrientDB: A Multi-Model Open Source NoSQL DBMS.

  • Kubernetes Clusters: an open-source system for automating deployment, scaling, and management of containerized applications.

Featured technologies

  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.

  • Natural Language Processing

  • Graph Database: A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation.

Blog

Title - Explore and get insights on unstructured text data in any domain

It is needless to say that a lot of valuable insights is missed when unstructured data is not analyzed. The percentage of unstructured data that is generated has increased exponentially in the digital world over the last decade. In order to have a competitive advantage with respect to decision making, it is essential to extract all the important insights from the unstructured data.

Here are a sample of the problems that are encountered by IT specialists and Data Scientists in their day to day life:

  • In an existing long running software project, there are huge number of document artifacts generated – requirements, defects, testcases, tasks. Can the unstructured text content in the artifacts be analyzed to generate a mapping between requirements, defects, testcases etc. The insights generated can be used for test case execution optimization or generation of new testcases in areas where there are more defects.
  • A car has multiple systems. Every system has a text manual. Is it possible to extract important information from the manual to answer common queries. It has to be augmented with appropriate responses from online portals. This requires a query text to be correlated with text in the manuals and online portals.
  • A prototype needs to be built to show case a complete end to end analytics solution involving text analytics to a prospective customer with an interactive visualization.

The composite pattern demonstrates a methodology to build a complete end to end text analytics solution for such problems. The methodology can be applied to problems in any domain. It can shorten the time to build a solution with re-usable patterns.

The pattern also demonstrates an interactive user interface using D3.js which allows an user to drill down to get more insights on the artifacts. D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It makes use of the widely implemented SVG, HTML5, and CSS standards.

At the end of this pattern, the developer will learn to build an interactive text analytics solution with customization using IBM Data Science Experience, Python NLTK, IBM Cloud services, Watson services, D3.js and Orient DB.

View the entire [Engineering Insights from artifacts generated in the software development and maintenance lifecycle] (https://github.com/IBM/engineering-insights-composite-pattern/) Journey, including demos, code, and more!

Links