Skip to content
Neha-Setia edited this page Sep 18, 2017 · 26 revisions

Welcome to the OrientDB-Insights wiki!

Umbrella SI Journey

Engineering Insights: Leverage IBM Watson and IBM Enterprise Cloud to deliver breakthrough insights with a cognitive approach across the Application Engineering Lifecycle

Short Name

Insights from OrientDB database

Short Description

Get Insights from OrientDB using IBM Data Science Experience.

Offering Type

Cognitive

Introduction

This journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks. Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media. Graph databases are well-suited for analyzing interconnections like to mine data from social media. It is also useful for working with data in business disciplines that involve complex relationships and dynamic schema and creating recommendations like "customers who bought this also looked at...". This journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with OrientDB, create a new OrientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the OrientDB database.

Author

By Vishal Chahal, Neha

Code

Demo

N/A

Video

https://www.youtube.com/watch?v=oGj2Bi_Viqo&t=15s

Overview

The Journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks.

OrientDB is a multi-model database, supporting graph, document, key/value, and object models, but the relationships are managed as in graph databases with direct connections between records. Graph databases are well-suited for analyzing interconnections like to mine data from social media. It is also useful for working with data in business disciplines that involve complex relationships and dynamic schema and creating recommendations like "customers who bought this also looked at...". This can allow a large level of flexibility to represent your data in a way that makes the most sense to everyone involved, whilst still making the most of the complex interactions between it.

This journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with OrientDB, create a new OrientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the OrientDB database.

In this journey we will demonstrate:

  • Setting up ipython notebook on DSX connecting to OrientDB using PyOrient.
  • To perform the CRUD operations and extracting insights from OrientDB database.

Flow

  1. The developer sets up the Kubernetes cluster using Kubernetes service on IBM Bluemix.
  2. The OrientDB instance is deployed on the Kubernetes cluster created by the developer in the first step with persistent volume, exposing the ports(2424, 2480) used by OrientDB on bluemix.
  3. The developer creates a Jupyter notebook on the IBM DSX powered by the spark. While the creation of notebook, an instance of Object Storage is attached to the notebook for storing the data used by the notebook.
  4. The developer uploads the configuration file (config.json) and the dataset (graph-insights.csv) in the object storage.
  5. The credentials of Object Storage are updated in the notebook and the files from Object Storage are loaded to create the graph from them in OrientDB.
  6. The notebook communicates with the OrientDB through the PyOrient driver. And various operations are performed on the OrientDB using functions written in the Jupyter notebook.

Included components

  • OrientDB: A multi-model open source NoSQL DBMS.

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • Bluemix Object Storage: A Bluemix service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.

  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

  • Kubernetes Clusters: An open-source system for automating deployment, scaling, and management of containerized applications.

Featured technologies

  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.

  • Graph Database: A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation.

Blog

Graphs are already prevalent in the real world, and in software development. For example, if you are a twitter user, you are one node of the twitter graph with your attributes being the number of tweets written by you, number of people you are following, and to whom you are following, who is following you, is the relationship between you and other Twitter users. Well, pretty much you are dealing with a Graph. Graph Databases are currently gaining a lot of interest, as they can give very powerful data modeling tools that provide a closer fit to how your data works in the real world. OrientDB is a multi-model database, supporting graph, document, key/value, and object models, but the relationships are managed as in graph databases with direct connections between records.

This journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks. This journey provides a guide on setting up ipython notebook on DSX connecting to OrientDB and performing CRUD operations on the database using PyOrient. This developer journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with OrientDB, create a new OrientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the OrientDB database.

By the end of this journey, the users will have a good understanding of the OrientDB which can be extended to create their own domain specific knowledge graph as per their business requirements and extract interesting information from it.

View the entire [Orient DB operations on IBM Data Science Experience] (https://github.com/IBM/graph-db-insights/) Journey, including demos, code, and more!

Links