New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Starting Introduction To The Project #2

Closed
Mec-iS opened this Issue Feb 10, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@Mec-iS
Copy link
Contributor

Mec-iS commented Feb 10, 2017

As the first students expressed interest in the project, I write here some more insights about the things at this very early stage.

​The​ objective for this project is to create a demo Web API ​implementing the HYDRA draft, that is a​n RDF-based​ framework. The entities defined in the specs are meant to describe the structure and usage of a generic Web API, to let an HYDRA-enabled ("intelligent" or "smart") client to connect to the API's entrypoint and automatically find out where and how to find the needed data.

In this scenario the layers involved are:
A. HYDRA server that can serve data and metadata to a client (this layer can be split into a traditional lower level server relying on a graph database plus a "HYDRA middleware"),
B. client that can "understand" HYDRA metadata and connect to HYDRA-enabled services, and possibly "learn and remember" about past interactions with other services to store its own set of concepts to be used in the usage's domain.

The objective is generally to let different HYDRA-enabled clients to exchange data each other. These clients can be running on any kind of machine, but the focus for this automation are IoT (connected) devices (industrial or consumer or research).
Usage scenario:

  • the IoT client X needs to know at which TIME the OPERATION Y was performed by the DEVICE Z. Its starting knowledge it is only about the API entrypoint's URI.
  • the client X fetches the metadata from the entrypoint, it finds out that to get TIME FOR Y ON Z it needs to request the endpoint http://entrypoint/gettime with method GET and passing Y, Z as parameters
  • the client makes the request to pull the data

Different concepts and classes are involved. An RDF domain has to be defined for the metadata exchange to work.

To make the demo interesting I suggested we should leverage space exploration and astronomy, so the graph can be based on these vocabularies: ​https://github.com/chronos-pramantha/RDFvocab/blob/master/ld%2Bjson/Spacecraft.json I can suggest possible operations to be requested to the API. Resources are well connected to popular repositories, so we can reach a great amount of knowledge without storing too much.

A very good starting design for the server is https://github.com/antoniogarrote/levanzo
​A Python implementation for the server client: https://github.com/pchampin/hydra-py​

Please enlist questions and comments below.

Some resources:

  • a blog post about documenting an Web API and programming a client

PS. the stack to be used has to be decided yet. We should tend to use a full-Python implementation, except for the lower layer where a graph database is required, we are free to experiment so we can suggest anything in the beginning. At first impression I would avoid Triple Stores and try to use a Graph Database or try to prototype something with Apache TinkerPop or also Spark GraphX, to gain in flexibility to switch to different solutions. At first I would prefer to not get concerned into stability and scalability but just try to reach the first working tool, to let the things to be iterated.

UPDATE: To have a better insight, one of the proposed design is described at #3

UPDATE:
There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:

  1. (Astronomy-based) the one you have written about is an idea coming form the quite recent development brought by planetary science about exoplanets. If you have a list of star systems with some planets observed, you can have your REST API to create the observed star and the observed planets orbiting that star. This implementation uses the Astronomy vocabulary.
  2. (Engineering-based) the one described in #3 is instead the first idea I had and it is about designing simulated spacecraft spare parts (Cubesat's COTS) and serve these parts using a REST API. In this case the user could create his/her own parts and put them together (with physical constraints applied) to build its own spacecraft. This implementation uses the Spacecraft and SubSystems vocabulary.
  3. (NLP based) an idea about a semantic engine that can translate human questions about the Solar System into query for the no.1 above and reply consistently; i.e. "How bigger is Jupiter compared to Earth?" from the user, and the server/client able to reply "Earth has a mass of 5.97 × 10^24 kg. Jupiter has a mass of 1.8986×10^27 kg".

UPDATE: Gitter chat available here

UPDATE: Check also this architectural proposal

@Mec-iS Mec-iS added this to the Getting-into-GSOC milestone Feb 10, 2017

@chrizandr

This comment has been minimized.

Copy link
Member

chrizandr commented Feb 10, 2017

If I may, I would recommend using rdflib, there is also Apache Jena which is good, however they do not provide a Python API. Rdflib has a learning curve, but all in all it is much more powerful than any other tools that I have used.
For the stack as well, would it be good to use Django? It is pretty robust and would help people focus more on the actual aim of the project rather than on the intricacies of the stack itself.
Please let me know if my suggestions were useful. :)

@kkoci

This comment has been minimized.

Copy link
Member

kkoci commented Feb 11, 2017

There is a cool graph database engine called OrientDB https://github.com/orientechnologies/orientdb They also have a python client though

@Mec-iS Mec-iS added the wiki label Feb 13, 2017

@pchampin

This comment has been minimized.

Copy link

pchampin commented Feb 23, 2017

A few comments about https://github.com/pchampin/hydra-py​
@Mec-iS it is a client implementation, not a server implementation
@chrizandr it is based on rdflib

@chrizandr

This comment has been minimized.

Copy link
Member

chrizandr commented Feb 23, 2017

@Mec-iS suggeseted that I post this here for better feedback

I have been going through the Hydra Proposal for the last couple of days and have been trying to understand what both Hydrus and Hydra is.

Here is all that I have understood:

Hydra is an XML namespace that uses RDF and provides tags to define Linked Data in a Graph Database. Hydra also allows us to define API Documentation along with other types of data, that would allow us to expose server APIs for clients to exchange data with servers without the need of clients to use only hyperlinks.
The main aim of the Hydra is to have a semantic understanding of what the content of each Hyperlink is, allowing clients to deference suitable hyperlinks and ignoring unnecessary ones. Hydra also allows HTTP operations directly between the client and server.

Hydrus is a python based web app, that is used to demonstrate the capabilities of Hydra using a Space Exploration example.
The general work flow of the app (from my understanding) would be:

  • Users give textual input in their natural language.
  • Hydrus processes this input and matches them to relevant classes and operations using NLP and Machine Learning.
  • Hydrus gets the API for the relevant operation from the server.
  • Asks the server to perform the operations on the relevant classes and directly give the output to the user.

This is similar to the example given on github, where a user requests the distance between Mars and Earth.

Please let me know if what I have understood is correct or not.

If possible, I would also like to know what more I can do to work/start working on this project for GSoC.

@pchampin

This comment has been minimized.

Copy link

pchampin commented Feb 24, 2017

@chrizandr, you wrote

Hydra is an XML namespace

that's not stricly correct, as Hydra has no (direct) relation to XML. Hydra is a vocabulary, i.e. a set of IRIs (in RDF/Linked Data, we use IRIs to identify any thing of interest: classes, attributes and relations, instances, datatypes...).

allowing clients to deference suitable hyperlinks

More generally, Hydra describes the available HTTP operations and what they mean/do. Dereferencing a hyperlink is just one particular HTTP operation (GET), although indeed it is the most common.

But overall, I think you git it right :)

@Mec-iS

This comment has been minimized.

Copy link
Contributor

Mec-iS commented Feb 24, 2017

@chrizandr

The general work flow of the app (from my understanding) would be:

  • Users give textual input in their natural language.
  • Hydrus processes this input and matches them to relevant classes and operations using NLP and Machine Learning.
  • Hydrus gets the API for the relevant operation from the server.
  • Asks the server to perform the operations on the relevant classes and directly give the output to the user.

There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:

  1. (Astronomy-based) the one you have written about is an idea coming form the quite recent development brought by planetary science about exoplanets. If you have a list of star systems with some planets observed, you can have your REST API to create the observed star and the observed planets orbiting that star. This implementation uses the Astronomy vocabulary.
  2. (Engineering-based) the one described in #3 is instead the first idea I had and it is about designing simulated spacecraft spare parts (Cubesat's COTS) and serve these parts using a REST API. In this case the user could create his/her own parts and put them together (with physical constraints applied) to build its own spacecraft. This implementation uses the Spacecraft and SubSystems vocabulary.

I would like to have your opinion about which one (or both?) can be the best one to fully express and test Hydra-features. I would be happy to see both working but if I need to choose I would say no.2 because its domain is much more defined and limited, that is a good thing for a test.

@Mec-iS

This comment has been minimized.

Copy link
Contributor

Mec-iS commented Feb 24, 2017

@chrizandr

If possible, I would also like to know what more I can do to work/start working on this project for GSOC

Keep studying the spec and asking questions (here or writing to the public HYDRA list public-hydra@w3.org introducing yourself). Start studying how Levanzo implemented a server in Clojure to serve an HYDRA API, take as much inspiration as possible to develop a similar server in Python.
Then, you can fork the repository and start coding. You PR will be reviewed and commented and finally merged.

@pchampin pchampin referenced this issue Feb 25, 2017

Closed

Backend choice #6

@Mec-iS Mec-iS closed this Feb 25, 2018

xadahiya pushed a commit that referenced this issue Mar 6, 2018

xadahiya added a commit that referenced this issue Oct 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment