### ***What is ORKG?***

We build the next generation digital libraries for semantic scientific knowledge communicated in scholarly literature. We focus on the communicated content rather than the context e.g., people and institutions in which scientific knowledge is communicated, and the content is semantic i.e., machine interpretable.

Scientific knowledge continues to be confined to the document, seemingly inseparable from the medium as hieroglyphs carved in stone. The global scientific knowledge base is little more than a collection of documents. It is written by humans for humans, and we have done so for a long time. This makes perfect sense, after all it is people that make up the audience, and researchers in particular.

Yet, with the monumental progress in information technologies over the more recent decades, one may wonder why it is that the scientific knowledge communicated in scholarly literature remains largely inaccessible to machines. Surely it would be useful if some of that knowledge is more available to automated processing.

The Open Research Knowledge Graph project is working on answers and solutions. The recently initiated TIB coordinated project is open to the community and actively engages research infrastructures and research communities in the development of technologies and use cases for open graphs about research knowledge.



### ***Who can use orkg package?***

Well, the short answer is anybody. The package is designed to be minimalistic, simple knowledge of python and JSON suffices for you to know and understand how the package works.

You can use the ORKG package to add/edit/list data from any instance of the open research knowledge graph. It can be easily integrated into data science workflows, it can be used to fetch data for analysis and visualizations.

The sky is your limit :)

# ***Installation***

ORKG python package is a simple API wrapper for the ORKG’s API.

Start using it by typing these few words:

In [None]:
!pip install orkg

Collecting orkg
  Downloading orkg-0.18.0-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.6/46.6 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Deprecated<2.0.0,>=1.2.14 (from orkg)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting Faker<20.0.0,>=19.1.0 (from orkg)
  Downloading Faker-19.6.2-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Inflector<4.0.0,>=3.1.0 (from orkg)
  Downloading Inflector-3.1.0-py3-none-any.whl (12 kB)
Collecting cardinality<0.2.0,>=0.1.1 (from orkg)
  Downloading cardinality-0.1.1.tar.gz (2.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting hammock<0.3.0,>=0.2.4 (from orkg)
  Downloading hammock-0.2.4.tar.gz (4.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pandas<3.0.0,>=2.0.1 (from orkg)
  Downloading pandas-2.1.1-cp310-cp

# ***Usage***

In order to use the package in your python code, you just need to import it and instantiate an instance of the base class to use it.

In [None]:
from orkg import ORKG # import base class from package

orkg = ORKG(host="<host-address-is-here>", creds=('email-address', 'password')) # create the connector to the ORKG

ValueError: ignored

In [None]:
The package can be used to connect to any instance of the ORKG local or remote. The host parameter may be specified with an address or environment name (see below).
Optionally you can provide the credentials to authenticate your requests. If host is not specified, it defaults to sandbox.

 # ***Host***

Three different host environments are available:


1.   ***Production***: The most stable version of the ORKG. Use this host to access the current version of the graph and add persistent data.

In [None]:
# from orkg import ORKG, Hosts # import base class from package

# orkg = ORKG(host=Hosts.PRODUCTION) # create the connector to the ORKG



2. ***Sandbox:*** The ORKG playground! It has the same features that exist on production at any given time. Use this host to experiment with the ORKG or Python package features without adding data to the main graph. Data added to sandbox is periodically deleted, and this version of the ORKG may not contain all of the data found in the production graph.



In [None]:
from orkg import ORKG, Hosts # import base class from package

orkg = ORKG(host=Hosts.SANDBOX) # create the connector to the ORKG


3. ***Incubating:*** The ORKG version with the newest features being tested. Since features are still under development, this version may break or not function as expected. Data added to incubating is periodically deleted, and this version of the ORKG may not contain all of the data found in the production graph.



In [None]:
from orkg import ORKG, Hosts # import base class from package

orkg = ORKG(host=Hosts.INCUBATING) # create the connector to the ORKG

Three main classes should be known when using the ORKG python package.



1. ORKG class (main class to connect to an ORKG instance).
2. OrkgResponse (output encapsulation for the ORKG API).
3. OrkgUnpaginatedResponse (output encapsulation for pageable endpoints of the ORKG API)


In [None]:
connector.resources # entry point to manipulate ORKG resources
connector.predicates # entry point to manipulate ORKG predicates
connector.classes # entry point to manipulate ORKG classes
connector.literals # entry point to manipulate ORKG literals
connector.stats # entry point to get ORKG statistics
connector.statements # entry point to manipulate ORKG statements
connector.papers # entry point to manipulate ORKG papers
connector.comparisons # entry point to manipulate ORKG comparisons
connector.objects # entry point to manipulate ORKG objects
connector.templates # entry point to manipulate ORKG templates (Alpha)
connector.harvesters # entry point to run harvesting functionalities on top of ORKG data (Alpha)

For the other main component which represent the output of all requests in the package.

In [None]:
connector.ping() # this call will check the availability of the ORKG and returns True if the request is successful; otherwise, returns False.

In [None]:
connector.resources.get()  # this call will fetch a collection of resources from the ORKG

Example apps: Streamlit

Welcome to Streamlit 👋
A faster way to build and share data apps.


In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.27.2-py2.py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.37-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.0/190.0 kB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m110.4 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.1/82.1 kB[0m [31m10.5 MB/s[0m eta [36m0:00

In [None]:
! streamlit hello


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  Welcome to Streamlit. Check out our demo in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.150.230.180:8501[0m
[0m
  Ready to create your own Python apps super quickly?[0m
  Head over to [0m[1mhttps://docs.streamlit.io[0m
[0m
  May you create awesome apps![0m
[0m
[0m
[34m  Stopping...[0m
[34m  Stopping...[0m


Add example function from Yaser

In [None]:
def get_paper_via_doi(doi: str):
  dois = orkg.literals.get_all(q=doi, exact=True).content
  if len(dois) > 0:
    # Found the id of DOI predicate from here https://orkg.org/resource/R36109?noRedirect
    statements = orkg.statements.get_by_object_and_predicate(predicate_id="P26", object_id=dois[0]["id"]).content
    return statements[0]["subject"]
  else:
    return None

get_paper_via_doi("10.1101/2020.03.03.20029983")

{'id': 'R36109',
 'label': 'Transmission interval estimates suggest pre-symptomatic spread of COVID-19',
 'classes': ['Paper', 'FeaturedPaper'],
 'shared': 0,
 'featured': True,
 'unlisted': False,
 'verified': True,
 'extraction_method': 'UNKNOWN',
 '_class': 'resource',
 'created_at': '2020-04-08T15:56:42.986436+02:00',
 'created_by': '3a55ad95-645b-4c1f-b614-d2293718ee0b',
 'observatory_id': '06144227-7efc-478c-8555-25147020f02f',
 'organization_id': 'edc18168-c4ee-4cb8-a98a-136f748e912e',
 'formatted_label': None}

In [None]:
def get_contributions_of_paper(paper_id: str):
  if orkg.resources.exists(paper_id):
    # P31 is the predicate that connects papers to contributions
    # It can be found when exploring any paper resource in the ORKG
    statements = orkg.statements.get_by_subject_and_predicate(subject_id=paper_id, predicate_id="P31").content
    return [st["object"] for st in statements]
  else:
    raise ValueError("Paper doesn't exist in the system!")

get_contributions_of_paper("R36109")

[{'id': 'R306172',
  'label': 'Contribution 36',
  'classes': ['Contribution'],
  'shared': 1,
  'featured': False,
  'unlisted': False,
  'verified': False,
  'extraction_method': 'UNKNOWN',
  '_class': 'resource',
  'created_at': '2023-06-21T12:04:00.092248+02:00',
  'created_by': '00000000-0000-0000-0000-000000000000',
  'observatory_id': '00000000-0000-0000-0000-000000000000',
  'organization_id': '00000000-0000-0000-0000-000000000000',
  'formatted_label': None},
 {'id': 'R308579',
  'label': 'Contribution 110',
  'classes': ['Contribution'],
  'shared': 1,
  'featured': False,
  'unlisted': False,
  'verified': False,
  'extraction_method': 'UNKNOWN',
  '_class': 'resource',
  'created_at': '2023-08-09T12:04:27.355753+02:00',
  'created_by': '00000000-0000-0000-0000-000000000000',
  'observatory_id': '00000000-0000-0000-0000-000000000000',
  'organization_id': '00000000-0000-0000-0000-000000000000',
  'formatted_label': None},
 {'id': 'R308578',
  'label': 'Contribution 109',
  '

# Ildar: examples of a prototype

In [None]:
from collections import Counter

orkg = ORKG(host=Hosts.PRODUCTION)

#cont_id = 'R8186'
#if orkg.resources.exists(cont_id):
  #response = orkg.contributions.similar(cont_id=cont_id)
  #response = orkg.statements.get_by_subject(subject_id=cont_id)
  #statements = response.content
  #[print(x) for x in statements]
#else:
#  print("Contribution doesn't exist in the system!")

# label = ''
# while label != 'q':
#   label = input('Enter a predicate label')
#   pred = orkg.predicates.get(q='label', exact=False, size=1).content
#   print(pred)

def get_filter(prop_id, obj_label):
  objs = orkg.resources.get(q=obj_label, exact = False, size = 2).content
  if objs:
    for obj in objs:
      print(obj['id'])
      return (prop_id, obj['id'])
  else:
    print('No objects found')


filters = []

#filters.append(get_filter('P32', 'KG Completion')) # Research problem
#filters.append(get_filter('P2005', 'Imagenet')) # Dataset
#filters.append(get_filter('P34', 'Accuracy')) # Evaluation
#filters.append(get_filter('P105047', 'Self-supervised')) # Evaluation
filters.append(('P32', 'R573979')) # research problem text classification
filters.append(('P105047', 'R573991')) # has training type self-supervised

conts = []
for filter in filters:
  pred = filter[0]
  obj = filter[1]
  response = orkg.statements.get_by_object_and_predicate(object_id=obj, predicate_id=pred).content
  if response:
    for cont in response:
      conts.append(cont['subject']['id'])

print(Counter(conts))

Counter({'R575731': 1, 'R575728': 1, 'R575663': 1, 'R575660': 1, 'R575612': 1, 'R575564': 1, 'R575508': 1, 'R575491': 1, 'R575462': 1, 'R575414': 1, 'R575220': 1, 'R575187': 1, 'R575082': 1, 'R575034': 1, 'R574956': 1, 'R574941': 1, 'R574739': 1, 'R574691': 1, 'R574571': 1, 'R574445': 1, 'R573124': 1, 'R573855': 1, 'R573853': 1, 'R573851': 1, 'R573849': 1, 'R573843': 1, 'R573839': 1, 'R573837': 1, 'R573831': 1, 'R573829': 1, 'R573825': 1, 'R573823': 1, 'R573819': 1, 'R573817': 1, 'R573809': 1, 'R573807': 1, 'R573800': 1, 'R573798': 1, 'R573796': 1, 'R573794': 1})


***To Do:*** Develop a prototype with the group idea

give general disc for all the hacerthon idea.

1. there will be 2 different task : probably use extention in crome or copy

1. data collcetion (manu + auto) --> read a paper, or own research paper ; add to the contribution (); (use chatgpt after 1 hour, please do not use LLM at that time) , make a example csv exp 2. make 2 different presentaion for data collection and coding.

2. coding
 1. Search research paper beased on propaerty (you can use anything)
 2. QA on ORKG data  --> also need to make 2/3 example (backup) (you can use anything), UI : streamlet


 3. small Presentation:
1. demo and feedback (possitive and negative from the user or developer point of view)

To do:
1. develop a templet for LLM
2. select 15/20 paper for LLM not in the ORKG
