# Introduction and Setup for Sinopia's Knowledge Graph

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
%reload_ext lab_black

import helpers
import widgets
from IPython.display import display

## Introduction
This work-shop will introduce you to downloading and exploring the RDF created in the Sinopia Linked Data Editing environment. We will then build upon these Sinopia data artifacts we created by applying various machine learning technologies and techniques for FAST subject heading and template classification along with summation of specific linked data properties. 

## Set-up for running Locally or Remotely
There are multiple ways to run the [Jupyter notebooks](https://jupyter.org/) in this workshop, the easiest method to load each notebook using the [MyBinder](https://mybinder.org/) service that will launch a Jupyter lab environment from which you can select and run the notebooks. The most complex method would be download and install Python along with the workshop dependencies on your local laptop or workstation. In between both of these, is running the notebooks using Google's [Collab](https://colab.research.google.com/notebooks/intro.ipynb) environment. 

### Run with MyBinder Cloud Service


### Run with Google's Collab Service

### Local Installation Set-up
1. Download and Install latest [Python version](https://python.org/downloads), current version **3.9.6**
1. Once Python 3.9.x is installed, launch a terminal window and change to a directory where you want to install the workshop notebooks repository
1. Create a Python virtual environment i.e. `python3 -m venv ld4-env`
1. Activate the Python virtual environment, 
   - `source ld4-env/bin/activate` for Macintosh or Linux
   - `. ld4-env\Scripts\Activate` for Windows
1. Clone or copy the workshop repository.
   -  If you have [git](https://git-scm.com) installed, run `git clone https://github.com/ld4p/{name-of-repo}`
   -  Download and unzip the repository
1. Change directories into the Workshop repository and run `pip install -r requirements.txt` to install all of the libraries we will be using for the workshop
1. Launch Jupyter lab from the Workshop repository with `jupyter lab`
1. Access the running Jupyter lab by accessing the locally running jupyter lab instance at http://localhost:8888 (or another port if 8888 is being used by another program)

## Sinopia Group Knowledge Graph
We can use [Sinopia API](https://ld4p.github.io/sinopia_api/#tag/resources/paths/~1resource/get) to only retrieve resources associated with a Sinopia group. The general URL pattern is 

`https://api.{env?}.sinopia.io/resources?group={name}`. 

Some examples:
- Retrieve PCC resources from Sinopia stage environment: `https://api.stage.sinopia.io/resources?group=pcc`
- Retrieve Yale resources from Sinopia production: `https://api.sinopia.io/resources?group=yale`

To assist in generating the group API URL, we will use the `sinopia_api` widget:

In [2]:
display(widgets.sinopia_api_group_widget)

VBox(children=(HBox(children=(RadioButtons(description='Environment:', options=(('Development', 'https://api.d…

## Retrieving all RDF from Sinopia Stage Environment
Using the `sinopia_api` widget to generate the Sinopia API url for all groups, we can then use a helper function, '`create_kg' that will download each resource, extract the RDF, and then return the Knowledge Graph.

In [4]:
stage_kg = helpers.create_kg("https://api.stage.sinopia.io/resource")

http://desktop.loc.gov/search?view=document&id=Infobasedcrmg0Dash0Dash0Dash247&hl=true&fq=allresources|true# does not look like a valid URI, trying to serialize this will break.
ld4p:RT:bf2:2D graphic material:Item does not look like a valid URI, trying to serialize this will break.
urn:ld4p:qa:gettyaat:Objects__Object_Groupings and Systems does not look like a valid URI, trying to serialize this will break.



https://api.stage.sinopia.io/resource/e49c5f1d-5e62-4b45-b87f-5d0cf3e573e5 missing data

https://api.stage.sinopia.io/resource/3770137a-bed5-4a97-bd9a-fea4f3822dd7 missing data

https://api.stage.sinopia.io/resource/28961949-72b2-4c94-b1f5-a7788f1ae1f0 missing data

https://api.stage.sinopia.io/resource/c3a1d5dd-a829-4ba7-8fbe-20490c018407 missing data

https://api.stage.sinopia.io/resource/4e80a183-4487-44fd-9bf8-8497c50d27f3 missing data

https://api.stage.sinopia.io/resource/16625687-0208-4ea5-b299-204d36180c45 missing data


https://api.stage.sinopia.io/resource/this is a test does not look like a valid URI, trying to serialize this will break.



https://api.stage.sinopia.io/resource/a6acbbea-1770-468b-904b-51cc4a3d7f27 missing data
Failed to parse {'user': 'mcm104', 'group': 'washington', 'templateId': 'WAU:RT:BF2:Work', 'types': ['http://id.loc.gov/ontologies/bibframe/Work'], 'id': '0398ce54-ff15-4e9f-8948-c44bcc393798', 'uri': 'https://api.stage.sinopia.io/resource/0398ce54-ff15-4e9f-8948-c44bcc393798', 'timestamp': '2021-03-30T22:02:40.077Z'}
'@eng' is not a valid language tag!


In [4]:
import urllib.parse

In [16]:
urllib.parse.warnings("https://api.stage.sinopia.io/resource/this is a test")

TypeError: 'module' object is not callable

To save the resulting knowledge graph, we will use the method `save_jsonld` that serializes the Sinopia Stage graph to JSON-LD, we will load and use this file in subsequent Jupyter notebooks in this workshop.

In [29]:
stage_kg.save_jsonld("data/stage.json")

## Exercises

### Exercise 1
Compare the total number of resources for National Library of Medicine in each Sinopia environment; development, stage, and production.

### Exercise 2