# Introduction and Setup for Sinopia's Knowledge Graph

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
%reload_ext lab_black

import datetime

import kglab
import helpers
import widgets
from IPython.display import display

interactive(children=(Text(value='https://api.development.sinopia.io/', description='value'), Output()), _dom_…

TraitError: The 'children' trait of a HBox instance contains an Instance of a TypedTuple which expected a Widget, not the function 'update_group_options'.

## Introduction
This work-shop will introduce you to downloading and exploring the RDF created in the Sinopia Linked Data Editing environment. We will then build upon these Sinopia data artifacts we created by applying various machine learning technologies and techniques for such tasks as FAST subject heading and template classification. Finally, we will discuss Data Statements and Model Cards and how to apply them to the work
we did today.

### Workshop Schedule
This workshop will be broken down into three parts, each 55 minutes with a break between each session.

#### 1. Introduction, Setup, Analysis, and Visualization of Sinopia RDF

#### 2. Using spaCy and HuggingFace Natural Language Processing (NLP)

#### 3. Increasing transparency with Model Cards and Data Statements

## Set-up for running Locally or Remotely
There are multiple ways to run the [Jupyter notebooks](https://jupyter.org/) in this workshop, the easiest method to load each notebook using the [MyBinder][BINDER] service that will launch a Jupyter lab environment from which you can select and run the notebooks. The most complex method would be download and install Python along with the workshop dependencies on your local laptop or workstation.

### Run with MyBinder Cloud Service (the easiest) 
To run this workshop's Jupyter notebooks on [MyBinder][BINDER]

1. Go to the following link https://mybinder.org/v2/gh/jermnelson/ld4-2021-workshop/HEAD 
1. Launch the container 
1. When the environment is finished, add `lab` to the end of the URL and you should a similar display to this:
   ![MyBinder Jupyter Lab Workshop](images/mybinder-lab-screenshot.png)
1. Click on the `01_IntroSetup.ipynb` to launch this notebook. 


### Local Installation Set-up
1. Download and Install latest [Python version](https://python.org/downloads), current version **3.9.6**
1. Once Python 3.9.x is installed, launch a terminal window and change to a directory where you want to install the workshop notebooks repository
1. Create a Python virtual environment i.e. `python3 -m venv ld4-env`
1. Activate the Python virtual environment, 
   - `source ld4-env/bin/activate` for Macintosh or Linux
   - `. ld4-env\Scripts\Activate` for Windows
1. Clone or copy the workshop repository.
   -  If you have [git](https://git-scm.com) installed, run `git clone https://github.com/ld4p/{name-of-repo}`
   -  Download and unzip the repository
1. Change directories into the Workshop repository and run `pip install -r requirements.txt` to install all of the libraries we will be using for the workshop
1. Install jupyer lab `pip install jupyterlab`
1. Launch Jupyter lab from the Workshop repository with `jupyter lab`
1. Access the running Jupyter lab by accessing the locally running jupyter lab instance at http://localhost:8888 (or another port if 8888 is being used)

The data we will be using in for this workship is located at `data/workshop-data.zip`

[BINDER]: https://mybinder.org/
[COLAB]: https://colab.research.google.com/

In [None]:
! unzip data/workshop-data.zip -d data/

### Brief Introduction to Jupyter Notebooks
[Jupyter](https://jupyter.org/) notebooks are a popular computing environment in big data and machine learning communities that runs in your web browser. A notebook is made up of one more cells that are contain either documentation, written in [Markdown][MKDOWN], or Python code. You can move cells around, copy, delete, or change the type using the notebook toolbar:

![Jupyter Notebook Toolbar](images/jupyter-nb-toolbar.png)

Here are the important buttons:

#### Saves the notebook to disk
![Save Notebook](images/notebook-save.png) 
  
####  Adds a new cell to the notebook
![Add cell](images/notebook-add-cell.png)

#### Removes current cell (but can paste the cell in a new location)
![Cut cell](images/notebook-cut-cell.png) 

#### Copy current cell
  ![Copy cell](images/notebook-copy-cell.png)
  
#### Paste cell at cursor position
![Paste cell](images/notebook-paste-cell.png)

#### Runs current cell, either renders Markdown cell to HTML or executes Python code.
![Run cell](images/notebook-run-cell.png) 

#### Stops current running Cell
![Stop Running Cell](images/notebook-stop-running-cell.png)

#### Dropdown for changing the current cell type
![Change cell type dropdown](images/notebook-cell-type-select.png) 

[MKDOWN]: https://www.markdownguide.org/

## Sinopia Group Knowledge Graph
We can use [Sinopia API](https://ld4p.github.io/sinopia_api/#tag/resources/paths/~1resource/get) to only retrieve resources associated with a Sinopia group. The general URL pattern is 

`https://api.{env?}.sinopia.io/resources?group={name}`. 

Some examples:
- Retrieve PCC resources from Sinopia stage environment: `https://api.stage.sinopia.io/resources?group=pcc`
- Retrieve Yale resources from Sinopia production: `https://api.sinopia.io/resources?group=yale`

To assist in generating the group API URL, we will use the `sinopia_api` widget:

In [None]:
display(widgets.sinopia_api_group_widget)

In [None]:
pcc_kg = helpers.create_kg('https://api.stage.sinopia.io/resource?group=pcc')

## Retrieving all RDF from Sinopia Stage Environment
Using the `sinopia_api` widget to generate the Sinopia API url for all groups, we can then use a helper function, `create_kg` that will download each resource, extract the RDF, and then return the Knowledge Graph after all of the RDF resources have been parsed.

**NOTE**: Instead of taking 8+ minutes to run this function, you can just load the existing stage knowledge graph with the following commands:

```python
stage_kg = kglab.KnowledgeGraph()
stage_kg.load_jsonld("data/stage.json")
```

In [None]:
stage_kg = kglab.KnowledgeGraph()
stage_kg.load_jsonld("data/stage.json")

In [None]:
start = datetime.datetime.utcnow()
print(f"Started creation of knowledge graph for Sinopia Stage at {start}")
stage_kg = helpers.create_kg("https://api.stage.sinopia.io/resource")
end = datetime.datetime.utcnow()
print(f"""Finished at {end}, total time {(end-start).seconds / 60.} minutes""")

To save the resulting knowledge graph, we will use the method `save_jsonld` that serializes the Sinopia Stage graph to JSON-LD, we will load and use this file in subsequent Jupyter notebooks in this workshop.

In [None]:
stage_kg.save_jsonld("data/stage.json")

In [None]:
start = datetime.datetime.utcnow()
print(f"Started creation of knowledge graph for Sinopia Production at {start}")
prod_kg = helpers.create_kg("https://api.sinopia.io/resource")
end = datetime.datetime.utcnow()
print(f"""Finished at {end}, total time {(end-start).seconds / 60.} minutes""")

In [None]:
prod_kg.save_jsonld("data/production.json")

## Exercise 1
Compare the total number of triples for National Library of Medicine in each Sinopia environment; development, stage, and production.