Skip to content

Commit

Permalink
Update nb
Browse files Browse the repository at this point in the history
  • Loading branch information
Justin Reese committed Sep 29, 2020
1 parent db83d64 commit 5205d0e
Showing 1 changed file with 40 additions and 53 deletions.
93 changes: 40 additions & 53 deletions Run-KG-COVID-19-pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"text": [
"\r",
"Downloading files: 0%| | 0/24 [00:00<?, ?it/s]\r",
"Downloading files: 100%|█████████████████████| 24/24 [00:00<00:00, 25311.36it/s]\r\n"
"Downloading files: 100%|█████████████████████| 24/24 [00:00<00:00, 19807.81it/s]\r\n"
]
}
],
Expand All @@ -69,65 +69,67 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tabula.io:Got stderr: Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING:tabula.io:Got stderr: Sep 28, 2020 4:12:20 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial-BoldMT'\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:20 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:20 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: Your current java version is: 1.8.0_161\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: To get higher rendering speed on old java 1.8 or 9 versions,\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: update to the latest 1.8 or 9 version (>= 1.8.0_191 or >= 9.0.4),\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: or\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: use the option -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.rendering.PDFRenderer suggestKCMS\n",
"INFO: or call System.setProperty(\"sun.java2d.cmm\", \"sun.java2d.cmm.kcms.KcmsServiceProvider\")\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial-BoldMT'\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial-BoldMT'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:21 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:48 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"Sep 28, 2020 4:06:49 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'ArialMT'\n",
"Sep 28, 2020 4:06:49 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"Sep 28, 2020 4:12:22 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>\n",
"WARNING: Using fallback font 'LiberationSans' for 'Arial'\n",
"\n",
"5782864it [00:21, 267769.37it/s]\n",
"4222272it [00:15, 245962.42it/s]^C\n",
"4243208it [00:16, 265139.87it/s]\n",
"\n",
"Aborted!\n"
"5782864it [00:21, 270243.44it/s]\n",
"5782864it [00:21, 271635.01it/s]\n",
"Loading gene info: 28496648it [01:33, 304282.49it/s]\n",
"Loading country codes: 264it [00:00, 238538.62it/s]\n",
"Unzipping files: 100%|███████████████████████████| 2/2 [03:30<00:00, 105.07s/it]\n",
"100%|█████████████████████████████████████| 54137/54137 [11:09<00:00, 80.84it/s]\n",
"100%|█████████████████████████████████████| 75785/75785 [17:06<00:00, 73.86it/s]\n"
]
}
],
Expand All @@ -154,19 +156,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[KGX][cli_utils.py][ parse_target] INFO: Processing target 'drug-central'\n",
"[KGX][cli_utils.py][ parse_target] INFO: Processing target 'pharmgkb'\n",
"[KGX][cli_utils.py][ parse_target] INFO: Processing target 'STRING'\n",
"[KGX][cli_utils.py][ apply_filters] INFO: with node filters: {'category': ['biolink:Gene', 'biolink:Protein']}\n",
"[KGX][cli_utils.py][ apply_filters] INFO: with edge filters: {'subject_category': ['biolink:Gene', 'biolink:Protein'], 'object_category': ['biolink:Gene', 'biolink:Protein'], 'edge_label': ['biolink:interacts_with', 'biolink:has_gene_product']}\n"
]
}
],
"outputs": [],
"source": [
"!python run.py merge"
]
Expand All @@ -187,7 +177,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Make training data for machine learning use case"
"## Make training data for machine learning use case\n",
"\n",
"KG-COVID-19 contains tooling to produce training data for machine learning. Briefly, a training graph is produced with 80% (by default, override with `-t` parameter) of edges. 20% of edges are removed such that they do not create new components. These graphs are emitted as KGX TSV files in `data/holdouts`."
]
},
{
Expand Down Expand Up @@ -290,7 +282,7 @@
"- [CBOW](https://github.com/monarch-initiative/embiggen/blob/master/notebooks/Graph%20embedding%20using%20CBOW.ipynb)\n",
"- [GloVe](https://github.com/monarch-initiative/embiggen/blob/master/notebooks/Graph%20embedding%20using%20GloVe.ipynb)\n",
"\n",
"#### These embeddings can then be used to train MLP, random forest, decision tree, and logistic regression classifiers using [this notebook](https://github.com/monarch-initiative/embiggen/blob/master/notebooks/Link%20Prediction.ipynb).\n",
"#### These embeddings can then be used to train MLP, random forest, decision tree, and logistic regression classifiers using [this notebook](https://github.com/monarch-initiative/embiggen/blob/master/notebooks/Classical%20Link%20Prediction.ipynb).\n",
"\n",
"##### Note: consider running the code in these notebooks on a server with GPUs in order to complete in a reasonable amount of time"
]
Expand All @@ -299,7 +291,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use prebuilt SPARQL queries to query our Blazegraph endpoint on the commandline"
"## Use prebuilt SPARQL queries to query our Blazegraph endpoint on the commandline\n",
"\n",
"KG-COVID-19 has tooling to query our Blazegraph endpoint using predetermined SPARQL queries, and emit the results as a TSV file. Different SPARQL queries on our endpoint or other endpoints can be used by creating a new YAML file and specific this filewith the `-y` flag. "
]
},
{
Expand All @@ -325,13 +319,6 @@
" for row in read_tsv:\n",
" print(row)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit 5205d0e

Please sign in to comment.