Image source on LucidDraw: Link
CZI adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.
pip install git+https://github.com/chanzuckerberg/czLandscapingTk.git
This libray is built on databricks_to_nbdev_template, which is modified version of nbdev_template tailored to work with databricks notebooks.
The steps to contributing to the development of this library are based
on a development pipeline that uses databricks. This means that this
work will mainly be driven internally from with the CZI tech team: 1.
Clone this library from within databricks. 2. Place your scripts and
utility notebooks in subdirectories of the databricks
folder in the
file hierarchy. 3. Any databricks notebooks that contain the text:
from nbdev import *
will be automatically converted to Jupyter
notebooks that live at the root level of the repository. 4. When you
push this repository to Github from Databricks, Jupyter notebooks will
be built, added to the repo and then processed by nbdev to generate
modules and documentation (refer to https://nbdev.fast.ai/ for full
documentation on how to do this). Note that pushing code to Github will
add and commit more code to github, requiring you to perform another
git pull
to load and refer to the latest changes in your code.
This project is focussed on provide a suite of generalizable tools that can be used by knowledge analysts to implement solutions for surveying tasks. The basic structure of this class of data analysis can be described in the following way:
An analytic task, where we attempt to answer a question by (A) surveying existing data sources, (B) compiling an intermedical knowledge corpus drawn from those sources, (C) analysing that corpus to yield an answer to the question.
- Identifying a set of Key Opinion Leaders (KOLs) with specialized expertise in an understudied area.
- Performing a systematic review of available treatments for a specific rare disease
- Developing (and using) reproducible impact metrics for a funded scientific program to study what is working and what is not.
Question
- A natural language expression of the research question that is the objective of the taskStudy Data Sources
- List of avaiable information sources that can be interrogated by executors of the taskInformation Retrieval Query
(IR Query
) - A list of logically-defined queries that can be run over the data sourcesInclusion / Exclusion Criteria
- Logical operators to determine if retrieved data should be included in the studyIntermediate Corpus
- Schema and Data of the collection of documents gathered from external information sourcesAnalysis
- Workflow specification of analyses to be performed over the intermediate corpus to generate anAnswer
Answer
- The answer to thequestion
expressed in natural language with a full explanation of the provenance of how the answer was computed.
Image source on LucidDraw: Link
Adopting the CommonKADS knowledge engineering design process, we consider the interplay between agents (swimlanes), processes, and items in the figure. In particular, we seek to characterize how knowledge is needed, used, or derived in the workflow.
The goal of this project is to provide code to execute the processes described above to provide an extensible set of executable computational tools to automate the process shown.