- HDRUK_AppliedML: This repository contains code and supporting material for the HDRUK Summer School Course on Applied Deep Learning, August 22nd 2019
- UKBiobank_deep_pretrain: Pretrained neural networks for UK Biobank brain MRI images. SFCN, 3D-ResNet etc.
- biobankAccelerometerAnalysis: Extracting meaningful health information from large accelerometer datasets
- metaviz:
- tpot: A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
- 4C-Initiative: 4C-Initiative
- Linked-EHR-England-2021: Publication: Linked electronic health records for research on a nationwide cohort including over 54 million people in England
- collaborathon-2020-team3: HDR UK Collaborathon Team3 Submission
- openpathology-web: Openpathology website
- openprescribing: A Django app providing a REST API and dashboards for the HSCIC's GP prescribing data
- bedtools2: bedtools - the swiss army knife for genome arithmetic
- bwa: Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
- crossplatform_mGWAS: Cross-platform genetic discovery of small molecule products of metabolism and application to clinical outcomes Luca A. Lotta, Maik Pietzner, Isobel D. Stewart, Laura B.L. Wittemans, Chen Li, Roberto Bonelli, Johannes Raffler, Emma K. Biggs, Clare Oliver-Williams, Victoria P.W. Auyeung, Jian’an Luan, Eleanor Wheeler, Ellie Paige, Praveen Surendran, Gregory A. Michelotti, Robert A. Scott, Stephen Burgess, Verena Zuber, Eleanor Sanderson, Albert Koulman, Fumiaki Imamura, Nita G. Forouhi, Kay-Tee Khaw, MacTel Consortium, Julian L. Griffin, Angela M. Wood, Gabi Kastenmüller, John Danesh, Adam S. Butterworth, Fiona M. Gribble, Frank Reimann, Melanie Bahlo, Eric Fauman, Nicholas J. Wareham, Claudia Langenberg bioRxiv 2020.02.03.932541; doi: https://doi.org/10.1101/2020.02.03.932541
- ensembl-vep: The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
- gatk: Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository
- seqtk: Toolkit for processing sequences in FASTA/Q formats
- 2019nCoV_proportion_asym:
- BSTI COVID-19 Imaging Database:
- CMMID Interactive Applications:
- COVID19_NPIs_vs_Rt:
- COVID19_clustersize: Analysis of COVID-19 transmission using cluster size distribution
- ISARIC 4C prognostic score: The ISARIC 4C prognostic score uses data from more than 70,000 hospital admissions, we have devised a simple, robust and generalisable prognostic scoring system for hospitalised patients. The 4C Mortality Score is a risk stratification score that predicts in-hospital mortality for hospitalised COVID-19 patients, produced by the ISARIC 4C consortium. It is intended for use by clinicians. It is designed to be easy-to-use, and require only parameters that are commonly available at hospital presentation. For full details, see the paper introducing the score. This is an infographic that visualises risk, based on observed mortality among hospitalised adult COVID19 patients recruited into the ISARIC 4C study in the UK.
- SAHSU-COVID-19: Repository for code and results from SAHSU's research into SARS-CoV-2 and COVID-19
- SocialBubble:
- benevolentai-dat: BenevolentAI's Diversity Analysis Tool (DAT) is a software package that can be used to produce demographic analysis reports given health data sets that contain for fields of age, sex, ethnicity, race and socio-economic status. For example, you might have a data about a cohort of patients and want to know how well you cover various ethnicities, age groups, sex groups and socio-economic status levels. Assuming your data sets have some of these fields, this software will help generate various views of the data to help inform your work. The DAT tool was developed as part of BenevolentAI's Diversity in Data Initiative, which aims to help improve the ways patients are represented in precision medicine. It is meant to help inspire other developers to find ways of assessing the data diversity in their current and prospective health data sets.
- comix_covid-19-first_wave: Code and data from the analysis of the 1st wave of the CoMix 2020 contacts survey
- comix_uk_covid_restrictions: code and data to reproduce analysis for Comix local and national restrictions paper
- covid-mobility-tracker: Generate maps from google mobility data
- covid-sim: This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
- covid-uk: Scenario analyses for COVID-19 outbreak in the United Kingdom
- covid19_asymptomatic_trans:
- covid19_cases_from_deaths: Shiny app to infer caes from new deaths of COVID-19
- gender_covid_uk: How is COVID-19 impacting women and men’s working lives in the UK? Data analysis, interactive visualizations, and integrative dashboard to understand the impact of COVID-19 on women and men's working lives in the UK
- ggquickeda: ggplot2 and table1 summary statistics quick exploration of data
- hit-analysis: Data and references on vaccine efficacy, R0 and seroprevalence estimates
- los_review: Code used to produce the results presented in "A rapid review of evidence for hospital length of stay of COVID-19 patients"
- ncov_measles_Kenya: Analysis of risk of measles outbreak during COVID19 pandemic in Kenya
- nhs_pathways_monitoring: Real-time monitoring of potential COVID-19 cases reported through NHS pathways data
- nsaids-covid-research:
- rt-comparison-uk-public: Evaluating the use of the reproduction number as an epidemiological tool, using spatio-temporal trends of the Covid-19 outbreak in England https://github.com/epiforecasts/rt-comparison-uk-public/blob/master/README.md
- screening_outbreak_delay:
- transmission_inference:
- link-lite: HDR Link Lite - an open source lightweight version of BC-Link
- Winterfell: Generate complex, validated and extendable JSON-based forms in React.
- gateway-api: HDR UK Gateway API
- gateway-web: HDR UK Gateway Web
- Gateway-Auth-Server:
- Gateway-DB: Health Data Gateway - PostgreSQL Database
- Gateway-Frontend: Health Data Gateway Frontend - react App
- Gateway-Middleware: Health Data Gateway - GraphQL Apollo Middleware
- TRE-Survey:
- clinical-trials:
- covid-19: HDR UK - Resources for 🦠COVID-19 Research
- datasets: HDR UK Gateway Datasets
- oss: HDR UK OSS Contributions
- papers: Extract of publications that mention HDR-UK
- schemata: HDR UK Schemas
- common-api: OpenAPI definitions for the Federated Data Sharing Common API
- CALIBER Drugdose: medication dosage instructions in electronic health records are often in the form of text rather than numbers. This program is designed to convert the text into numbers for the dose, frequency, units, duration etc.
- CALIBER health records research toolkit: This project comprises a set of R packages to assist in epidemiological studies using electronic health records databases. CALIBER (http://caliberresearch.org/) is led from the Farr Institute @ London. CALIBER investigators represent a collaboration between epidemiologists, clinicians, statisticians, health informaticians and computer scientists with initial funding from the Wellcome Trust and the National Institute for Health Research. The goal of CALIBER is to provide evidence across different stages of translation, from discovery, through evaluation to implementation where electronic health records provide new scientific opportunities.
- COVID-19 Phenomics:
- OurRisk.CoV: We recently used large-scale samples of NHS patient data to estimate the excess risk of mortality which may be associated with coronavirus (COVID-19). We wanted to develop this research to inform simple tools to allow interaction and exploration by policymakers, researchers and the public. In the OurRisk.CoV tool, we provide our initial learning in relation to the implementation and development of policy responses to the COVID-19 pandemic. We have provided some simple tools to allow interaction and exploration. OurRisk.CoV is initially intended for use by researchers, policy makers and those who seek to work with patient data to inform responses to the current emergency.
- PGS_Catalog: An open database of polygenic scores and relevant metadata needed to apply and evaluate them correctly.
- PhenomicsLibrary:
- chronological-map-phenotypes: Machine-readable version of electronic health record phenotypes for Kuan V. and Denaxas S. et al.
- discrete_frechet: Compute the Fréchet distance between two polygonal curves in Euclidean space.
- docker-ukbiobank-utils: Docker image for running UK Biobank utilities: ukbunpack, ukbfetch, ukblink, ukbgene, ukbmd5, ukbconv
- hdr-caliber-phenotype-library: HDR UK National Phenotype Library
- ohdsi-etl-caliber: ETL scripts for caliber data
- phemap: Functions to map between ICD-10 terms and PheCodes for UK Biobank hospital electronic health records
- phenomenet-vp: A phenotype-based tool for variant prioritization in WES and WGS data
- phenotypes: HDR UK National Phenotype Library
- tofu: Tofu is a Python tool for generating synthetic UK Biobank data.
- ukb-biomarker-phenotypes: Phenotyping algorithms for common biomarkers in primary care EHR for UK Biobank
- ukbiobank-resources: A curated list for preprocessing, cleaning, mapping and analyzing UK Biobank data.
- BioWSD: corpora containing examples of two ambiguities from the biomedical domain (abbreviations and gene names).
- CRIS Natural Language Processing: library of applications available for use within South London and Maudsley (SLaM) on the Clinical Record Interaction Search (CRIS) platform. Access to CRIS must be applied for in order to use applications.
- Cardiovascular research abstracts: corpus containing examples of potentially contradictory claims from Medline abstracts describing cardiovascular research intended as a useful resource for researchers working on similar problems.
- CogStack-NiFi: Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
- CogStack-Pipeline: Distributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning
- CogStack-Recipes: Example deployment recipes based on CogStack
- CogStack-SemEHR: Surfacing Semantic Data from Clinical Notes in Electronic Health Records for Tailored Care, Trial Recruitment and Clinical Research
- EdIE-BERT: a neural network system for named entity recognition and negation detection with a pretrained BERT encoder (BlueBERT) for brain imaging reports.
- EdIE-BiLSTM: a neural network system for named entity recognition and negation detection with a character-aware BiLSTM sentence encoder for brain imaging reports.
- EdIE-R: a rule-based information extraction tool developed for brain imaging reports.
- EdIE-Viz: provides an interface for stroke-related clinical concept recognition and negation detection in brain radiology reports.
- EndoMineR: a rule-based information extraction system for free-text and semi-structured endoscopy reports and their associated pathology specimens.
- ExECT-V2:
- HELIN: Demo Entity Linking API for the HDR Text Analytic Team.
- MedCAT: Medical Concept Annotation Tool
- MedCATservice: Running MedCAT as a RESTful web service
- MedCATtrainer: A simple interface to inspect, improve and add concepts to biomedical NER+L -> MedCAT.
- MedNorm Corpus: MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains. The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies.
- PheneBank: 4 million MEDLINE abstracts as well as 3.8M open-access PMC full articles annotated with 9 classes of entity - Phenotype, Disease, Anatomy, Cell, Cell_line, GPR, Gene_variant, Molecule, and Pathway mapped to five major ontologies - SNOMED, HPO, MeSH, PRO, and FMA.
- PheneBank- Processed Medline Abstracts and PMC full articles: 24 million MEDLINE abstracts as well as 3.8M open-access PMC full articles annotated with 9 classes of entity: Phenotype, Disease, Anatomy, Cell, Cell_line, GPR, Gene_variant, Molecule, and Pathway mapped to five major ontologies: SNOMED, HPO, MeSH, PRO, and FMA.
- SIPHS: a collection of software and datasets to support linguistic analysis of online health communities.
- SapBERT: Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. SapBERT is a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts.
- bioreddit: Word embeddings trained on medical subreddits.
- cometa: Corpus of Online Medical EnTities: the cometA corpus
- freetext-matching-algorithm: Source code for the Freetext Matching Algorithm, a natural language processing system for clinical text
- med7:
- nlp2phenome: using AI model to infer patient phenotypes from identified named entities (instances of biomedical concepts)
- onset_pipeline:
- rake-nltk: Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
- risk-score-builder: Map clinical risk score concepts to structured definitions using ontologies, calculate structured risk scores from NLP.
- HDR-UK_Webmapping_in_R: How to use leaflet to make an interactive map using child obesity at reception year data in England.
- hdruk_summerschool_session1: EHR analysis for the HDR UK summer school 2019
- hdruk_summerschool_session_1_2: Docker image for first two sessions at the HDRUK summer school 2019
- 20190927_IntroductionGithub_HDRUK:
- ADR-graph: Predicting adverse drug reactions from a knowledge graph
- ASSIGN:
- BRASS: Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements.
- Biomarker_Identification: Biomarker Identification for Bravo, Williams, Gkoutos and Acharjee
- CPRD_HES_variability:
- CPRD_UTI_sepsis_elderly: This is the R code and code lists used to analyse the data for paper "Antibiotic prescribing for lower UTI in elderly patients in primary care and risk of bloodstream infection: A cohort study using electronic health records in England"
- DWGSIM: Whole Genome Simulator for Next-Generation Sequencing
- Data-Science-for-Docs: Data Science For Practicing Clinicians
- ECMO: Extended Clinical Measurement Ontology
- EHRtemporalVariability: R package for delineating temporal dataset shifts in Eletronic Health Records
- EPACTS:
- ESCs_models: Agent-based modelling of pattern formation in embryonic stem cells
- Electronic-Monitoring-Device-Adherence-Typology: Study Aim: To construct a data-driven multi-dimensional typology of medication non-adherence in children with asthma.
- FM-summary: Fine-mapping method only using summary statistics
- HDRUK-fellows-day-2018: Machine Learning Resources collected from the HDRUK UK Fellows's day 2018 (2018-11-15)
- KGPhenotyping: knowledge graph based phenotyping
- MetaXcan: MetaXcan software and manuscript
- PGM:
- Pulse Wave Database: The aim of this project is to develop a database of simulated pulse waves with which to test pulse wave analysis algorithms. A database of simulated pulse waves representative of a sample of healthy adults was created. It contains several types of pulse waves (such as those measured by smart watches and fitness trackers) at common measurement sites for 4,374 virtual subjects aged 25 - 75 years old. The virtual subjects exhibit a range of cardiovascular properties across the normal ranges encountered in healthy people. The results were verified against in vivo data. Case studies were published (at https://doi.org/10.1152/ajpheart.00218.2019 ) demonstrating the utility of the database. The database is a useful resource for research and education on the topics of - signal processing, physiology, and computational modelling.
- RDMP: Research Data Management Platform (RDMP) is an open source application for the loading,linking,anonymisation and extraction of datasets stored in relational databases.
- RF-FeatureSelection-PowerAnalysis:
- ScienceCOPEMethodsCode: Code accompanying map generation in Drew et al. Science manuscript
- UNMIREOT: Tools to identify, diagnose, and semi-automatically repair hidden contradictions in biomedical ontologies
- add_synonyms: add a synonym vocabulary to an ontology
- artificialMHR:
- bhpm: Bayesian Hierarchical Poisson Models for Multiple Grouped Outcomes with Clustering
- cCFRDiamondPrincess:
- cardio: Pulse oximetry data processing and classification
- cgpBattenberg: Battenberg algorithm and associated implementation script
- clockwork: CRyPTIC data processing pipelines
- clustering_w2v:
- dpclust: Dirichlet Process based methods for subclonal reconstruction of tumours
- drug_target_lof: Evaluating potential drug targets through human loss-of-function genetic variation
- generate-patients:
- hdr_uk_demo: HDR UK Summer School demo for Image Registration
- hdruk-phd-handbook: Handbook for the HDRUK-Turing PhD Programme in Health Data Science
- hdruk_bbmri_hack:
- hilton-keeling-estimating-R0: The code in this repository accompanies our upcoming paper "Estimation of country-level basic reproductive ratios for novelCoronavirus (COVID-19) using synthetic contact matrices". A preprint version of this paper is available at https://www.medrxiv.org/content/10.1101/2020.02.26.20028167v1. Some of our notation has changed between the preprint and the submission version of the paper. In particular, the age-specific susceptibility is now denoted by sigma rather than z.
- jabberwocky: toolkit for those nonsensical ontologies
- komenti: Semantic query and text mining tool
- ldsc: LD Score Regression (LDSC)
- loftee:
- lrTaps:
- manuscript_code: Manuscript code for our paper: bit.ly/bloodcellgwas
- metaviz:
- mim3val: mimic3 nlp evaluation. needs a NOTEEVENTS.csv, processed files from Komenti
- mimpred: (In development) Reproducible experimental foundation for experiments with MIMIC
- mimsim: Tools for performing semantic similarity over MIMIC-III
- multiomic_AITD: Scripts associated with the paper doi: https://doi.org/10.1101/662957
- mvGWAMA: Python script for multivariate GWAS meta-analysis
- ocimido: An ontology for ocular immune-mediated inflammatory diseases
- ontologies:
- predCAN: Ontology-based prediction of cancer driver genes
- resource_manager: This is a simple resource manager
- ringbp:
- sampledd: Sample MIMIC patients who die during their stay, and associated text records
- software_differences_paper:
- sview-sat-combined: Multimodal deep learning from satellite and street-level imagery for measuring income, overcrowding, and environmental deprivation in urban areas. The code used to implement the proposed methods and produce the experimental results in multimodal deep learning article
- synonym_expansion_validation:
- treeImbalance: An R package for detecting asymmetry in time-sampled phylogenetic trees.
- treeSeg: Implementation of the treeSeg algorithm as an R package.
- treeseq-inference: Work for the tree sequence inference paper.
- uvR: R package to support analysing UVR data
- vascular-ageing: Container for Python analysis codes regarding "Prediction of vascular ageing based on smartphoneacquired PPG signals"
- vec2sparql: SPARQL Endpoint with functions for computing embedding similarities
Note: Contributions are noted where there are direct or indirect contributions (funding and support) made by HDR UK to the above projects.
- Create a Github.com account if you do not have one already
- Create a fork of this project.
- Edit the oss_projects.yml file and add an entry for your project using this template:
- name: 'Project Awesome' description: 'Describe what your project is about' url: 'https://github.com/YOUREPO (Provide your Github Repo URL)' keywords: - 'example' - 'keyword' categories: - 'HDR UK' - 'themes'
- Commit the changes to your fork and submit a Pull Request against the
master
branch of this project.
We'll review all requests and accept them according to HDR UK's policies. If accepted, your project will be listed above