Skip to content

Commit

Permalink
Release v1.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
josephwilson8-nhs committed May 12, 2023
1 parent 43bc558 commit ae29527
Show file tree
Hide file tree
Showing 26 changed files with 971 additions and 113 deletions.
3 changes: 3 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"image":"mcr.microsoft.com/devcontainers/universal:2",
"postCreateCommand": "pip3 install --user -r requirements.txt && python -m mkdocs serve"
}
2 changes: 1 addition & 1 deletion .github/workflows/pages-build-deployment.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Build and Deploy Website Pages
name: Website Deployment

# Controls when the workflow will run
on:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# [RAP Community of Practice](https://NHSDigital.github.io/rap-community-of-practice/)
![CI](https://github.com/NHSDigital/rap-community-of-practice/actions/workflows/main.yml/badge.svg "CI badge indicating passing or failing status")
![CI](https://github.com/NHSDigital/rap-community-of-practice/actions/workflows/pages-build-deployment.yml/badge.svg "CI badge indicating passing or failing status")
[![Release Version](https://img.shields.io/github/v/release/nhsdigital/rap-community-of-practice "Release version")](https://github.com/NHSDigital/rap-community-of-practice/releases)
[![MkDocs Material](https://img.shields.io/badge/style-MkDocs%20Material-darkblue "Markdown Style: MkDocs")](https://squidfunk.github.io/mkdocs-material/reference/)
[![licence: MIT](https://img.shields.io/badge/Licence-MIT-yellow.svg)](https://opensource.org/licenses/MIT "MIT License")
Expand Down
26 changes: 25 additions & 1 deletion docs/about.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
---
hide:
- navigation
---

# RAP Community of Practice

> **This material is maintained by the [NHS Digital Data Science team](mailto:datascience@nhs.net)**.
> You can see some examples of our work [here](https://github.com/NHSDigital/data-analytics-services), including [underlying code to NHS Digital publications](https://github.com/NHSDigital/data-analytics-services#rap-publication-repositories) which have been published as a direct outcome of the service our team provides.
These resources are intended for those interested in adopting [Reproducible Analytical Pipelines (RAP)](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/).

RAP is becoming the standard for creating analytical outputs in government; combining a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Learn more on our [Why RAP is important][17] page.

## RAP in the NHS

The [Goldacre Review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis), tasked with finding ways to deliver better, broader, and safer use of NHS data for analysis and research, identified RAP as the essential element to ensure high-quality analysis.

The Data Science team at NHS Digital have been championing RAP practices and providing support for analytical teams across our organisation. We have published these resources in the spirit of openness and transparency, and in the hope that other teams in other organisations may find them useful. You can find out more about our incredible contributors on our [Acknowledgements](acknowledgements.md) page.

Learn more about our [RAP service][19].

## Aims

This community of practice aims to support teams in adopting RAP practices through:
Expand Down Expand Up @@ -45,7 +66,10 @@ This collection of resources is [© Crown copyright](http://www.nationalarchives
[8]: ./training_resources/git/intro-to-git.md
[11]: ./introduction_to_RAP/why_RAP_is_important.md#aims-of-rap
[12]: ./introduction_to_RAP/levels_of_RAP.md
[13]: ./our_RAP_service/README.md#support
[13]: ./our_RAP_service#support
[14]: ./implementing_RAP/code-review.md
[15]: ./implementing_RAP/tools.md
[16]: ./training_resources/pyspark/README.md
[17]: ./introduction_to_RAP/why_RAP_is_important.md
[18]: ./implementing_RAP/how-to-publish-your-code-in-the-open.md
[19]: ./our_RAP_service
17 changes: 12 additions & 5 deletions docs/acknowledgements.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
---
hide:
- navigation
- toc
---

# Acknowledgements

It's taken a lot of work to make the NHS Digital RAP Community of Practice and further the cause of RAP within NHS Digital more generally.

Many people have pitched in, doing what they could, often **going the extra mile** and ultimately with the **goal of helping our fellow analysts**.
The **NHS Digital Data Science Skilled Team** has been the core of this work, but in particular the **Data Science RAP Squad** lead the charge, piece by piece moving mountains and making a lasting difference.

**Many thanks and congratulations** to the following for their incredible hard work.
**Many thanks and congratulations** to the following for their incredible hard work.

| [Helen Richardson](https://github.com/helrich)|[Jonny Laidler](https://github.com/JonathanLaidler) |[Harriet Sands](https://github.com/harrietrs) | [Maakhe Ndhlela](https://github.com/maakhe)|
|:----------------------------------|:-----|:----|:---|
| __[Connor Quinn](https://github.com/connor1q)__|__[Alistair Jones](https://github.com/alistair-jones)__ |__[Daniel Goldwater](https://github.com/DanGoldwater1)__ | __[Joseph Wilson](https://github.com/josephwilson8-nhs)__|
| __[Philip Hoang Le](https://github.com/philip-le)__ |__[Sam Hollings](https://github.com/SamHollings)__ |__[Abbie Prescott](https://github.com/abbieprescott)__ |__[Xiyao Zhuang](https://github.com/xiyaozhuang)__ |
| [Helen Richardson](https://github.com/helrich) | [Jonny Laidler](https://github.com/JonathanLaidler) | [Harriet Sands](https://github.com/harrietrs) | [Maakhe Ndhlela](https://github.com/maakhe) | [Scarlett Kynoch](https://github.com/scarlett-k-nhs)|
| :-------------------------------------------------- | :------------------------------------------------------ | :------------------------------------------------------- | :-------------------------------------------------------- | :-------------------------------------------------------- |
| **[Connor Quinn](https://github.com/connor1q)** | **[Alistair Jones](https://github.com/alistair-jones)** | **[Daniel Goldwater](https://github.com/DanGoldwater1)** | **[Joseph Wilson](https://github.com/josephwilson8-nhs)** | |
| **[Philip Hoang Le](https://github.com/philip-le)** | **[Sam Hollings](https://github.com/SamHollings)** | **[Abbie Prescott](https://github.com/abbieprescott)** | **[Xiyao Zhuang](https://github.com/xiyaozhuang)** | |

**You guys really put the "champion" in RAP Champion!!!**
10 changes: 9 additions & 1 deletion docs/example_RAP_CoP_page.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,14 @@ You can also have tabs:
And in here something completely different, such as a diagram
![alt text](images/branch_info.JPG "Some random picture")

## Useful Links
## Further Reading

- Provide any useful links people might need to further their learning

??? info "_External Links Disclaimer_"

*NHS Digital makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.*

*NHS Digital is not affiliated with any of the websites or companies in the links to external websites.*

*If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].*
37 changes: 37 additions & 0 deletions docs/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
hide:
- navigation
---

## RAP

RAP stands for Reproducible Analytical Pipelines. The term comes from UK public sector data scientists - you can [read the ONS description here](https://datasciencecampus.ons.gov.uk/capability/data-science-campus-faculty/reproducible-analytical-pipeline-journey/#:~:text=Reproducible%20Analytical%20Pipelines%20are%20programs,impressive%20efficiencies%20in%20your%20teams.). We also have a page on [why RAP is important](introduction_to_RAP/why_RAP_is_important.md)

## Pipeline

A data pipeline is a series of steps or processes that are used to collect, process, and transform data from various sources into a format that can be easily used. The pipeline typically includes data ingestion, data cleaning, and some analysis. A good pipeline can automate work as well as improving the quality of the outcomes.

## Virtual Environments

Virtual environments are a way to ensure that you have maximum control over the code that you're writing and how it will run. Many software packages interact with one another, and not always in a good or predictable way. Virtual environments allow you to develop code like it is being done on a completely clean, separate machine. In a virtual environment, you can install whatever packages you want, at whatever version, without worrying about affecting the other software or projects you might have on your computer.

Best practice recommends that we create a different virtual environment for each project that we work on.

## Venv

`venv` is a particular package for managing virtual environments in Python, which comes pre-installed with python. We highly recommend using this package, although others are available. See [our page](training_resources/python/virtual-environments/venv.md) for advice on using it.

## Conda

`conda` is also a virtual environment manager for python. Conda does not come pre-installed with python, but instead comes with Anaconda. You can find our advice on [how to use conda here](training_resources/python/virtual-environments/conda.md).

## Git

Git is a version control system which can help with keeping track of changes to code. Git isn't specific to python, and is used almost universally for coding. Git is a program which runs locally on your computer. Where git really comes into it's own, though, is when you use it to help with collaborating on code with others. Github and Gitlab are two websites which help to do this.

You can read more about git on our [introduction to Git](training_resources/git/intro-to-git.md) page.

## IDE

An IDE (Integrated Development Environment) is a piece of software you can use to write code. You can write code anywhere that you can write plain text, but IDEs are designed to help with the process. It doesn't affect how the code will run, and you can move the code safely between IDEs.
IDEs are packed with useful features like autocompletion, test suites, git integration, linting and more. We recommend Visual Studio Code, but you can also try PyCham, Spyder, or Eclipse - to name a few.
Binary file added docs/images/extensions_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/logo/nhs-blue-on-white.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/logo/nhs-white-on-blue.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/python_extension.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/vscode_run_file.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/implementing_RAP/when-to-stop-coding.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ The "Straightforward" part of "KISS" refers to how things break and the sophisti

!!! note

KISS originally meant **"Keep It Simple, Stupid"**, with the Aerospace Engineer Kelly Johnson of the Lockheed Martin Skunk Works being attributed with coining the phrase. The Skunk Works developed experimental and advanced aeroplanes, including the SR-71 Blackbird spy plane and the F-117 Nighthawk stealth bomber.

While sounding like a churlish addition, the "Stupid" part of "KISS" means the same as "Straightforward". Kelly Johnson would task Design Engineers to design their aeroplanes to be repairable under combat conditions in the field by an average mechanic with a standard set of tools.
KISS originally meant **"Keep It Simple, Stupid"**, with the Aerospace Engineer Kelly Johnson of the Lockheed Martin Skunk Works being attributed with coining the phrase. The Skunk Works developed experimental and advanced aeroplanes, including the SR-71 Blackbird spy plane and the F-117 Nighthawk stealth bomber.
While sounding like a churlish addition, the "Stupid" part of "KISS" means the same as "Straightforward". Kelly Johnson would task Design Engineers to design their aeroplanes to be repairable under combat conditions in the field by an average mechanic with a standard set of tools.

With analytical pipelines, this concept can apply to both the code design you use in your functions and your overall approach to your pipeline. Your pipeline overall should be kept simple, which helps with transparency, allowing someone unfamiliar with the process to understand what your pipeline does. In addition, if you are looking to release your pipeline as a package, make sure that you don't try to make the *All-Purpose NHS Data Super-Pipeline Package*, but instead a specific pipeline with a specific range of outputs. Focusing on your pipeline's results can help implement the KISS principle.

Expand Down
8 changes: 8 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
hide:
- navigation
- toc
- footer

template: home.html
---
93 changes: 93 additions & 0 deletions docs/introduction_to_RAP/what-is-open-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# What is open source?

!!! tip "TLDR"

- Open source programming languages are generally **free**! This means we can use them, and other analysts can easily reuse our code without the need for costly software licenses.

- Open source programming languages and tools are widely distributed under various open source licenses - **these may have conditions which you should be aware of**.

- Open source programming languages and tools often have vibrant and collaborative communities that provide additional libraries (extensions) and documentation (though this isn't always complete).

- Save time and resources, avoid duplication of effort, and increase efficiency.

??? success "Pre-requisites"

| Pre-requisite | Importance | Note |
|---------------|------------|------|
| [Levels of RAP] | Helpful | Open source is an essential component of Baseline RAP |

## What is an open source programming language?

Open source programming languages are not owned by anyone. They are widely available, usually maintained by a community, and are typically freely distributed under various open source licenses.

For example, [Python] is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered by the [Python Software Foundation].

It is recommended that we align our practices with [The Technology Code of Practice]. It is a cross-government agreed standard on a set of criteria to help government design, build and buy technology. Specifically, the third point states that we should [Be open and use open source].

Producing data in an open source language is an essential component of Baseline RAP. The Baseline level is designated as the _minimum_ standard of a reproducible analytical pipeline so using an open source programming language is a good place to start adopting RAP practices.

## Why use an open source programming language?

**Licensing and distribution** - Open source programming languages are widely distributed under various open source licenses.

**Collaborative communities** - Resolve common problems using solutions from open source communities.

**Libraries and tools** - There is a vast number of useful open source libraries and tools that are easily accessible and well documented.

**Time and cost** - Open source is free to use! Additionally, code libraries and tools developed for open source programming languages can save a lot of time when writing the source code of a project and lower implementation and running costs.

## What is open source code?

Open source code is provided under a licence that means anyone can freely access, utilise, and modify the code for any purpose.

It means that regardless of who produced it, anyone can contribute to its further development or use it for their own means without paying a licensing fee, or seeking permission from the contributors.

Any open source code can be reused by our developers to reduce costs, avoid duplication of effort, and increase staff efficiency.
**By publishing your code**, and even better **[packaging up your code]**, you add to the body of open source code, **allowing someone else to benefit from your hard work**.

[How to publish your code in the open :material-cursor-default-click:][how to publish your code in the open]{ .md-button }

## Examples of open source projects

**[Splink]** - Splink is a Python package developed by the Ministry of Justice for probabilistic record linkage. It deduplicates and/or links records from datasets that lack a unique identifier. The core linkage algorithm is an implementation of Fellegi-Sunter's model of record linkage, with various customisations to improve accuracy. Check out the publication ['Splink: MoJ’s open source library for probabilistic record linkage at scale'] to find out more.

The package is fully open source and can be found on GitHub. It is accompanied by a set of [interactive demos] to illustrate its functionality, whereby users can run real record linking jobs in their web browser.

**[Coronavirus Dashboard][coronavirus-dashboard-github]** - Public Health England's coronavirus dashboard repository on GitHub contains the frontend source code for the [Coronavirus Dashboard]. It also contains the API service that supplies the latest data for the COVID-19 outbreak in the UK. They have also developed software development kits in several programming languages to facilitate access to the API such as the [Python SDK].

**[NHS.UK frontend]** - NHS.UK frontend contains code to start building user interfaces for NHS websites and services.

**[NHS COVID Pass Verifier]** - The NHS COVID Pass Verifier app is a secure way to scan an individual’s NHS COVID Pass and check that they have been fully vaccinated against COVID-19, had a negative test, or have recovered from COVID-19.

## Further reading

- [Be open and use open source] (GOV.UK)
- [The benefits of coding in the open] (GDS)
- [Open source policy] (NHSX)

??? info "_External Links Disclaimer_"

*NHS Digital makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.*

*NHS Digital is not affiliated with any of the websites or companies in the links to external websites.*

*If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].*

[levels of rap]: ./levels_of_RAP.md
[python]: https://www.python.org/about/
[python software foundation]: https://www.python.org/psf-landing/
[the technology code of practice]: https://www.gov.uk/guidance/the-technology-code-of-practice
[splink]: https://github.com/moj-analytical-services/splink/
[splink: moj’s open source library for probabilistic record linkage at scale]: https://www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale
[interactive demos]: https://github.com/moj-analytical-services/splink_demos
[coronavirus-dashboard-github]: https://github.com/publichealthengland/coronavirus-dashboard
[coronavirus dashboard]: https://coronavirus.data.gov.uk/
[python sdk]: https://github.com/publichealthengland/coronavirus-dashboard-api-python-sdk
[nhs.uk frontend]: https://github.com/nhsuk/nhsuk-frontend
[nhs covid pass verifier]: https://github.com/nhsx/covid-pass-verifier
[how to publish your code in the open]: ../implementing_RAP/how-to-publish-your-code-in-the-open.md
[be open and use open source]: https://www.gov.uk/guidance/be-open-and-use-open-source
[the benefits of coding in the open]: https://gds.blog.gov.uk/2017/09/04/the-benefits-of-coding-in-the-open/
[open source policy]: https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md
[rap community of practice github]: https://github.com/NHSDigital/rap-community-of-practice/issues
[packaging up your code]: ../training_resources/python/project-structure-and-packaging.md
2 changes: 1 addition & 1 deletion docs/our_RAP_service/building_team_capability.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This guide will detail what you need to consider before starting a RAP engagemen

## RAP pre-engagement questionnaire

The analyst team needs to assess their ability to carry out a RAP project end-to-end by completing a RAP pre-engagement questionnaire(Link TBC).
The analyst team needs to assess their ability to carry out a RAP project end-to-end by completing a [RAP pre-engagement questionnaire](./rap-pre-engagement-questionnaire.md).

The questionnaire will contain a series of questions which will revolve around initial project considerations:

Expand Down
Loading

0 comments on commit ae29527

Please sign in to comment.