diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json new file mode 100644 index 00000000..9fcfca43 --- /dev/null +++ b/.devcontainer/devcontainer.json @@ -0,0 +1,3 @@ +{"image":"mcr.microsoft.com/devcontainers/universal:2", +"postCreateCommand": "pip3 install --user -r requirements.txt && python -m mkdocs serve" +} \ No newline at end of file diff --git a/.github/workflows/pages-build-deployment.yml b/.github/workflows/pages-build-deployment.yml index 774ead34..28e87b7d 100644 --- a/.github/workflows/pages-build-deployment.yml +++ b/.github/workflows/pages-build-deployment.yml @@ -1,4 +1,4 @@ -name: Build and Deploy Website Pages +name: Website Deployment # Controls when the workflow will run on: diff --git a/README.md b/README.md index 42ff5598..8437a625 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # [RAP Community of Practice](https://NHSDigital.github.io/rap-community-of-practice/) -![CI](https://github.com/NHSDigital/rap-community-of-practice/actions/workflows/main.yml/badge.svg "CI badge indicating passing or failing status") +![CI](https://github.com/NHSDigital/rap-community-of-practice/actions/workflows/pages-build-deployment.yml/badge.svg "CI badge indicating passing or failing status") [![Release Version](https://img.shields.io/github/v/release/nhsdigital/rap-community-of-practice "Release version")](https://github.com/NHSDigital/rap-community-of-practice/releases) [![MkDocs Material](https://img.shields.io/badge/style-MkDocs%20Material-darkblue "Markdown Style: MkDocs")](https://squidfunk.github.io/mkdocs-material/reference/) [![licence: MIT](https://img.shields.io/badge/Licence-MIT-yellow.svg)](https://opensource.org/licenses/MIT "MIT License") diff --git a/docs/about.md b/docs/about.md index a8fe1fc7..2bd19458 100644 --- a/docs/about.md +++ b/docs/about.md @@ -1,5 +1,26 @@ +--- +hide: + - navigation +--- + # RAP Community of Practice +> **This material is maintained by the [NHS Digital Data Science team](mailto:datascience@nhs.net)**. + +> You can see some examples of our work [here](https://github.com/NHSDigital/data-analytics-services), including [underlying code to NHS Digital publications](https://github.com/NHSDigital/data-analytics-services#rap-publication-repositories) which have been published as a direct outcome of the service our team provides. + +These resources are intended for those interested in adopting [Reproducible Analytical Pipelines (RAP)](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/). + +RAP is becoming the standard for creating analytical outputs in government; combining a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Learn more on our [Why RAP is important][17] page. + +## RAP in the NHS + +The [Goldacre Review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis), tasked with finding ways to deliver better, broader, and safer use of NHS data for analysis and research, identified RAP as the essential element to ensure high-quality analysis. + +The Data Science team at NHS Digital have been championing RAP practices and providing support for analytical teams across our organisation. We have published these resources in the spirit of openness and transparency, and in the hope that other teams in other organisations may find them useful. You can find out more about our incredible contributors on our [Acknowledgements](acknowledgements.md) page. + +Learn more about our [RAP service][19]. + ## Aims This community of practice aims to support teams in adopting RAP practices through: @@ -45,7 +66,10 @@ This collection of resources is [© Crown copyright](http://www.nationalarchives [8]: ./training_resources/git/intro-to-git.md [11]: ./introduction_to_RAP/why_RAP_is_important.md#aims-of-rap [12]: ./introduction_to_RAP/levels_of_RAP.md -[13]: ./our_RAP_service/README.md#support +[13]: ./our_RAP_service#support [14]: ./implementing_RAP/code-review.md [15]: ./implementing_RAP/tools.md [16]: ./training_resources/pyspark/README.md +[17]: ./introduction_to_RAP/why_RAP_is_important.md +[18]: ./implementing_RAP/how-to-publish-your-code-in-the-open.md +[19]: ./our_RAP_service diff --git a/docs/acknowledgements.md b/docs/acknowledgements.md index 78aba207..b1cd044b 100644 --- a/docs/acknowledgements.md +++ b/docs/acknowledgements.md @@ -1,14 +1,21 @@ +--- +hide: + - navigation + - toc +--- + # Acknowledgements + It's taken a lot of work to make the NHS Digital RAP Community of Practice and further the cause of RAP within NHS Digital more generally. Many people have pitched in, doing what they could, often **going the extra mile** and ultimately with the **goal of helping our fellow analysts**. The **NHS Digital Data Science Skilled Team** has been the core of this work, but in particular the **Data Science RAP Squad** lead the charge, piece by piece moving mountains and making a lasting difference. -**Many thanks and congratulations** to the following for their incredible hard work. +**Many thanks and congratulations** to the following for their incredible hard work. -| [Helen Richardson](https://github.com/helrich)|[Jonny Laidler](https://github.com/JonathanLaidler) |[Harriet Sands](https://github.com/harrietrs) | [Maakhe Ndhlela](https://github.com/maakhe)| -|:----------------------------------|:-----|:----|:---| -| __[Connor Quinn](https://github.com/connor1q)__|__[Alistair Jones](https://github.com/alistair-jones)__ |__[Daniel Goldwater](https://github.com/DanGoldwater1)__ | __[Joseph Wilson](https://github.com/josephwilson8-nhs)__| -| __[Philip Hoang Le](https://github.com/philip-le)__ |__[Sam Hollings](https://github.com/SamHollings)__ |__[Abbie Prescott](https://github.com/abbieprescott)__ |__[Xiyao Zhuang](https://github.com/xiyaozhuang)__ | +| [Helen Richardson](https://github.com/helrich) | [Jonny Laidler](https://github.com/JonathanLaidler) | [Harriet Sands](https://github.com/harrietrs) | [Maakhe Ndhlela](https://github.com/maakhe) | [Scarlett Kynoch](https://github.com/scarlett-k-nhs)| +| :-------------------------------------------------- | :------------------------------------------------------ | :------------------------------------------------------- | :-------------------------------------------------------- | :-------------------------------------------------------- | +| **[Connor Quinn](https://github.com/connor1q)** | **[Alistair Jones](https://github.com/alistair-jones)** | **[Daniel Goldwater](https://github.com/DanGoldwater1)** | **[Joseph Wilson](https://github.com/josephwilson8-nhs)** | | +| **[Philip Hoang Le](https://github.com/philip-le)** | **[Sam Hollings](https://github.com/SamHollings)** | **[Abbie Prescott](https://github.com/abbieprescott)** | **[Xiyao Zhuang](https://github.com/xiyaozhuang)** | | **You guys really put the "champion" in RAP Champion!!!** diff --git a/docs/example_RAP_CoP_page.md b/docs/example_RAP_CoP_page.md index 819fa634..3555e0d4 100644 --- a/docs/example_RAP_CoP_page.md +++ b/docs/example_RAP_CoP_page.md @@ -108,6 +108,14 @@ You can also have tabs: And in here something completely different, such as a diagram ![alt text](images/branch_info.JPG "Some random picture") -## Useful Links +## Further Reading - Provide any useful links people might need to further their learning + +??? info "_External Links Disclaimer_" + + *NHS Digital makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.* + + *NHS Digital is not affiliated with any of the websites or companies in the links to external websites.* + + *If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].* \ No newline at end of file diff --git a/docs/glossary.md b/docs/glossary.md new file mode 100644 index 00000000..957b495a --- /dev/null +++ b/docs/glossary.md @@ -0,0 +1,37 @@ +--- +hide: + - navigation +--- + +## RAP + +RAP stands for Reproducible Analytical Pipelines. The term comes from UK public sector data scientists - you can [read the ONS description here](https://datasciencecampus.ons.gov.uk/capability/data-science-campus-faculty/reproducible-analytical-pipeline-journey/#:~:text=Reproducible%20Analytical%20Pipelines%20are%20programs,impressive%20efficiencies%20in%20your%20teams.). We also have a page on [why RAP is important](introduction_to_RAP/why_RAP_is_important.md) + +## Pipeline + +A data pipeline is a series of steps or processes that are used to collect, process, and transform data from various sources into a format that can be easily used. The pipeline typically includes data ingestion, data cleaning, and some analysis. A good pipeline can automate work as well as improving the quality of the outcomes. + +## Virtual Environments + +Virtual environments are a way to ensure that you have maximum control over the code that you're writing and how it will run. Many software packages interact with one another, and not always in a good or predictable way. Virtual environments allow you to develop code like it is being done on a completely clean, separate machine. In a virtual environment, you can install whatever packages you want, at whatever version, without worrying about affecting the other software or projects you might have on your computer. + +Best practice recommends that we create a different virtual environment for each project that we work on. + +## Venv + +`venv` is a particular package for managing virtual environments in Python, which comes pre-installed with python. We highly recommend using this package, although others are available. See [our page](training_resources/python/virtual-environments/venv.md) for advice on using it. + +## Conda + +`conda` is also a virtual environment manager for python. Conda does not come pre-installed with python, but instead comes with Anaconda. You can find our advice on [how to use conda here](training_resources/python/virtual-environments/conda.md). + +## Git + +Git is a version control system which can help with keeping track of changes to code. Git isn't specific to python, and is used almost universally for coding. Git is a program which runs locally on your computer. Where git really comes into it's own, though, is when you use it to help with collaborating on code with others. Github and Gitlab are two websites which help to do this. + +You can read more about git on our [introduction to Git](training_resources/git/intro-to-git.md) page. + +## IDE + +An IDE (Integrated Development Environment) is a piece of software you can use to write code. You can write code anywhere that you can write plain text, but IDEs are designed to help with the process. It doesn't affect how the code will run, and you can move the code safely between IDEs. +IDEs are packed with useful features like autocompletion, test suites, git integration, linting and more. We recommend Visual Studio Code, but you can also try PyCham, Spyder, or Eclipse - to name a few. diff --git a/docs/images/extensions_img.png b/docs/images/extensions_img.png new file mode 100644 index 00000000..e51ee2c6 Binary files /dev/null and b/docs/images/extensions_img.png differ diff --git a/docs/images/logo/nhs-blue-on-white.jpg b/docs/images/logo/nhs-blue-on-white.jpg new file mode 100644 index 00000000..b95c15c7 Binary files /dev/null and b/docs/images/logo/nhs-blue-on-white.jpg differ diff --git a/docs/images/logo/nhs-white-on-blue.jpg b/docs/images/logo/nhs-white-on-blue.jpg new file mode 100644 index 00000000..fa0fc8f4 Binary files /dev/null and b/docs/images/logo/nhs-white-on-blue.jpg differ diff --git a/docs/images/python_extension.png b/docs/images/python_extension.png new file mode 100644 index 00000000..1728b2ca Binary files /dev/null and b/docs/images/python_extension.png differ diff --git a/docs/images/vscode_run_file.png b/docs/images/vscode_run_file.png new file mode 100644 index 00000000..9a76de02 Binary files /dev/null and b/docs/images/vscode_run_file.png differ diff --git a/docs/implementing_RAP/when-to-stop-coding.md b/docs/implementing_RAP/when-to-stop-coding.md index 51ba520c..c615206b 100644 --- a/docs/implementing_RAP/when-to-stop-coding.md +++ b/docs/implementing_RAP/when-to-stop-coding.md @@ -118,9 +118,9 @@ The "Straightforward" part of "KISS" refers to how things break and the sophisti !!! note - KISS originally meant **"Keep It Simple, Stupid"**, with the Aerospace Engineer Kelly Johnson of the Lockheed Martin Skunk Works being attributed with coining the phrase. The Skunk Works developed experimental and advanced aeroplanes, including the SR-71 Blackbird spy plane and the F-117 Nighthawk stealth bomber. - - While sounding like a churlish addition, the "Stupid" part of "KISS" means the same as "Straightforward". Kelly Johnson would task Design Engineers to design their aeroplanes to be repairable under combat conditions in the field by an average mechanic with a standard set of tools. + KISS originally meant **"Keep It Simple, Stupid"**, with the Aerospace Engineer Kelly Johnson of the Lockheed Martin Skunk Works being attributed with coining the phrase. The Skunk Works developed experimental and advanced aeroplanes, including the SR-71 Blackbird spy plane and the F-117 Nighthawk stealth bomber. + + While sounding like a churlish addition, the "Stupid" part of "KISS" means the same as "Straightforward". Kelly Johnson would task Design Engineers to design their aeroplanes to be repairable under combat conditions in the field by an average mechanic with a standard set of tools. With analytical pipelines, this concept can apply to both the code design you use in your functions and your overall approach to your pipeline. Your pipeline overall should be kept simple, which helps with transparency, allowing someone unfamiliar with the process to understand what your pipeline does. In addition, if you are looking to release your pipeline as a package, make sure that you don't try to make the *All-Purpose NHS Data Super-Pipeline Package*, but instead a specific pipeline with a specific range of outputs. Focusing on your pipeline's results can help implement the KISS principle. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..f4c0d99f --- /dev/null +++ b/docs/index.md @@ -0,0 +1,8 @@ +--- +hide: + - navigation + - toc + - footer + +template: home.html +--- diff --git a/docs/introduction_to_RAP/what-is-open-source.md b/docs/introduction_to_RAP/what-is-open-source.md new file mode 100644 index 00000000..1016018c --- /dev/null +++ b/docs/introduction_to_RAP/what-is-open-source.md @@ -0,0 +1,93 @@ +# What is open source? + +!!! tip "TLDR" + + - Open source programming languages are generally **free**! This means we can use them, and other analysts can easily reuse our code without the need for costly software licenses. + + - Open source programming languages and tools are widely distributed under various open source licenses - **these may have conditions which you should be aware of**. + + - Open source programming languages and tools often have vibrant and collaborative communities that provide additional libraries (extensions) and documentation (though this isn't always complete). + + - Save time and resources, avoid duplication of effort, and increase efficiency. + +??? success "Pre-requisites" + + | Pre-requisite | Importance | Note | + |---------------|------------|------| + | [Levels of RAP] | Helpful | Open source is an essential component of Baseline RAP | + +## What is an open source programming language? + +Open source programming languages are not owned by anyone. They are widely available, usually maintained by a community, and are typically freely distributed under various open source licenses. + +For example, [Python] is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered by the [Python Software Foundation]. + +It is recommended that we align our practices with [The Technology Code of Practice]. It is a cross-government agreed standard on a set of criteria to help government design, build and buy technology. Specifically, the third point states that we should [Be open and use open source]. + +Producing data in an open source language is an essential component of Baseline RAP. The Baseline level is designated as the _minimum_ standard of a reproducible analytical pipeline so using an open source programming language is a good place to start adopting RAP practices. + +## Why use an open source programming language? + +**Licensing and distribution** - Open source programming languages are widely distributed under various open source licenses. + +**Collaborative communities** - Resolve common problems using solutions from open source communities. + +**Libraries and tools** - There is a vast number of useful open source libraries and tools that are easily accessible and well documented. + +**Time and cost** - Open source is free to use! Additionally, code libraries and tools developed for open source programming languages can save a lot of time when writing the source code of a project and lower implementation and running costs. + +## What is open source code? + +Open source code is provided under a licence that means anyone can freely access, utilise, and modify the code for any purpose. + +It means that regardless of who produced it, anyone can contribute to its further development or use it for their own means without paying a licensing fee, or seeking permission from the contributors. + +Any open source code can be reused by our developers to reduce costs, avoid duplication of effort, and increase staff efficiency. +**By publishing your code**, and even better **[packaging up your code]**, you add to the body of open source code, **allowing someone else to benefit from your hard work**. + +[How to publish your code in the open :material-cursor-default-click:][how to publish your code in the open]{ .md-button } + +## Examples of open source projects + +**[Splink]** - Splink is a Python package developed by the Ministry of Justice for probabilistic record linkage. It deduplicates and/or links records from datasets that lack a unique identifier. The core linkage algorithm is an implementation of Fellegi-Sunter's model of record linkage, with various customisations to improve accuracy. Check out the publication ['Splink: MoJ’s open source library for probabilistic record linkage at scale'] to find out more. + +The package is fully open source and can be found on GitHub. It is accompanied by a set of [interactive demos] to illustrate its functionality, whereby users can run real record linking jobs in their web browser. + +**[Coronavirus Dashboard][coronavirus-dashboard-github]** - Public Health England's coronavirus dashboard repository on GitHub contains the frontend source code for the [Coronavirus Dashboard]. It also contains the API service that supplies the latest data for the COVID-19 outbreak in the UK. They have also developed software development kits in several programming languages to facilitate access to the API such as the [Python SDK]. + +**[NHS.UK frontend]** - NHS.UK frontend contains code to start building user interfaces for NHS websites and services. + +**[NHS COVID Pass Verifier]** - The NHS COVID Pass Verifier app is a secure way to scan an individual’s NHS COVID Pass and check that they have been fully vaccinated against COVID-19, had a negative test, or have recovered from COVID-19. + +## Further reading + +- [Be open and use open source] (GOV.UK) +- [The benefits of coding in the open] (GDS) +- [Open source policy] (NHSX) + +??? info "_External Links Disclaimer_" + + *NHS Digital makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.* + + *NHS Digital is not affiliated with any of the websites or companies in the links to external websites.* + + *If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].* + +[levels of rap]: ./levels_of_RAP.md +[python]: https://www.python.org/about/ +[python software foundation]: https://www.python.org/psf-landing/ +[the technology code of practice]: https://www.gov.uk/guidance/the-technology-code-of-practice +[splink]: https://github.com/moj-analytical-services/splink/ +[splink: moj’s open source library for probabilistic record linkage at scale]: https://www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale +[interactive demos]: https://github.com/moj-analytical-services/splink_demos +[coronavirus-dashboard-github]: https://github.com/publichealthengland/coronavirus-dashboard +[coronavirus dashboard]: https://coronavirus.data.gov.uk/ +[python sdk]: https://github.com/publichealthengland/coronavirus-dashboard-api-python-sdk +[nhs.uk frontend]: https://github.com/nhsuk/nhsuk-frontend +[nhs covid pass verifier]: https://github.com/nhsx/covid-pass-verifier +[how to publish your code in the open]: ../implementing_RAP/how-to-publish-your-code-in-the-open.md +[be open and use open source]: https://www.gov.uk/guidance/be-open-and-use-open-source +[the benefits of coding in the open]: https://gds.blog.gov.uk/2017/09/04/the-benefits-of-coding-in-the-open/ +[open source policy]: https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md +[rap community of practice github]: https://github.com/NHSDigital/rap-community-of-practice/issues +[packaging up your code]: ../training_resources/python/project-structure-and-packaging.md diff --git a/docs/our_RAP_service/building_team_capability.md b/docs/our_RAP_service/building_team_capability.md index 5b4da6c8..f0c77845 100644 --- a/docs/our_RAP_service/building_team_capability.md +++ b/docs/our_RAP_service/building_team_capability.md @@ -19,7 +19,7 @@ This guide will detail what you need to consider before starting a RAP engagemen ## RAP pre-engagement questionnaire -The analyst team needs to assess their ability to carry out a RAP project end-to-end by completing a RAP pre-engagement questionnaire(Link TBC). +The analyst team needs to assess their ability to carry out a RAP project end-to-end by completing a [RAP pre-engagement questionnaire](./rap-pre-engagement-questionnaire.md). The questionnaire will contain a series of questions which will revolve around initial project considerations: diff --git a/docs/our_RAP_service/index.md b/docs/our_RAP_service/index.md new file mode 100644 index 00000000..75f78324 --- /dev/null +++ b/docs/our_RAP_service/index.md @@ -0,0 +1,19 @@ +# Our RAP Service + +Following the recommendations in the [Overcoming Barriers to RAP](https://osr.statisticsauthority.gov.uk/publication/reproducible-analytical-pipelines-overcoming-barriers-to-adoption/) +report, we have set up a central RAP team to coordinate efforts and set standards across NHS Digital. + +This team tends to be made up of about 5 data scientists and we flex the resource up and down according to demand. + +Over time, we hope that the central RAP team can be resourced with a mix of all kinds of roles in NHS Digital. We think that this collaborative approach to resourcing will make it more likely that the whole community owns the problem. + +There are already lots of people playing an informal 'RAP champion' role by supporting colleagues. This group of people would be the natural candidates to guide the NHS Digital RAP community over time. + +> The rest of this section shares the resources our RAP team has developed for the management, planning, and strategy for embedding RAP practices in a large organisation. We hope that sharing our approach and considerations will help others. + +## NHS Digital rollout strategy + +The single most valuable tool we have had in our work is the report from the ONS about +[overcoming barriers to RAP adoption](https://osr.statisticsauthority.gov.uk/publication/reproducible-analytical-pipelines-overcoming-barriers-to-adoption/). Almost every one of the considerations discussed in this report has been relevant for us in the past year. + +We would strongly encourage organisations who want to adopt RAP practices to read this report and share with senior leaders. You might also consider running a [pre-mortem](https://www.atlassian.com/team-playbook/plays/pre-mortem) with the project sponsor and senior analytical leaders to anticipate and avoid some of the problems mentioned in this report. diff --git a/docs/our_RAP_service/rap-pre-engagement-questionnaire.md b/docs/our_RAP_service/rap-pre-engagement-questionnaire.md new file mode 100644 index 00000000..bc3301d8 --- /dev/null +++ b/docs/our_RAP_service/rap-pre-engagement-questionnaire.md @@ -0,0 +1,74 @@ +# RAP Pre-engagement Questionnaire + +!!! tip "TLDR" + - These questions will help you plan out your RAP engagement. + - They focus on what the current situation is, what needs doing, why, and when it will be classed as done. + +??? question "Why should we care?" + - By thoroughly scoping out a piece of work, prior to starting, it might identify potential pitfalls, or opportunities. + - This document should help RAP champions set expectations, especially when "disengaging" at the end of an engagement. + +??? success "Pre-requisites" + * Some information on what someone might need to be familiar with before they can use this page + + | Pre-requisite | Importance | Note | + |----------------------------------------------------------|------------|-------------------------------------------------------------------------------------------| + | [Levels of RAP](../introduction_to_RAP/levels_of_RAP.md) | Necessary | Knowing the levels of RAP allows you to understand what to aim for | + | [Building Team Capability](building_team_capability.md) | Helpful | Provides context for this checklist: what your team will need to do RAP | + | [Support Models](support-models.md) | Helpful | Helps you understand the different ways in which RAP champions can help analysts do RAP | + | [Typical Engagement Flow](typical-engagement-flow.md) | Helpful | An example of how an engagement might go - might help RAP champions to plan out their own | + | [Thin Slice Strategy](thin-slice-strategy.md) | Helpful | Describes how to take an existing pipeline and remake it into a RAP | + +This is checklist is aimed at **helping RAP Champions plan out an engagement with a team** to help them make a RAP pipeline. + +It's a good idea to go through this checklist with the subject matter experts of the pipeline which needs transforming, so that there are fewer surprises further down the line. + +**Before diving into RAP**, it's highly recommended that colleagues familiarise themselves the [elements of baseline RAP](..//introduction_to_RAP/levels_of_RAP.md), and do some introductory training - this will mean any engagement will be far more efficient. + +!!! note + See [Thin slice Strategy](thin-slice-strategy.md) and [Typical Engagement Flow](typical-engagement-flow.md) for more information on how to do engagements, [support models](support-models.md) for to resource and the trade-offs involved in RAP engagements. + + +| **Theme** | **Sub-theme** | **Question** | **Tips** | +|----------------------|-------------------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------| +| **Overall** | **Start point** | What situation are we starting with? | How does the current process work, at the high level. | +| | **Aim** | What is the end goal? | Examples include they want to get it to baseline, and train up their team | +| | **"The ask"** | What do they need from us? | That is, why can't they do the above themselves? not enough resources, not enough skills? | +| | **Shutdown** | Who would own this once the project is over? | Good to get an idea of this right at the start - so handover is considered from day 1. | +| | | What is the plan for ongoing support? | For example, follow up / drop-in sessions? Analysts will hit snags after the engagements - they need to know where to turn for help | +| **Engagement model** | | Which type of [support model](support-models.md)? (for example, RAP team leads, analyst team leads, RAP champion) | | +| **Resource** | **RAP Champions** | Who? | Which of your available RAP champions will be working on this? How will they balance this alongside any other commitments? | +| | | For how long? | For how long? A default time period for engagements is 2 months, but they might need to be shorter or longer. | +| | **Analyst team** | Who? | who will be working on this project? | +| | | What is their FTE availability and for what period of time, for instance 2 months? | +| | | What is the level of technical skills and experience in the analyst team? | This should be related to the technology of choice (Python, SQL, and so on) | +| | | If pre-engagement training, who, when and for how ? | +| | | How well do the team understand the process which needs RAPifying? | For example, do they have a diagram which shows the major steps? Are technical details of those understood? Are there any black boxes? | +| | | What happens after the engagement finishes? | (Hint: hopefully they will train others and make more pipelines | +| | | Pair programming? | [Pair programming](https://www.techtarget.com/searchsoftwarequality/definition/Pair-programming#:~:text=Pair%20programming%20is%20an%20Agile,code%20and%20test%20user%20stories.) means two people working together on one task. This sounds wasteful, but can result in better code being written, and more learning taking place.| +| | **Both** | Upcoming deadlines for non-RAP stuff? | Other deadlines might derail the RAP work - so good to know about them | +| **Support from other teams** | **Data Management** | Is the pipeline that makes the "base asset" trusted and assured? | The RAP will consume and build on the "base asset" table - if this isn't assured, then the RAP outputs won't be assured. | +| | | Do we have access to data management SMEs? | +| | **Information Asset Owners** | Good to know who these people are in case there are any issues with the data, or if more info is needed on its providence, use, etc. | | +| **Existing pipeline / publication** | **Access** | What is needed to get access to any of the required data, storage and systems? | We need to arrange access promptly. | +| | | Any other access requirements? | Such as network drives, fileshares, gitlab repos, etc. | +| | **Information governance** | What direction is this done under? Which Unified Register asset numbers are used? | | +| | **Platform** | Which analytical environment are they using? | We need to know what tools and setup they're using. | +| | | Longevity of the Platform? | Is this platform sticking around? Is there a more future proof one you could use? | +| | | Plan in place for future moves to new analytical environments? | Have they thought what will happen when eventually the current environment is replaced? | +| | **Outputs** | Excel outputs, CSVs (do they follow [Open Data Standards for CSVs](https://github.com/NHSDigital/open-data-standards)? | This is important for consistency, to allow easier reuse| +| | **Existing Work** | **Any other RAP pipelines in the team?** | These can act as a useful template. | +| | | **Can the [RAP package template](https://github.com/NHSDigital/rap-package-template) be used?** | This is a great starting point to ensure your work is consistent with that of your colleagues. | +| | **[Thin slice](./thin-slice-strategy.md)** | What will the measures and breakdowns (if applicable!) be? Or otherwise, how could you chunk up the work into the thin slice approach? | | | + +## Further Reading + +- [Are you ready for RAP?](../implementing_RAP/rap-readiness.md) + +??? info "_External Links Disclaimer_" + + *NHS England makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.* + + *NHS England is not affiliated with any of the websites or companies in the links to external websites.* + + *If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].* \ No newline at end of file diff --git a/docs/our_RAP_service/typical-engagement-flow.md b/docs/our_RAP_service/typical-engagement-flow.md index af5eb054..d5030db3 100644 --- a/docs/our_RAP_service/typical-engagement-flow.md +++ b/docs/our_RAP_service/typical-engagement-flow.md @@ -36,12 +36,12 @@ Nevertheless, the text below lays out the type of activities that occur in acros - Identify training needs of the publication team through conversations - Dedicated training. The content is tailored to the specific team but a typical sequence might look like: - - [Concept of a thin slice and how to choose the thin slice][5] - - Access to off-the-shelf interactive training for self-led training - - [PySpark style guide][6] - - [Version control][7] - - [Writing good functions][8] - - [Unit tests][9] + - [Concept of a thin slice and how to choose the thin slice][5] + - Access to off-the-shelf interactive training for self-led training + - [PySpark style guide][6] + - [Version control][7] + - [Writing good functions][8] + - [Unit tests][9] - Buddy pairs work to replicate the thin slide outputs - Set up automated code testing once the numbers are correct @@ -66,7 +66,7 @@ Nevertheless, the text below lays out the type of activities that occur in acros [1]: ./support-models.md [2]: ../implementing_RAP/rap-readiness.md -[3]: ../README.md +[3]: ../index.md [4]: ../introduction_to_RAP/levels_of_RAP.md [5]: ./thin-slice-strategy.md [6]: ../training_resources/pyspark/pyspark-style-guide.md diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css index 89430046..c2be2146 100644 --- a/docs/stylesheets/extra.css +++ b/docs/stylesheets/extra.css @@ -29,7 +29,7 @@ } .md-logo-nhs { - height: 80px; + height: auto; } .md-header__button.md-logo-nhs :is(img, svg) { diff --git a/docs/support.md b/docs/support.md index 81b43952..da4848e9 100644 --- a/docs/support.md +++ b/docs/support.md @@ -1,3 +1,8 @@ +--- +hide: + - navigation +--- + # Support If your team is embarking upon a RAP journey, you should understand [why RAP are important][1] and think about which [levels of RAP][2] that you want to aim for. diff --git a/docs/training_resources/python/intro-to-python.md b/docs/training_resources/python/intro-to-python.md new file mode 100644 index 00000000..a11c6b8e --- /dev/null +++ b/docs/training_resources/python/intro-to-python.md @@ -0,0 +1,138 @@ +# Intro to Python + +!!! tip "TLDR" + - Python is a general-purpose, open-source programming language, good for data analysis, available on many data platforms + - [VSCode](https://code.visualstudio.com/) is a good tool in which to develop Python. [PEP-8](https://peps.python.org/pep-0008/) is the widely used style guide. + - We have guidance on [writing functions](python-functions.md), [data analysis in python](basic-python-data-analysis-operations.md), [python functions](python-functions.md), [unit testing](unit-testing.md), and much more in the left-hand sidebar. + + +??? success "Pre-requisites" + |Pre-requisite | Importance | Note | + |--------------|-----------|------------| + |[Levels of RAP](../../introduction_to_RAP/levels_of_RAP.md) | Necessary |Python is a key component of our approach to RAP| + |[Coding Best Practice](coding-../../implementing_RAP/coding-best-practice.md)| Helpful |Some basic coding skills will help| + +## What is Python? + +Python is an open-source programming language famous for its ease of use. It has a simple syntax, which is easier to learn and read than most languages. + +It's also considered a 'high-level' language - this means you can get on with making code do what you want it to without worrying about overly technical aspects like assigning chunks of memory. This also has downsides; python can run slower than other languages. + +Broadly, python is considered the language of choice for data science, with a growing collection of helpful packages and is a popular choice for the kinds of scripting and automation which we need to do when building Reproducable Analytical Pipelines. + + + +!!! note "Programming Environment" + You'll need to have somewhere to work where you can run Python code to follow this guide. GitHub codespaces or Google Collab can be a nice tools to get started, however cannot be used for all of your work. It's a good idea to go over the [RAP readiness - resources](https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/rap-readiness/#resources) guide to understand what you need (and might be missing) for RAP! + + + +## Getting Started + +You can write python code in any program that allows you to write plain text. However, this is like running a marathon without any shoes on, uncomfortable and you are likely to run into some issues. Therefore we recommend using an Integrated Development Environment (IDE). IDEs are programs that make developing code much easier by providing a feature-rich toolset with tools such as: + +* Editors +* Run-time environments +* Line numbering +* Variable Training +* Debugging tools +* Interpreters +* Auto-correction and suggestion + +Whether you're new to coding or an old hand, we recommend using [Visual Studio Code](https://code.visualstudio.com/) (aka VSCode). This has a few benefits: it's feature-rich, lightweight, and works the same across multiple platforms (Windows, Mac, and Linux). It's also already installed on most NHS environments where you might want to write code. + +Other IDE's are available, such as Spyder, Eclipse, PyCharm, Sublime, Atom, and more! +Some of our analytical environments use [Databricks](https://www.databricks.com/), in which you can write and execute Python code. +### Python In VSCode + +Once you've installed and opened VSCode, you'll want to install the python extension for VSCode. +![](docs\images\extensions_img.png) +You can find the Extensions panel either by clicking the link above, or by pressing `ctrl + shift + x`. +From here you can search for and install the python extension. +![](\docs\images\python_extension.png) +If you're familiar with working with Jupyter Notebooks, you can also download the Jupyter extension in the same way. + +### Hello World + +Now we want to use python to run a simple script. Create a file called `hello_python.py` (in VSCode, you can just press `Ctrl + N` to make a file). In this new file, write + +``` +print("Hello World") +``` + +This is all the code we need to test our python install. There's a couple of ways you can run this code: + +- In VSCode, press `Ctrl + Shift + p`. This will bring up the Command Palette. You can do most things from here - try typing something which you're trying to do, and there's likely a command which will appear. In our case, we want to run the file, so we type `run file` and select the appropriate option - `run file in Terminal`. + +![](../../images/vscode_run_file.png) + +- As the screenshot shows, we can also run the file by using the keyboard shortcut `Ctrl + Shift + Enter`. + +- Alternatively, we can run the same file from outside VSCode. Open a terminal and navigate to the space where you've written your script, then run the command: + +``` +python -m hello_python.py +``` + + +Whichever method you use, you ought to see `Hello World` printed to the terminal. + +### Environments and Packages + +It's considered good practice to create a virtual environment for each of your python projects - [we explain why on this page](virtual-environments/why-use-virtual-environments.md). There are many tools which you can use to create and manage your python environments; we recommend reading our [guide to conda](virtual-environments/conda.md) and our [guide to venv](virtual-environments/venv.md) and picking one of these two. + +### Data Analysis in Python + +Many consider python to be the language of choice for data analysis. One of the things which makes python powerful is the wide range of packages and tools available for the language -- this makes it flexible enough to be useful for small-scale analysis, which happens directly on your laptop and for larger projects which use cloud resources. + +To get started with analysis, take a look at our [guide to basic python data analysis operations](basic-python-data-analysis-operations.md). + +If you're using Databricks or otherwise working with large datasets and distributed computing, take a look at the documentation we have on Pyspark in the sidebar. You can start with [our page on what it is](../pyspark/README.md) + +### SQL and Python + +Whatever the scale of your data, you might need to interact with it via SQL queries. You can generate these using python too - [check out our guide on how to to so](using-f-strings-sql-queries.md). + +### Writing Python Code + +There are as many opinions about what constitutes good code as there are coders. It's also always a good idea to adhere to a style guide. + +* For Python, we recommend using [`PEP-8`](https://peps.python.org/pep-0008/), which is also mentioned in our [Levels of RAP](../introduction_to_RAP/levels_of_RAP.md) +* more specifically, the [Google Python style-guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md) (which is also the core of our [Pyspark style guide](../training_resources/pyspark/pyspark-style-guide/) ) +* There are tools called **linters** which can be used to check for these styles automatically, such as [Pylint](https://pypi.org/project/pylint/) + +Next, you'll want to know [how to structure your python projects](project-structure-and-packaging.md), [how to write good functions](python-functions.md), and [how to approach unit testing in python](unit-testing.md). + +### Where Now? +You've now you've got python installed and you've got an editor to work in. There's lots and lots of training resources about how to write python code, here are some we've found useful: + +- [Kaggle introduction to python](https://www.kaggle.com/learn/python) +- [Govt Analysis Function Introduction to Python](https://analysisfunction.civilservice.gov.uk/training/introduction-to-python/) +- [freecodecamp](https://www.freecodecamp.org), which has a course tailored to learning [data analysis with Python](https://www.freecodecamp.org/learn/data-analysis-with-python/) + +**There's also plenty of resources right here on this site** -- check our guide to [python functions](python-functions.md). + +??? info "External Links Discalimer" + *NHS England makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.* + + *NHS England is not affiliated with any of the websites or companies in the links to external websites.* + + *If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub](https://github.com/NHSDigital/RAP_CoP_dev).* diff --git a/docs/training_resources/python/project-structure-and-packaging.md b/docs/training_resources/python/project-structure-and-packaging.md index 953948e2..11f28f8a 100644 --- a/docs/training_resources/python/project-structure-and-packaging.md +++ b/docs/training_resources/python/project-structure-and-packaging.md @@ -1,6 +1,22 @@ # Project and Package structuring -A python package is a way to bundle your code into a single thing that can be shared and reused. If our goal is to be able to share and reuse code across NHS Digital and externally then there are many benefits to packaging code: +!!! tip "TLDR" + - Generally you should use a standard repo structure - this page describes how and why + - [RAP Python Package template](https://github.com/NHSDigital/rap-package-template) comes with sections for the different bits of your code, testing and prepares you for making your code into a Python package. + - There is guidance on [how to adapt the RAP Package template for your project](#adapting-package-structure-for-analytical-work) + +??? question "Why should we care?" + - If we use a standard repo structure it will be easier for other people to understand and reuse your code. + - This structure follows current best practice for Python + - Packaging your code makes it easier for other people to re-run it, or use the functions / classes / etc. that you've made. + +??? success "Pre-requisites" + + |Pre-requisite | Importance | Note | + |--------------|------------|------| + |[Levels of RAP](../../introduction_to_RAP/levels_of_RAP.md)|Essential|Having well organised code and following a standard directory format is a component of **Silver RAP**| + +A python package is a way to bundle your code into a single thing that can be shared and reused. If our goal is to be able to share and reuse code across NHS England and externally then there are many benefits to packaging code: - **Shareable**: The most important reason to use packages is that it is the way to share python code. Not using packages runs the risk that other people will not be able to run your code... "It works fine on my machine". - **Databricks**: It looks likely that packaging code will be the easiest way to get your code onto databricks. (NB: we are keeping a close eye on Data Refinery to answer this question) @@ -14,40 +30,42 @@ This is a tricky topic at first so we recommend asking for some support when you In order to package your code, you just need to follow the standard templates for python projects. I describe this briefly here but provide links to the comprehensive official guidance below. -Here is an outline of a very basic python project for the Smoking, Drinking, and Drugs publication (SDD): +Here is an outline of a very basic python project for a publication as an example: ```txt -SDD/ +project_name ├── LICENSE ├── README.md ├── requirements.txt -├── setup.py -├── sdd/ +├── pyproject.toml +├── src/ | ├── __init__.py | ├── main.py │   └── example_package/ │   ├── __init__.py │   └── example.py +├── sql/ +├── templates/ └── tests/ ``` -Note that the whole directory is called `SDD` but we also have a sub-folder called `sdd`. All of our code lives inside this `sdd` sub-folder. Sometimes you might see the folder that contains all the code called `src` but the other common convention is to name it the same as the overall project. +Note that the whole directory is called `project_name` but we also have a sub-folder called `src`. All of our code lives inside this `src` sub-folder. Sometimes you might see the folder that contains all the code called `scripts` but the other common convention is to name it the same as the overall project. The README.md file is extremely important as this functions as the package’s landing page. The README provides a bird's eye view of the whole package. You should treat it as the first thing a new starter might read when trying to understand your code. It might contain an overview of the code, details of inputs/outputs, how to install package, ownership, contributing and licence info. -The random-seeming files (setup.py, requirements.txt, etc.) are all involved in dependency management. Your package will depend on certain things being in place in order to run - these files manage those dependencies. Be aware that dependency management is a difficult topic in python and other languages and so over the years many different approaches have fallen in and out of favour. Here, we aim to keep it as simple as possible while still achieving the goal of producing robust, shareable code. +The random-seeming files (`pyproject.toml`, `requirements.txt`, etc.) are all involved in dependency management. Your package will depend on certain things being in place in order to run - these files manage those dependencies. Be aware that dependency management has evolved over the years in python and other languages and best practice has changed over time. Here, we aim to keep it as simple as possible while still achieving the goal of producing robust, shareable code. Also note the funny looking `__init__.py`. This file tells python that this code is part of a package. With this file in place you can import functions from the different parts of your code. E.g. `from sdd.example_package.example import my_function`. Again - you can learn more about this in the links below. -Here is another for the diabetes publication so you can see the pattern. We have a more elaborate version of this below. +Here is another for the [NDA publication](https://github.com/NHSDigital/national-diabetes-audit) so you can see the pattern. We have a more elaborate version of this below. ``` -diabetes/ +national-diabetes-audit/ ├── LICENSE ├── README.md ├── requirements.txt ├── setup.py -├── diabetes/ +├── diabetes_code/ | ├── __init__.py | ├── main.py │   └── example_package/ @@ -63,68 +81,119 @@ To help get you started, we have created a generic package structure that you ca ## So what now? - Make your own package either by adding the files yourself or by cloning the [generic template](https://github.com/NHSDigital/rap-package-template) that we provide. -- Once you have the package structure in place, you can install that package on your machine using these two commands: +- Once you have the package structure in place, you can install that package on your machine following any of these options: +- Never used virtual environments before? Visit our [RAP COP guidance](https://nhsdigital.github.io/rap-community-of-practice/training_resources/python/virtual-environments/why-use-virtual-environments/)! - ```python - pip install -e . - pip install -r requirements.txt - ``` +### Using pip +``` +python -m venv .venv +.\.venv\Scripts\Activate.ps1 +python -m pip install -r requirements.txt +``` +For Visual Studio Code it is necessary that you change your default interpreter to the virtual environment you just created .venv. To do this use the shortcut Ctrl-Shift-P, search for Python: Select interpreter and select .venv from the list. + +### Using conda +The first line of the `environment.yml` file sets the new environment's name. In this template, the name is `rap_template`- you should change this in the `environment.yml` file, as well as the following code, to the name of your project. +``` +conda env create -f environment.yml +conda activate +``` Once the package is installed it will be able to identify all the modules inside your code. If you have populated the `setup.py` and `requirements.txt`, python will install all of the bits that your code needs in order to run. - More importantly, other people can install finished your package from gitlab using `pip install git+path_to_my_repo`. For example `pip install git+https:////diabetes_rap` -- Even better, you can bundle all of your code into a single file called a wheel (.whl). This wheel can be shared, stored in an online repository for others to use, or **installed into databricks**. +- Even better, you can bundle all of your code into a single file called a wheel (.whl). This wheel can be shared, stored in an online repository for others to use, or installed into data processing platforms such as Databricks. ## Adapting package structure for analytical work Every python project should follow the standard package structure to help ensure portability and reliability. Nevertheless, there is scope to adapt this structure to fit the workflow of specific projects. The [cookie cutter data science template](https://drivendata.github.io/cookiecutter-data-science/#directory-structure) shows how you can include folders for output data, validation reports, figures, etc. -The figure below shows how we have applied this structure to the National Diabetes Audit code. +The figure below shows how we have applied this structure to the [RAP package template](https://github.com/NHSDigital/rap-package-template). ``` -diabetes -├── LICENSE -├── README.md -├── requirements.txt -├── setup.py -├───diabetes -│ ├── create_publication.py -│ ├── params.py -│ ├── __init__.py -│ │ -│ └───utilities -│ ├── data_connections.py -│ ├── field_definitions.py -│ ├── processing_steps.py -│ ├── __init__.py +project_name +| .gitignore <- Files (& file types) automatically removed from version control for security purposes +| config.toml <- Configuration file with parameters we want to be able to change (e.g. date) +| environment.yml <- Conda equivalent of requirements file +| requirements.txt <- Requirements for reproducing the analysis environment +| pyproject.toml <- Configuration file containing package build information +| LICENCE <- License info for public distribution +| README.md <- Quick start guide / explanation of your project +| +| create_publication.py <- Runs the overall pipeline to produce the publication +| ++---src <- Scripts with functions for use in 'create_publication.py'. Contains project's codebase. +| | __init__.py <- Makes the functions folder an importable Python module +| | +| +---utils <- Scripts relating to configuration and handling data connections e.g. importing data, writing to a database etc. +| | __init__.py <- Makes the functions folder an importable Python module +| | file_paths.py <- Configures file paths for the package +| | logging_config.py <- Configures logging +| | data_connections.py <- Handles data connections i.e. reading/writing dataframes from SQL Server +| | +| +---processing <- Scripts with modules containing functions to process data i.e. clean and derive new fields +| | __init__.py <- Makes the functions folder an importable Python module +| | clean.py <- Perform cleaning and wrangling processes +| | derive_fields.py <- Create new field definitions, columns, derivations. +| | +| +---data_ingestion <- Scripts with modules containing functions to preprocess read data i.e. perform validation/data quality checks, other preprocessing etc. +| | __init__.py <- Makes the functions folder an importable Python module +| | preprocessing.py <- Perform preprocessing, for example preparing your data for metadata or data quality checks. +| | validation_checks.py <- Perform validation checks e.g. a field has acceptable values. +| | +| +---data_exports +| | __init__.py <- Makes the functions folder an importable Python module +| | write_excel.py <- Populates an excel .xlsx template with values from your CSV output. +| | ++---sql <- SQL scripts for importing data +| example.sql +| ++---templates <- Templates for output files +| publication_template.xlsx | -├───reports -│ ├───input_profile -│ └───output_profile -│ -└───tests - ├───unittests - │ │ test_data_connections.py - │ │ test_field_definitions.py - │ │ test_processing_steps.py ++---tests +| | __init__.py <- Makes the functions folder an importable Python module +| | +| +---backtests <- Comparison tests for the old and new pipeline's outputs +| | backtesting_params.py +| | test_compare_outputs.py +| | __init__.py <- Makes the functions folder an importable Python module +| | +| +---unittests <- Tests for the functional outputs of Python code +| | test_data_connections.py +| | test_processing.py +| | __init__.py <- Makes the functions folder an importable Python module ``` Some things to notice about this structure: -- All of the actual code lives inside the `diabetes_code` directory. Everything else at the top level has to do with packaging and testing the code. -- In the diabetes_code repository there are two files: `create_publication.py` and `params.py`. These top level files are the highest level of abstraction and should be the main place where users interact with the code. +- All of the actual code lives inside the `src` directory. Everything else at the top level has to do with packaging and testing the code. +- In the repository there are two files: `create_publication.py` and `config.toml`. These top level files are the highest level of abstraction and should be the main place where users interact with the code. - - The `params.py` file contains all of the parameters that we expect to change frequently, e.g. input data. + - The `config.toml` file contains all of the parameters that we expect to change frequently, e.g. input data. - The `create_publication.py` file organises the steps in a simple, easy-to-understand manner that should be readable by anyone, even if they don't know python. In this way, we aim to reduce risk by make the code accessible to new staff. + - The `requirements.txt` file which specifies all the python packages you wish to install. See [venv virtual environments guide](https://nhsdigital.github.io/rap-community-of-practice/training_resources/python/virtual-environments/venv/). + - The `environment.yml` is Conda offering a way to export and share environments via a yaml file. See [Conda virtual environments guide](https://nhsdigital.github.io/rap-community-of-practice/training_resources/python/virtual-environments/conda/) for more info. + - The `pyproject.toml` file, which contains build system requirements and information, which are used by pip to build the package. For more information see [pyproject.toml documentation](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/). + +### `root` + +In the highest level of this repository (known as the 'root'), there is one Python file: `create_publication.py`. This top level file should be the main place where users interact with the code, where you store the steps to create your publication. + +This file currently runs a set of example steps using example data. -- The next level down contains the meaty parts of the code. By organising the code into logical sections, we make it easier to understand but also to maintain and test. Moreover, tucking the complex code out of the way means that users don't need to understand everything about the code all at once. - - The `data_connections.py` file handles reading data in and writing data back out. Since we know that this code will have to migrate to Data Refinery soon, it makes sense to have an interface here. The plan is that when we move to Data Refinery, this should be the only code we need to change. - - The `field_definitions.py` file contains the definitions for each of the fields (columns) derived in the process. By abstracting these definitions out of the code and making them reusable, we achieve some great benefits. First, it becomes much easier to maintain. When the specifications change next year, we only need to make the change in one location. Next, it becomes much easier to test. We write unit tests for each of these definitions and can then reuse these definitions in many places without increasing risk. - - The `processing_steps.py` file contains the core business logic of the diabetes data. We could consider breaking this down into further steps. +### `src` + +This directory contains the meaty parts of the code. By organising the code into logical sections, we make it easier to understand, maintain and test. Moreover, tucking the complex code out of the way means that users don't need to understand everything about the code all at once. + +* `data_connections.py` handles reading data in and writing data back out. +* `processing` folder contains the core business logic. +* `utils` folder contains useful reusable functions (e.g. to set up logging, and importing configuration settings from `config.toml`) +* `write_excel.py` contains functions relating to the final part of the pipeline, any exporting or templating happens here. This is a simplistic application of writing output code to an Excel spreadsheet template (.xlsx). A good example of this application is: [NHS sickness absence rates publication](https://github.com/NHSDigital/absence-rates). We highly recommend to use [Automated Excel Production](https://nhsd-git.digital.nhs.uk/data-services/analytics-service/iuod/automated-excel-publications) for a more in depth Excel template production application. Note that we never store passwords or any sensitive credentials in the repo to prevent the situation where it can mistakenly committed into the git. There are several ways to deal with the secret, keys and passwords such as using Git Hooks or final cleansing process before publishing. -## External links +## Further Reading This is a really big topic and we don't want to replicate material that you can find elsewhere. Here are links to some resources that will give you as much detail as you want on this topic. @@ -134,3 +203,11 @@ This is a really big topic and we don't want to replicate material that you can _Good for understanding on how to organise and import sub-packages._ - [Why we share code as a .whl file](https://packaging.python.org/discussions/wheel-vs-egg/) + +??? info "_External Links Disclaimer_" + + *NHS Digital makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.* + + *NHS Digital is not affiliated with any of the websites or companies in the links to external websites.* + + *If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our [RAP Community of Practice GitHub].* diff --git a/docs/useful_links.md b/docs/useful_links.md index 1729c969..389cd072 100644 --- a/docs/useful_links.md +++ b/docs/useful_links.md @@ -1,3 +1,8 @@ +--- +hide: + - navigation +--- + # Useful links ## Strategic @@ -13,7 +18,7 @@ - The [ONS best practice team have a useful website - often called the Quack book] covering many of the same topics we cover here. Their [best-practice checklist] is particularly useful. - The Turing Institute's [The Turing Way handbook to reproducible, ethical and collaborative data science] - We have taken inspiration from the [NHS Digital software engineering COP]. -- [**NHS PyCom Coding Club Github**][Coding Club]: Lots of great guides and lessons here from the NHS Python Community +- [**NHS PyCom Coding Club Github**][coding club]: Lots of great guides and lessons here from the NHS Python Community ## Examples, Challenges, Benefits @@ -28,45 +33,45 @@ - There are several communities to discuss Python, R and Git in the Health and Public sectors: - - **Government Data Science community** ([website][GDS community website] | - [slack][GDS community slack] #rap_collaboration | [GitHub][GDS community GitHub]) - - **Government Analysis Function RAP Champion network** ([website][Analysis Function website]) - - **NHS Python Community (NHS-pycom)** ([website][NHS-pycom website] | [slack][NHS-pycom slack] | [GitHub][NHS-pycom GitHub] | [Coding Club]) - - **NHS R Community** ([website][NHS-R website] | [slack][NHS-R slack] |[GitHub][NHS-R GitHub]) - - **AnalystX** ([website][AnalystX website] | [Future NHS - AnalystX] | [GitHub][AnalystX GitHub]) + - **Government Data Science community** ([website][gds community website] | + [slack][gds community slack] #rap_collaboration | [GitHub][gds community github]) + - **Government Analysis Function RAP Champion network** ([website][analysis function website]) + - **NHS Python Community (NHS-pycom)** ([website][nhs-pycom website] | [slack][nhs-pycom slack] | [GitHub][nhs-pycom github] | [Coding Club]) + - **NHS R Community** ([website][nhs-r website] | [slack][nhs-r slack] |[GitHub][nhs-r github]) + - **AnalystX** ([website][analystx website] | [Future NHS - AnalystX] | [GitHub][analystx github]) - We have an **NHS Digital RAP Teams group** (internal to NHS Digital - contact us @ [data.science@nhs.net]) - [NHS Digital Github] - Our very own [NHS Digital RAP Community of Practice] -[RAP strategy]: https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/ -[RAP Strategy Implementation Plan 2023]: https://www.ons.gov.uk/aboutus/whatwedo/programmesandprojects/analysisfunctionrapstrategy2023implementationplan -[Goldacre Review]: https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis +[rap strategy]: https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/ +[rap strategy implementation plan 2023]: https://www.ons.gov.uk/aboutus/whatwedo/programmesandprojects/analysisfunctionrapstrategy2023implementationplan +[goldacre review]: https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis [helpful summary]: https://www.bennett.ox.ac.uk/blog/2022/07/bennett-insights-an-overview-of-uk-data-policy-developments/ -[AQUA book of analytical standards]: https://www.gov.uk/government/publications/the-aqua-book-guidance-on-producing-quality-analysis-for-government -[ONS best practice team have a useful website - often called the Quack book]: https://best-practice-and-impact.github.io/qa-of-code-guidance/intro.html +[aqua book of analytical standards]: https://www.gov.uk/government/publications/the-aqua-book-guidance-on-producing-quality-analysis-for-government +[ons best practice team have a useful website - often called the quack book]: https://best-practice-and-impact.github.io/qa-of-code-guidance/intro.html [best-practice checklist]: https://best-practice-and-impact.github.io/qa-of-code-guidance/checklist_higher.html -[The Turing Way handbook to reproducible, ethical and collaborative data science]: https://the-turing-way.netlify.app/welcome.html -[NHS Digital software engineering COP]: https://github.com/NHSDigital/software-engineering-quality-framework/blob/master/insights/review.md -[Coding Club]: https://github.com/nhs-pycom/coding-club -[This blog post]: https://dataingovernment.blog.gov.uk/2017/03/27/reproducible-analytical-pipeline/ -[benefits that come from RAP]: https://gss.civilservice.gov.uk/reproducible-analytical-pipelines/benefits-to-government-from-reproducible-analytical-pipelines/ -[report on overcoming barriers to RAP adoption]: https://osr.statisticsauthority.gov.uk/publication/reproducible-analytical-pipelines-overcoming-barriers-to-adoption/ -[A Beginner's Guide to Conducting Reproducible Research]: https://doi.org/10.1002/bes2.1801 -[Introduction to RAP course]: https://gss.civilservice.gov.uk/training/introduction-to-reproducible-analytical-pipelines-rap/ -[research on the rollout of RAP across different departments]: https://best-practice-and-impact.github.io/CARS-3/index.html -[GDS community website]: https://www.gov.uk/service-manual/communities/data-science-community -[GDS community slack]: https://govdatascience.slack.com/ -[GDS community GitHub]: https://github.com/ukgovdatascience -[Analysis Function website]: https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/ -[NHS-pycom website]: https://nhs-pycom.net/ -[NHS-pycom slack]: https://nhs-pycom.slack.com -[NHS-pycom GitHub]: https://github.com/nhs-pycom -[NHS-R website]: https://nhsrcommunity.com/ -[NHS-R slack]: https://nhsrcommunity.slack.com -[NHS-R GitHub]: https://github.com/nhs-r-community -[AnalystX website]: https://analystx.uk/ -[Future NHS - AnalystX]: https://future.nhs.uk/connect.ti/DataAnalytics/grouphome -[AnalystX GitHub]: https://github.com/nhs-analystx +[the turing way handbook to reproducible, ethical and collaborative data science]: https://the-turing-way.netlify.app/welcome.html +[nhs digital software engineering cop]: https://github.com/NHSDigital/software-engineering-quality-framework/blob/master/insights/review.md +[coding club]: https://github.com/nhs-pycom/coding-club +[this blog post]: https://dataingovernment.blog.gov.uk/2017/03/27/reproducible-analytical-pipeline/ +[benefits that come from rap]: https://gss.civilservice.gov.uk/reproducible-analytical-pipelines/benefits-to-government-from-reproducible-analytical-pipelines/ +[report on overcoming barriers to rap adoption]: https://osr.statisticsauthority.gov.uk/publication/reproducible-analytical-pipelines-overcoming-barriers-to-adoption/ +[a beginner's guide to conducting reproducible research]: https://doi.org/10.1002/bes2.1801 +[introduction to rap course]: https://gss.civilservice.gov.uk/training/introduction-to-reproducible-analytical-pipelines-rap/ +[research on the rollout of rap across different departments]: https://best-practice-and-impact.github.io/CARS-3/index.html +[gds community website]: https://www.gov.uk/service-manual/communities/data-science-community +[gds community slack]: https://govdatascience.slack.com/ +[gds community github]: https://github.com/ukgovdatascience +[analysis function website]: https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/ +[nhs-pycom website]: https://nhs-pycom.net/ +[nhs-pycom slack]: https://nhs-pycom.slack.com +[nhs-pycom github]: https://github.com/nhs-pycom +[nhs-r website]: https://nhsrcommunity.com/ +[nhs-r slack]: https://nhsrcommunity.slack.com +[nhs-r github]: https://github.com/nhs-r-community +[analystx website]: https://analystx.uk/ +[future nhs - analystx]: https://future.nhs.uk/connect.ti/DataAnalytics/grouphome +[analystx github]: https://github.com/nhs-analystx [data.science@nhs.net]: mailto:data.science@nhs.net -[NHS Digital Github]: https://github.com/NHSDigital -[NHS Digital RAP Community of Practice]: https://github.com/NHSDigital/rap-community-of-practice +[nhs digital github]: https://github.com/NHSDigital +[nhs digital rap community of practice]: https://github.com/NHSDigital/rap-community-of-practice diff --git a/mkdocs.yml b/mkdocs.yml index a16a0688..32c69de0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -11,21 +11,22 @@ plugins: include_source: True theme: dark nav: - - Home: README.md + - Home: index.md - About: about.md - - Support: support.md - - Acknowledgements: acknowledgements.md - Introduction to RAP: - Why RAP is important: introduction_to_RAP/why_RAP_is_important.md - Levels of RAP: introduction_to_RAP/levels_of_RAP.md + - What is open source?: introduction_to_RAP/what-is-open-source.md - Our RAP service: - - The RAP team: our_RAP_service/README.md + - our_RAP_service/index.md - Building team capability: our_RAP_service/building_team_capability.md - Support models: our_RAP_service/support-models.md - Thin slice strategy: our_RAP_service/thin-slice-strategy.md - Typical engagement flow: our_RAP_service/typical-engagement-flow.md + - RAP pre-engagement Questionnare: our_RAP_service/rap-pre-engagement-questionnaire.md - Programme level reporting: our_RAP_service/programme-level-reporting.md - Service design and user research: our_RAP_service/service-design-and-user-research.md + - Implementing RAP: - Are you ready for RAP?: implementing_RAP/rap-readiness.md - Code review: implementing_RAP/code-review.md @@ -45,6 +46,7 @@ nav: - Using Git collaboratively: training_resources/git/using-git-collaboratively.md - Git Hooks: training_resources/git/githooks.md - Python: + - Intro to python: training_resources/python/intro-to-python.md - Basic Python data analysis operations: training_resources/python/basic-python-data-analysis-operations.md - Python functions: training_resources/python/python-functions.md - Visualisation in Python: training_resources/python/visualisation-in-python.md @@ -68,7 +70,10 @@ nav: - R: - External resources: training_resources/R/README.md - Git with RStudio: training_resources/R/git_with_RStudio.md + - Support: support.md + - Glossary: glossary.md - Useful links: useful_links.md + - Acknowledgements: acknowledgements.md theme: name: material custom_dir: overrides @@ -77,12 +82,14 @@ theme: primary: indigo font: text: Arial - logo: images/NHS-Digital-logo_LEFT-WHITE-235x183.png - favicon: images/NHS Digital logo_WEB_LEFT.svg + logo: images/logo/nhs-blue-on-white.jpg + favicon: images/favicon/favicon.ico features: - search.share - content.code.annotate - content.tabs.link + - navigation.tabs + - navigation.indexes icon: admonition: : material/alert @@ -107,7 +114,7 @@ markdown_extensions: class: mermaid format: !!python/name:pymdownx.superfences.fence_code_format - pymdownx.tabbed: - alternate_style: true + alternate_style: true - pymdownx.arithmatex: generic: true - admonition @@ -128,3 +135,5 @@ extra_javascript: - javascripts/mathjax.js - https://polyfill.io/v3/polyfill.min.js?features=es6 - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js +watch: + - overrides diff --git a/overrides/home.html b/overrides/home.html new file mode 100644 index 00000000..b7b3c46f --- /dev/null +++ b/overrides/home.html @@ -0,0 +1,351 @@ +{% extends "main.html" %} {% block tabs %} {{ super() }} + + + + +
+
+
+
+ +
+ +
+

RAP Community of Practice

+ +

+ Improve reliability, transparency, and speed of + statistics publications. +

+ + + Get started + + + + See our work + +
+
+
+
+ + + + + + +{% endblock %} {% block content %}{% endblock %} {% block footer %} {{ super() +}} {% endblock %}