# Setup

- Installation of required software should have been handled before, but make sure on the day before starting that all participants have access to the required tools: Bash, Git, a GitHub account, Python with pip and venv
- TODO need to make a decision about VSCode and whether we will support it 

# [Setting the Scene](http://0.0.0.0:4000/00-setting-the-scene/index.html)

- fill out narrative about target audience and why we are here:
  - we have learned programming to do our research: it is a tool and a means to an end
  - likely we are mostly self-taught or have taken some intro courses
  - but we now find the techniques we have picked up to be inadequate for the software we need to write
  - single scripts no longer cut it and we are collaborating with more people, or have users for the software we are producing
  - TODO think of a question to ask here like: how many times have you found yourself coding and thought "there must be a better way to do this" or "this software is getting in the way of my research", "why is it so difficult to get this program to work?", "this code is incomprehensible and really difficult to modify", "I screwed up my Python installation again and need to reinstall my OS", 

- objective of this course is to deal with some of these struggles you might be facing by teaching some intermediate software engineering skills
  - just like maths, statistics, and physics theory, software engineering is a skill you need to continue to develop

- teach intermediate software engineering skills so that you can:
  - restructure existing code and design more robust software from scratch
  - automate the process of testing and verifying software correctness 
  - support collaborations with others in a way that mimics a typical collaborative software development process
  - get you ready to distribute your code for use by others

- I want this to be seen as a collaborative learning session
  - I'll be doing a bit of instructing, but most of the learning will come from the many exercises and other activities
  - Please type along as I am working at the command line or coding
  - none of us here are computer scientists and have a variety of different backgrounds; so we can all contribute to this learning process, and this shouldn't be viewed as knowledge being imparted from your instructors on high; please speak up and get involved in the conversation

- This course has necessarily made some decisions about the tools used to demonstrate the concepts being taught
  - Python is used as a fairly ubiquitous and syntactically easy language; however, the point needs to be clear that this isn't a course about Python; the course is about software engineering, and it is using Python as the playground to demonstrate the skills and concepts that should be valuable independent of the domain and language
  - to this end, I will be trying to draw connections with other languages and development scenarios when applicable since I know Python isn't necessarily the main development language for a researcher at UKAEA
  - in the long run, you will encounter many more tools than those shown here, and you will form your own preferences; that is fine and we are in no way suggesting these are the definitive tools that should be used by any researcher who codes
  
## Content Overview

Four main sections for the course.

![](../fig/course-overview.png)

I think a single slide will be fine to cover all the sections, giving a brief idea about each.

1. Setting up Software Environment: **PyCharm or VSCode** for editing, testing and debugging, **GitHub** for collaborative development, **virtual environments** for dependency isolation, and **Python code style**.
2. Verifying Software Correctness at Scale: how to set up a **test framework** and automate and scale testing with **Continuous Integration (CI)**
3. Designing Software Architecture: an exploration of different **software design paradigms** and their advantages and disadvantages
4. Managing and Improving Software: learn how to **collaborate** on code through a group project covering **issue tracking** and **software support** (TODO improve this once I go through the content)

## [Section 1: Environment For Collaborative Code Development](http://0.0.0.0:4000/01-section1-intro/index.html)

### Overview Slide / Intro

- get set up with the _tools_ for collaborative code development, and of course there are lots of decisions to make
- the recommendations are opinionated but backed by experience
1. Command Line & Virtual Development Environment: use of Git through command line and then the Python tools `venv` and `pip` to manage dependencies and isolate our project
2. Integrated Development Environment (IDE): course content supports **PyCharm**, but we will do our best to also support **VSCode** assuming you have prior knowledge of it
3. GitHub and Git development workflows
  - _Mod_ there should be some reference to git workflows in this paragraph
4. Python coding style: PEP8

_Mod_ Add reference to python coding style on this page

# [Introduction to Software Design and Development](http://0.0.0.0:4000/02-software-development/index.html)

- _Fix_ this episode feels a bit out of place here
  - the section starts with talking about tools and environment, and then quickly moves to talking about design without any warning, and then immediately back to tools and environment
  - I think the problem is that there is no mention that there will be an introduction to the example project that will be used throughout the course; if there is some warning, then I think this would be fine, and the piece on architecture can probably stay where it is at the bottom of the page
  - Also, the episode could do with a rename! It should be called "Introduction to Our Project" or something like that
  - I have made these modifications in my `mbluteau-modifications` branch
- Give an introduction to the "patient inflammation project"
  - the software project studies inflammation in patients who have been given a new treatment for arthritis and reuses the inflammation dataset from the novice Software Carpentry Python lesson
  - The dataset contains information for 60 patients, who had their inflammation levels recorded for 40 days, so a 2D dataset like below:
  
![](../fig/inflammation-dataset.svg)
 
- The analysis is incomplete and there are some errors that you will need to correct
- First, we need to get the project, so go to the course website and follow the instructions there for copying and then cloning the repository locally on your machine to work on
  - Complete the lesson "Obtain the Software Project Locally"
  - please let us know when you are done by using a sticky note
- Let's take a look at the project structure
  - Demo this from commandline
  - I like to use `tree`
  - With this we see:
  - README file (that typically describes the project, its usage, installation, authors and how to contribute),
  - Python script inflammation-analysis.py provides the main entry point into the application
  - three directories - inflammation, data and tests
  - inflammation directory has two other Python scripts that we will look at more later
  - data directory has the data we will be analysing in csv files
  - tests directory has tests for our Python programs that we will be adding to and correcting
  - **Important Point**: the structure of this project is not arbitrary
    - a difference between novice and intermediate softare development is that at the intermediate level the structure of the project should be planned in advance, and this includes the structure of abstract entities like software components and how they interact
    - in contrast, a novice will make this structure up as they go along (nothing wrong with that, it is part of learning, but at some point you need to stop doing that and have a think about these things in advance before you start a project).
    - this is probably an appropriate point to link to the Python Cookiecutter project template: https://github.com/ukaea/scientific-python-cookiecutter
- Complete exercise "Have a Peak at the Data". Please post your answers in shared document.

## Software Architecture

- "Software architecture is the fundamental structure of a software system... It refers to a 'bigger picture' of a software system that describes high-level components (modules) of the system and how they interact."
- Modules are an important idea
  - common examples: libraries that we import and use in our own code (e.g. `numpy` and `matplotlib`), classes in object-oriented languages
  - modules are largely self-contained (they can have dependencies) but there should be a well-defined way to interact with it (otherwise, what is the point?)
  - the specification or contract of how to interact with a module is called a **programming interface** and these are everywhere in software engineering (e.g. user interfaces, APIs, and even function signatures)
- Have a read of the Wikipedia articles about different architectures (5 minutes)
- What were some common features? Are any subsets or special cases of others? Any practical differences?
  - MVC (as presented in the course content, which is actually Model-View-Adapter) is sort of a subset of the multitier/layer architecture concept, where the view is the "presentation layer", the controller is the "application/business layer" and the model is the "data layer"
  - client-server is another subcase of multilayer architecture (where the presentation and application layers have been merged on the client-side)
  - in contrast, SOA takes a much more loosely coupled view of software components; good example is HTTP-based REST APIs
    - there is an agreed communication protocol over a network that the services use to communicate
    - the purpose and function of each of these services is much less prescriptive than MVC or multilayer
  - There is clear differentiation on underlying hardware: MVC is implementation agnostic (could be on sigle device or multiple) while multitier and server-client require distinct machines communicating between each other; similarly for SOA
- MVC is likely most applicable to research contexts and it is also what is used for the example project
- Traditional MVC is actually a bit different from the course content:
![](../fig/MVC-Process.svg)
By RegisFrey - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=10298177
- There is direct communication between the model and the view, and the controller does not interact with the view
- What is presented here is more accurately Model-View-Presenter/Mediator, where the view and model are completely decoupled and unaware of each other:
![](../fig/mvc-DNA-guide-GUI.png)
- TODO raise an issue about this on central repo?
- For the example project, the MVC corresponds as follows:
  - **Controller** = `inflammation-analysis.py` that performs basic statistical analysis over patient data and provides the main entry point into the application
  - **View** = `inflammation/view.py` TODO add brief summary
  - **Model** = `inflammation/model.py` TODO add brief summary
- Discussion session (5 minutes) in small groups about how MVC might be applicable to their own work. Think of a piece of software you work on, and how this might be used for it. Or think of a new tool you might want to make that uses this.
- Some final words on architecture and these particular patterns:
  - don't get too caught up determining exactly what functionality should be the responsibility of each component
  - the act of splitting things up and thinking about how they will interact through interfaces is where you get the most value
  - it is likely you were already doing this in an informal fashion, but good to think about it more explicitly

# [Virtual Environments For Software Development](http://0.0.0.0:4000/03-virtual-environments/index.html)

- Switch to terminal, but follow the notes here
- Try to run the analysis script from the command line: `python3 inflammation-analysis.py`
  - If you are in a clean Python installation, this should throw a `ModuleNotFoundError` which proves we have some external dependencies that are not installed and we need to get through a package manager
  - Depending on what learners have in their `PYTHONPATH` and site packages for their current default environment, they may or may not have success with this command
  - Take a look at the top of the views file to see the other dependencies: `head inflammation/views.py`
- Before jumping to install matplotlib and numpy, it is worth a thought about other projects we might be currently be working on or in the future
  - what if they have a requirement for a different version of numpy or matplotlib? or a different python version?
  - in general, each project is going to have its own unique configuration and set of dependencies
  - to solve this, we set up a virtual environment for each project, containing a specific python version and set of libraries that won't interact with others on the system
- TODO create more notes about virtual envs and package managers
- Get learners for follow content from "Creating a `venv` Environment" to "Exporting/Importing an Environment with `pip`"
  - remember to use sticky notes for status
- Does anyone have opinions on the naming of a virtual environment folder?
- And important thing to note with `venv` is that you can only ever use the system version of python with it
  - So, be mindful that if there is an update of your system installation then your virtual environment will stop working, and you will need to get rid of it and create a new one
  - this is the process to do that:
  ```bash
  rm -r venv/
  python3 -m venv venv
  pip install <your_dependencies> #this is probably one of the best arguments for maintaing a requirements.txt
  ```
- Now, onto the content about exporting/importing an environment
  - I think there are actually two scenarios here:
  1. If you are providing a python application (i.e. building and deploying something) or doing a project that is a scientific analysis, then it is fine to pin your dependencies as detailed here in a `requirements.txt`
  2. If you are providing a reusable library (i.e. one that might be called from someone else's code or another library) then pinning can be overly restrictive and cause issues for package managers, and it is considered bad practice to pin your dependencies like this
    - Instead, you should specify loose dependency requirements in the `install_requires=[...]` metadata of `setup.py`. A full setup.py project is outside the scope of this course, but there are many good resources on this.
    - <https://packaging.python.org/en/latest/discussions/install-requires-vs-requirements/>
    - <https://caremad.io/posts/2013/07/setup-vs-requirement/>
    - and if you want a template for Python projects that keeps `requirements.txt` and `install_requires` synced: <https://github.com/ukaea/scientific-python-cookiecutter>
  - In general, I would recommend against pinning unless necessary
- Get learners to practice exporting, then deleting their exising virtual env, the recreating it with the requirements.txt file
- Live code the "Running Python Scripts From Command Line"
  - confirm everyone gets the same error
- TODO a note about environment management for other languages
  - Ask John and Kristian about C++
  - Fortran???
  - The "nuclear" option is to develop in a Docker container and specify the environment with a Dockerfile
    - However, this might not be possible for a variety of reasons: performance and developing on a cluster

# [Integrated Development Environments](http://0.0.0.0:4000/04-ides/index.html)

- Most of us probably started out programming with a simple text editor and ran our programs from the command line with a compiler or interpreter
  - This is fine to start off, but as our projects become more complex with more files and configurations, it natural that the tools we use to develop need to evolve as well
  - Enter the Integrated Development Environment (IDE)
- Preference for Code Editors and IDEs is one of the more contentious and strongly felt topics among software developers, but the bottom line is that if a tool works for you and helps you be productive, then it is absolutely fine to use that tool
  - But again, for the practicalities of this course, the decision to use PyCharm has been made
  - If you are comfortable enough in another IDE or code editor to get the functionality demonstrated in the content below, then please feel free to use that tool here, but this is a disclaimer that we cannot promise to resolve any issues you have, and if these issues are holding the group up then we will need to move on
- TODO in the intro email, I should make it clear that if they want to use a different IDE, then they should read this section in advance and make sure that they can set up analogous functionality in their IDE
- Let learners read through and try out content from "Using the PyCharm IDE" (~ 30mins)
  - TODO I am currently going through this and at this point: http://0.0.0.0:4000/13-ides/index.html#adding-an-external-library