From e503272ad0d6dc0da32a78a51e94857290a2c5d1 Mon Sep 17 00:00:00 2001 From: Mira Date: Tue, 14 Jan 2025 14:21:48 +0000 Subject: [PATCH 01/20] start course with optimisation section and finish with profiling tools --- config.yaml | 19 +++++++++---------- episodes/profiling-introduction.md | 2 +- index.md | 17 ++++++++--------- 3 files changed, 18 insertions(+), 20 deletions(-) diff --git a/config.yaml b/config.yaml index 117da0c9..a0ffb9a9 100644 --- a/config.yaml +++ b/config.yaml @@ -11,7 +11,7 @@ carpentry: 'incubator' # Overall title for pages. -title: 'Performance Profiling & Optimisation (Python)' +title: 'Python Optimisation and Performance Profiling' # Date the lesson was created (YYYY-MM-DD, this is empty by default) created: 2024-02-01~ # FIXME @@ -27,13 +27,13 @@ life_cycle: 'alpha' license: 'CC-BY 4.0' # Link to the source repository for this lesson -source: 'https://github.com/RSE-Sheffield/pando-python' +source: 'https://github.com/ICR-RSE-Group/carpentry-pando-python' # Default branch of your lesson branch: 'main' # Who to contact if there are any issues -contact: 'robert.chisholm@sheffield.ac.uk' +contact: 'mira.sarkis@icr.ac.uk' # Navigation ------------------------------------------------ # @@ -59,18 +59,17 @@ contact: 'robert.chisholm@sheffield.ac.uk' # Order of episodes in your lesson episodes: -- profiling-introduction.md -- profiling-functions.md -- short-break1.md -- profiling-lines.md -- profiling-conclusion.md - optimisation-introduction.md - optimisation-data-structures-algorithms.md -- long-break1.md - optimisation-minimise-python.md - optimisation-use-latest.md - optimisation-memory.md - optimisation-conclusion.md +- long-break1.md +- profiling-introduction.md +- profiling-functions.md +- profiling-lines.md +- profiling-conclusion.md # Information for Learners learners: @@ -92,4 +91,4 @@ profiles: # sandpaper and varnish versions) should live varnish: RSE-Sheffield/uos-varnish@main -url: 'https://rse.shef.ac.uk/pando-python' +url: 'https://icr-rse-group.github.io/carpentry-pando-python' diff --git a/episodes/profiling-introduction.md b/episodes/profiling-introduction.md index ad58dc4e..84915614 100644 --- a/episodes/profiling-introduction.md +++ b/episodes/profiling-introduction.md @@ -41,7 +41,7 @@ Increasingly, particularly with relation to HPC, attention is being paid to the Profiling is most relevant to working code, when you have reached a stage that the code works and are considering deploying it. -Any code that will run for more than a few minutes over it's lifetime, that isn't a quick one-shot script can benefit from profiling. +Any code that will run for more than a few minutes over its lifetime, that isn't a quick one-shot script can benefit from profiling. Profiling should be a relatively quick and inexpensive process. If there are no significant bottlenecks in your code you can quickly be confident that your code is reasonably optimised. If you do identify a concerning bottleneck, further work to optimise your code and reduce the bottleneck could see significant improvements to the performance of your code and hence productivity. diff --git a/index.md b/index.md index af9940b6..ae27b5a0 100644 --- a/index.md +++ b/index.md @@ -3,18 +3,17 @@ site: sandpaper::sandpaper_site --- -**Welcome to Performance Profiling & Optimisation (Python) Training!** +**Welcome to Python Optimisation and Performance Profiling Training!** The training curriculum for this course is designed for researchers that are writing Python and lack formal computer science training. The curriculum covers how to assess where time is being spent during execution of a Python program, it also provides a high level understanding of how code executes and how this maps to the limiting factors of performance and good practice. -If you are now comfortable using Python, this course may be of interest to supplement and advance your programming knowledge. This course is particularly relevant if you are writing research code and desire greater confidence that your code is both performant and suitable for publication. - +If you are now comfortable using Python, this course may be of interest to supplement and advance your programming knowledge. This course is particularly relevant if you are writing from scratch or re-using and existing research code and would desire a greater confidence that your code is both performant and suitable for publication. This is an all-day course, however it normally finishes by early afternoon. If you would like to register to take the course, check the [registration information](learners/registration.md). @@ -34,20 +33,20 @@ If you would like to register to take the course, check the [registration inform After attending this training, participants will be able to: -- identify the most expensive functions and lines of code using `cprofile` and `line_profiler`. -- evaluate code to determine the limiting factors of it's performance. - recognise and implement optimisations for common limiting factors of performance. +- identify the most expensive functions and lines of code using `cprofile` and `line_profiler`. +- evaluate code to determine the limiting factors of its performance. :::::::::::::::::::::::::::::::::::::::::: prereq ## Prerequisites -Before joining Performance Profiling & Optimisation (Python) Training, participants should be able to: +Before joining Python Optimisation and Performance Profiling Training, participants should be able to: - implement basic algorithms in Python. - follow the control flow of Python code, and dry run the execution in their head or on paper. -See the [Research Computing Training Hub](https://sites.google.com/sheffield.ac.uk/research-training/research-training) for other courses to help with learning these skills. +See the [Python novice carpentry ](https://icr-rse-group.github.io/carpentry-python-novice/instructor/index.html) for another course to help with learning these skills. :::::::::::::::::::::::::::::::::::::::::::::::::: From 618fb805b400bb64c3e354db569e8b0e83792a94 Mon Sep 17 00:00:00 2001 From: Mira Date: Tue, 14 Jan 2025 15:54:49 +0000 Subject: [PATCH 02/20] change the course narrative to start with the optimisation --- episodes/long-break1.md | 2 +- episodes/optimisation-introduction.md | 22 +++++++++++----------- episodes/profiling-introduction.md | 3 +++ learners/setup.md | 8 ++++++-- 4 files changed, 21 insertions(+), 14 deletions(-) diff --git a/episodes/long-break1.md b/episodes/long-break1.md index d28a91bd..2bda1dad 100644 --- a/episodes/long-break1.md +++ b/episodes/long-break1.md @@ -1,5 +1,5 @@ --- -title: Break +title: Lunch Break teaching: 0 exercises: 0 break: 60 diff --git a/episodes/optimisation-introduction.md b/episodes/optimisation-introduction.md index 1650962a..b9dddd9c 100644 --- a/episodes/optimisation-introduction.md +++ b/episodes/optimisation-introduction.md @@ -18,37 +18,37 @@ exercises: 0 ## Introduction - -Now that you're able to find the most expensive components of your code with profiling, it becomes time to learn how to identify whether that expense is reasonable. - + +Think about optimization as the first step on your journey to writing high-performance code. +It’s like a race: the faster you can go without taking unnecessary detours, the better. +Code optmisation is all about understanding the principles of efficiency in Python and being conscious of how small changes can yield massive improvements. + -In order to optimise code for performance, it is necessary to have an understanding of what a computer is doing to execute it. +These are the first steps in code optimization: making better choices as you write your code and have an understanding of what a computer is doing to execute it. -Even a high-level understanding of how you code executes, such as how Python and the most common data-structures and algorithms are implemented, can help you to identify suboptimal approaches when programming. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way. +A high-level understanding of how your code executes, such as how Python and the most common data-structures and algorithms are implemented, can help you identify suboptimal approaches when programming. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way. The remaining content is often abstract knowledge, that is transferable to the vast majority of programming languages. This is because the hardware architecture, data-structures and algorithms used are common to many languages and they hold some of the greatest influence over performance bottlenecks. -## Premature Optimisation +## Optimising code from scratch: trade-off between performance and maintainability > Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: **premature optimization is the root of all evil**. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth This classic quote among computer scientists states; when considering optimisation it is important to focus on the potential impact, both to the performance and maintainability of the code. -Profiling is a valuable tool in this cause. Should effort be expended to optimise a component which occupies 1% of the runtime? Or would that time be better spent focusing on the most expensive components? - Advanced optimisations, mostly outside the scope of this course, can increase the cost of maintenance by obfuscating what code is doing. Even if you are a solo-developer working on private code, your future self should be able to easily comprehend your implementation. Therefore, the balance between the impact to both performance and maintainability should be considered when optimising code. -This is not to say, don't consider performance when first writing code. The selection of appropriate algorithms and data-structures covered in this course form good practice, simply don't fret over a need to micro-optimise every small component of the code that you write. +This is not to say, don't consider performance when first writing code. The selection of appropriate algorithms and data-structures covered in this course form a good practice, simply don't fret over a need to micro-optimise every small component of the code that you write. -## Ensuring Reproducible Results +## Ensuring Reproducible Results when optimising an existing code -When optimising your code, you are making speculative changes. It's easy to make mistakes, many of which can be subtle. Therefore, it's important to have a strategy in place to check that the outputs remain correct. +When optimising an existing code, you are making speculative changes. It's easy to make mistakes, many of which can be subtle. Therefore, it's important to have a strategy in place to check that the outputs remain correct. Testing is hopefully already a seamless part of your research software development process. Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality. diff --git a/episodes/profiling-introduction.md b/episodes/profiling-introduction.md index 84915614..af8a0c41 100644 --- a/episodes/profiling-introduction.md +++ b/episodes/profiling-introduction.md @@ -22,6 +22,9 @@ exercises: 10 ## Introduction + +But what if, despite your best efforts, performance still isn’t up to par? This is where profiling comes into play — and it’s a game-changer. You can’t always guess what’s slow. Profiling helps you see hidden inefficiencies that might be buried deep within the code. + Performance profiling is the process of analysing and measuring the performance of a program or script, to understand where time is being spent during execution. diff --git a/learners/setup.md b/learners/setup.md index 961e09d0..b478c26c 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -22,8 +22,12 @@ This course uses Python and was developed using Python 3.11, therefore it is rec You may want to create a new Python virtual environment for the course, this can be done with your preferred Python environment manager (e.g. `conda`, `pipenv`), the required packages can all be installed via `pip`. - +To create a new Anaconda environment named `py311_env` with Python 3.11, use the following command for conda: + +```bash + conda create --name py311_env python=3.11 + conda activate py311_env +``` The non-core Python packages required by the course are `pytest`, `snakeviz`, `line_profiler`, `numpy`, `pandas` and `matplotlib` which can be installed via `pip`. From c074680b1b9e71d718e79c49546a62504ca019b5 Mon Sep 17 00:00:00 2001 From: Mira Date: Tue, 14 Jan 2025 16:29:00 +0000 Subject: [PATCH 03/20] remove logo by disconnecting varnish --- config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config.yaml b/config.yaml index a0ffb9a9..60e57ad7 100644 --- a/config.yaml +++ b/config.yaml @@ -90,5 +90,5 @@ profiles: # This space below is where custom yaml items (e.g. pinning # sandpaper and varnish versions) should live -varnish: RSE-Sheffield/uos-varnish@main -url: 'https://icr-rse-group.github.io/carpentry-pando-python' +#varnish: RSE-Sheffield/uos-varnish@main +#url: 'https://icr-rse-group.github.io/carpentry-pando-python' From a7a09cacdade0379fde0a04673f6ec1ed849d466 Mon Sep 17 00:00:00 2001 From: Mira Date: Tue, 14 Jan 2025 16:37:10 +0000 Subject: [PATCH 04/20] change carpentry type --- config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config.yaml b/config.yaml index 60e57ad7..9886ea23 100644 --- a/config.yaml +++ b/config.yaml @@ -3,12 +3,12 @@ #------------------------------------------------------------ # Which carpentry is this (swc, dc, lc, or cp)? -# swc: Software Carpentry +# swc: Software Carpentry - # dc: Data Carpentry # lc: Library Carpentry # cp: Carpentries (to use for instructor training for instance) # incubator: The Carpentries Incubator -carpentry: 'incubator' +carpentry: 'swc' # Overall title for pages. title: 'Python Optimisation and Performance Profiling' From c73399a737d1653747ae89a6dfe4446dc1c6ab39 Mon Sep 17 00:00:00 2001 From: Anastasiia Shcherbakova Date: Tue, 21 Jan 2025 14:33:06 +0000 Subject: [PATCH 05/20] Fixed US/British spelling mix-up and reduced the number of new paragraphs --- episodes/optimisation-introduction.md | 28 ++++++--------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/episodes/optimisation-introduction.md b/episodes/optimisation-introduction.md index b9dddd9c..5f9bafec 100644 --- a/episodes/optimisation-introduction.md +++ b/episodes/optimisation-introduction.md @@ -19,12 +19,12 @@ exercises: 0 ## Introduction -Think about optimization as the first step on your journey to writing high-performance code. +Think about optimisation as the first step on your journey to writing high-performance code. It’s like a race: the faster you can go without taking unnecessary detours, the better. Code optmisation is all about understanding the principles of efficiency in Python and being conscious of how small changes can yield massive improvements. -These are the first steps in code optimization: making better choices as you write your code and have an understanding of what a computer is doing to execute it. +These are the first steps in code optimisation: making better choices as you write your code and have an understanding of what a computer is doing to execute it. A high-level understanding of how your code executes, such as how Python and the most common data-structures and algorithms are implemented, can help you identify suboptimal approaches when programming. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way. @@ -34,41 +34,25 @@ The remaining content is often abstract knowledge, that is transferable to the v ## Optimising code from scratch: trade-off between performance and maintainability -> Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: **premature optimization is the root of all evil**. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth +> Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: **premature optimisation is the root of all evil**. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth -This classic quote among computer scientists states; when considering optimisation it is important to focus on the potential impact, both to the performance and maintainability of the code. - -Advanced optimisations, mostly outside the scope of this course, can increase the cost of maintenance by obfuscating what code is doing. Even if you are a solo-developer working on private code, your future self should be able to easily comprehend your implementation. - -Therefore, the balance between the impact to both performance and maintainability should be considered when optimising code. +This classic quote among computer scientists states; when considering optimisation it is important to focus on the potential impact, both to the performance and maintainability of the code. Advanced optimisations, mostly outside the scope of this course, can increase the cost of maintenance by obfuscating what code is doing. Even if you are a solo-developer working on private code, your future self should be able to easily comprehend your implementation. Therefore, the balance between the impact to both performance and maintainability should be considered when optimising code. This is not to say, don't consider performance when first writing code. The selection of appropriate algorithms and data-structures covered in this course form a good practice, simply don't fret over a need to micro-optimise every small component of the code that you write. - ## Ensuring Reproducible Results when optimising an existing code When optimising an existing code, you are making speculative changes. It's easy to make mistakes, many of which can be subtle. Therefore, it's important to have a strategy in place to check that the outputs remain correct. -Testing is hopefully already a seamless part of your research software development process. -Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality. - -There are a plethora of methods for testing code. +Testing is hopefully already a seamless part of your research software development process. Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality. ## pytest Overview -Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/), it's a great place to get started if you're new to testing code. +There are a plethora of methods for testing code. Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/), it's a great place to get started if you're new to testing code. Tests should be created within a project's testing directory, by creating files named with the form `test_*.py` or `*_test.py`. pytest looks for these files, when running the test suite. Within the created test file, any functions named in the form `test*` are considered tests that will be executed by pytest. The `assert` keyword is used, to test whether a condition evaluates to `True`. Here's a quick example of how a test can be used to check your function's output against an expected value. -Tests should be created within a project's testing directory, by creating files named with the form `test_*.py` or `*_test.py`. - -pytest looks for these files, when running the test suite. - -Within the created test file, any functions named in the form `test*` are considered tests that will be executed by pytest. - -The `assert` keyword is used, to test whether a condition evaluates to `True`. - ```python # file: test_demonstration.py From 010a27347f1573b0a8f9ec681b7d85537b6ed188 Mon Sep 17 00:00:00 2001 From: Anastasiia Shcherbakova Date: Tue, 21 Jan 2025 15:15:03 +0000 Subject: [PATCH 06/20] Fixed comma issues and fixed grammar in the second last paragraph --- .../optimisation-data-structures-algorithms.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/episodes/optimisation-data-structures-algorithms.md b/episodes/optimisation-data-structures-algorithms.md index 0c1c8d35..30b1a0c4 100644 --- a/episodes/optimisation-data-structures-algorithms.md +++ b/episodes/optimisation-data-structures-algorithms.md @@ -65,7 +65,7 @@ CPython for example uses [`newsize + (newsize >> 3) + 6`](https://github.com/pyt This has two implications: -* If you are creating large static lists, they will use upto 12.5% excess memory. +* If you are creating large static lists, they will use up to 12.5% excess memory. * If you are growing a list with `append()`, there will be large amounts of redundant allocations and copies as the list grows. ### List Comprehension @@ -165,7 +165,7 @@ To retrieve or check for the existence of a key within a hashing data structure, ### Keys -Keys will typically be a core Python type such as a number or string. However multiple of these can be combined as a Tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented. +Keys will typically be a core Python type such as a number or string. However, multiple of these can be combined as a Tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented. You can implement `__hash__()` by utilising the ability for Python to hash tuples, avoiding the need to implement a bespoke hash function. @@ -265,7 +265,7 @@ Constructing a set with a loop and `add()` (equivalent to a list's `append()`) c The naive list approach is 2200x times slower than the fastest approach, because of how many times the list is searched. This gap will only grow as the number of items increases. -Sorting the input list reduces the cost of searching the output list significantly, however it is still 8x slower than the fastest approach. In part because around half of it's runtime is now spent sorting the list. +Sorting the input list reduces the cost of searching the output list significantly, however it is still 8x slower than the fastest approach. In part because around half of its runtime is now spent sorting the list. ```output uniqueSet: 0.30ms @@ -280,9 +280,9 @@ uniqueListSort: 2.67ms Independent of the performance to construct a unique set (as covered in the previous section), it's worth identifying the performance to search the data-structure to retrieve an item or check whether it exists. -The performance of a hashing data structure is subject to the load factor and number of collisions. An item that hashes with no collision can be checked almost directly, whereas one with collisions will probe until it finds the correct item or an empty slot. In the worst possible case, whereby all insert items have collided this would mean checking every single item. In practice, hashing data-structures are designed to minimise the chances of this happening and most items should be found or identified as missing with a single access. +The performance of a hashing data structure is subject to the load factor and number of collisions. An item that hashes with no collision can be checked almost directly, whereas one with collisions will probe until it finds the correct item or an empty slot. In the worst possible case, whereby all insert items have collided this would mean checking every single item. In practice, hashing data-structures are designed to minimise the chances of this happening and most items should be found or identified as missing with single access. -In contrast if searching a list or array, the default approach is to start at the first item and check all subsequent items until the correct item has been found. If the correct item is not present, this will require the entire list to be checked. Therefore the worst-case is similar to that of the hashing data-structure, however it is guaranteed in cases where the item is missing. Similarly, on-average we would expect an item to be found half way through the list, meaning that an average search will require checking half of the items. +In contrast, if searching a list or array, the default approach is to start at the first item and check all subsequent items until the correct item has been found. If the correct item is not present, this will require the entire list to be checked. Therefore, the worst-case is similar to that of the hashing data-structure, however it is guaranteed in cases where the item is missing. Similarly, on-average we would expect an item to be found halfway through the list, meaning that an average search will require checking half of the items. If however the list or array is sorted, a binary search can be used. A binary search divides the list in half and checks which half the target item would be found in, this continues recursively until the search is exhausted whereby the item should be found or dismissed. This is significantly faster than performing a linear search of the list, checking a total of `log N` items every time. @@ -333,9 +333,7 @@ print(f"linear_search_list: {timeit(linear_search_list, number=repeats)-gen_time print(f"binary_search_list: {timeit(binary_search_list, number=repeats)-gen_time:.2f}ms") ``` -Searching the set is fastest performing 25,000 searches in 0.04ms. -This is followed by the binary search of the (sorted) list which is 145x slower, although the list has been filtered for duplicates. A list still containing duplicates would be longer, leading to a more expensive search. -The linear search of the list is more than 56,600x slower than the fastest, it really shouldn't be used! +Searching the set is the fastest, performing 25,000 searches in 0.04ms. This is followed by the binary search of the (sorted) list which is 145x slower, although the list has been filtered for duplicates. A list still containing duplicates would be longer, leading to a more expensive search. The linear search of the list is more than 56,600x slower than searching the set, it really shouldn't be used! ```output search_set: 0.04ms From d18ba1c902cf2ae077ca1050ad7ed5caea6757e7 Mon Sep 17 00:00:00 2001 From: Anastasiia Shcherbakova Date: Wed, 22 Jan 2025 13:30:00 +0000 Subject: [PATCH 07/20] Fix spelling mistakes and changed output/description order in vectorisation --- episodes/optimisation-minimise-python.md | 32 ++++++++++++------------ 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/episodes/optimisation-minimise-python.md b/episodes/optimisation-minimise-python.md index 0e2666f6..5ab4e983 100644 --- a/episodes/optimisation-minimise-python.md +++ b/episodes/optimisation-minimise-python.md @@ -20,7 +20,7 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::::::::::::: -Python is an interpreted programming language. When you execute your `.py` file, the (default) CPython back-end compiles your Python source code to an intermediate bytecode. This bytecode is then interpreted in software at runtime generating instructions for the processor as necessary. This interpretation stage, and other features of the language, harm the performance of Python (whilst improving it's usability). +Python is an interpreted programming language. When you execute your `.py` file, the (default) CPython back-end compiles your Python source code to an intermediate bytecode. This bytecode is then interpreted in software at runtime generating instructions for the processor as necessary. This interpretation stage, and other features of the language, harm the performance of Python (whilst improving its usability). In comparison, many languages such as C/C++ compile directly to machine code. This allows the compiler to perform low-level optimisations that better exploit hardware nuance to achieve fast performance. This however comes at the cost of compiled software not being cross-platform. @@ -28,7 +28,7 @@ Whilst Python will rarely be as fast as compiled languages like C/C++, it is pos A simple example of this would be to perform a linear search of a list (in the previous episode we did say this is not recommended). The below example creates a list of 2500 integers in the inclusive-exclusive range `[0, 5000)`. -It then searches for all of the even numbers in that range. +It then searches for all the even numbers in that range. `searchlistPython()` is implemented manually, iterating `ls` checking each individual item in Python code. `searchListC()` in contrast uses the `in` operator to perform each search, which allows CPython to implement the inner loop in it's C back-end. @@ -281,7 +281,7 @@ In particular, those which are passed an `iterable` (e.g. lists) are likely to p ::::::::::::::::::::::::::::::::::::: callout -The built-in functions [`filter()`](https://docs.python.org/3/library/functions.html#filter) and [`map()`](https://docs.python.org/3/library/functions.html#map) can be used for processing iterables However list-comprehension is likely to be more performant. +The built-in functions [`filter()`](https://docs.python.org/3/library/functions.html#filter) and [`map()`](https://docs.python.org/3/library/functions.html#map) can be used for processing iterables. However, list-comprehension is likely to be more performant. @@ -292,11 +292,11 @@ The built-in functions [`filter()`](https://docs.python.org/3/library/functions. [NumPy](https://numpy.org/) is a commonly used package for scientific computing, which provides a wide variety of methods. -It adds restriction via it's own [basic numeric types](https://numpy.org/doc/stable/user/basics.types.html), and static arrays to enable even greater performance than that of core Python. However if these restrictions are ignored, the performance can become significantly worse. +It adds restriction via its own [basic numeric types](https://numpy.org/doc/stable/user/basics.types.html), and static arrays to enable even greater performance than that of core Python. However, if these restrictions are ignored, the performance can become significantly worse. ### Arrays -NumPy's arrays (not to be confused with the core Python `array` package) are static arrays. Unlike core Python's lists, they do not dynamically resize. Therefore if you wish to append to a NumPy array, you must call `resize()` first. If you treat this like `append()` for a Python list, resizing for each individual append you will be performing significantly more copies and memory allocations than a Python list. +NumPy's arrays (not to be confused with the core Python `array` package) are static arrays. Unlike core Python's lists, they do not dynamically resize. Therefore, if you wish to append to a NumPy array, you must call `resize()` first. If you treat this like `append()` for a Python list, resizing for each individual append you will be performing significantly more copies and memory allocations than a Python list. The below example sees lists and arrays constructed from `range(100000)`. @@ -390,7 +390,7 @@ There is however a trade-off, using `numpy.random.choice()` can be clearer to so ### Vectorisation -The manner by which NumPy stores data in arrays enables it's functions to utilise vectorisation, whereby the processor executes one instruction across multiple variables simultaneously, for every mathematical operation between arrays. +The manner by which NumPy stores data in arrays enables its functions to utilise vectorisation, whereby the processor executes one instruction across multiple variables simultaneously, for every mathematical operation between arrays. Earlier in this episode it was demonstrated that using core Python methods over a list, will outperform a loop performing the same calculation faster. The below example takes this a step further by demonstrating the calculation of dot product. @@ -416,11 +416,6 @@ print(f"numpy_sum_array: {timeit(np_sum_ar, setup=gen_array, number=repeats):.2f print(f"numpy_dot_array: {timeit(np_dot_ar, setup=gen_array, number=repeats):.2f}ms") ``` -* `python_sum_list` uses list comprehension to perform the multiplication, followed by the Python core `sum()`. This comes out at 46.93ms -* `python_sum_array` instead directly multiplies the two arrays, taking advantage of NumPy's vectorisation. But uses the core Python `sum()`, this comes in slightly faster at 33.26ms. -* `numpy_sum_array` again takes advantage of NumPy's vectorisation for the multiplication, and additionally uses NumPy's `sum()` implementation. These two rounds of vectorisation provide a much faster 1.44ms completion. -* `numpy_dot_array` instead uses NumPy's `dot()` to calculate the dot product in a single operation. This comes out the fastest at 0.29ms, 162x faster than `python_sum_list`. - ```output python_sum_list: 46.93ms python_sum_array: 33.26ms @@ -428,6 +423,11 @@ numpy_sum_array: 1.44ms numpy_dot_array: 0.29ms ``` +* `python_sum_list` uses list comprehension to perform the multiplication, followed by the Python core `sum()`. This comes out at 46.93ms +* `python_sum_array` instead directly multiplies the two arrays, taking advantage of NumPy's vectorisation. But uses the core Python `sum()`, this comes in slightly faster at 33.26ms. +* `numpy_sum_array` again takes advantage of NumPy's vectorisation for the multiplication, and additionally uses NumPy's `sum()` implementation. These two rounds of vectorisation provide a much faster 1.44ms completion. +* `numpy_dot_array` instead uses NumPy's `dot()` to calculate the dot product in a single operation. This comes out the fastest at 0.29ms, 162x faster than `python_sum_list`. + ::::::::::::::::::::::::::::::::::::: callout ## Parallel NumPy @@ -439,7 +439,7 @@ A small number of functions are backed by BLAS and LAPACK, enabling even greater The [supported functions](https://numpy.org/doc/stable/reference/routines.linalg.html) mostly correspond to linear algebra operations. -The auto-parallelisation of these functions is hardware dependant, so you won't always automatically get the additional benefit of parallelisation. +The auto-parallelisation of these functions is hardware-dependent, so you won't always automatically get the additional benefit of parallelisation. However, HPC systems should be primed to take advantage, so try increasing the number of cores you request when submitting your jobs and see if it improves the performance. *This might be why `numpy_dot_array` is that much faster than `numpy_sum_array` in the previous example!* @@ -449,7 +449,7 @@ However, HPC systems should be primed to take advantage, so try increasing the n ### `vectorize()` Python's `map()` was introduced earlier, for applying a function to all elements within a list. -NumPy provides `vectorize()` an equivalent for operating over it's arrays. +NumPy provides `vectorize()` an equivalent for operating over its arrays. This doesn't actually make use of processor-level vectorisation, from the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html): @@ -497,7 +497,7 @@ Pandas' methods by default operate on columns. Each column or series can be thou Following the theme of this episode, iterating over the rows of a data frame using a `for` loop is not advised. The pythonic iteration will be slower than other approaches. -Pandas allows it's own methods to be applied to rows in many cases by passing `axis=1`, where available these functions should be preferred over manual loops. Where you can't find a suitable method, `apply()` can be used, which is similar to `map()`/`vectorize()`, to apply your own function to rows. +Pandas allows its own methods to be applied to rows in many cases by passing `axis=1`, where available these functions should be preferred over manual loops. Where you can't find a suitable method, `apply()` can be used, which is similar to `map()`/`vectorize()`, to apply your own function to rows. ```python from timeit import timeit @@ -571,7 +571,7 @@ vectorize: 1.48ms It won't always be possible to take full advantage of vectorisation, for example you may have conditional logic. -An alternate approach is converting your dataframe to a Python dictionary using `to_dict(orient='index')`. This creates a nested dictionary, where each row of the outer dictionary is an internal dictionary. This can then be processed via list-comprehension: +An alternate approach is converting your DataFrame to a Python dictionary using `to_dict(orient='index')`. This creates a nested dictionary, where each row of the outer dictionary is an internal dictionary. This can then be processed via list-comprehension: ```python def to_dict(): @@ -588,7 +588,7 @@ Whilst still nearly 100x slower than pure vectorisation, it's twice as fast as ` to_dict: 131.15ms ``` -This is because indexing into Pandas' `Series` (rows) is significantly slower than a Python dictionary. There is a slight overhead to creating the dictionary (40ms in this example), however the stark difference in access speed is more than enough to overcome that cost for any large dataframe. +This is because indexing into Pandas' `Series` (rows) is significantly slower than a Python dictionary. There is a slight overhead to creating the dictionary (40ms in this example), however the stark difference in access speed is more than enough to overcome that cost for any large DataFrame. ```python from timeit import timeit From ea7ba46738f25d012fb4ddd2c56170d90475db33 Mon Sep 17 00:00:00 2001 From: Anastasiia Shcherbakova Date: Wed, 22 Jan 2025 14:14:59 +0000 Subject: [PATCH 08/20] Fixed comma issues and added 'Later' specifier for one topic --- episodes/profiling-introduction.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/episodes/profiling-introduction.md b/episodes/profiling-introduction.md index af8a0c41..ed26f755 100644 --- a/episodes/profiling-introduction.md +++ b/episodes/profiling-introduction.md @@ -127,7 +127,7 @@ Therefore, it is better described as a tool for **benchmarking**. Software is typically comprised of a hierarchy of function calls, both functions written by the developer and those used from the language's standard library and third party packages. -Function-level profiling analyses where time is being spent with respect to functions. Typically function-level profiling will calculate the number of times each function is called and the total time spent executing each function, inclusive and exclusive of child function calls. +Function-level profiling analyses where time is being spent with respect to functions. Typically, function-level profiling will calculate the number of times each function is called and the total time spent executing each function, inclusive and exclusive of child function calls. This allows functions that occupy a disproportionate amount of the total runtime to be quickly identified and investigated. @@ -149,7 +149,7 @@ This will identify individual lines of code that occupy an disproportionate amou -In this course we will cover the usage of the line-level profiler `line_profiler`. +Later in this course we will cover the usage of the line-level profiler `line_profiler`. ::::::::::::::::::::::::::::::::::::: callout @@ -159,7 +159,7 @@ Line-level profiling can be particularly expensive, a program can execute hundre `line_profiler` is deterministic, meaning that it tracks every line of code executed. To avoid it being too costly, the profiling is restricted to methods targeted with the decorator `@profile`. -In contrast [`scalene`](https://github.com/plasma-umass/scalene) is a more advanced Python profiler capable of line-level profiling. It uses a sampling based approach, whereby the profiler halts and samples the line of code currently executing thousands of times per second. This reduces the cost of profiling, whilst still maintaining representative metrics for the most expensive components. +In contrast, [`scalene`](https://github.com/plasma-umass/scalene) is a more advanced Python profiler capable of line-level profiling. It uses a sampling based approach, whereby the profiler halts and samples the line of code currently executing thousands of times per second. This reduces the cost of profiling, whilst still maintaining representative metrics for the most expensive components. ::::::::::::::::::::::::::::::::::::::::::::: @@ -168,7 +168,7 @@ In contrast [`scalene`](https://github.com/plasma-umass/scalene) is a more advan Timeline profiling takes a different approach to visualising where time is being spent during execution. -Typically a subset of function-level profiling, the execution of the profiled software is instead presented as a timeline highlighting the order of function execution in addition to the time spent in each individual function call. +Typically, a subset of function-level profiling, the execution of the profiled software is instead presented as a timeline highlighting the order of function execution in addition to the time spent in each individual function call. By highlighting individual functions calls, patterns relating to how performance scales over time can be identified. These would be hidden with the aforementioned aggregate approaches. @@ -205,7 +205,7 @@ Ideally, it should take no more than a few minutes to run the profiled test-case -For example, you may have a model which normally simulates a year in hourly timesteps. +For example, you may have a model which normally simulates a year in hourly time steps. It would be appropriate to begin by profiling the simulation of a single day. If the model scales over time, such as due to population growth, it may be pertinent to profile a single day later into a simulation if the model can be resumed or configured. A larger population is likely to amplify any bottlenecks that scale with the population, making them easier to identify. @@ -218,7 +218,7 @@ A larger population is likely to amplify any bottlenecks that scale with the pop Think about a project where you've been working with Python. Do you know where the time during execution is being spent? -Write a short plan of the approach you would take to investigate and confirm where the majority of time is being spent during it's execution. +Write a short plan of the approach you would take to investigate and confirm where the majority of time is being spent during its execution. @@ -236,7 +236,7 @@ Write a short plan of the approach you would take to investigate and confirm whe ::::::::::::::::::::::::::::::::::::: keypoints - Profiling is a relatively quick process to analyse where time is being spent and bottlenecks during a program's execution. -- Code should be profiled when ready for deployment if it will be running for more than a few minutes during it's lifetime. +- Code should be profiled when ready for deployment if it will be running for more than a few minutes during its lifetime. - There are several types of profiler each with slightly different purposes. - function-level: `cProfile` (visualised with `snakeviz`) - line-level: `line_profiler` From 5a31effbe5a6b7a481011e2c4d38c23c1fa391c9 Mon Sep 17 00:00:00 2001 From: Anastasiia Shcherbakova Date: Wed, 22 Jan 2025 14:36:44 +0000 Subject: [PATCH 09/20] Fixed gammar issues --- episodes/profiling-functions.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/episodes/profiling-functions.md b/episodes/profiling-functions.md index 54ba38b6..9e99c584 100644 --- a/episodes/profiling-functions.md +++ b/episodes/profiling-functions.md @@ -46,7 +46,7 @@ As a stack it is a last-in first-out (LIFO) data structure. ![A diagram of a call stack](fig/stack.png){alt="A greyscale diagram showing a (call)stack, containing 5 stack frame. Two additional stack frames are shown outside the stack, one is marked as entering the call stack with an arrow labelled push and the other is marked as exiting the call stack labelled pop."} -When a function is called, a frame to track it's variables and metadata is pushed to the call stack. +When a function is called, a frame to track its variables and metadata is pushed to the call stack. When that same function finishes and returns, it is popped from the stack and variables local to the function are dropped. If you've ever seen a stack overflow error, this refers to the call stack becoming too large. @@ -88,7 +88,7 @@ Hence, this prints the following call stack: traceback.print_stack() ``` -The first line states the file and line number where `a()` was called from (the last line of code in the file shown). The second line states that it was the function `a()` that was called, this could include it's arguments. The third line then repeats this pattern, stating the line number where `b2()` was called inside `a()`. This continues until the call to `traceback.print_stack()` is reached. +The first line states the file and line number where `a()` was called from (the last line of code in the file shown). The second line states that it was the function `a()` that was called, this could include its arguments. The third line then repeats this pattern, stating the line number where `b2()` was called inside `a()`. This continues until the call to `traceback.print_stack()` is reached. You may see stack traces like this when an unhandled exception is thrown by your code. @@ -102,7 +102,7 @@ You may see stack traces like this when an unhandled exception is thrown by your [`cProfile`](https://docs.python.org/3/library/profile.html#instant-user-s-manual) is a function-level profiler provided as part of the Python standard library. -It can be called directly within your Python code as an imported package, however it's easier to use it's script interface: +It can be called directly within your Python code as an imported package, however it's easier to use its script interface: ```sh python -m cProfile -o