Skip to content

Latest commit

 

History

History
489 lines (290 loc) · 13.1 KB

packages.rst

File metadata and controls

489 lines (290 loc) · 13.1 KB

Packages

Learning objectives

  • practice to determine the version of a Python package
  • practice to determine that a Python package is not installed
  • practice to have loaded a Python machine learning module
  • practice to install a Python package

For teachers

Teaching goals are:

  • Learners have determined the version of a Python package
  • Learners have determined that a Python package is not installed
  • Learners have loaded a Python machine learning module
  • Learners have installed a Python package

Lesson plan (45 minutes in total):

  • 5 mins: prior knowledge
    • What are Python packages?
    • Why use Python packages?
    • How to find out if a package is already installed?
    • What are some Python package installers?
    • What are the differences?
    • What are some Python package installers used on UPPMAX?
    • What are some Python package installers used on HPC2N?
  • 5 mins: presentation
  • 20 mins: challenge
  • 5 mins: feedback
    • What are Python packages?
    • Why use Python packages?
    • How to find out if a package is already installed?
    • What are some Python package installers?
    • What are the differences?
    • What are some Python package installers used on UPPMAX?
    • What are some Python package installers used on HPC2N?

Compute allocations in this workshop

  • Rackham: naiss2024-22-107
  • Kebnekaise: hpc2n2024-025

Storage space for this workshop

  • Rackham: /proj/r-py-jl
  • Kebnekaise: /proj/nobackup/hpc2n2024-025

Introduction

Packages are pieces of Python code written to be used by others. When possible, using an existing Python package is usually smarter than writing code yourself. In this session, we practice working with packages.

Finding packages

package_user_journey.mmd

The most common Python packages come installed when loading a regular Python module. Some of the more complex packages, are part of a module for more complex Python packages. If a package is not installed, however, you can also install it.

Python package installers

package_installers.mmd

There are two Python package installers, called conda and pip.

In this session, we use pip, as it can be used on the two HPC clusters used in this course:

Package installer HPC2N UPPMAX (Rackham) UPPMAX (Bianca)
conda Unsupported [1] Recommended Recommended
pip Recommended Supported Unsupported [2]

In this session we use pip, because it is a commonly-used package installation system that works on both HPC clusters used in this course.

We have not scheduled to discuss Conda in this course, yet teaching materials can be found at Conda at UPPMAX.

As a first impression, here is a simple comparison between the two:

Parameter conda pip
Installs Python packages Yes Yes
Installs non-Python software Yes No

In this session, we will install packages to your default user folder. Because this one default user folder, installing a different version of one package for one computational experiment, may have consequences for others. These problems are addressed in the session on isolated environments.

Exercises

These exercises follow a common user journey, for a user that needs to use a certain Python packages:

  • In exercise 1, we determine if a Python package is already installed
  • In exercise 2, we determine if a machine learning Python package is already installed
  • If all fails, in exercise 3, we install a Python package ourselves

Like any user, we'll try to be autonomous and read the -hopefully well written!-UPPMAX documentation.

Exercise 1

Learning objectives

  • Practice reading documentation
  • Apply/rehearse the documentation to load a module
  • Apply the documentation to show if a Python package is already installed
  • Observe how it looks like when a package is not installed

Imagine you want to use the Python packages pandas and tensorflow-cpu and mhcnuggets. Here we see that one comes already installed with the module system.

Exercise 1.1

Read the UPPMAX documentation on how to load Python.

Then do:

  • HPC2N: load the modules GCC/12.3.0 and Python 3.11.3
  • UPPMAX: load the module python/3.11.8

Answer HPC2N

Do:

module load GCC/12.3.0 Python/3.11.3

Answer UPPMAX

Do:

module load python/3.11.8

Exercise 1.2

Read the UPPMAX documentation on how to determine if a Python package comes with your Python module.

Is the Python package pandas installed? If yes, which version?

Answer HPC2N

Do:

pip list

So for HPC2N you need to load pandas as a separate module or as part of SciPy-bundle.

Answer UPPMAX

Do:

pip list

Then among the list one can find: pandas 2.2.0

So, yes, the Python package pandasversion 2.2.0 is installed!

Exercise 1.3

Is the Python package tensorflow-cpu installed? If yes, which version?

Answer HPC2N

Do:

pip list

Answer UPPMAX

Do:

pip list

In the list, one cannot find tensorflow-cpu.

So, no, the Python package tensorflow-cpu is not installed.

Exercise 1.4

Is the Python package mhcnuggets installed? If yes, which version?

Answer HPC2N

Do:

pip list

Answer UPPMAX

Do:

pip list

In the list, one cannot find mhcnuggets.

So, no, the Python package mhcnuggets is not installed.

Exercise 2

Learning objectives

  • Practice reading documentation
  • Rehearse the documentation to load a Python machine learning module
  • Apply the documentation to show if a Python package is already installed
  • Observe how it looks like when a package is not installed

Imagine you want to use the Python packages pandas and tensorflow-cpu and mhcnuggets. Here we see that two come already installed with a Python machine learning module.

Exercise 2.1

Read:

Do:

  • UPPMAX: Which of the versions should you use? Load the latest Python machine learning module for that version.
  • HPC2N: Load the latest module

Answer HPC2N

UNTESTED

TensorFlow for CPU is installed as a module that is compatible with Python/3.11.3. The relevant TensorFlow version is 2.13.0.

Find the latest module:

module spider tensorflow

Load the latest module with placeholder version 2.13.0 which is compatible with Python 3.11.3. Note that there are prerequisites

module load GCC/12.3.0 OpenMPI/4.1.5 TensorFlow/2.13.0

Answer UPPMAX

Rackham only has CPUs, hence you will need to load the cpu module:

Do:

module load python_ML_packages/3.11.8-cpu

Exercise 2.2

Read the UPPMAX documentation on how to determine if a Python package comes with your Python module.

Is the Python package pandas installed? If yes, which version?

Answer HPC2N

Do:

pip list

You need to load SciPy-bundle with prerequisites. Here for the one that is compatible with Python 3.11.3

ml GCC/12.3.0 SciPy-bundle/2023.07

If you do pip list now you will see that pandas/2.0.3 is available.

Answer UPPMAX

Do:

pip list

Then among the list one can find: pandas 2.2.0

So, yes, the Python package pandasversion 2.2.0 is installed!

Exercise 2.3

Answer:

  • HPC2N: Is the Python package tensorflow-cpu installed? If yes, which version?
  • UPPMAX: Is the Python package tensorflow-cpu installed? If yes, which version?

Answer HPC2N

UNTESTED

Do:

pip list

Answer UPPMAX

Do:

pip list

In the list, one can find tensorflow-cpu, with version 2.15.0.post1.

So, yes, the Python package tensorflow-cpu is installed.

Exercise 2.4

Is the Python package mhcnuggets installed? If yes, which version?

Answer HPC2N

Do:

pip list

Answer UPPMAX

Do:

pip list

In the list, one cannot find mhcnuggets.

So, no, the Python package mhcnuggets is not installed.

Exercise 3

Learning objectives

  • Practice reading documentation
  • Install a new package.
  • Rehearse determining if a Python package is already installed

Imagine you want to use the Python packages pandas and tensorflow-cpu and mhcnuggets. Even when loading a bigger module, one of the packages was not installed for us. Here we install a Python package ourselves.

Exercise 3.1

Read the UPPMAX documentation on how to install Python packages using pip.

We will be using the first install with --user.

In which folder do the Python packages end up?

Try to come up with a reason why would this be important to know.

Answer

When using --user, your Python packages end up in the .local folder.

This can be important, because it will always be present. That is, it is not part of an isolated environment. If you, for example, work in an 'isolated' environment and run into problems with Python package versions that are not part of it, it is probably those packages in your .local folder. This can be solved by removing that .local folder.

Note that on UPPMAX, one can omit the --user flag, as it is added automatically, as is shown in a warning.

Exercise 3.2

Install the package mhcnuggets.

Answer

Do:

pip install --user mhcnuggets

Exercise 3.3

Confirm that the Python package mhcnuggets is installed now. Which version has been installed?

Answer

Do:

pip list

In the list, one can find mhcnuggets, with version 2.4.1

So, yes, the Python package mhcnuggets is now installed!

Conclusion

You have:

  • determined if a Python package is installed yes/no using pip
  • discovered some Python package are already installed upon loading a module
  • installed a Python package using pip

However, the installed package was put into a shared (as in, not isolated) environment.

Luckily, isolated environments are discussed in this course too :-)