Skip to content

Latest commit

 

History

History
65 lines (42 loc) · 4.2 KB

TROUBLESHOOTING.md

File metadata and controls

65 lines (42 loc) · 4.2 KB

Known problems and solutions

  • Problem: docker run crashes

    • Seen on: Windows
    • Solution: Make sure that your machine has enough diskspace available (e.g., via spacesniffer)
  • Problem: docker run works, however the 'work' directory is empty in the Jupyer notebook

    • Seen on: Windows10, Windows11 and MacOs
    • Solution: For some students $(pwd) returns an empty string. Fix this by hardcoding the location containining the Assignments, for example:
docker run -p 8888:8888 -v "c:\users\jo\my documents\big data":/home/jovyan/work  -e JUPYTER_ENABLE_LAB=yes mtasnim/jupyter-pyspark-duckdb
  • In Windows (powershell) you should use curly bracets ${pwd} instead of parantheses $(pwd):
docker run -p 8888:8888 -v ${pwd}:/home/jovyan/work  -e JUPYTER_ENABLE_LAB=yes mtasnim/jupyter-pyspark-duckdb
  • Problem: Warning that Docker cannot be used unless Virtualization is enabled in the bios.

    • Seen on: Windows
    • Solution: Enable virtualization in the bios
  • Problem: Docker does not start on a new Mac M1

    • Seen on: MacOs (M1)
    • Solution: Make sure that you installed Docker for M1 processors, and not Docker for Intel processors.
  • Problem: Docker crashes under Ubuntu with a message like exec user process caused: exec format error

    • Seen on: Ubuntu 20
    • Solution: Follow the instructions outlined here to install additional virtualization software
  • Problem: Multiple Jupyter instances running at the same time

    • Seen on: various operating systems
    • Solution: First open localhost:8888 and exit the notebook and then run the docker container (and the Jupyter notebook inside it)
  • Problem: Docker error invalid reference format: repository name must be lowercase

  • Problem: Webbrowser halts when connecting to docker

    • Seen on: MacOs (M1)
    • Solution: Settings (MAC) -> 'Complete DiskAccess' -> and Select Docker, then restart Docker

Manual installation of the environment without Docker under Windows

We strongly recommend everyone to use the Docker image, in order to ensure that there are no version conflicts and that your code can interact correctly with the grading server. If you cannot get Docker to work under Windows, there is also the possibility of manually installing the programming environment without Docker. Please follow the steps outlined below:

To run the assignments without Docker, you will need the follow dependencies:

Additionally, you will need the following Python libraries:

  • Jupyterlab, which you can install using the following conda command conda install -c conda-forge jupyterlab
  • DuckDB, which you can install with pip install duckdb==0.3.2
  • PySpark (version 3.2.0) - Follow the installation instructions here. Since you would already have OpenJDK 11 and Anaconda installed, skip to the section “PySpark Install on Windows”. If you are still having problems on Windows after these steps, step 5 from this guide for using PySpark from Jupyter in Windows might help.

Note that the instructions in the linked webpage are given for PySpark version 3.0.0, please adapt them accordingly for downloading and installing PySpark version 3.2.0. To find version 3.2.0 in the Spark Downloads page, navigate to Spark release archives and download the file spark-3.2.0-bin-hadoop3.2.tgz