Skip to content

Commit

Permalink
Merge recent changes (#135)
Browse files Browse the repository at this point in the history
* Use minimum stage name formatting.

* Run copier.

* Add a tad more context for failure email.

* Pin pandas version

* Remove benchmarks for now.

* unpin sphinx versions (#134)

---------

Co-authored-by: Max West <110124344+maxwest-uw@users.noreply.github.com>
  • Loading branch information
delucchi-cmu and maxwest-uw committed Sep 28, 2023
1 parent 757f126 commit 883e7d3
Show file tree
Hide file tree
Showing 19 changed files with 108 additions and 27 deletions.
4 changes: 3 additions & 1 deletion .copier-answers.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Changes here will be overwritten by Copier
_commit: v1.3.4
_commit: v1.4.2
_src_path: gh:lincc-frameworks/python-project-template
author_email: lincc-frameworks-team@lists.lsst.org
author_name: LINCC Frameworks
create_example_module: false
custom_install: true
include_benchmarks: false
include_docs: true
include_notebooks: true
mypy_type_checking: basic
package_name: hipscat_import
Expand Down
6 changes: 3 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If it fixes an open issue, please link to the issue here. If this PR closes an i


## Code Quality
- [ ] I have read the [Contribution Guide](https://lincc-ppt.readthedocs.io/en/latest/source/contributing.html)
- [ ] I have read the Contribution Guide
- [ ] My code follows the code style of this project
- [ ] My code builds (or compiles) cleanly without any errors or warnings
- [ ] My code contains relevant comments and necessary documentation
Expand All @@ -50,8 +50,8 @@ If it fixes an open issue, please link to the issue here. If this PR closes an i
- [ ] Any updated docstrings use the [NumPy docstring format](https://numpydoc.readthedocs.io/en/latest/format.html)

### Build/CI Change Checklist
- [ ] If required or optional dependencies have changed (including version numbers), I have updated the [README](https://github.com/lincc-frameworks/python-project-template/blob/main/README.md) to reflect this
- [ ] If this is a new CI setup, I have added the associated badge to the [README](https://github.com/lincc-frameworks/python-project-template/blob/main/README.md)
- [ ] If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
- [ ] If this is a new CI setup, I have added the associated badge to the README

<!-- ### Version Change Checklist [For Future Use] -->

Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/build-documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,8 @@ jobs:
run: |
sudo apt-get update
python -m pip install --upgrade pip
if [ -f docs/requirements.txt ]; then pip install -r docs/requirements.txt; fi
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Install notebook requirements
run: |
sudo apt-get install pandoc
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,3 @@ jobs:

run: |
pylint -rn -sn --recursive=y ./src --rcfile=./src/.pylintrc
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ jobs:
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
password: ${{ secrets.PYPI_API_TOKEN }}
5 changes: 5 additions & 0 deletions .github/workflows/smoke-test.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# This workflow will run daily at 06:45.
# It will install Python dependencies and run tests with a variety of Python versions.
# See documentation for help debugging smoke test issues:
# https://lincc-ppt.readthedocs.io/en/latest/practices/ci_testing.html#version-culprit

name: Unit test smoke test

Expand Down Expand Up @@ -30,6 +32,9 @@ jobs:
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: List dependencies
run: |
pip list
- name: Run unit tests with pytest
run: |
python -m pytest tests
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,6 @@ dask-worker-space/

# tmp directory
tmp/

# Mac OS
.DS_Store
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ repos:
# This hook should always pass. It will print a message if the local version
# is out of date.
- repo: https://github.com/lincc-frameworks/pre-commit-hooks
rev: v0.1
rev: v0.1.1
hooks:
- id: check-lincc-frameworks-template-version
name: Check template version
Expand Down
25 changes: 25 additions & 0 deletions .prepare_project.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

echo "Initializing local git repository"
{
gitversion=( $(git version | sed 's/^.* //;s/\./ /g') )
if let "${gitversion[0]}<2"; then
# manipulate directly
git init . && echo 'ref: refs/heads/main' >.git/HEAD
elif let "${gitversion[0]}==2 & ${gitversion[1]}<34"; then
# rename master to main
git init . && { git branch -m master main 2>/dev/null || true; };
else
# set the initial branch name to main
git init --initial-branch=main >/dev/null
fi
} > /dev/null

echo "Installing package and runtime dependencies in local environment"
pip install -e . > /dev/null

echo "Installing developer dependencies in local environment"
pip install -e .'[dev]' > /dev/null

echo "Installing pre-commit"
pre-commit install > /dev/null
7 changes: 3 additions & 4 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
sphinx==6.1.3
sphinx_rtd_theme==1.2.0
sphinx-autoapi==2.0.1
sphinx
sphinx-rtd-theme
sphinx-autoapi
nbsphinx
ipykernel
ipython
jupytext
jupyter
Expand Down
13 changes: 8 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies = [
"healpy",
"hipscat >= 0.1.2",
"ipykernel", # Support for Jupyter notebooks
"pandas",
"pandas < 2.1.0",
"pyarrow",
"tqdm",
"numpy < 1.25",
Expand All @@ -36,9 +36,9 @@ dev = [
"pytest",
"pytest-cov",
"pytest-timeout",
"sphinx==6.1.3", # Used to automatically generate documentation
"sphinx_rtd_theme==1.2.0", # Used to render documentation
"sphinx-autoapi==2.0.1", # Used to automatically generate api documentation
"sphinx", # Used to automatically generate documentation
"sphinx-rtd-theme", # Used to render documentation
"sphinx-autoapi", # Used to automatically generate api documentation
"mypy", # Used for static type checking of files
# if you add dependencies here while experimenting in a notebook and you
# want that notebook to render in your documentation, please add the
Expand All @@ -53,7 +53,7 @@ dev = [

[build-system]
requires = [
"setuptools>=45", # Used to build and package the Python project
"setuptools>=62", # Used to build and package the Python project
"setuptools_scm>=6.2", # Gets release version from git. Makes it available programmatically
]
build-backend = "setuptools.build_meta"
Expand All @@ -69,6 +69,9 @@ timeout = 1
markers = [
"dask: mark tests as having a dask client runtime dependency",
]
testpaths = [
"tests",
]

[tool.coverage.report]
omit = [
Expand Down
5 changes: 3 additions & 2 deletions src/hipscat_import/catalog/resume_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ def __post_init__(self):

def gather_plan(self):
"""Initialize the plan."""
with tqdm(total=5, desc="Planning ", disable=not self.progress_bar) as step_progress:
with tqdm(
total=4, desc=self.get_formatted_stage_name("Planning"), disable=not self.progress_bar
) as step_progress:
## Make sure it's safe to use existing resume state.
super().safe_to_resume()
step_progress.update(1)
Expand Down Expand Up @@ -71,7 +73,6 @@ def gather_plan(self):
step_progress.update(1)

## Gather keys for execution.
step_progress.update(1)
if not mapping_done:
mapped_keys = set(self.read_log_keys(self.MAPPING_STAGE))
self.map_files = [
Expand Down
9 changes: 7 additions & 2 deletions src/hipscat_import/catalog/run_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

import hipscat_import.catalog.map_reduce as mr
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.pipeline_resume_plan import PipelineResumePlan


def _map_pixels(args, client):
Expand Down Expand Up @@ -106,7 +107,9 @@ def run(args, client):
raise ValueError("args must be type ImportArguments")
_map_pixels(args, client)

with tqdm(total=2, desc="Binning ", disable=not args.progress_bar) as step_progress:
with tqdm(
total=2, desc=PipelineResumePlan.get_formatted_stage_name("Binning"), disable=not args.progress_bar
) as step_progress:
raw_histogram = args.resume_plan.read_histogram(args.mapping_healpix_order)
step_progress.update(1)
if args.constant_healpix_order >= 0:
Expand Down Expand Up @@ -141,7 +144,9 @@ def run(args, client):
_reduce_pixels(args, destination_pixel_map, client)

# All done - write out the metadata
with tqdm(total=6, desc="Finishing", disable=not args.progress_bar) as step_progress:
with tqdm(
total=6, desc=PipelineResumePlan.get_formatted_stage_name("Finishing"), disable=not args.progress_bar
) as step_progress:
catalog_info = args.to_catalog_info(int(raw_histogram.sum()))
io.write_provenance_info(
catalog_base_dir=args.catalog_path,
Expand Down
5 changes: 4 additions & 1 deletion src/hipscat_import/index/run_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import hipscat_import.index.map_reduce as mr
from hipscat_import.index.arguments import IndexArguments
from hipscat_import.pipeline_resume_plan import PipelineResumePlan


def run(args):
Expand All @@ -16,7 +17,9 @@ def run(args):
rows_written = mr.create_index(args)

# All done - write out the metadata
with tqdm(total=4, desc="Finishing", disable=not args.progress_bar) as step_progress:
with tqdm(
total=4, desc=PipelineResumePlan.get_formatted_stage_name("Finishing"), disable=not args.progress_bar
) as step_progress:
# pylint: disable=duplicate-code
catalog_info = args.to_catalog_info(int(rows_written))
write_metadata.write_provenance_info(
Expand Down
6 changes: 5 additions & 1 deletion src/hipscat_import/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,11 @@ def _send_failure_email(args: RuntimeArguments, exception: Exception):
message = EmailMessage()
message["Subject"] = "hipscat-import failure."
message["To"] = args.completion_email_address
message.set_content(f"failed with message:\n{exception}")
message.set_content(
f"output_catalog_name: {args.output_catalog_name}"
"\n\nSee logs for more details"
f"\n\nFailed with message:\n\n{exception}"
)

_send_email(message)

Expand Down
16 changes: 15 additions & 1 deletion src/hipscat_import/pipeline_resume_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ def wait_for_futures(self, futures, stage_name):
stage_name(str): name of the stage (e.g. mapping, reducing)
"""
some_error = False
formatted_stage_name = self.get_formatted_stage_name(stage_name)
for future in tqdm(
as_completed(futures),
desc=stage_name,
desc=formatted_stage_name,
total=len(futures),
disable=(not self.progress_bar),
):
Expand All @@ -123,3 +124,16 @@ def wait_for_futures(self, futures, stage_name):
if some_error: # pragma: no cover
raise RuntimeError(f"Some {stage_name} stages failed. See logs for details.")
self.touch_done_file(stage_name)

@staticmethod
def get_formatted_stage_name(stage_name) -> str:
"""Create a stage name of consistent minimum length. Ensures that the tqdm
progress bars can line up nicely when multiple stages must run.
Args:
stage_name (str): name of the stage (e.g. mapping, reducing)
"""
if stage_name is None or len(stage_name) == 0:
stage_name = "progress"

return f"{stage_name.capitalize(): <10}"
4 changes: 3 additions & 1 deletion src/hipscat_import/soap/resume_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ def __init__(self, args: SoapArguments):

def gather_plan(self, args):
"""Initialize the plan."""
with tqdm(total=5, desc="Planning ", disable=not self.progress_bar) as step_progress:
with tqdm(
total=3, desc=self.get_formatted_stage_name("Planning"), disable=not self.progress_bar
) as step_progress:
## Make sure it's safe to use existing resume state.
super().safe_to_resume()
step_progress.update(1)
Expand Down
5 changes: 4 additions & 1 deletion src/hipscat_import/soap/run_soap.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from hipscat.io import file_io, write_metadata
from tqdm import tqdm

from hipscat_import.pipeline_resume_plan import PipelineResumePlan
from hipscat_import.soap.arguments import SoapArguments
from hipscat_import.soap.map_reduce import combine_partial_results, count_joins
from hipscat_import.soap.resume_plan import SoapPlan
Expand Down Expand Up @@ -36,7 +37,9 @@ def run(args, client):
resume_plan.wait_for_counting(futures)

# All done - write out the metadata
with tqdm(total=4, desc="Finishing", disable=not args.progress_bar) as step_progress:
with tqdm(
total=4, desc=PipelineResumePlan.get_formatted_stage_name("Finishing"), disable=not args.progress_bar
) as step_progress:
# pylint: disable=duplicate-code
# Very similar to /index/run_index.py
combine_partial_results(args.tmp_path, args.catalog_path)
Expand Down
14 changes: 14 additions & 0 deletions tests/hipscat_import/test_pipeline_resume_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,17 @@ def test_safe_to_resume(tmp_path):
## If there are no more intermediate files, we don't need to set resume.
plan.clean_resume_files()
plan.safe_to_resume()


def test_formatted_stage_name():
formatted = PipelineResumePlan.get_formatted_stage_name(None)
assert formatted == "Progress "

formatted = PipelineResumePlan.get_formatted_stage_name("")
assert formatted == "Progress "

formatted = PipelineResumePlan.get_formatted_stage_name("stage")
assert formatted == "Stage "

formatted = PipelineResumePlan.get_formatted_stage_name("very long stage name")
assert formatted == "Very long stage name"

0 comments on commit 883e7d3

Please sign in to comment.