Skip to content

Commit ab6b1b2

Browse files
authored
🚧 fix ci runs (mamba and numpy related) (#81)
* 🚧 switch mamba installation - see if snakemake envs are somehow cached * 🐛 specify python version, move ls * 🚧 deactivate some workflow, run relatvie ls command * try not to cache * 🚧 test using venv created by codespace with python 3.12 - might be that I need to create (not sure what change in runner configurations) * try to use full snakemake installation * 🚧 use miniconda for pypi installation test * try miniconda again - snakemake environment has it's own mamba installation - auto-activate environment "test" * install build dependencies, fix ubuntu first * 🐛 try to put mamba below 2.0 snakemake/snakemake#3108 * test should be activate per default * 🚧 conda env not activated... * 🐛 pip does not install in environment * 🚧 experiment * 🐛 shell was not iniated * 🐛 test installing njab separately * 🐛 order matters! * try again new order, add umap-learn explicitly * 🐛 do not re-install njab * restrict scipy (trapz missing in lifelines) latest scipy not supported by lifelines * 🐛 exclude numpy 2.0 for now * numpy try two * swap numpy and njab, adapt other pkg to what it was before * add back umap learn, relax constraints * 🚧 in package single requirement single packages cannot be specified to just ignore the dependencies. * ➖ remove scipy dependency - leave it to njab to install dependencies in a second step. * ⬆️ remove support for python 3.8 (end-of-life) * 🎨 setuptools_scm uses tags to determine version, add tags * 🐛 tags not fetched without entire history see actions/checkout#1471 * 🎨 clean-up workflow file * ✨ add njab after update to requirements - enable again more workflows (using mamba constraint snakemake environement) * 🔥 remove comments, ⏪ add back tests * 🐛 make order explicit (by feat freq or bin and bin count) * 🐛 fix order of example more explicitly. * 🐛 actually test latest version of pimms, remove comments * 🐛 runs natively in colab without issues
1 parent 6f391c0 commit ab6b1b2

File tree

13 files changed

+96
-290
lines changed

13 files changed

+96
-290
lines changed

.github/workflows/ci.yaml

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -21,35 +21,26 @@ jobs:
2121
"macos-13",
2222
# "windows-latest" # rrcovNA cannot be build from source on windows-server
2323
]
24-
python-version: ["3.8", "3.9", "3.10"]
24+
python-version: ["3.9", "3.10", "3.11", "3.12"]
2525
steps:
2626
- name: Checkout
2727
uses: actions/checkout@v4
2828
- name: Set up Miniconda
29-
# ! change action https://github.com/mamba-org/setup-micromamba
3029
uses: conda-incubator/setup-miniconda@v3
3130
with:
32-
miniforge-variant: Mambaforge
33-
# miniforge-version: latest
34-
use-mamba: true
35-
channel-priority: disabled
3631
python-version: ${{ matrix.python-version }}
32+
channel-priority: strict
3733
environment-file: snakemake_env.yml
3834
activate-environment: snakemake
3935
auto-activate-base: true
40-
# auto-update-conda: true
36+
auto-update-conda: true
4137
- name: inspect-conda-environment
4238
run: |
4339
conda info
4440
conda list
4541
conda env export --from-history --no-builds > environment.yml
4642
conda env export --no-builds
4743
conda env export --no-builds > environment_w_versions.yml
48-
# - name: test-r-kernel-imports
49-
# run: |
50-
# Rscript -e "library(stringi)"
51-
# Rscript -e "library(stringr)"
52-
# Rscript -e "library(reshape2)"
5344
- name: Dry-Run demo workflow (integration test)
5445
run: |
5546
cd project
@@ -75,8 +66,8 @@ jobs:
7566
name: ${{ matrix.os }}-${{ matrix.python-version }}-example-workflow-results
7667
path: |
7768
project/runs/example/
78-
environment.yml
79-
environment_w_versions.yml
69+
snakemake_env
70+
project/.snakemake/conda/
8071
8172
run-unit-local-pip-installation:
8273
runs-on: ${{ matrix.os }}
@@ -85,25 +76,28 @@ jobs:
8576
fail-fast: false
8677
matrix:
8778
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
88-
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
79+
python-version: ["3.9", "3.10", "3.11", "3.12"]
8980
steps:
9081
- uses: actions/checkout@v4
82+
with:
83+
fetch-tags: true
84+
fetch-depth: 0
9185

9286
- uses: actions/setup-python@v5
9387
with:
9488
python-version: ${{ matrix.python-version }}
9589

9690
- name: install pimms
97-
run: python -m pip install .
98-
91+
run: pip install .
92+
9993
- name: Install pytest
100-
run: python -m pip install pytest pytest-cov
94+
run: pip install pytest pytest-cov
10195

10296
- name: Run pytest
10397
run: pytest .
10498

10599
- name: Install papermill
106-
run: python -m pip install papermill ipykernel
100+
run: pip install papermill ipykernel
107101

108102
- name: View papermill help message for notebooks (as scripts)
109103
run: |
@@ -141,4 +135,4 @@ jobs:
141135
- uses: pypa/gh-action-pypi-publish@release/v1
142136
with:
143137
user: __token__
144-
password: ${{ secrets.PYPI_API_TOKEN }}
138+
password: ${{ secrets.PYPI_API_TOKEN }}

.github/workflows/ci_workflow.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: run workflow with conda envs
1+
name: run workflow (v1) with conda envs
22
on:
33
push:
44
branches: [main, dev]
@@ -31,13 +31,12 @@ jobs:
3131
# ! change action https://github.com/mamba-org/setup-micromamba
3232
uses: conda-incubator/setup-miniconda@v3
3333
with:
34-
miniforge-variant: Mambaforge
35-
use-mamba: true
36-
channel-priority: disabled
34+
channel-priority: strict
3735
python-version: ${{ matrix.python-version }}
3836
environment-file: snakemake_env.yml
3937
activate-environment: snakemake
4038
auto-activate-base: true
39+
auto-update-conda: true
4140
- name: inspect-conda-environment
4241
run: |
4342
conda info

.github/workflows/test_pkg_on_colab.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,12 @@ jobs:
2020
- name: Install pimms-learn (from branch) and papermill
2121
if: github.event_name == 'pull_request'
2222
run: |
23-
python3 -m pip install pimms-learn papermill
23+
pip install .
24+
pip install papermill
2425
- name: Install pimms-learn (from PyPI) and papermill
2526
if: github.event_name == 'schedule'
2627
run: |
27-
python3 -m pip install pimms-learn papermill
28+
pip install pimms-learn papermill
2829
- name: Run tutorial
2930
run: |
3031
cd project

.github/workflows/workflow_website.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Build workflow website on public Alzheimer dataset (for protein groups)
1+
name: Build workflow (v2) website on public Alzheimer dataset (for protein groups)
22
on:
33
pull_request:
44
branches: [main, dev]
@@ -73,4 +73,4 @@ jobs:
7373
uses: peaceiris/actions-gh-pages@v4
7474
with:
7575
github_token: ${{ secrets.GITHUB_TOKEN }}
76-
publish_dir: project/runs/alzheimer_study/_build/
76+
publish_dir: project/runs/alzheimer_study/_build/

.readthedocs.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ version: 2
99
build:
1010
os: ubuntu-22.04
1111
tools:
12-
python: "3.8"
12+
python: "3.10"
1313
# You can also specify other tool versions:
1414
# nodejs: "19"
1515
# rust: "1.64"
@@ -32,4 +32,4 @@ python:
3232
- method: pip
3333
path: .
3434
extra_requirements:
35-
- docs
35+
- docs

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ channels:
99
- plotly
1010
# - defaults
1111
dependencies:
12-
- python>=3.8,<=3.12
12+
- python>=3.9,<=3.12
1313
- numpy
1414
- pandas>=1
1515
- scipy>=1.6

pimmslearn/imputation.py

Lines changed: 3 additions & 159 deletions
Original file line numberDiff line numberDiff line change
@@ -5,165 +5,18 @@
55
66
77
"""
8-
from typing import Tuple, Dict
9-
from sklearn.neighbors import NearestNeighbors
10-
import scipy
8+
import logging
9+
from typing import Dict, Tuple
10+
1111
import numpy as np
1212
import pandas as pd
13-
import logging
1413

1514
logger = logging.getLogger(__name__)
1615

1716

1817
RANDOMSEED = 123
1918

2019

21-
def impute_missing(protein_values, mean=None, std=None):
22-
"""
23-
Imputation is based on the mean and standard deviation
24-
from the protein_values.
25-
If mean and standard deviation (std) are given,
26-
missing values are imputed and protein_values are returned imputed.
27-
If no mean and std are given, the mean and std are computed from
28-
the non-missing protein_values.
29-
30-
Parameters
31-
----------
32-
protein_values: Iterable
33-
mean: float
34-
std: float
35-
36-
Returns
37-
------
38-
protein_values: pandas.Series
39-
"""
40-
raise NotImplementedError('Will be the main function combining features')
41-
# clip by zero?
42-
43-
44-
def _select_data(data: pd.DataFrame, threshold: float):
45-
"""Select (protein-) columns for imputation.
46-
47-
Based on the threshold representing the minimum proportion of available
48-
data per protein, the columns of a `pandas.DataFrame` are selected.
49-
50-
Parameters
51-
----------
52-
data: pandas.DataFrame
53-
threshold: float
54-
Threshold of percentage of non-missing values to select a column/feature.
55-
"""
56-
columns_to_impute = data.notnull().mean() >= threshold
57-
return columns_to_impute
58-
59-
60-
def _sparse_coo_array(data: pd.DataFrame):
61-
"""Return a sparse scipy matrix from dense `pandas.DataFrame` with many
62-
missing values.
63-
"""
64-
indices = np.nonzero(~np.isnan(data.to_numpy()))
65-
data_selected_sparse = data.to_numpy()
66-
data_selected_sparse = scipy.sparse.coo_matrix(
67-
(data_selected_sparse[indices], indices),
68-
shape=data_selected_sparse.shape)
69-
return data_selected_sparse
70-
71-
72-
def _get_weighted_mean(distances, data):
73-
"""Compute weighted mean ignoring
74-
identical entries"""
75-
mask = distances > 0.0
76-
weights = distances[mask] / distances[mask].sum()
77-
weighted_sum = data.loc[mask].mul(weights, axis=0)
78-
mean_imputed = weighted_sum.sum() / sum(mask)
79-
return mean_imputed
80-
81-
82-
# define imputation methods
83-
# could be done in PCA transformed space
84-
def imputation_KNN(data, alone=True, threshold=0.5):
85-
"""
86-
87-
88-
Parameters
89-
----------
90-
data: pandas.DataFrame
91-
alone: bool # is not used
92-
threshold: float
93-
Threshold of missing data by column in interval (0, 1)
94-
"""
95-
mask_selected = _select_data(data=data, threshold=threshold)
96-
data_selected = data.loc[:, mask_selected].copy()
97-
data_selected_sparse = _sparse_coo_array(data_selected)
98-
# impute
99-
knn_fitted = NearestNeighbors(n_neighbors=3, algorithm='brute').fit(
100-
data_selected_sparse)
101-
fit_distances, fit_neighbors = knn_fitted.kneighbors(data_selected_sparse)
102-
for i, (distances, ids) in enumerate(zip(fit_distances, fit_neighbors)):
103-
mean_imputed = _get_weighted_mean(distances, data_selected.loc[ids])
104-
if all(distances == 0.0):
105-
logger.warning(f"Did not find any neighbor for int-id: {i}")
106-
else:
107-
assert i == ids[distances == 0.0], (
108-
"None or more then one identical data points "
109-
"for ids: {}".format(ids[distances == 0.0])
110-
)
111-
mask = data_selected.iloc[i].isna()
112-
data_selected.loc[i, mask] = mean_imputed.loc[mask] # SettingWithCopyError
113-
114-
data.update(data_selected)
115-
return data
116-
117-
118-
def imputation_normal_distribution(log_intensities: pd.Series,
119-
mean_shift=1.8,
120-
std_shrinkage=0.3,
121-
copy=True):
122-
"""Impute missing log-transformed intensity values of a single feature.
123-
Samples one value for imputation for all samples.
124-
125-
Parameters
126-
----------
127-
log_intensities: pd.Series
128-
Series of normally distributed values of a single feature (for all samples/runs).
129-
Here usually log-transformed intensities.
130-
mean_shift: integer, float
131-
Shift the mean of the log_intensities by factors of their standard
132-
deviation to the negative.
133-
std_shrinkage: float
134-
Value greater than zero by which to shrink (or inflate) the
135-
standard deviation of the log_intensities.
136-
"""
137-
np.random.seed(RANDOMSEED)
138-
if not isinstance(log_intensities, pd.Series):
139-
try:
140-
log_intensities.Series(log_intensities)
141-
logger.warning("Series created of Iterable.")
142-
except BaseException:
143-
raise ValueError(
144-
"Plese provided data which is a pandas.Series or an Iterable")
145-
if mean_shift < 0:
146-
raise ValueError(
147-
"Please specify a positive float as the std.-dev. is non-negative.")
148-
if std_shrinkage <= 0:
149-
raise ValueError(
150-
"Please specify a positive float as shrinkage factor for std.-dev.")
151-
if std_shrinkage >= 1:
152-
logger.warning("Standard Deviation will increase for imputed values.")
153-
154-
mean = log_intensities.mean()
155-
std = log_intensities.std()
156-
157-
mean_shifted = mean - (std * mean_shift)
158-
std_shrinked = std * std_shrinkage
159-
160-
if copy:
161-
log_intensities = log_intensities.copy(deep=True)
162-
163-
return log_intensities.where(log_intensities.notna(),
164-
np.random.normal(mean_shifted, std_shrinked))
165-
166-
16720
def impute_shifted_normal(df_wide: pd.DataFrame,
16821
mean_shift: float = 1.8,
16922
std_shrinkage: float = 0.3,
@@ -224,15 +77,6 @@ def impute_shifted_normal(df_wide: pd.DataFrame,
22477
return imputed_shifted_normal
22578

22679

227-
def imputation_mixed_norm_KNN(data):
228-
# impute columns with less than 50% missing values with KNN
229-
data = imputation_KNN(data, alone=False) # ToDo: Alone is not used.
230-
# impute remaining columns based on the distribution of the protein
231-
data = imputation_normal_distribution(
232-
data, mean_shift=1.8, std_shrinkage=0.3)
233-
return data
234-
235-
23680
def compute_moments_shift(observed: pd.Series, imputed: pd.Series,
23781
names: Tuple[str, str] = ('observed', 'imputed')) -> Dict[str, float]:
23882
"""Summary of overall shift of mean and std. dev. of predictions for a imputation method."""

pimmslearn/pandas/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
import omegaconf
88
import pandas as pd
99

10-
from pimmslearn.pandas.calc_errors import calc_errors_per_feat, get_absolute_error
10+
from pimmslearn.pandas.calc_errors import (calc_errors_per_feat,
11+
get_absolute_error)
1112

1213
__all__ = [
1314
'calc_errors_per_feat',

project/workflow/envs/pimms.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ channels:
99
- plotly
1010
# - defaults
1111
dependencies:
12-
- python>=3.8,<=3.12
12+
- python>=3.9,<=3.12
1313
- numpy
1414
- pandas>=1
1515
- scipy>=1.6

0 commit comments

Comments
 (0)