Skip to content

Commit

Permalink
updates for version 0.15.0
Browse files Browse the repository at this point in the history
improve tests, change parameter names and output times for non-class
functions - `merge_dicts`, `standardize`, and `change_dtype`. Also, use
numpy operations for CAP metric calculations.
  • Loading branch information
donishadsmith committed Jul 22, 2024
1 parent 1af2bb6 commit 7c54583
Show file tree
Hide file tree
Showing 20 changed files with 711 additions and 4,420 deletions.
16 changes: 15 additions & 1 deletion .github/workflows/testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ jobs:
run: |
pip install pytest
pytest test_CAP.py
pytest test_merge_dicts.py
shell: bash
working-directory: tests
- name: Run TimeseriesExtractor tests
Expand All @@ -39,3 +38,18 @@ jobs:
pytest test_TimeseriesExtractor_modified.py
shell: bash
working-directory: tests
- name: Run merge_dicts test
run: |
pytest test_merge_dicts.py
shell: bash
working-directory: tests
- name: Run standardize test
run: |
pytest test_standardize.py
shell: bash
working-directory: tests
- name: Run change_dtype test
run: |
pytest test_change_dtype.py
shell: bash
working-directory: tests
168 changes: 159 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,161 @@ noted in the changelog (i.e new functions or parameters, changes in parameter de
improvements/enhancements. Fixes and modifications will be backwards compatible.
- *.postN* : Consists of only metadata-related changes, such as updates to type hints or doc strings/documentation.

## [0.15.0] - 2024-07-21
### 🚀 New/Added
- `save_reduced_dicts` parameter to `merge_dicts` so that the reduced dictionaries can also be saved instead of only
being returned.

### ♻ Changed
- Some parameter names, inputs, and outputs for non-class functions - `merge_dicts`, `change_dtypes`, and `standardize`
have changed to improve consistency across these functions.
- `merge_dicts`
- `return_combined_dict` has been changed to `return_merged_dict`.
- `file_name` has been changed to `file_names` since the reduced dicts can also be saved now.
- `standardize` & `change_dtypes`
- `subject_timeseries` has been changed to `subject_timeseries_list`, the same as in `merge_dicts`.
- `file_name` has been changed to `file_names`.
- `return_dict` has been changed to `return_dicts`.
- The returned dictionary for `merge_dicts`, `change_dtypes`, and `standardize` is only
`dict[str, dict[str, dict[str, np.ndarray]]]` now.

- In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce code.
- "counts"
- Previous Code:
```python
# Get frequency
frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
# Sort the keys
sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
# Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
if len(sorted_frequency_dict) != len(cap_numbers):
sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
cap_number <= group_cap_counts[group] else float("nan") for cap_number in
cap_numbers}

```
- Refactored Code:
```python
# Get frequency;
frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
for key in range(1, group_cap_counts[group] + 1)}
# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
```
- "temporal_fraction"
- Previous Code:
```python
proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
for key, item in sorted_frequency_dict.items()}
```
- "Refactored Code": Nothing other than some parameter names have changed.
```python
proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
for key, value in frequency_dict.items()}
```
- "persistence"
- Previous Code:
```python
# Initialize variable
persistence_dict = {}
uninterrupted_volumes = []
count = 0
# Iterate through caps
for target in cap_numbers:
# Iterate through each element and count uninterrupted volumes that equal target
for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
if predicted_subject_timeseries[subj_id][curr_run][index] == target:
count +=1
# Store count in list if interrupted and not zero
else:
if count != 0:
uninterrupted_volumes.append(count)
# Reset counter
count = 0
# In the event, a participant only occupies one CAP and to ensure final counts are added
if count > 0:
uninterrupted_volumes.append(count)
# If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
if len(uninterrupted_volumes) > 0:
persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
if tr:
persistence_dict.update({target: persistence_value*tr})
else:
persistence_dict.update({target: persistence_value})
else:
# Zero indicates that a participant has zero instances of the CAP
persistence_dict.update({target: 0})
# Reset variables
count = 0
uninterrupted_volumes = []

# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
persistence_dict = {cap_number: persistence_dict[cap_number] if
cap_number <= group_cap_counts[group] else float("nan") for cap_number in
cap_numbers}
```
- Refactored Code:
```python
# Initialize variable
persistence_dict = {}
# Iterate through caps
for target in cap_numbers:
# Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
# Get indices of values that equal 1; [0,2,3,4]
target_indices = np.where(binary_arr == 1)[0]
# Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
# binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
# Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
# always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})

# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
```
- "transition_frequency"
- Previous Code:
```python
count = 0
# Iterate through predicted values
for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
if index != 0:
# If the subsequent element does not equal the previous element, this is considered a transition
if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
count +=1
# Populate DataFrame
new_row = [subj_id, group_name, curr_run, count]
df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
```

- Refactored Code:
```python
# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
# for values not zero is [1,1,0,0,1] = 3 transitions
transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()
```
### 🐛 Fixes
- When a pickle file was used as input in `standardize` or `change_dtype` an error was produced, this has been fixed
and these functions accept a list of dictionaries or a list of pickle files now.

### 💻 Metadata
- In the documentation for `CAP.caps2corr` it is now explicitly stated that the type of correlation being used is
Pearson correlation.

## [0.14.7] - 2024-07-17
### ♻ Changed
- Improved Warning Messages and Print Statements:
- In TimeseriesExtractor.get_bold, the subject-specific information output has been reformatted for better readability:

- Previous Format:
```
Subject: 1; run:1 - Message
Expand Down Expand Up @@ -254,9 +404,9 @@ in earlier Python versions (3.9).
## [0.12.1] - 2024-06-27

### ♻ Changed
- For `merge_dicts` sorts the run keys lexicographically so that subjects that don't have the earliest run-id in the
- For `merge_dicts` sorts the run keys lexicographically so that subjects that don't have the earliest run-id in the
first dictionary due to not having that run or the run being excluded still have ordered run keys in the merged
dictionary.
dictionary.

### 💻 Metadata
- Updates `runs` parameters type hints so that it is known that strings can be used to0.
Expand Down Expand Up @@ -297,7 +447,7 @@ This doesn't affect functionality but it may be better to respect the original u

## [0.11.1] - 2024-06-23
### 🐛 Fixes
- Fix for python 3.12 when using `CAP.caps2surf()`.
- Fix for python 3.12 when using `CAP.caps2surf()`.
- Changes in pathlib.py in Python 3.12 results in an error message format change. The error message now includes
quotes (e.g., "not 'Nifti1Image'") instead of the previous format without quotes ("not Nifti1Image"). This issue
arises when using ``neuromaps.transforms.mni_to_fslr`` within CAP.caps2surf() as neuromaps captures the error as a
Expand Down Expand Up @@ -346,7 +496,7 @@ the docstring header for aesthetics.
### 🚀 New/Added
- Added new function `change_dtype` to make it easier to change the dtypes of each subject's numpy array to assist with
memory usage, especially if doing the CAPs analysis on a local machine.
- Added new parameters - `output_dir`, `file_name`, and `return_dict` ` to `standardize` to save dictionary, the
- Added new parameters - `output_dir`, `file_name`, and `return_dict` ` to `standardize` to save dictionary, the
`return_dict` defaults to True.
- Adds a new version attribute so you can check the current version using `neurocaps.__version__`

Expand All @@ -363,7 +513,7 @@ it will provide a default file_name now instead of producing a Nonetype error.

## [0.10.0.post2] - 2024-06-20
### 💻 Metadata
- Minor metadata update to docstrings to remove curly braces from inside the list object of certain parameters to
- Minor metadata update to docstrings to remove curly braces from inside the list object of certain parameters to
not make it seem as if it is supposed to be a strings inside a dictionary which is inside a list as opposed to strings
in a list.

Expand Down Expand Up @@ -394,7 +544,7 @@ surface space. - **new to [0.10.0]**
- Adds nbformat as dependency for plotly. - **new to [0.10.0]**
- In `TimeseriesExtractor.get_bold()`, several checks are done to ensure that subjects have the necessary files for
extraction. Subjects that have zero nifti, confound files (if confounds requested), event files (if requested), etc
are automatically eliminated from being added to the list for timeseries extraction. A final check assesses, the run
are automatically eliminated from being added to the list for timeseries extraction. A final check assesses, the run
ID of the files to see if the subject has at least one run with all necessary files to avoid having subjects with all
the necessary files needed but all are from different runs. This is most likely a rare occurrence but it is better to be
safer to ensure that even a rare occurrence doesn't result in a crash. The continue statement that skips the subject
Expand Down Expand Up @@ -425,8 +575,8 @@ python 3.12, you may need to use `pip install setuptools` if you receive an erro

## [0.9.9.post3] - 2024-06-13
### 🐛 Fixes
- Noted an issue with file naming in `CAP.calculate_metrics()` that causes the suffix of the file name to append
to subsequent file names when requesting multiple metrics. While it doesn't effect the content inside the file it is an
- Noted an issue with file naming in `CAP.calculate_metrics()` that causes the suffix of the file name to append
to subsequent file names when requesting multiple metrics. While it doesn't effect the content inside the file it is an
irritating issue. For instance "-temporal_fraction.csv" became "-counts-temporal_fraction.csv" if user requested "counts"
before "temporal fraction".

Expand Down
4,242 changes: 90 additions & 4,152 deletions demo.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/examples/dtype.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ Changes the dtype of all participant's numpy arrays to assist with memory usage.
from neurocaps.analysis import change_dtype
subject_timeseries = {str(x) : {f"run-{y}": np.random.rand(50,100) for y in range(1,3)} for x in range(1,3)}
converted_subject_timeseries = change_dtype(subject_timeseries=subject_timeseries, dtype=np.float32)
converted_subject_timeseries = change_dtype(subject_timeseries_list=[subject_timeseries], dtype=np.float32)
for subj_id in subject_timeseries:
for run in subject_timeseries[subj_id]:
print(f"""
subj-{subj_id}; {run}:
dtype before conversion {subject_timeseries[subj_id][run].dtype}
dtype after conversion: {converted_subject_timeseries[subj_id][run].dtype}
dtype after conversion: {converted_subject_timeseries["dict_0"][subj_id][run].dtype}
""")
.. rst-class:: sphx-glr-script-out
Expand Down
34 changes: 17 additions & 17 deletions docs/examples/merge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Tutorial 3: Merging Timeseries With ``neurocaps.analysis.merge_dicts``
======================================================================
Combining the timeseries from different tasks is possible with ``merge_dicts``, this permits running analyses to
identify similar CAPs across different tasks, assuming these tasks use the same subjects. The ``merge_dicts()``
function will produce a combined subject timeseries dictionary that contains only the subject IDs present across both
function will produce a merged subject timeseries dictionary that contains only the subject IDs present across both
subject dictionaries. Additionally, this function appends similar run-IDs together. For instance, run-1 from one task
is appended to run-1 of the other task. For this to work, all dictionaries must contain the same number of columns/ROIs.

Expand All @@ -21,12 +21,12 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
subject_timeseries_2 = {str(x) : {f"run-{y}": np.random.rand(20,100) for y in range(1,2)} for x in range(1,3)}
# subject_timeseries_list also takes pickle files and can save the modified dictionaries as pickles too.
subject_timeseries_combined = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
return_combined_dict=True, return_reduced_dicts=False)
subject_timeseries_merged = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
return_merged_dict=True, return_reduced_dicts=False)
for subj_id in subject_timeseries_combined:
for run_id in subject_timeseries_combined[subj_id]:
timeseries = subject_timeseries_combined[subj_id][run_id]
for subj_id in subject_timeseries_merged["merged"]:
for run_id in subject_timeseries_merged["merged"][subj_id]:
timeseries = subject_timeseries_merged["merged"][subj_id][run_id]
print(f"sub-{subj_id}; {run_id} shape is {timeseries.shape}")
.. rst-class:: sphx-glr-script-out
Expand All @@ -43,13 +43,13 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
# The original dictionaries can also be returned too. The only modifications done is that the originals will
# Only contain the subjects present across all dictionaries in the list
combined_dicts = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
return_combined_dict=True, return_reduced_dicts=True)
merged_dicts = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
return_merged_dict=True, return_reduced_dicts=True)
for dict_id in combined_dicts:
for subj_id in combined_dicts[dict_id]:
for run_id in combined_dicts[dict_id][subj_id]:
timeseries = combined_dicts[dict_id][subj_id][run_id]
for dict_id in merged_dicts:
for subj_id in merged_dicts[dict_id]:
for run_id in merged_dicts[dict_id][subj_id]:
timeseries = merged_dicts[dict_id][subj_id][run_id]
print(f"For {dict_id} sub-{subj_id}; {run_id} shape is {timeseries.shape}")
.. rst-class:: sphx-glr-script-out
Expand All @@ -63,8 +63,8 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
For dict_0 sub-2; run-2 shape is (10, 100)
For dict_1 sub-1; run-1 shape is (20, 100)
For dict_1 sub-2; run-1 shape is (20, 100)
For combined sub-1; run-1 shape is (30, 100)
For combined sub-1; run-2 shape is (10, 100)
For combined sub-1; run-3 shape is (10, 100)
For combined sub-2; run-1 shape is (30, 100)
For combined sub-2; run-2 shape is (10, 100)
For merged sub-1; run-1 shape is (30, 100)
For merged sub-1; run-2 shape is (10, 100)
For merged sub-1; run-3 shape is (10, 100)
For merged sub-2; run-1 shape is (30, 100)
For merged sub-2; run-2 shape is (10, 100)
8 changes: 4 additions & 4 deletions docs/examples/standardize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ timeseries extraction.
std_vec_1[std_vec_1 < np.finfo(np.float64).eps] = 1.0
std_vec_2[std_vec_2 < np.finfo(np.float64).eps] = 1.0
standardized_subject_timeseries = standardize(subject_timeseries)
standardized_subject_timeseries = standardize(subject_timeseries_list=[subject_timeseries])
standardized_1 = (subject_timeseries["1"]["run-1"] - mean_vec_1)/std_vec_1
standardized_2 = (subject_timeseries["1"]["run-2"] - mean_vec_2)/std_vec_2
print(np.array_equal(standardized_subject_timeseries["1"]["run-1"], standardized_1))
print(np.array_equal(standardized_subject_timeseries["1"]["run-2"], standardized_2))
print(np.array_equal(standardized_subject_timeseries["dict_0"]["1"]["run-1"], standardized_1))
print(np.array_equal(standardized_subject_timeseries["dict_0"]["1"]["run-2"], standardized_2))
.. rst-class:: sphx-glr-script-out

Expand Down
2 changes: 1 addition & 1 deletion neurocaps/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
__all__=["analysis", "extraction"]

# Version in single place
__version__ = "0.14.7"
__version__ = "0.15.0"
Loading

0 comments on commit 7c54583

Please sign in to comment.