updates for version 0.15.0

improve tests, change parameter names and output times for non-class functions - `merge_dicts`, `standardize`, and `change_dtype`. Also, use numpy operations for CAP metric calculations.
donishadsmith · Jul 22, 2024 · 7c54583 · 7c54583
1 parent 1af2bb6
commit 7c54583
Show file tree

Hide file tree

Showing 20 changed files with 711 additions and 4,420 deletions.
diff --git a/.github/workflows/testing.yaml b/.github/workflows/testing.yaml
@@ -28,7 +28,6 @@ jobs:
         run: |
           pip install pytest
           pytest test_CAP.py
-          pytest test_merge_dicts.py
         shell: bash
         working-directory: tests
       - name: Run TimeseriesExtractor tests
@@ -39,3 +38,18 @@ jobs:
           pytest test_TimeseriesExtractor_modified.py
         shell: bash
         working-directory: tests
+      - name: Run merge_dicts test
+        run: |
+          pytest test_merge_dicts.py
+        shell: bash
+        working-directory: tests
+      - name: Run standardize test
+        run: |
+          pytest test_standardize.py
+        shell: bash
+        working-directory: tests
+      - name: Run change_dtype test
+        run: |
+          pytest test_change_dtype.py
+        shell: bash
+        working-directory: tests
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -42,11 +42,161 @@ noted in the changelog (i.e new functions or parameters, changes in parameter de
 improvements/enhancements. Fixes and modifications will be backwards compatible.
 - *.postN* : Consists of only metadata-related changes, such as updates to type hints or doc strings/documentation.
 
+## [0.15.0] - 2024-07-21
+### 🚀 New/Added
+- `save_reduced_dicts` parameter to `merge_dicts` so that the reduced dictionaries can also be saved instead of only
+being returned.
+
+### ♻ Changed
+- Some parameter names, inputs, and outputs for non-class functions - `merge_dicts`, `change_dtypes`, and `standardize`
+have changed to improve consistency across these functions.
+    - `merge_dicts`
+        - `return_combined_dict` has been changed to `return_merged_dict`.
+        - `file_name` has been changed to `file_names` since the reduced dicts can also be saved now.
+    - `standardize` & `change_dtypes`
+        - `subject_timeseries` has been changed to `subject_timeseries_list`, the same as in `merge_dicts`.
+        - `file_name` has been changed to `file_names`.
+        - `return_dict` has been changed to `return_dicts`.
+- The returned dictionary for `merge_dicts`, `change_dtypes`, and `standardize` is only
+`dict[str, dict[str, dict[str, np.ndarray]]]` now.
+
+- In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
+import or use numpy operations to reduce code.
+    - "counts"
+        - Previous Code:
+        ```python
+        # Get frequency
+        frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
+        # Sort the keys
+        sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
+        # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
+        if len(sorted_frequency_dict) != len(cap_numbers):
+            sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
+                                     list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
+        # Replace zeros with nan for groups with less caps than the group with the max caps
+        if len(cap_numbers) > group_cap_counts[group]:
+            sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
+                                     cap_number <= group_cap_counts[group] else float("nan") for cap_number in
+                                     cap_numbers}
+
+        ```
+        - Refactored Code:
+        ```python
+        # Get frequency;
+        frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
+                          for key in range(1, group_cap_counts[group] + 1)}
+        # Replace zeros with nan for groups with less caps than the group with the max caps
+        if max(cap_numbers) > group_cap_counts[group]:
+            for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
+        ```
+    - "temporal_fraction"
+        - Previous Code:
+        ```python
+        proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
+                                       for key, item in sorted_frequency_dict.items()}
+        ```
+        - "Refactored Code": Nothing other than some parameter names have changed.
+        ```python
+        proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
+                           for key, value in frequency_dict.items()}
+        ```
+    - "persistence"
+        - Previous Code:
+        ```python
+        # Initialize variable
+        persistence_dict = {}
+        uninterrupted_volumes = []
+        count = 0
+        # Iterate through caps
+        for target in cap_numbers:
+            # Iterate through each element and count uninterrupted volumes that equal target
+            for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
+                if predicted_subject_timeseries[subj_id][curr_run][index] == target:
+                    count +=1
+                # Store count in list if interrupted and not zero
+                else:
+                    if count != 0:
+                        uninterrupted_volumes.append(count)
+                    # Reset counter
+                    count = 0
+            # In the event, a participant only occupies one CAP and to ensure final counts are added
+            if count > 0:
+                uninterrupted_volumes.append(count)
+            # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
+            if len(uninterrupted_volumes) > 0:
+                persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
+                if tr:
+                    persistence_dict.update({target: persistence_value*tr})
+                else:
+                    persistence_dict.update({target: persistence_value})
+            else:
+                # Zero indicates that a participant has zero instances of the CAP
+                persistence_dict.update({target: 0})
+            # Reset variables
+            count = 0
+            uninterrupted_volumes = []
+
+        # Replace zeros with nan for groups with less caps than the group with the max caps
+        if len(cap_numbers) > group_cap_counts[group]:
+            persistence_dict = {cap_number: persistence_dict[cap_number] if
+                                cap_number <= group_cap_counts[group] else float("nan") for cap_number in
+                                cap_numbers}
+        ```
+        - Refactored Code:
+        ```python
+        # Initialize variable
+        persistence_dict = {}
+        # Iterate through caps
+        for target in cap_numbers:
+            # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
+            binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
+            # Get indices of values that equal 1; [0,2,3,4]
+            target_indices = np.where(binary_arr == 1)[0]
+            # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
+            # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
+            segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
+            # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
+            # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
+            persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})
+
+        # Replace zeros with nan for groups with less caps than the group with the max caps
+        if max(cap_numbers) > group_cap_counts[group]:
+            for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
+        ```
+    - "transition_frequency"
+        - Previous Code:
+        ```python
+        count = 0
+        # Iterate through predicted values
+        for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
+            if index != 0:
+                # If the subsequent element does not equal the previous element, this is considered a transition
+                if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
+                    count +=1
+        # Populate DataFrame
+        new_row = [subj_id, group_name, curr_run, count]
+        df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
+        ```
+
+        - Refactored Code:
+        ```python
+        # Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
+        # for values not zero is [1,1,0,0,1] = 3 transitions
+        transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()
+        ```
+### 🐛 Fixes
+- When a pickle file was used as input in `standardize` or `change_dtype` an error was produced, this has been fixed
+and these functions accept a list of dictionaries or a list of pickle files now.
+
+### 💻 Metadata
+- In the documentation for `CAP.caps2corr` it is now explicitly stated that the type of correlation being used is
+Pearson correlation.
+
 ## [0.14.7] - 2024-07-17
 ### ♻ Changed
 - Improved Warning Messages and Print Statements:
     - In TimeseriesExtractor.get_bold, the subject-specific information output has been reformatted for better readability:
-        
+
         - Previous Format:
         ```
         Subject: 1; run:1 - Message
@@ -254,9 +404,9 @@ in earlier Python versions (3.9).
 ## [0.12.1] - 2024-06-27
 
 ### ♻ Changed
-- For `merge_dicts` sorts the run keys lexicographically so that subjects that don't have the earliest run-id in the 
+- For `merge_dicts` sorts the run keys lexicographically so that subjects that don't have the earliest run-id in the
 first dictionary due to not having that run or the run being excluded still have ordered run keys in the merged
-dictionary. 
+dictionary.
 
 ### 💻 Metadata
 - Updates `runs` parameters type hints so that it is known that strings can be used to0.
@@ -297,7 +447,7 @@ This doesn't affect functionality but it may be better to respect the original u
 
 ## [0.11.1] - 2024-06-23
 ### 🐛 Fixes
-- Fix for python 3.12 when using `CAP.caps2surf()`. 
+- Fix for python 3.12 when using `CAP.caps2surf()`.
     - Changes in pathlib.py in Python 3.12 results in an error message format change. The error message now includes
       quotes (e.g., "not 'Nifti1Image'") instead of the previous format without quotes ("not Nifti1Image"). This issue
       arises when using ``neuromaps.transforms.mni_to_fslr`` within CAP.caps2surf() as neuromaps captures the error as a
@@ -346,7 +496,7 @@ the docstring header for aesthetics.
 ### 🚀 New/Added
 - Added new function `change_dtype` to make it easier to change the dtypes of each subject's numpy array to assist with
 memory usage, especially if doing the CAPs analysis on a local machine.
-- Added new parameters - `output_dir`, `file_name`, and `return_dict` ` to `standardize` to save dictionary, the 
+- Added new parameters - `output_dir`, `file_name`, and `return_dict` ` to `standardize` to save dictionary, the
 `return_dict` defaults to True.
 - Adds a new version attribute so you can check the current version using `neurocaps.__version__`
 
@@ -363,7 +513,7 @@ it will provide a default file_name now instead of producing a Nonetype error.
 
 ## [0.10.0.post2] - 2024-06-20
 ### 💻 Metadata
-- Minor metadata update to docstrings to remove curly braces from inside the list object of certain parameters to 
+- Minor metadata update to docstrings to remove curly braces from inside the list object of certain parameters to
 not make it seem as if it is supposed to be a strings inside a dictionary which is inside a list as opposed to strings
 in a list.
 
@@ -394,7 +544,7 @@ surface space. - **new to [0.10.0]**
 - Adds nbformat as dependency for plotly. - **new to [0.10.0]**
 - In `TimeseriesExtractor.get_bold()`, several checks are done to ensure that subjects have the necessary files for
 extraction. Subjects that have zero nifti, confound files (if confounds requested), event files (if requested), etc
-are automatically eliminated from being added to the list for timeseries extraction. A final check assesses, the run 
+are automatically eliminated from being added to the list for timeseries extraction. A final check assesses, the run
 ID of the files to see if the subject has at least one run with all necessary files to avoid having subjects with all
 the necessary files needed but all are from different runs. This is most likely a rare occurrence but it is better to be
 safer to ensure that even a rare occurrence doesn't result in a crash. The continue statement that skips the subject
@@ -425,8 +575,8 @@ python 3.12, you may need to use `pip install setuptools` if you receive an erro
 
 ## [0.9.9.post3] - 2024-06-13
 ### 🐛 Fixes
-- Noted an issue with file naming in `CAP.calculate_metrics()` that causes the suffix of the file name to append 
-to subsequent file names when requesting multiple metrics. While it doesn't effect the content inside the file it is an 
+- Noted an issue with file naming in `CAP.calculate_metrics()` that causes the suffix of the file name to append
+to subsequent file names when requesting multiple metrics. While it doesn't effect the content inside the file it is an
 irritating issue. For instance "-temporal_fraction.csv" became "-counts-temporal_fraction.csv" if user requested "counts"
 before "temporal fraction".
 

diff --git a/demo.ipynb b/demo.ipynb
diff --git a/docs/examples/dtype.rst b/docs/examples/dtype.rst
@@ -8,13 +8,13 @@ Changes the dtype of all participant's numpy arrays to assist with memory usage.
     from neurocaps.analysis import change_dtype
 
     subject_timeseries = {str(x) : {f"run-{y}": np.random.rand(50,100) for y in range(1,3)} for x in range(1,3)}
-    converted_subject_timeseries = change_dtype(subject_timeseries=subject_timeseries, dtype=np.float32)
+    converted_subject_timeseries = change_dtype(subject_timeseries_list=[subject_timeseries], dtype=np.float32)
     for subj_id in subject_timeseries:
         for run in subject_timeseries[subj_id]:
             print(f"""
                   subj-{subj_id}; {run}:
                   dtype before conversion {subject_timeseries[subj_id][run].dtype}
-                  dtype after conversion: {converted_subject_timeseries[subj_id][run].dtype}
+                  dtype after conversion: {converted_subject_timeseries["dict_0"][subj_id][run].dtype}
                   """)
 
 .. rst-class:: sphx-glr-script-out

diff --git a/docs/examples/merge.rst b/docs/examples/merge.rst
@@ -2,7 +2,7 @@ Tutorial 3: Merging Timeseries With ``neurocaps.analysis.merge_dicts``
 ======================================================================
 Combining the timeseries from different tasks is possible with ``merge_dicts``, this permits running analyses to 
 identify similar CAPs across different tasks, assuming these tasks use the same subjects. The ``merge_dicts()``
-function will produce a combined subject timeseries dictionary that contains only the subject IDs present across both
+function will produce a merged subject timeseries dictionary that contains only the subject IDs present across both
 subject dictionaries. Additionally, this function appends similar run-IDs together. For instance, run-1 from one task
 is appended to run-1 of the other task. For this to work, all dictionaries must contain the same number of columns/ROIs.
 
@@ -21,12 +21,12 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
     subject_timeseries_2 = {str(x) : {f"run-{y}": np.random.rand(20,100) for y in range(1,2)} for x in range(1,3)}
 
     # subject_timeseries_list also takes pickle files and can save the modified dictionaries as pickles too.
-    subject_timeseries_combined = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
-                                          return_combined_dict=True, return_reduced_dicts=False)
+    subject_timeseries_merged = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
+                                          return_merged_dict=True, return_reduced_dicts=False)
 
-    for subj_id in subject_timeseries_combined:
-        for run_id in subject_timeseries_combined[subj_id]:
-            timeseries = subject_timeseries_combined[subj_id][run_id]
+    for subj_id in subject_timeseries_merged["merged"]:
+        for run_id in subject_timeseries_merged["merged"][subj_id]:
+            timeseries = subject_timeseries_merged["merged"][subj_id][run_id]
             print(f"sub-{subj_id}; {run_id} shape is {timeseries.shape}")
 
 .. rst-class:: sphx-glr-script-out
@@ -43,13 +43,13 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
 
     # The original dictionaries can also be returned too. The only modifications done is that the originals will 
     # Only contain the subjects present across all dictionaries in the list
-    combined_dicts = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
-                                        return_combined_dict=True, return_reduced_dicts=True)
+    merged_dicts = merge_dicts(subject_timeseries_list=[subject_timeseries_1, subject_timeseries_2],
+                                        return_merged_dict=True, return_reduced_dicts=True)
 
-    for dict_id in combined_dicts:
-        for subj_id in combined_dicts[dict_id]:
-            for run_id in combined_dicts[dict_id][subj_id]:
-                timeseries = combined_dicts[dict_id][subj_id][run_id]
+    for dict_id in merged_dicts:
+        for subj_id in merged_dicts[dict_id]:
+            for run_id in merged_dicts[dict_id][subj_id]:
+                timeseries = merged_dicts[dict_id][subj_id][run_id]
                 print(f"For {dict_id} sub-{subj_id}; {run_id} shape is {timeseries.shape}")
 
 .. rst-class:: sphx-glr-script-out
@@ -63,8 +63,8 @@ is appended to run-1 of the other task. For this to work, all dictionaries must
         For dict_0 sub-2; run-2 shape is (10, 100)
         For dict_1 sub-1; run-1 shape is (20, 100)
         For dict_1 sub-2; run-1 shape is (20, 100)
-        For combined sub-1; run-1 shape is (30, 100)
-        For combined sub-1; run-2 shape is (10, 100)
-        For combined sub-1; run-3 shape is (10, 100)
-        For combined sub-2; run-1 shape is (30, 100)
-        For combined sub-2; run-2 shape is (10, 100)
+        For merged sub-1; run-1 shape is (30, 100)
+        For merged sub-1; run-2 shape is (10, 100)
+        For merged sub-1; run-3 shape is (10, 100)
+        For merged sub-2; run-1 shape is (30, 100)
+        For merged sub-2; run-2 shape is (10, 100)
diff --git a/docs/examples/standardize.rst b/docs/examples/standardize.rst
@@ -19,13 +19,13 @@ timeseries extraction.
     std_vec_1[std_vec_1 < np.finfo(np.float64).eps] = 1.0
     std_vec_2[std_vec_2 < np.finfo(np.float64).eps] = 1.0
 
-    standardized_subject_timeseries = standardize(subject_timeseries)
-    
+    standardized_subject_timeseries = standardize(subject_timeseries_list=[subject_timeseries])
+
     standardized_1 = (subject_timeseries["1"]["run-1"] - mean_vec_1)/std_vec_1
     standardized_2 = (subject_timeseries["1"]["run-2"] - mean_vec_2)/std_vec_2
 
-    print(np.array_equal(standardized_subject_timeseries["1"]["run-1"], standardized_1))
-    print(np.array_equal(standardized_subject_timeseries["1"]["run-2"], standardized_2))
+    print(np.array_equal(standardized_subject_timeseries["dict_0"]["1"]["run-1"], standardized_1))
+    print(np.array_equal(standardized_subject_timeseries["dict_0"]["1"]["run-2"], standardized_2))
 
 .. rst-class:: sphx-glr-script-out
 

diff --git a/neurocaps/__init__.py b/neurocaps/__init__.py
@@ -3,4 +3,4 @@
 __all__=["analysis", "extraction"]
 
 # Version in single place
-__version__ = "0.14.7"
+__version__ = "0.15.0"