Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix visit merging #64

Closed
wants to merge 9 commits into from
Closed

Fix visit merging #64

wants to merge 9 commits into from

Conversation

svittoz
Copy link
Collaborator

@svittoz svittoz commented Apr 17, 2024

Description

eds_scikit.period.stays.merge_visits function outputs non-deterministic results due to koalas sort_values(...).first() not working as expected.

We add the sort_values_first_koalas function for replacement.

Checklist

  • If this PR is a bug fix, the bug is documented in the test suite.
  • Changes were documented in the changelog (pending section).
  • If necessary, changes were made to the documentation (eg new pipeline).

@svittoz svittoz force-pushed the fix_visit_merging branch 2 times, most recently from 400ecb6 to 82b46b7 Compare May 17, 2024 08:12
Copy link

Coverage Report

NameStmtsMiss∆ MissCover
TOTAL2448154094%
Files without new missing coverage
NameStmtsMiss∆ MissCover
eds_scikit/utils/test_utils.py

Was already missing at line 50

 def date(s):
-     return dt.strptime(s, "%Y-%m-%d")
Was already missing at lines 88-90
         args = tuple(args)
-     elif type(index_or_key) == str:
-         kwargs[index_or_key] = inputs
Was already missing at lines 114-116
     else:
-         normalized_sum_sq_diff = sum_sq_diff / np.sqrt(sum_sq_diff)
-         assert normalized_sum_sq_diff < 0.001

545091%
eds_scikit/utils/flowchart/flowchart.py

Was already missing at line 152

     def __str__(self) -> str:
-         return self.__repr__()

1311099%
eds_scikit/utils/custom_implem/custom_implem.py

Was already missing at line 54

         """
-         return cut(
             x,

221095%
eds_scikit/utils/checks.py

Was already missing at line 127

         if return_index_or_key:
-             return kwargs[argname], argname
         return kwargs[argname]
Was already missing at line 149
         else:
-             to_display_per_concept = [f"- {concept}" for concept in required_concepts]
         str_to_display = "\n".join(to_display_per_concept)
Was already missing at lines 172-189

-         if all(isinstance(table, tuple) for table in required_tables):
  ...
-         super().__init__(message)

7110086%
eds_scikit/utils/bunch.py

Was already missing at line 32

     def __setattr__(self, key, value):
-         self[key] = value
Was already missing at line 35
     def __dir__(self):
-         return self.keys()
Was already missing at lines 38-41
     def __getattr__(self, key):
-         try:
-             return self[key]
-         except KeyError:
             raise AttributeError(key)

115055%
eds_scikit/resources/utils.py

Was already missing at line 19

     if len(splited) == 1:
-         return None
     return splited[-1]

61083%
eds_scikit/resources/reg.py

Was already missing at lines 50-78

             # Looking for a match excluding version string
-             candidates = [
  ...
-             func = r.get(candidates[0])
         return func

164075%
eds_scikit/plot/omop_teva.py

Was already missing at line 108

                 if drop_columns:
-                     table = table.merge(
                         visit_occurrence.drop(columns=drop_columns),

401098%
eds_scikit/period/tagging_functions.py

Was already missing at lines 60-63

         # TODO: is this necessary ?
-         logger.warning("No matching were found between the 2 DataFrames")
- 
-         return framework.DataFrame(
             columns=["person_id", "t_start", "t_end", "concept", "value"]
Was already missing at lines 119-123
         return (B_start >= A_start) & (B_end <= A_end)
-     elif algo == interval_algos.from_before_to:
-         return B_end <= A_start
-     elif algo == interval_algos.to_before_from:
-         return A_end <= B_start
     else:

366083%
eds_scikit/period/stays.py

Was already missing at line 409

         if open_stay_end_datetime is None:
-             open_stay_end_datetime = datetime.now()
         vo["visit_end_datetime_calc"] = open_stay_end_datetime

871099%
eds_scikit/io/i2b2_mapping.py

Was already missing at lines 38-211


-     i2b2_table_name = i2b2_tables[db_source][table]
  ...
-     return df
Was already missing at lines 230-234

-     def f(x):
-         return mapping.get(x, default)
- 
-     return F.udf(f)

7969013%
eds_scikit/io/base.py

Was already missing at line 13

     def __str__(self):
-         return self.__repr__()

91089%
eds_scikit/event/from_code.py

Was already missing at lines 108-111

     else:
-         event.loc[:, "t_start"] = event.loc[:, columns["code_start_datetime"]]
  ...
-         event = event.drop(
             columns=[columns["code_start_datetime"], columns["code_end_datetime"]]

423093%
eds_scikit/event/diabetes.py

Was already missing at lines 88-102

     """
-     diabetes = conditions_from_icd10(
  ...
- 
-     return diabetes

104060%
eds_scikit/event/consultations.py

Was already missing at line 68

     if type(algo) == str:
-         algo = [algo]

611098%
eds_scikit/emergency/emergency_care_site.py

Was already missing at line 54

     if algo == "from_regex_on_parent_UF":
-         return from_regex_on_parent_UF(care_site)
     elif algo == "from_regex_on_care_site_description":
Was already missing at line 166
     """
-     return attributes.get_parent_attributes(
         care_site,

312094%
eds_scikit/datasets/synthetic/biology.py

Was already missing at lines 37-44

     def reset_to_pandas(self):
-         if self.module == "koalas":
  ...
-             self.module = "pandas"

1327095%
eds_scikit/datasets/__init__.py

Was already missing at line 38

 def __dir__():
-     return known_datasets + [func.__name__ for func in __all__]
Was already missing at lines 52-56
 def add_dataset(table: pd.DataFrame, name: str):
-     dataset_path = os.path.abspath(
-         os.path.join(os.path.dirname(__file__), name + ".csv")
-     )
-     table.to_csv(dataset_path, index=False)
Was already missing at line 67
     """
-     return [func.__name__ for func in __all__]

264085%
eds_scikit/biology/viz/plot.py

Was already missing at line 72

     else:
-         logger.error(
             "The folder {} has not been found",
Was already missing at lines 718-720
     else:
-         terminologies_hist = alt.Chart().mark_text()
-         terminologies_time_series = (
             alt.Chart(measurement)

1303098%
eds_scikit/biology/viz/aggregate.py

Was already missing at line 83

     if stats_only:
-         return {"measurement_stats": measurement_stats}
Was already missing at line 208
     if overall_only:
-         return measurement_stats_overall

972098%
eds_scikit/biology/utils/config.py

Was already missing at lines 30-66

     """
-     my_custom_config = pd.DataFrame()
  ...
-     register_configs()
Was already missing at lines 73-75
     for config in glob.glob(os.path.join(CONFIGS_PATH, "*.csv")):
-         config_name = Path(config).stem
-         registry.data.register(
             f"get_biology_config.{config_name}",
Was already missing at lines 89-94
     """
-     registered = list(registry.data.get_all().keys())
-     configs = [
-         r.split(".")[-1] for r in registered if r.startswith("get_biology_config")
-     ]
-     return configs

3522037%
eds_scikit/biology/cleaning/cohort.py

Was already missing at line 28

     if isinstance(studied_pop, DataFrame.__args__):
-         filtered_measures = measurement.merge(
             studied_pop,

91089%

63 files skipped due to complete coverage.

Coverage success: total of 94% is above 94% 🎉

@svittoz svittoz closed this Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant