Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove OMOP <>_date columns #59

Merged
merged 1 commit into from
May 30, 2024
Merged

fix: remove OMOP <>_date columns #59

merged 1 commit into from
May 30, 2024

Conversation

Thomzoy
Copy link
Collaborator

@Thomzoy Thomzoy commented Mar 19, 2024

Description

Checklist

  • If this PR is a bug fix, the bug is documented in the test suite.
  • Changes were documented in the changelog (pending section).
  • If necessary, changes were made to the documentation (eg new pipeline).

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.25%. Comparing base (b7c19e6) to head (03a9b6b).
Report is 5 commits behind head on main.

Current head 03a9b6b differs from pull request most recent head 6a09ff4

Please upload reports for the commit 6a09ff4 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
+ Coverage   84.23%   84.25%   +0.01%     
==========================================
  Files          86       86              
  Lines        2550     2553       +3     
==========================================
+ Hits         2148     2151       +3     
  Misses        402      402              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@svittoz
Copy link
Collaborator

svittoz commented Mar 21, 2024

Nice fix! However, it wouldn't resolve the issue for tables without datetime columns, such as the CONCEPT table.

Is there a way to log a message indicating that the date column can be dropped to solve ?

Copy link

Coverage Report

NameStmtsMiss∆ MissCover
TOTAL2281153093%
Files without new missing coverage
NameStmtsMiss∆ MissCover
eds_scikit/utils/test_utils.py

Was already missing at line 50

 def date(s):
-     return dt.strptime(s, "%Y-%m-%d")
Was already missing at lines 88-90
         args = tuple(args)
-     elif type(index_or_key) == str:
-         kwargs[index_or_key] = inputs
Was already missing at lines 114-116
     else:
-         normalized_sum_sq_diff = sum_sq_diff / np.sqrt(sum_sq_diff)
-         assert normalized_sum_sq_diff < 0.001

545091%
eds_scikit/utils/flowchart/flowchart.py

Was already missing at line 152

     def __str__(self) -> str:
-         return self.__repr__()

1311099%
eds_scikit/utils/custom_implem/custom_implem.py

Was already missing at line 54

         """
-         return cut(
             x,

221095%
eds_scikit/utils/checks.py

Was already missing at line 127

         if return_index_or_key:
-             return kwargs[argname], argname
         return kwargs[argname]
Was already missing at line 149
         else:
-             to_display_per_concept = [f"- {concept}" for concept in required_concepts]
         str_to_display = "\n".join(to_display_per_concept)
Was already missing at lines 172-189
-         if all(isinstance(table, tuple) for table in required_tables):
  ...
-         super().__init__(message)

7110086%
eds_scikit/utils/bunch.py

Was already missing at line 32

     def __setattr__(self, key, value):
-         self[key] = value
Was already missing at line 35
     def __dir__(self):
-         return self.keys()
Was already missing at lines 38-41
     def __getattr__(self, key):
-         try:
-             return self[key]
-         except KeyError:
             raise AttributeError(key)

115055%
eds_scikit/resources/utils.py

Was already missing at line 19

     if len(splited) == 1:
-         return None
     return splited[-1]

61083%
eds_scikit/resources/reg.py

Was already missing at lines 50-78

             # Looking for a match excluding version string
-             candidates = [
  ...
-             func = r.get(candidates[0])
         return func

164075%
eds_scikit/period/tagging_functions.py

Was already missing at lines 60-63

         # TODO: is this necessary ?
-         logger.warning("No matching were found between the 2 DataFrames")
- 
-         return framework.DataFrame(
             columns=["person_id", "t_start", "t_end", "concept", "value"]
Was already missing at lines 119-123
         return (B_start >= A_start) & (B_end <= A_end)
-     elif algo == interval_algos.from_before_to:
-         return B_end <= A_start
-     elif algo == interval_algos.to_before_from:
-         return A_end <= B_start
     else:

366083%
eds_scikit/period/stays.py

Was already missing at line 409

         if open_stay_end_datetime is None:
-             open_stay_end_datetime = datetime.now()
         vo["visit_end_datetime_calc"] = open_stay_end_datetime

861099%
eds_scikit/io/i2b2_mapping.py

Was already missing at lines 38-211

-     i2b2_table_name = i2b2_tables[db_source][table]
  ...
-     return df
Was already missing at lines 230-234
-     def f(x):
-         return mapping.get(x, default)
- 
-     return F.udf(f)

7969013%
eds_scikit/io/base.py

Was already missing at line 13

     def __str__(self):
-         return self.__repr__()

91089%
eds_scikit/event/from_code.py

Was already missing at lines 108-111

     else:
-         event.loc[:, "t_start"] = event.loc[:, columns["code_start_datetime"]]
  ...
-         event = event.drop(
             columns=[columns["code_start_datetime"], columns["code_end_datetime"]]

423093%
eds_scikit/event/diabetes.py

Was already missing at lines 88-102

     """
-     diabetes = conditions_from_icd10(
  ...
- 
-     return diabetes

104060%
eds_scikit/event/consultations.py

Was already missing at line 68

     if type(algo) == str:
-         algo = [algo]

611098%
eds_scikit/emergency/emergency_care_site.py

Was already missing at line 54

     if algo == "from_regex_on_parent_UF":
-         return from_regex_on_parent_UF(care_site)
     elif algo == "from_regex_on_care_site_description":
Was already missing at line 166
     """
-     return attributes.get_parent_attributes(
         care_site,

312094%
eds_scikit/datasets/synthetic/biology.py

Was already missing at lines 37-44

     def reset_to_pandas(self):
-         if self.module == "koalas":
  ...
-             self.module = "pandas"

1327095%
eds_scikit/datasets/__init__.py

Was already missing at line 38

 def __dir__():
-     return known_datasets + [func.__name__ for func in __all__]
Was already missing at lines 52-56
 def add_dataset(table: pd.DataFrame, name: str):
-     dataset_path = os.path.abspath(
-         os.path.join(os.path.dirname(__file__), name + ".csv")
-     )
-     table.to_csv(dataset_path, index=False)
Was already missing at line 67
     """
-     return [func.__name__ for func in __all__]

264085%
eds_scikit/biology/viz/plot.py

Was already missing at line 72

     else:
-         logger.error(
             "The folder {} has not been found",
Was already missing at lines 718-720
     else:
-         terminologies_hist = alt.Chart().mark_text()
-         terminologies_time_series = (
             alt.Chart(measurement)

1303098%
eds_scikit/biology/viz/aggregate.py

Was already missing at line 83

     if stats_only:
-         return {"measurement_stats": measurement_stats}
Was already missing at line 208
     if overall_only:
-         return measurement_stats_overall

972098%
eds_scikit/biology/utils/config.py

Was already missing at lines 30-66

     """
-     my_custom_config = pd.DataFrame()
  ...
-     register_configs()
Was already missing at lines 73-75
     for config in glob.glob(os.path.join(CONFIGS_PATH, "*.csv")):
-         config_name = Path(config).stem
-         registry.data.register(
             f"get_biology_config.{config_name}",
Was already missing at lines 89-94
     """
-     registered = list(registry.data.get_all().keys())
-     configs = [
-         r.split(".")[-1] for r in registered if r.startswith("get_biology_config")
-     ]
-     return configs

3522037%
eds_scikit/biology/cleaning/cohort.py

Was already missing at line 28

     if isinstance(studied_pop, DataFrame.__args__):
-         filtered_measures = measurement.merge(
             studied_pop,

91089%

59 files skipped due to complete coverage.

Coverage success: total of 93% is above 93% 🎉

@svittoz svittoz self-requested a review May 30, 2024 15:44
Copy link
Collaborator

@svittoz svittoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice solution !

@svittoz svittoz merged commit dd12804 into main May 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants