Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
uses: pypa/cibuildwheel@v2.16.5
env:
CIBW_ARCHS_MACOS: "x86_64 arm64"
PIP_EXTRA_INDEX_URL: "https://download.pytorch.org/whl/cpu"

- uses: actions/upload-artifact@v2
with:
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ jobs:
strategy:
fail-fast: true
matrix:
python-version: ["3.7", "3.8", "3.9"]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v2

Expand All @@ -172,6 +172,7 @@ jobs:

- name: Install library
run: |
pip install .
pip install ".[ml]" pytest
pytest tests/pipelines/test_pipelines.py
# uv venv
# uv pip install .
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@ Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```shell
pip install edsnlp==0.13.0
pip install edsnlp==0.13.1
```

or if you want to use the trainable components (using pytorch)

```shell
pip install "edsnlp[ml]==0.13.0"
pip install "edsnlp[ml]==0.13.1"
```

### A first pipeline
Expand Down
9 changes: 5 additions & 4 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
# Changelog

## Unreleased
## v0.13.1

### Added

- `eds.tables` accepts a minimum_table_size (default 2) argument to reduce pollution
- `RuleBasedQualifier` now expose a `process` method that only returns qualified entities and token without actually tagging them, defering this task to the `__call__` method.
- `RuleBasedQualifier` now expose a `process` method that only returns qualified entities and token without actually tagging them, deferring this task to the `__call__` method.
- Added new patterns for metastasis detection. Developed on CT-Scan reports.
- Added citation of articles

### Fixed

- Disorder and Behavor pipes don't use a "PRESENT" or "ABSENT" `status` anymore. Instead, `status=None` by default,
- Disorder and Behavior pipes don't use a "PRESENT" or "ABSENT" `status` anymore. Instead, `status=None` by default,
and `ent._.negation` is set to True instead of setting `status` to "ABSENT". To this end, the *tobacco* and *alcohol*
now use the `NegationQualifier` internaly.
now use the `NegationQualifier` internally.
- Numbers are now only detected without trying to remove the pollution in between digits, ie `55 @ 77777` could be detected as a full number before, but not anymore.
- Fix fsspec open file encoding to "utf-8".

### Changed

- Rename `eds.measurements` to `eds.quantities`
- scikit-learn (used in `eds.endlines`) is no longer installed by default when installing `edsnlp[ml]`

## v0.13.0

Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```{: data-md-color-scheme="slate" }
pip install edsnlp==0.13.0
pip install edsnlp==0.13.1
```

or if you want to use the trainable components (using pytorch)

```{: data-md-color-scheme="slate" }
pip install "edsnlp[ml]==0.13.0"
pip install "edsnlp[ml]==0.13.1"
```

### A first pipeline
Expand Down
12 changes: 8 additions & 4 deletions edsnlp/pipes/core/endlines/endlines.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ class EndLinesMatcher(GenericMatcher):
Behind the scenes, it uses a `endlinesmodel` instance, which is an unsupervised
algorithm based on the work of [@zweigenbaum2016].

!!! warning "Installation"

To use this component, you need to install the `scikit-learn` library.

Training
--------
```python
Expand Down Expand Up @@ -93,12 +97,12 @@ class EndLinesMatcher(GenericMatcher):

Extensions
----------
The `eds.endlines` pipeline declares one extension, on both `Span` and `Token`
objects. The `end_line` attribute is a boolean, set to `True` if the pipeline
The `eds.endlines` pipe declares one extension, on both `Span` and `Token`
objects. The `end_line` attribute is a boolean, set to `True` if the pipe
predicts that the new line is an end line character. Otherwise, it is set to
`False` if the new line is classified as a space.

The pipeline also sets the `excluded` custom attribute on newlines that are
The pipe also sets the `excluded` custom attribute on newlines that are
classified as spaces. It lets downstream matchers skip excluded tokens
(see [normalisation](/pipes/core/normalisation/)) for more detail.

Expand All @@ -113,7 +117,7 @@ class EndLinesMatcher(GenericMatcher):

Authors and citation
--------------------
The `eds.endlines` pipeline was developed by AP-HP's Data Science team based on
The `eds.endlines` pipe was developed by AP-HP's Data Science team based on
the work of [@zweigenbaum2016].
'''

Expand Down
40 changes: 19 additions & 21 deletions edsnlp/pipes/misc/quantities/quantities.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ def __init__(
as_ents: bool = False,
span_setter: Optional[SpanSetterArg] = None,
use_tables: bool = True,
measurements: Union[str, List[Union[str, MsrConfig]], Dict[str, MsrConfig]] = None # deprecated # noqa: E501
measurements: Optional[Union[str, List[Union[str, MsrConfig]], Dict[str, MsrConfig]]] = None, # deprecated # noqa: E501
):

if measurements:
Expand All @@ -632,7 +632,7 @@ def __init__(
"Skipping that step."
)

self.all_quantities = (quantities == "all")
self.all_quantities = quantities == "all"
if self.all_quantities:
quantities = []

Expand All @@ -659,9 +659,7 @@ def __init__(
self.extract_ranges = extract_ranges
self.range_patterns = range_patterns
self.span_getter = (
validate_span_getter(span_getter)
if span_getter is not None
else None
validate_span_getter(span_getter) if span_getter is not None else None
)
self.merge_mode = merge_mode
self.before_snippet_limit = before_snippet_limit
Expand All @@ -676,10 +674,7 @@ def __init__(
"ents": as_ents,
"measurements": True,
"quantities": True,
**{
name: [name]
for name in self.measure_names.values()
}
**{name: [name] for name in self.measure_names.values()},
}

super().__init__(nlp=nlp, name=name, span_setter=span_setter)
Expand Down Expand Up @@ -1033,10 +1028,17 @@ def get_matches_before(i):
table_pd = table._.to_pd_table(as_spans=True)
# Find out the number's row
for _, row in table_pd.iterrows():
start_line = next((item.start for item in row
if item is not None), None)
end_line = next((item.end for item in reversed(row)
if item is not None), None)
start_line = next(
(item.start for item in row if item is not None), None
)
end_line = next(
(
item.end
for item in reversed(row)
if item is not None
),
None,
)
if start_line is None:
continue

Expand Down Expand Up @@ -1136,10 +1138,7 @@ def is_within_row(x):

else:
ent.label_ = self.measure_names[dims]
ent._.set(
ent.label_,
SimpleQuantity(value, unit_norm, self.unit_registry)
)
ent._.set(ent.label_, SimpleQuantity(value, unit_norm, self.unit_registry))

quantities.append(ent)

Expand Down Expand Up @@ -1224,9 +1223,7 @@ def merge_quantities_in_ranges(self, quantities: List[Span]) -> List[Span]:
]
if len(matching_patterns):
try:
new_value = RangeQuantity.from_quantities(
last._.value, ent._.value
)
new_value = RangeQuantity.from_quantities(last._.value, ent._.value)
merged[-1] = last = last.doc[
last.start
if matching_patterns[0][0] is None
Expand Down Expand Up @@ -1296,7 +1293,8 @@ def __call__(self, doc):
existing = (
list(get_spans(doc, self.span_getter))
if self.span_getter is not None
else ())
else ()
)
snippets = (
dict.fromkeys(ent.sent for ent in existing)
if self.span_getter is not None
Expand Down
3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ dependencies = [
"pytz",
"pysimstring>=1.2.1",
"regex",
"spacy>=3.1,<3.8",
"spacy>=3.2,<3.8",
"confit>=0.5.5",
"tqdm",
"umls-downloader>=0.1.1",
Expand Down Expand Up @@ -105,7 +105,6 @@ ml = [
"safetensors>=0.3.0",
"transformers>=4.0.0,<5.0.0",
"accelerate>=0.20.3,<1.0.0",
"scikit-learn>=1.0.0",
]

[project.urls]
Expand Down
2 changes: 1 addition & 1 deletion tests/pipelines/test_pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ def test_import_all():
import edsnlp.pipes

for name in dir(edsnlp.pipes):
if not name.startswith("_"):
if not name.startswith("_") and "endlines" not in name:
getattr(edsnlp.pipes, name)
Loading