!!! hint "Consult the tutorials after changes" When we change the API, we make sure that the tutorials show off all possibilities.
See the app-specific tutorials via tf.about.corpora
.
The TEI converter is still in active development. If you need the latest version, clone the TF repo and in its top-level directory run the command:
pip install -e .
2023-11-28
New conversion tool: from PageXML. Still in its infancy. It uses the PageXML tools by Marijn Koolen.
For an example see translatin/logic.
Fix:
- TF did not fetch an earlier version of a corpus if the newest release
contains a
complete.zip
(which only has the latest version).
From now on: TF does not try to download complete.zip
if you pass a version
argument
to the use()
command.
2023-11-15
Various fixes:
- Some package data was not included for the NER annotation tool.
- In the NER tool, the highlighting of hits of the search pattern is now exact, it was sometimes off.
Deleted tf.tools.docsright again, but developed it further in docsright.
2023-11-02
-
Improvement in dependencies. Text-Fabric is no longer mandatory dependent on
openpyxl
,pandas
,pyarrow
,lxml
. The optional dependencies onpygithub
andpython-gitlab
remain, but most users will never need them, because TF can also fetch thecomplete.zip
that is available as release asset for most corpora.Whenever TF invokes a module that is not in the mandatory dependencies, it will act gracefully, providing hints to install the modules in question.
2023-11-01
-
API change in the Annotator: Calling the annotator is now easier:
A.makeNer()
(No need to make an additional
import
statement.)This will give you access to all annotation methods, including using a spreadsheet to read annotation instructions from.
-
Removal of deprecated commands (on the command line) in version 11:
text-fabric
(has becometf
)text-fabric-zip
(has becometf-zip
)text-fabric-make
(has becometf-make
)
-
Spell-checked all bits of the TF docs here (33,000 lines). Wrote a script tf.tools.docsright to separate the code content from the markdown content, and to strip bits from the markdown content that lead to false positives for the spell checker. Then had the Vim spell checker run over those lines and corrected all mistakes by hand. Still, there might be grammar errors and content inaccuracies.
-
12.1.4 follows 12.1.3. quickly, because in corpora without a NER configuration file, TF did not start up properly.
2023-10-29
- Bug fix: the mechanism to make individual exceptions when adding named entities
in the
tf.browser.ner.annotate
tool was broken. Thanks to Daniel Swanson for spotting it. - Additional fixes and enhancements.
2023-10-28
-
In the TF browser there will be a new tab in the vertical sidebar: Annotate, which will give access to manual annotation tools. I am developing the first one, a tool to annotate named entities efficiently, both in the TF browser and in a Jupyter Notebook. Reed more in
tf.about.annotate
.These tools will let you save your work as files on your own computer.
-
In
tf.convert.addnlp
we can now extract more NLP information besides tokens and sentences: part-of-speech, morphological tagging, lemmatisation, named entity recognition
- in the TEI converter.
2023-09-13
Trivial fix in code that exports the data from a job in the TF browser.
In the meanwhile there is unfinished business in the Annotate
tab in the TF browser,
that will come into production in the upcoming 12.1 release.
The Chrome browser has an attractive feature that other browsers such as Safari lack:
It supports the CSS property
content-visibility.
With this property you can prevent
the browser to do the expensive rendering of content that is not visible on the screen.
That makes it possible to load a lot of content in a single page without tripping up
the browser. You also need the
IntersectionObserver
API,
but that is generally supported by browsers. With the help of that API you can
restrict the binding of event listeners to elements that are visible on the screen.
So, you can open the TF browser in Chrome by passing the option --chrome
.
But if Chrome is not installed, it will open in the default browser anyway.
Also, when the opening of the browser fails somehow, the web server is stopped.
2023-07-10
Fixed references to static files that still went to /server
instead of /browser
.
This has to do with the new approach to the TF browser.
2023-07-05
-
The TF browser no longer works with a separate process that holds the TF corpus data. Instead, the web server (flask) loads the corpus itself. This will restrict the usage of the TF browser to local-single-user scenarios.
-
TF no longer exposes the installation options
[browser, pandas]
pip install 'text-fabric[browser]' pip install 'text-fabric[pandas]'
If you work with Pandas (like exporting to Pandas) you have to install it yourself:
pip install pandas pyarrow
The TF browser is always supported.
The reason to have these distinct capabilities was that there are python libraries involved that do not install on the iPad. The simplification of the TF browser makes it possible to be no longer dependent on these modules.
Hence, TF can be installed on the iPad, although the TF browser works is not working there yet. But the auto-downloading of data from GitHub / GitLab works.
- Header. After loading a dataset, a header is shown with shows all kinds of
information about the corpus. But so far, it did not show the TF app settings.
Now they are included in the header. There are two kinds: the explicitly given
settings and the derived and computed settings.
The latter ones will be suppressed when loading a dataset in a Jupyter notebook,
because these settings can become quite big. You can still get them with
A.showContext()
. In the TF browser they will be always included, you find it in the Corpus tab.
See tf.about.releasesold
.