Skip to content

Commit

Permalink
fix: typo
Browse files Browse the repository at this point in the history
  • Loading branch information
severinsimmler committed Apr 25, 2019
1 parent d8345bf commit 777d5a3
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
@@ -1,15 +1,17 @@
# A library for preprocessing
`cophi` is a Python library for handling, modeling and processing text corpora. You can easily pipe a collection of text files using the high-level API:

```
```python
corpus, metadata = cophi.corpus(directory="british-fiction-corpus",
pathname_pattern="**/*.txt",
encoding="utf-8",
lowercase=True,
token_pattern=r"\p{L}+\p{P}?\p{L}+")
```

You can also plug the [DARIAH-DKPro-Wrapper](https://dariah-de.github.io/DARIAH-DKPro-Wrapper/) into this pipeline to lemmatize text, or just keep certain word types. Check out the introducing [Jupyter notebook](https://github.com/cophi-wue/cophi-toolbox/blob/master/notebooks/introducing-cophi.ipynb).
You can also plug the [DARIAH-DKPro-Wrapper](https://dariah-de.github.io/DARIAH-DKPro-Wrapper/) into this pipeline to lemmatize text, or just keep certain word types.

>Check out the introducing [Jupyter notebook](https://github.com/cophi-wue/cophi-toolbox/blob/master/notebooks/introducing-cophi.ipynb).

## Getting started
Expand Down

0 comments on commit 777d5a3

Please sign in to comment.