Skip to content

Commit

Permalink
Add Training Guides (#212) and Improve Doc
Browse files Browse the repository at this point in the history
* fix error in documentation and improve it

* add api.rst documentation file

* first draft of training_guidelines

* removed installation and getting started from index to specific files for easier redability

* added training guide & fixed warnings

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* Update docs/source/training_guide.rst

* fix header error

* clean spacing

* clean spacing README

* added details about the data

* changed countries names to english

* Update docs/source/training_guide.rst

Co-authored-by: David Beauchemin <david.beauchemin.5@ulaval.ca>

* Update docs/source/training_guide.rst

Co-authored-by: David Beauchemin <david.beauchemin.5@ulaval.ca>

* formatting - removed blanck line

* Update docs/source/training_guide.rst

---------

Co-authored-by: Marouane Yassine <marouane.yassine.1@ulaval.ca>
Co-authored-by: Marouane Yassine <46830666+MAYAS3@users.noreply.github.com>
  • Loading branch information
3 people committed Oct 6, 2023
1 parent 2800d57 commit adcbe41
Show file tree
Hide file tree
Showing 10 changed files with 479 additions and 238 deletions.
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,19 +244,20 @@ Once you have Docker Engine and Docker Compose installed, you can run the follow
docker compose up app
```

#### Sentry:
#### Sentry

Also, you can monitor your application usage with [Sentry](https://sentry.io) by setting the environment variable `SENTRY_DSN` to your Sentry's project
DSN. There is an example of the `.env` file in the project's root named `.env_example`. You can copy it using the following command:

```sh
cp .env_example .env
```
#### Request Examples:
#### Request Examples

Once the application is up and running and port `8000` is exported on your localhost, you can send a request with one
of the following methods:

##### cURL POST request:
##### cURL POST request
```sh
curl -X POST --location "http://127.0.0.1:8000/parse/bpemb-attention" --http1.1 \
-H "Host: 127.0.0.1:8000" \
Expand All @@ -267,7 +268,7 @@ curl -X POST --location "http://127.0.0.1:8000/parse/bpemb-attention" --http1.1
]"
```

##### Python POST request:
##### Python POST request

```python
import requests
Expand Down Expand Up @@ -395,20 +396,27 @@ Starting at version 0.9.8, we will also release the weights with the GitHub rele

Before installing deepparse, you must have the latest version of [PyTorch](https://pytorch.org/) in your environment.

- **Install the stable version of deepparse:**
- **Install the stable version of Deepparse:**

```sh
pip install deepparse
```

- **Install the stable version of deepparse with the app extra dependencies:**
- **Install the stable version of Deepparse with the app extra dependencies:**

```sh
pip install deepparse[app] # for bash terminal
pip install 'deepparse[app]' # for ZSH terminal
```

- **Install the latest development version of deepparse:**
- **Install the stable version of Deepparse with all extra dependencies:**

```sh
pip install deepparse[all] # for bash terminal
pip install 'deepparse[all]' # for ZSH terminal
```

- **Install the latest development version of Deepparse:**

```sh
pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev
Expand Down
Binary file added docs/source/_static/img/labeled_addresses.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/source/dataset_container.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _dataset_container:

.. role:: hidden
:class: hidden-section

Expand Down
2 changes: 2 additions & 0 deletions docs/source/examples/fine_tuning.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _fine_tuning:

.. role:: hidden
:class: hidden-section

Expand Down
147 changes: 147 additions & 0 deletions docs/source/get_started/get_started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
.. role:: hidden
:class: hidden-section

Getting Started
===============

.. code-block:: python
from deepparse.parser import AddressParser
from deepparse.dataset_container import CSVDatasetContainer
address_parser = AddressParser(model_type="bpemb", device=0)
# you can parse one address
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")
# or multiple addresses
parsed_address = address_parser(["350 rue des Lilas Ouest Québec Québec G1L 1B6",
"350 rue des Lilas Ouest Québec Québec G1L 1B6"])
# or multinational addresses
# Canada, US, Germany, UK and South Korea
parsed_address = address_parser(
["350 rue des Lilas Ouest Québec Québec G1L 1B6", "777 Brockton Avenue, Abington MA 2351",
"Ansgarstr. 4, Wallenhorst, 49134", "221 B Baker Street", "서울특별시 종로구 사직로3길 23"])
# you can also get the probability of the predicted tags
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6",
with_prob=True)
# Print the parsed address
print(parsed_address)
# or using one of our dataset container
addresses_to_parse = CSVDatasetContainer("./a_path.csv", column_names=["address_column_name"],
is_training_container=False)
address_parser(addresses_to_parse)
The default predictions tags are the following

- ``"StreetNumber"``: for the street number,
- ``"StreetName"``: for the name of the street,
- ``"Unit"``: for the unit (such as apartment),
- ``"Municipality"``: for the municipality,
- ``"Province"``: for the province or local region,
- ``"PostalCode"``: for the postal code,
- ``"Orientation"``: for the street orientation (e.g. west, east),
- ``"GeneralDelivery"``: for other delivery information.

Parse Addresses From the Command Line
*************************************

You can also use our cli to parse addresses using:

.. code-block:: sh
parse <parsing_model> <dataset_path> <export_file_name>
Parse Addresses Using Your Own Retrained Model
**********************************************

See `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/retrained_model_parsing.py>`__ for a complete example.

.. code-block:: python
address_parser = AddressParser(
model_type="bpemb", device=0, path_to_retrained_model="path/to/retrained/bpemb/model.p")
address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")
Retrain a Model
***************
See `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/fine_tuning.py>`__ for a complete example
using Pickle and `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/fine_tuning_with_csv_dataset.py>`__
for a complete example using CSV.

.. code-block:: python
address_parser.retrain(training_container, train_ratio=0.8, epochs=5, batch_size=8)
One can also freeze some layers to speed up the training using the ``layers_to_freeze`` parameter.

.. code-block:: python
address_parser.retrain(training_container, train_ratio=0.8, epochs=5, batch_size=8, layers_to_freeze="seq2seq")
Or you can also give a specific name to the retrained model. This name will be use as the model name (for print and
class name) when reloading it.

.. code-block:: python
address_parser.retrain(training_container, train_ratio=0.8, epochs=5, batch_size=8, name_of_the_retrain_parser="MyNewParser")
Retrain a Model With an Attention Mechanism
*******************************************
See `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/retrain_attention_model.py>`__ for a complete example.

.. code-block:: python
# We will retrain the fasttext version of our pretrained model.
address_parser = AddressParser(model_type="fasttext", device=0, attention_mechanism=True)
address_parser.retrain(training_container, train_ratio=0.8, epochs=5, batch_size=8)
Retrain a Model With New Tags
*****************************
See `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/retrain_with_new_prediction_tags.py>`__ for a complete example.

.. code-block:: python
address_components = {"ATag":0, "AnotherTag": 1, "EOS": 2}
address_parser.retrain(training_container, train_ratio=0.8, epochs=1, batch_size=128, prediction_tags=address_components)
Retrain a Seq2Seq Model From Scratch
************************************

See `here <https://github.com/GRAAL-Research/deepparse/blob/main/examples/retrain_with_new_seq2seq_params.py>`__ for
a complete example.

.. code-block:: python
seq2seq_params = {"encoder_hidden_size": 512, "decoder_hidden_size": 512}
address_parser.retrain(training_container, train_ratio=0.8, epochs=1, batch_size=128, seq2seq_params=seq2seq_params)
Download Our Models
*******************

Here are the URLs to download our pretrained models directly
- `FastText <https://graal.ift.ulaval.ca/public/deepparse/fasttext.ckpt>`__,
- `FastTextAttention <https://graal.ift.ulaval.ca/public/deepparse/fasttext_attention.ckpt>`__,
- `BPEmb <https://graal.ift.ulaval.ca/public/deepparse/bpemb.ckpt>`__,
- `BPEmbAttention <https://graal.ift.ulaval.ca/public/deepparse/bpemb_attention.ckpt>`__,
- `FastText Light <https://graal.ift.ulaval.ca/public/deepparse/fasttext.magnitude.gz>`__ (using `Magnitude Light <https://github.com/davebulaval/magnitude-light>`__),.

Or you can use our cli to download our pretrained models directly using:

.. code-block:: sh
download_model <model_name>

0 comments on commit adcbe41

Please sign in to comment.