Skip to content
FashionBrain D2.1: Named Entity Recognition and Linking Methods
Branch: master
Clone or download
Pull request Compare This branch is 1 commit ahead of eXascaleInfolab:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
output
src
README.md

README.md

FashionNLP

FashionBrain D2.1: Named Entity Recognition and Linking Methods EU project 732328: "FashionBrain"

In this repository, we provide a natural language processing tool called FashionNLP which is specially designed for fashion textual data. This tool extends existing state of the art NER technique to fashion application. More specifically, FashionNLP has three main components: NER, where fashion entities are recognized on textual data, NEL, where we link the fashion entity to the FashionBrain taxonomy and finally, in case the fashion entity does not exist in the FashionBrain taxonomy, we add it to the taxonomy.

Getting Started

This project requires PyTorch 0.4+ and Python 3.6+. you need to install Flair using this command

pip install flair

Then, you can start using the FashionNLP package:

git clone https://github.com/FashionBrainTeam/fashion_nlp_v2
cd ./fashion_nlp_v2/

Description of the FashionNLP Package

The "fashionnlp" package contains three folders:

  • src contains three python scripts:
    • lstm_fashion.py: the implementation of the LSTM-CRF models
    • bootsrap_lstm.py: the implementation of the bootstraping approach
    • taxonomy_matching.py: the implementation of the taxonomy enrichment
  • data contains the training set (fashion_items_train.txt), the testing set (fashion_items_test.txt) and the FashionBrain taxonomy (FBtaxonomy.csv)
  • output contains the results of the bootstrap approach.

Running the code

  • The arguments needed to train an LSTM-CRF model are:
    • data folder: Path to input folder containing training and testing sets
    • embedding: Type of embedding and it could be one of the three options: ‘no char’, ‘char’ or ‘flair’ The word ‘fashion’ should be filled with either white or black RGB # F1
    • epochs: Number of epochs to train the chosen model
    • output folder: Path to the output folder to save three files: loss.tsv contains the accuracy measures in each epoch, test.tsv contains the the testing set with model labels and training.log contains the log history.

Example:

   python lstm_fashion.py --data_folder '../data/lstm_input' --embedding 'no_char' --epochs 150 --output_folder '../output'
  • The arguments needed to use the bootstrap approach are:
    • model: Path to folder containing the model to load
    • first iteration: Path to folder containing the first iteration input data
    • second iteration: Path to folder containing the second iteration input data
    • epochs: Number of epochs to train the chosen model
    • retrained: Path to the data file trained in the second iteration
    • output folder: Path to the output folder to save three files

Example:

   python bootsrap_lstm.py --model '../output/no_char_1st_iter/final-model.pt' --first_iteration '../data/lstm_input' --second_iteration '../data/lstm_bootstrap' --epochs 100 --retrained '../data/lstm_bootstrap/retrained_data.tsv' --output_folder '../output/no_char_2nd_iter'
  • The arguments needed to enrich the FashionBrain taxonomy:
    • taxonomy: Path to the FashionBrain taxonomy
    • test result: Path to the file containing the testing result filled with orange. Web safe RGB # F16823
   python taxonomy_matching.py --taxonomy '../data/enrichment_input/FBtaxonomy.csv' --test_result '../data/enrichment_input/test_result.txt'
You can’t perform that action at this time.