Skip to content

ICTD-IITD/Voice_App_Custom_Entity_Extraction

Repository files navigation

Custom Entity Modules

Thesis link

Create virtual environment

  • This step is optional and is required in-case a virtual environment needs to be created
  • Note: the modules have been tested on python 3.6
virtualenv -p /usr/bin/python3.6 <env_name> 
source path_to_<env_name>/bin/activate

pip install -U sentence-transformers
pip install dateparser
pip install textdistance
pip install stanza

OR
pip install -r requirements.txt

Yes/No Entity module

  • This semantic search is based on BERT sentence embedding i.e. a comparison of the input user query with the phrases present in the dataset (since this is BERT based, it takes time to load the model)

  • Following this link

  • Using the multilingual model : distiluse-base-multilingual-cased

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distiluse-base-multilingual-cased')

Running instructions:

python get_yes_no.py --query="<query text>"

Age entity module

  • This module is built on top of the dateparser library

  • This module essentially calculates the date from the user's response and then subtracts it (and takes the absolute value) to get the user age

python get_age.py --query="<query_text>"

Number entity module

  • This module is built on top of Stanza NLP library
  • It uses PoS tags for Hindi language using the Stanza library
python get_number.py --query="<query_text>"

Name module

The name module is based on a 5 gram approach in which an SVM model predicts the probability of the center word (i.e. 3rd word from beginning) as the name of a person.

Environment setup for Name module:

source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
cd Name/libsvm-3.23/
rm svm-scale svm-train svm-predict svm.o
make
cd python/
make
cd ../../../ (go back to home dir)
from Name import main
pred_name = main.get_name("<query_text>")

Location module

  • The README file for the location module is inside the location folder

Environment setup for location module:

source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi

DoB module

  • This module is a heuristic based DoB extraction approach. The heuristics were developed after manual analysis of how users spoke their date of birth.
python get_dob.py --query="<query_text>"

Voice survey App

  • The code and documentation for the voice survey app is inside the voice_survey_android_app folder

The code for these modules is also present in a single Jupyter Notebook finalEvalForPaper.ipynb

Data

Please request for the data at contact@oniondev.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published