A python2 tool to extract phone number from url or text
The phone-number-matcher aims to extract phone numbers from url and text for DIG project.
The precision is more important than recall, and thus a phone number validator is added at the end of extraction process based on Google's libphonenumber.
python setup.py install
pip install pnmatcher
from pnmatcher import PhoneNumberMatcher matcher = PhoneNumberMatcher()
use pnmatcher for url
url_string = "http://2134529851.backpage.com/FemaleEscorts/r-u-t-a-_your-blonde-_-o-b-s-e-s-s-i-o-n-_-23-23-23-23/30688875" url_phone_numbers = matcher.match(url_string, source_type='url') # print url_phone_numbers # ['2134529851']
use pnmatcher for text
text_string = "Sexy new girl in town searching for a great date wiff u Naughty fresh girl here searching 4 a great date wiff you Sweet new girl in town seeking for a good date with u for80 2sixseven one9zerofor 90hr incall or out call" text_phone_numbers = matcher.match(text_string, source_type='text') # print text_phone_numbers # ['4802671904']
upload following four files into your spark environment
spark_workflow.shfor spark workflow
pnmatcher/directory holds the python code.
spark_dependencies/directory contains two zip files that are used for spark workflow.
tests/holds test scripts to evaluate the program.