Skip to content
A pure python tool to match and extract phone number from url or text, compatible with spark.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pnmatcher
spark_dependencies
tests
.DS_Store
.gitignore
LICENSE
README.md
dig_phone_extractor.py
example.py
phone_number_extractor.py
requirements.txt
setup.cfg
setup.py
spark_workflow.py
spark_workflow.sh

README.md

phone-number-matcher

A python2 tool to extract phone number from url or text

About

The phone-number-matcher aims to extract phone numbers from url and text for DIG project.

The precision is more important than recall, and thus a phone number validator is added at the end of extraction process based on Google's libphonenumber.

Install

python setup.py install

or

pip install pnmatcher

Example Usage

initialize pnmatcher

from pnmatcher import PhoneNumberMatcher

matcher = PhoneNumberMatcher()

use pnmatcher for url

url_string = "http://2134529851.backpage.com/FemaleEscorts/r-u-t-a-_your-blonde-_-o-b-s-e-s-s-i-o-n-_-23-23-23-23/30688875"

url_phone_numbers = matcher.match(url_string, source_type='url')

# print url_phone_numbers
# ['2134529851']

use pnmatcher for text

text_string = "Sexy new girl in town searching for a great date wiff u Naughty fresh girl here searching 4 a great date wiff you Sweet new girl in town seeking for a good date with u for80 2sixseven one9zerofor 90hr incall or out call"

text_phone_numbers = matcher.match(text_string, source_type='text')

# print text_phone_numbers
# ['4802671904']

Spark Usage

  1. upload following four files into your spark environment

    • spark_workflow.sh
    • spark_workflow.py
    • spark_dependencies/python_main.zip
    • spark_dependencies/python_lib.zip
  2. run spark_workflow.sh for spark workflow

Project Layout

  • The pnmatcher/ directory holds the python code.
  • The spark_dependencies/ directory contains two zip files that are used for spark workflow.
  • The tests/ holds test scripts to evaluate the program.

Credit

Library

Resource

You can’t perform that action at this time.