<a href="https://colab.research.google.com/github/analyticsariel/projects/blob/master/BatchData_SkipTracing_MockData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Batch Data Skip Tracing Mock Data

## Overview
| Detail Tag            | Information                                                                                        |
|-----------------------|----------------------------------------------------------------------------------------------------|
| Originally Created By | Ariel Herrera arielherrera@analyticsariel.com |
| External References   | API |
| Input Datasets        | Source name |
| Output Datasets       | Source name |
| Input Data Source     | Pandas DataFrame |
| Output Data Source    | Pandas DataFrame |

## History
| Date         | Developed By  | Reason                                                |
|--------------|---------------|-------------------------------------------------------|
| 1st Jun 2022 | Ariel Herrera | Create notebook. |

## Getting Started
1. Copy this notebook -> File -> Save a Copy in Drive
2. Directions
  - Signup for BatchData
  - Create mock api key and live api key

## Useful Resources
- [Google Colab Cheat Sheet](https://towardsdatascience.com/cheat-sheet-for-google-colab-63853778c093)
- [BatchData API docs](https://developer.batchdata.com/docs/batchdata/a45e094c668b1-property-skip-trace)
- [Curl Convertor](https://curlconverter.com/#)

## <font color="blue">Install Packages</font>

## <font color="blue">Imports</font>

In [1]:
from google.colab import drive, files # google colab specific
import pandas as pd
import requests
import os
import warnings

pd.set_option('display.max_columns', None) # show all columns

## <font color="blue">Functions</font>

## <font color="blue">Locals & Constants</font>

In [2]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# set working directory
os.chdir('/content/drive/MyDrive/Colab Data/')
dir = os.getcwd()
print('Current working directory:', dir)

# remove pandas warnings
warnings.filterwarnings('ignore')

Current working directory: /content/drive/MyDrive/Colab Data


## <font color="blue">Data</font>

In [4]:
# read in api keys
df_api_keys = pd.read_csv(dir + '/input/api_keys.csv')
# extract api key
batchdata_mock_api_key = df_api_keys.loc[df_api_keys['API'] == 'batchdata_mock']['KEY'].iloc[0]

## <font color="blue">Transformations</font>

### <font color="green">1) Single Property - Skip Trace Search</font>

In [5]:
url = 'https://stoplight.io/mocks/batchdata/batchdata/20349728/property/skip-trace'

json_data = {
  'requests': [
    {
      'propertyAddress': {
        "city": "Franklin Square",
        "street": "1011 Rosegold St",
        "state": "NY",
        "zip": "11010"
      },
    },
  ],
}

headers = {
  'Content-Type': "application/json",
  'Authorization': 'Bearer ' + batchdata_mock_api_key
}

response = requests.post(url, headers=headers, json=json_data)
response.text

'{"status":{"code":200,"text":"OK"},"results":{"persons":[{"_id":"tOdljN72SozLw","bankruptcy":{},"death":{},"dnc":{},"emails":[{"email":"john@example.net"}],"mailingAddress":{},"name":{"first":"john","last":"dow"},"phoneNumbers":[{"number":"123123123","carrier":"test carries","type":"Mobile","tested":true,"reachable":true,"score":100},{"number":"123123","carrier":"test carries","type":"Land Line","tested":false,"reachable":false,"score":95},{"number":"123123","carrier":"test carries.","type":"Land Line","tested":true,"reachable":true,"score":90},{"number":"123123","carrier":"test carries","type":"Land Line","tested":true,"reachable":true,"score":85}],"litigator":false,"propertyAddress":{"houseNumber":"1011","street":"1011 Rosegold St","city":"Franklin Square","county":"Nassau","state":"NY","zip":"11010","zipPlus4":"2507","formattedStreet":"Rosegold St","streetNoUnit":"1011 Rosegold St","hash":"b7bd4cea51b26af459febfbf822c99e2"},"involuntaryLien":[],"property":{"address":{"houseNumber":

In [6]:
response.json()['results']

{'meta': {'apiVersion': '2.10.2',
  'performance': {'endTime': '2022-04-20T11:25:01.179Z',
   'startTime': '2022-04-20T11:25:00.231Z',
   'totalRequestTime': 948},
  'requestId': '1gKSoy00kQLgmyQ',
  'results': {'errorCount': 0,
   'matchCount': 1,
   'noMatchCount': 0,
   'requestCount': 1}},
 'persons': [{'_id': 'tOdljN72SozLw',
   'bankruptcy': {},
   'death': {},
   'dnc': {},
   'emails': [{'email': 'john@example.net'}],
   'involuntaryLien': [],
   'litigator': False,
   'mailingAddress': {},
   'meta': {'error': False, 'matched': True},
   'name': {'first': 'john', 'last': 'dow'},
   'phoneNumbers': [{'carrier': 'test carries',
     'number': '123123123',
     'reachable': True,
     'score': 100,
     'tested': True,
     'type': 'Mobile'},
    {'carrier': 'test carries',
     'number': '123123',
     'reachable': False,
     'score': 95,
     'tested': False,
     'type': 'Land Line'},
    {'carrier': 'test carries.',
     'number': '123123',
     'reachable': True,
     'scor

In [7]:
_df = pd.json_normalize(response.json()['results']['persons'])
print('No of columns:', len(_df.columns))
_df

No of columns: 50


Unnamed: 0,_id,emails,phoneNumbers,litigator,involuntaryLien,name.first,name.last,propertyAddress.houseNumber,propertyAddress.street,propertyAddress.city,propertyAddress.county,propertyAddress.state,propertyAddress.zip,propertyAddress.zipPlus4,propertyAddress.formattedStreet,propertyAddress.streetNoUnit,propertyAddress.hash,property.address.houseNumber,property.address.street,property.address.city,property.address.county,property.address.state,property.address.zip,property.address.zipPlus4,property.address.localities,property.address.hash,property.address.latitude,property.address.longitude,property.address.countyFipsCode,property.address.formattedStreet,property.address.streetNoUnit,property.address.geoStatus,property.owner.name.first,property.owner.name.last,property.owner.mailingAddress.houseNumber,property.owner.mailingAddress.street,property.owner.mailingAddress.city,property.owner.mailingAddress.county,property.owner.mailingAddress.state,property.owner.mailingAddress.zip,property.owner.mailingAddress.formattedStreet,property.owner.mailingAddress.streetNoUnit,property.owner.mailingAddress.hash,property.equity,property.equityPercent,property.absenteeOwner,property.vacant,property.uspsDeliverable,meta.matched,meta.error
0,tOdljN72SozLw,[{'email': 'john@example.net'}],"[{'number': '123123123', 'carrier': 'test carr...",False,[],john,dow,1011,1011 Rosegold St,Franklin Square,Nassau,NY,11010,2507,Rosegold St,1011 Rosegold St,b7bd4cea51b26af459febfbf822c99e2,1011,1011 Rosegold St,Franklin Square,Nassau,NY,11010,2507,[franklin square],b7bd4cea51b26af459febfbf822c99e2,40.704661,-73.677661,36059,Rosegold St,1011 Rosegold St,Rooftop,john,doe,161,161 Sheridan Blvd,Mineola,Nassau,NY,11501,Sheridan Blvd,161 Sheridan Blvd,8659e28750aff33159bdc0dbc484bf6f,171469,30.2,True,False,True,True,False


In [8]:
df_phone_num = pd.DataFrame(_df["phoneNumbers"].iloc[0])
df_phone_num

Unnamed: 0,number,carrier,type,tested,reachable,score
0,123123123,test carries,Mobile,True,True,100
1,123123,test carries,Land Line,False,False,95
2,123123,test carries.,Land Line,True,True,90
3,123123,test carries,Land Line,True,True,85


In [9]:
# filter on valid phone numbers
df_ph_valid = df_phone_num.loc[
  (df_phone_num['reachable'] == True) & (df_phone_num['score'] >= 90)]
# change phone numbers for testing purposes only
df_ph_valid['number'] = [6027828692, 5633574823] # mock dnc phone numbers
df_ph_valid['number'] = df_ph_valid['number'].astype(str)
df_ph_valid

Unnamed: 0,number,carrier,type,tested,reachable,score
0,6027828692,test carries,Mobile,True,True,100
2,5633574823,test carries.,Land Line,True,True,90


In [19]:
url = "https://stoplight.io/mocks/batchdata/batchdata/20349728/phone/dnc"

json_data = {
  'requests': df_ph_valid['number'].tolist(), # pass all valid numbers
}

headers = {
  'Content-Type': "application/json",
  'Authorization': 'Bearer ' + batchdata_mock_api_key
}

dnc_response = requests.post(url, headers=headers, json=json_data)
dnc_response.json()

{'results': {'meta': {'apiVersion': '2.0',
   'performance': {'endTime': '2021-10-05T21:18:51.009Z',
    'startTime': '2021-10-05T21:18:50.576Z',
    'totalRequestTime': 433},
   'requestId': 'qTmKWCL3mDLIpmX',
   'results': {'errorCount': 0,
    'matchCount': 3,
    'noMatchCount': 0,
    'requestCount': 3}},
  'phoneNumbers': [{'dnc': False,
    'meta': {'error': False, 'matched': True},
    'number': '6027828692'},
   {'dnc': True,
    'meta': {'error': False, 'matched': True},
    'number': '5633574823'},
   {'dnc': True,
    'meta': {'error': False, 'matched': True},
    'number': '4808602153'}]},
 'status': {'code': 200, 'text': 'OK'}}

In [12]:
# transform JSON object to a DataFrame and select relevant columns
df_dnc = pd.json_normalize(dnc_response.json()['results']['phoneNumbers'])[['number', 'dnc']]
df_dnc

Unnamed: 0,number,dnc
0,6027828692,False
1,5633574823,True
2,4808602153,True


In [13]:
# convert numbers to string type
df_ph_valid['number'] = df_ph_valid['number'].astype(str)
df_dnc['number'] = df_dnc['number'].astype(str)
# merge valid and dnc tables
df_ph_valid_dnc = pd.merge(df_ph_valid, df_dnc, how='left', on=['number'])
df_ph_valid_dnc

Unnamed: 0,number,carrier,type,tested,reachable,score,dnc
0,6027828692,test carries,Mobile,True,True,100,False
1,5633574823,test carries.,Land Line,True,True,90,True


In [14]:
# filter on phone numbers that are NOT on DNC (Do Not Call) list
df_ph_valid_dnc_fltr = df_ph_valid_dnc.loc[df_ph_valid_dnc['dnc'] == False]
df_ph_valid_dnc_fltr

Unnamed: 0,number,carrier,type,tested,reachable,score,dnc
0,6027828692,test carries,Mobile,True,True,100,False


In [15]:
# create copy of df
df = _df.copy()

# create phone numbers columns
i = 1
# itereate through list of valid numbers
for num in df_ph_valid_dnc_fltr['number'].tolist():
  df['phone' + str(i)] = [num] # create column

# create email column
df['emails'] = df.apply(lambda x: x['emails'][0], axis=1)
df = pd.concat([df, df["emails"].apply(pd.Series)], axis=1)
df

Unnamed: 0,_id,emails,phoneNumbers,litigator,involuntaryLien,name.first,name.last,propertyAddress.houseNumber,propertyAddress.street,propertyAddress.city,propertyAddress.county,propertyAddress.state,propertyAddress.zip,propertyAddress.zipPlus4,propertyAddress.formattedStreet,propertyAddress.streetNoUnit,propertyAddress.hash,property.address.houseNumber,property.address.street,property.address.city,property.address.county,property.address.state,property.address.zip,property.address.zipPlus4,property.address.localities,property.address.hash,property.address.latitude,property.address.longitude,property.address.countyFipsCode,property.address.formattedStreet,property.address.streetNoUnit,property.address.geoStatus,property.owner.name.first,property.owner.name.last,property.owner.mailingAddress.houseNumber,property.owner.mailingAddress.street,property.owner.mailingAddress.city,property.owner.mailingAddress.county,property.owner.mailingAddress.state,property.owner.mailingAddress.zip,property.owner.mailingAddress.formattedStreet,property.owner.mailingAddress.streetNoUnit,property.owner.mailingAddress.hash,property.equity,property.equityPercent,property.absenteeOwner,property.vacant,property.uspsDeliverable,meta.matched,meta.error,phone1,email
0,tOdljN72SozLw,{'email': 'john@example.net'},"[{'number': '123123123', 'carrier': 'test carr...",False,[],john,dow,1011,1011 Rosegold St,Franklin Square,Nassau,NY,11010,2507,Rosegold St,1011 Rosegold St,b7bd4cea51b26af459febfbf822c99e2,1011,1011 Rosegold St,Franklin Square,Nassau,NY,11010,2507,[franklin square],b7bd4cea51b26af459febfbf822c99e2,40.704661,-73.677661,36059,Rosegold St,1011 Rosegold St,Rooftop,john,doe,161,161 Sheridan Blvd,Mineola,Nassau,NY,11501,Sheridan Blvd,161 Sheridan Blvd,8659e28750aff33159bdc0dbc484bf6f,171469,30.2,True,False,True,True,False,6027828692,john@example.net


In [17]:
output_cols = ['name.first', 'name.last',
       'propertyAddress.street', 'propertyAddress.city',
       'propertyAddress.county', 'propertyAddress.state',
       'propertyAddress.zip', 'property.equity',
       'property.equityPercent', 'property.absenteeOwner', 'property.vacant',
       'property.uspsDeliverable', 'phone1', 'email']

df_output = df[output_cols]
df_output

Unnamed: 0,name.first,name.last,propertyAddress.street,propertyAddress.city,propertyAddress.county,propertyAddress.state,propertyAddress.zip,property.equity,property.equityPercent,property.absenteeOwner,property.vacant,property.uspsDeliverable,phone1,email
0,john,dow,1011 Rosegold St,Franklin Square,Nassau,NY,11010,171469,30.2,True,False,True,6027828692,john@example.net


### <font color="green">2) Multi Property - Skip Trace Search</font>

## <font color="blue">Output</font>

In [18]:
# download file
df_output.to_csv('output.csv', index=False)
files.download('output.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# End Notebook