Skip to content

Commit

Permalink
# This is a combination of 62 commits.
Browse files Browse the repository at this point in the history
# This is the 1st commit message:
init

This is the initial commit for transportation-data-publishing. Actually, it's a squash of about 70 commits that are now prehistory. To new beginnings...

# This is the commit message #2:

refactor for config life

# This is the commit message #3:

ignore some bs notes

# This is the commit message #4:

delete a line

# This is the commit message #5:

a happy village of python scripts living working loving dying

# This is the commit message #6:

ping and update, then move on

# This is the commit message #7:

write device data to json

# This is the commit message #8:

write to json

# This is the commit message #9:

better config and mind numbing

# This is the commit message #10:

tiny bugs

# This is the commit message #11:

disable pandas csv for now

# This is the commit message #12:

proper log path join

# This is the commit message #13:

restore csv writing with pandas

# This is the commit message #14:

hello worlds

# This is the commit message #15:

hello worlds

# This is the commit message #16:

progreso

# This is the commit message #17:

progreso

# This is the commit message #18:

progreso

# This is the commit message #19:

kits sync

# This is the commit message #20:

hardly working

# This is the commit message #21:

sync knack > kits cctv

# This is the commit message #22:

modules aren't subscriptable bro

# This is the commit message #23:

log dest

# This is the commit message #24:

new log dest

# This is the commit message #25:

whoops debugging

# This is the commit message #26:

signal status pusher for the modern world

# This is the commit message #27:

a tree-hugging logger

# This is the commit message #28:

teach a naive date about the world

# This is the commit message #29:

traffic count processing

# This is the commit message #30:

output directory

# This is the commit message #31:

a quote with a primary key

# This is the commit message #32:

replaced with knack_data_pub

# This is the commit message #33:

query anything!

# This is the commit message #34:

agol > CIFS translation

# This is the commit message #35:

we have a feed

# This is the commit message #36:

file and dataset replacers

# This is the commit message #37:

remove this file

# This is the commit message #38:

logging and email alerts

# This is the commit message #39:

email alerts and traceback logging

# This is the commit message #40:

email alerts and traceback logging

# This is the commit message #41:

fix response parsing keyerror

# This is the commit message #42:

rename signal eng area

# This is the commit message #43:

trace it back

# This is the commit message #44:

handle empty features pt. 1

# This is the commit message #45:

logging and email

# This is the commit message #46:

cleanup

# This is the commit message #47:

logging, etc. ready to deploy

# This is the commit message #48:

rename row_id

# This is the commit message #49:

utc date formatting

# This is the commit message #50:

update for VM deploy

# This is the commit message #51:

happy loaders

# This is the commit message #52:

wip: cleanup

# This is the commit message #53:

WIP: cifs cleanup

# This is the commit message #54:

WIP: in-memory file writing

# This is the commit message #55:

WIP: tabular upsert instead of replace

# This is the commit message #56:

WIP: strip whitespace from input text

# This is the commit message #57:

WIP: use replace instead of strip when parsing csv

# This is the commit message #58:

WIP: logging, email alerts, ready to deploy

# This is the commit message #59:

WIP: more date/time fields

# This is the commit message #60:

WIP: del whitespace

# This is the commit message #61:

Format datetime as ISO 8601 with local timestamp

Prior to this change the datetime fields were stored in UTC time.
With this change datetime fiels are formated according to ISO 8601, with
US/Central timezone explictly indicated in the field value. E.g. 2016-06-15T10:00:00-05:00. The datets also have a TIME field, which was and contiues to be stored as hours, minutes, and seconds, in local time. Although we do not explicity inidicate the the TIME field is in local time, it is obvious thanks to the corresponding datetime field which does indicate the time

# This is the commit message #62:

WIP: tabular upsert instead of replace
  • Loading branch information
johnclary committed Jul 20, 2017
0 parents commit a015c08
Show file tree
Hide file tree
Showing 30 changed files with 4,554 additions and 0 deletions.
17 changes: 17 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.env
*.pyc
.DS_Store
__pycache__
*.zip
dist
secrets.py
schtask.txt
intersection-database
mapDrive.bat
bcycle_1.py
to_test.py
tp.py
script_status.md
log
shell_scripts
*.log
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# transportation-data-publishing

This repo houses ETL scripts for Austin Transportation's open data projects. They're written in Python.

Check out [sig_pub.py](https://github.com/cityofaustin/transportation-data-publishing/blob/master/sig_pub.py) to see the scripts be happy together.

#### ArcGIS Online Helpers (agol_helpers.py)
Query, add, and delete features from an ArcGIS Online Feature Service

#### Data Helpers (data_helpers.py)
Handy bits of code for common ETL tasks, mostly borrowed from Stack Overflow snippets.

#### Socrata Helpers (knack_helpers.py)
Use the Socrata Open Data API to publish #opendata.

#### Knack Helpers (knack_helpers.py)
Scripts for accessing the [Knack API](http://knack.freshdesk.com/support/solutions/articles/5000444173-working-with-the-api).

#### Email Helpers (email_helpers.py)
Helpers for sending emails with [yagmail](https://github.com/kootenpv/yagmail)

#### KITS Helpers (kits_helpers.py)
Scripts for accessing the KITS SQL database which supports Austin Transportation's Advanced Traffic Management System (ATMS).

#### Fake Secrets (fake_secrets.py)
Reference file for setting up secrets.py

#### GitHub Helpers (github_helpers.py)
Helpers for commiting to GitHub with programmaticaly. Code borrowed from @luqmaan and @openaustin's [Construction Permits](https://github.com/open-austin/construction-permits) project.
190 changes: 190 additions & 0 deletions agol_helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
import json
import pdb
import logging
import arrow
import requests


logger = logging.getLogger(__name__)


def get_token(creds):
print("Generate token")
url = 'https://austin.maps.arcgis.com/sharing/rest/generateToken'
params = {'username' : creds['user'],'password' : creds['password'], 'referer' : 'http://www.arcgis.com','f' : 'pjson' }
res = requests.post(url, params=params)
res = res.json()
token = res['token']
return token


def query_all_features(url, token):
url = url + 'query'
where = 'OBJECTID>0'
params = {'f' : 'json','where': where , 'outFields' : '*','token' : token, 'returnGeometry':False }
res = requests.post(url, params=params)
res = res.json()
return res


def query_layer(url, params):
url = url + 'query'
res = requests.post(url, params=params)
res = res.json()
return res


def add_features(url, token, payload):
print('add new features to ArcGIS Online feature service')
url = url + 'addFeatures'
success = 0
fail = 0
params = { 'f':'json','features': json.dumps(payload) ,'token':token}
res = requests.post(url, data=params)
res = res.json()

if 'addResults' not in res:
print('FAIL!')
else:
print('{} features added'.format(len(res['addResults'])))
return res



def delete_features(url, token):
print('delete all existing ArcGIS Online features')
url = url + 'deleteFeatures'
where = 'OBJECTID>0'
params = {'f' : 'json','where': where , 'outFields' : '*','token' : token, 'returnGeometry':False }
res = requests.post(url, params=params)
res = res.json()
success = 0
fail = 0

for feature in res['deleteResults']:
if feature['success'] == True:
success += 1

else:
fail += 1

print('{} features deleted and {} features failed to delete'.format( success, fail ))

return res



def build_payload(data, **options):
# assemble an ArcREST feature object dictionary
# spec: http://resources.arcgis.com/en/help/arcgis-rest-api/#/Feature_object/02r3000000n8000000/
# records without 'LATITUDE' field are ignored
print('build data payload')

if 'require_locations' not in options:
options['require_locations'] = False

payload = []

count = 0

for record in data:
new_record = { 'attributes': {}, 'geometry': { 'spatialReference': {'wkid': 4326} } }

if options['require_locations']:

if not 'LATITUDE' in record:
continue

for attribute in record:

if attribute == 'LATITUDE':
new_record['geometry']['y'] = record[attribute]

elif attribute == 'LONGITUDE':
new_record['geometry']['x'] = record[attribute]

new_record['attributes'][attribute] = record[attribute]

payload.append(new_record)

return payload



def parse_attributes(query_results):
print('parse feature attributes')
results = []

for record in query_results['features']:
results.append(record['attributes'])

return results



def query_atx_street(segment_id):
print('query atx street segment {}'.format(segment_id))

url = 'http://services.arcgis.com/0L95CJ0VTaxqcmED/arcgis/rest/services/TRANSPORTATION_street_segment/FeatureServer/0/query'

where = 'SEGMENT_ID={}'.format(segment_id)

params = {'f' : 'json','where': where , 'returnGeometry':False, 'outFields' : '*'}

res = requests.post(url, params=params)

res = res.json()

if 'features' in res:
if len(res['features']) > 0:
return res['features'][0]['attributes']

else:
return None



def point_in_poly(service_name, layer_id, point_geom, outfields):
# check if point is within polygon feature
# return attributes of containing feature
# assume input geometry spatial reference is WGS84
print('point in poly: {}'.format(service_name))
point = '{},{}'.format(point_geom[0],point_geom[1])

outfields = ','.join( str(e) for e in outfields )

query_url = 'http://services.arcgis.com/0L95CJ0VTaxqcmED/ArcGIS/rest/services/{}/FeatureServer/{}/query'.format(service_name, layer_id)

params = {'f' : 'json','outFields' : outfields, 'geometry': point,'returnGeometry':False, 'spatialRel' :'esriSpatialRelIntersects', 'inSR' : 4326, 'geometryType' : 'esriGeometryPoint'}

res = requests.get(query_url, params=params)

res = res.json()

if 'features' in res:
if len(res['features']) > 0:
return res['features'][0]['attributes']

else:
return ''

else:
raise ValueError('point in poly request failure')



def parse_response(res_msg, req_type):
print('parse response')
success = 0
fail = 0

for record in res_msg[ '{}Results'.format(req_type) ]:
if 'success' in record:
success += 1
else:
fail += 1

return {
"success" : success,
"fail" : fail
}
87 changes: 87 additions & 0 deletions backup_objs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
if __name__ == '__main__' and __package__ is None:
from os import sys, path
sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))

import logging
import pdb
import arrow

import knack_helpers
import email_helpers
import data_helpers
import secrets


now = arrow.now()
now_s = now.format('YYYY_MM_DD')

log_directory = secrets.LOG_DIRECTORY
logfile = '{}/{}_{}.log'.format(log_directory,'backup_objs', now_s)
logging.basicConfig(filename=logfile, level=logging.INFO)
logging.info('START AT {}'.format(str(now)))

objects = ['object_11', 'object_53','object_56','object_12','object_21','object_14','object_13','object_26','object_27','object_29','object_36','object_63','object_31', 'object_70', 'object_35','object_37','object_41','object_42','object_43', 'object_45', 'object_58', 'object_82', 'object_81', 'object_78', 'object_84', 'object_85', 'object_89', 'object_91']

backup_directory = secrets.BACKUP_DIRECTORY
knack_credentials = secrets.KNACK_CREDENTIALS
log_directory = secrets.LOG_DIRECTORY

field_names = []


def main(date_time):

try:

count = 0

for obj in objects:
logging.info( "backup {}".format(obj) )

# get field metadata
fields = knack_helpers.get_all_fields(obj, knack_credentials)

# assign field metadata to 'raw' field name
field_list = {}
for field in fields:
field_list[field['key'] + '_raw'] = field

# get knack object data
data = knack_helpers.get_object_data(obj, knack_credentials)
logging.info( "total records: {}".format(len(data)) )

# parse data
parsed = knack_helpers.parse_data(data, field_list, convert_to_unix=True, include_ids=True)

today = date_time.format('YYYY_MM_DD')

file_name = '{}/{}_{}.csv'.format(backup_directory, obj, today)

try:
data_helpers.write_csv(parsed, file_name=file_name)

except Exception as e:
print(e)
body = 'Data backup of failed when writing csv'
email_helpers.send_email(secrets.ALERTS_DISTRIBUTION, 'data backup exception', body)
raise e

count += 1

return count



except Exception as e:
print('Failed to process data for {}'.format(date_time))
print(e)
body = 'Data backup of failed'
email_helpers.send_email(secrets['ALERTS_DISTRIBUTION'], 'data backup exception', body)
raise e


r = main(now)

print( '{} objects written to file'.format(r) )


Loading

0 comments on commit a015c08

Please sign in to comment.