Skip to content
This repository has been archived by the owner on Aug 21, 2019. It is now read-only.

Commit

Permalink
Jobs API can take multiple comma-separated organization codes
Browse files Browse the repository at this point in the history
- Created v3 of jobs API
- Updated README to point at developer API docs, and DRY'd up stuff that appeared in both places
- Removed API v1 support
- Switched to using Agency API v2
- tags param supports both space and comma delimiters

Closes #1
  • Loading branch information
loren committed Jan 22, 2015
1 parent fcfb450 commit 2ced859
Show file tree
Hide file tree
Showing 16 changed files with 473,249 additions and 343 deletions.
109 changes: 21 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,10 @@ Jobs API Server
[![Build Status](https://travis-ci.org/GSA/jobs_api.png)](https://travis-ci.org/GSA/jobs_api)
[![Coverage Status](https://coveralls.io/repos/GSA/jobs_api/badge.png?branch=master)](https://coveralls.io/r/GSA/jobs_api?branch=master)

The unemployment rate has hovered around 8 percent since early 2012. So, not surprisingly, many people are hitting the web to search for jobs. Federal, state, and local government agencies are hiring and have thousands of job openings across the country.
The server code that runs the DigitalGov [Jobs API](http://search.digitalgov.gov/developer/jobs.html) is here on Github. If you're a Ruby developer, keep reading. Fork this repo to add features (such as additional datasets) or fix bugs.

## Current Version

You are reading documentation for Jobs API v2. Documentation for v1 is available [here](https://github.com/GSA/jobs_api/tree/v1).

## Access the Data

Use our [Jobs API](http://usasearch.howto.gov/developer/jobs.html) to tap into a list of current jobs openings with the government. Jobs are searchable by keyword, location, agency, schedule, or any combination of these.

## Contribute to the Code

The server code that runs our [Jobs API](http://usasearch.howto.gov/developer/jobs.html) is here on Github. If you're a Ruby developer, keep reading. Fork this repo to add features (such as additional datasets) or fix bugs.
The documentation on request parameters and response format is on the [API developer page](http://search.digitalgov.gov/developer/jobs.html).
This README just covers software development of the API service itself.

### Ruby

Expand All @@ -29,27 +20,30 @@ We use bundler to manage gems. You can install bundler and other required gems l
gem install bundler
bundle install

### ElasticSearch
### Elasticsearch

We're using [ElasticSearch](http://www.elasticsearch.org/) (>= 1.4.0) for fulltext search. On a Mac, it's easy to install with [Homebrew](http://mxcl.github.com/homebrew/).
We're using [Elasticsearch](http://www.elasticsearch.org/) (>= 1.4.0) for fulltext search. On a Mac, it's easy to install with [Homebrew](http://mxcl.github.com/homebrew/).

$ brew install elasticsearch

Otherwise, follow the [instructions](http://www.elasticsearch.org/download/) to download and run it.

### Geonames

We use the United States location data from [Geonames.org](http://www.geonames.org) to help geocode the locations of each job position. By assigning latitude and longitude coordinates to each position location, we can sort job results based on proximity to the user's location, provided that information is sent in with the request.
We use the United States location data from [Geonames.org](http://www.geonames.org) to help geocode the locations of each job position. By assigning latitude and longitude coordinates to each position location, we can sort job results based on proximity to the searcher's location, provided that information is sent in with the request.

Download and extract the 'US.txt' file from [the Geonames archive](http://download.geonames.org/export/dump/US.zip), and import it into ElasticSearch.
The 'US.txt' file from [the Geonames archive](http://download.geonames.org/export/dump/US.zip) contains goecoding information for many entities that we aren't interested in for the purpose of government jobs (e.g., canals, churches), so we pick out just what we need in order to keep the index small with this AWK script:

bundle exec rake geonames:import[/path/to/US.txt]
awk -F $'\\t' '$8 ~ /PPL|ADM\d?|PRK|BLDG|AIR|INSM/' US.txt > doc/filtered_US.txt

The file contains goecoding information for many entities that we aren't interested in for the purpose of government jobs (e.g., canals, churches), so we pick out just what we need in order to keep the index small with this AWK script:
This includes populated places, administrative areas, parks, buildings, airports, and military bases.

awk -F $'\\t' '$8 ~ /PPL|ADM\d?|PRK|BLDG|AIR|INSM/' US.txt > filtered_US.txt
You can download, unzip, and filter a more recent version of the file if you like, or you can import the one in this repo to get started:

This includes populated places, administrative areas, parks, buildings, airports, and military bases.
bundle exec rake geonames:import[doc/filtered_US.txt]

If you are running Elasticsearch with the default 1g JVM heap, this import process will be pretty slow.
You may want to consider [allocating more memory](http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/heap-sizing.html) to Elasticsearch.

### Seed jobs data

Expand All @@ -74,80 +68,23 @@ Fire up a server and try it all out.

<http://127.0.0.1:3000/search.json?query=nursing+jobs&organization_id=VATA&hl=1>

### Parameters

These parameters are accepted:

1. query
2. organization_id
3. hl [highlighting]
4. size
5. from
6. tags
7. lat_lon

Full documentation on the parameters is in our [Jobs API documentation](http://usasearch.howto.gov/developer/jobs.html#parameters).

### Results

* `id`: Job identifier.
* `position_title`: The brief title of the job.
* `organization_name`: The full name of the hiring organization.
* `minimum, maximum`: The remuneration range for this position.
* `rate_interval_code`: This two letter code specifies the frequency of payment, most usually yearly or hourly. The full list of possibilities is [here](https://schemas.usajobs.gov/Enumerations/CodeLists.xml), about halfway down the page.
* `start_date, end_date`: The application period for this position.
* `locations`: Note that a job opening can have multiple locations associated with it.
* `url`: The official listing for the job.

Sample results:

[
{
"id": "usajobs:327358300",
"position_title": "Student Nurse Technicians",
"organization_name": "Veterans Affairs, Veterans Health Administration",
"minimum": 27,
"maximum": 34,
"rate_interval_code": "PH",
"start_date": "2012-12-29",
"end_date": "2013-2-28",
"locations": [
"Odessa, TX",
"Fairfax, VA",
"San Angelo, TX",
"Abilene, TX"
],
"url": "https://www.usajobs.gov/GetJob/ViewDetails/327358300"
},
{
"id": "usajobs:325054900",
"position_title": "Physician (Surgical Critical Care)",
"organization_name": "Veterans Affairs, Veterans Health Administration",
"minimum": 100000,
"maximum": 150000,
"rate_interval_code": "PA",
"start_date": "2012-12-29",
"end_date": "2013-2-28",
"locations": [
"Charleston, SC"
],
"url": "https://www.usajobs.gov/GetJob/ViewDetails/325054900"
}
]
### Parameters and Results

Full documentation on the parameters and result format is in our [Jobs API documentation](http://search.digitalgov.gov/developer/jobs.html).

### Expiration

When a job opening's end application date has passed, it is automatically purged from the index and won't show up in search results.

### API Versioning

We support API versioning with JSON format. The current version is v2. You can specify a specific JSON API version like this:
We support API versioning with JSON format. The current/default version is v3. You can specify a specific JSON API version like this:

curl -H 'Accept: application/vnd.usagov.position_openings.v2' http://localhost:3000/search.json?query=jobs
curl -H 'Accept: application/vnd.usagov.position_openings.v3' http://localhost:3000/search.json?query=jobs

### Tests

These require an [ElasticSearch](http://www.elasticsearch.org/) server to be running.
These require an [Elasticsearch](http://www.elasticsearch.org/) server to be running.

bundle exec rake spec

Expand All @@ -157,11 +94,7 @@ We track test coverage of the codebase over time, to help identify areas where w

After running your tests, view the report by opening `coverage/index.html`.

Click around on the files that have < 100% coverage to see what lines weren't exercised.

## Terms of Use

By accessing this Jobs API server, you agree to our [Terms of Service](http://www.usa.gov/About/developer-resources/terms-of-service.shtml).
Click around on the files that have less than 100% coverage to see what lines weren't exercised by the tests.

Feedback
--------
Expand Down
30 changes: 17 additions & 13 deletions app/classes/query.rb
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
class Query

JOB_KEYWORD_TOKENS = '(position|job|employment|career|trabajo|puesto|empleo|vacante)s?'.freeze
NON_CAPTURING_JOB_KEYWORD_TOKENS = JOB_KEYWORD_TOKENS.sub('(','(?:')
NON_CAPTURING_JOB_KEYWORD_TOKENS = JOB_KEYWORD_TOKENS.sub('(', '(?:')
STOPWORDS = 'appl(y|ications?)|for|the|a|and|available|gov(ernment)?|usa|current|civilian|fed(eral)?|(usajob|opening|posting|description|announcement|listing)s?|(opportunit|vacanc)(y|ies)|search(es)?|(posicion|ocupacion|oportunidad|federal)es|gobierno'.freeze

attr_accessor :location, :organization_id, :keywords, :position_offering_type_code, :position_schedule_type_code, :rate_interval_code
attr_accessor :location, :organization_ids, :keywords, :position_offering_type_code, :position_schedule_type_code, :rate_interval_code

def initialize(query, organization_id)
organization_id.upcase! if organization_id.present?
def initialize(query, organization_ids)
organization_ids.each(&:upcase!) if organization_ids.present?
self.keywords = parse(normalize(query)) if query.present?
self.organization_id ||= organization_id
self.organization_ids ||= organization_ids
end

def has_state?
Expand All @@ -21,12 +21,16 @@ def has_city?
end

def valid?
keywords.present? || location.present? || organization_id.present? ||
keywords.present? || location.present? || organization_ids.present? ||
position_offering_type_code.present? || position_schedule_type_code.present? || rate_interval_code.present?
end

def organization_format
organization_id.length == 2 ? :prefix : :term
def organization_prefixes
organization_ids.select { |str| str.length == 2 }
end

def organization_terms
organization_ids.select { |str| str.length > 2 }
end

private
Expand All @@ -45,12 +49,12 @@ def parse(query)
nil
end
query.gsub!(/ ?(at|with) (.*) in (.*)/) do
self.organization_id = Agencies.find_organization_id($2)
self.organization_ids = Agencies.find_organization_ids($2)
self.location = Location.new($3)
nil
end
query.gsub!(/ ?(at|with) (.*)/) do
self.organization_id = Agencies.find_organization_id($2)
self.organization_ids = Agencies.find_organization_ids($2)
nil
end
query.gsub!(/ ?in (.*)/) do
Expand All @@ -61,16 +65,16 @@ def parse(query)
self.location = Location.new(location_str)
query.gsub!(location_str, '')
end
if self.organization_id.nil? && (possible_org = extract_possible_org(query))
if (self.organization_id = Agencies.find_organization_id(possible_org))
if self.organization_ids.nil? && (possible_org = extract_possible_org(query))
if (self.organization_ids = Agencies.find_organization_ids(possible_org))
query.gsub!(possible_org, '')
end
end
query.gsub(/\b#{JOB_KEYWORD_TOKENS}\b/, '').squish
end

def normalize(query)
query.downcase.gsub('.','').gsub(/[^0-9a-z \-]/, ' ').gsub(/\b(#{Date.current.year}|#{STOPWORDS})\b/, ' ').squish
query.downcase.gsub('.', '').gsub(/[^0-9a-z \-]/, ' ').gsub(/\b(#{Date.current.year}|#{STOPWORDS})\b/, ' ').squish
end

def extract_possible_org(query)
Expand Down
16 changes: 0 additions & 16 deletions app/controllers/api/v1/position_openings_controller.rb

This file was deleted.

10 changes: 10 additions & 0 deletions app/controllers/api/v3/position_openings_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
class Api::V3::PositionOpeningsController < ApplicationController
include NewRelic::Agent::Instrumentation::ControllerInstrumentation

def search
@position_openings = PositionOpening.search_for(params.slice(:query, :organization_ids, :tags, :size, :from, :hl, :lat_lon))
render
end

add_transaction_tracer :search
end
70 changes: 43 additions & 27 deletions app/models/position_opening.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,37 +14,37 @@ def create_search_index
settings: {
index: {
analysis: {
analyzer: {custom_analyzer: {type: 'custom', tokenizer: 'whitespace', filter: %w(standard lowercase synonym snowball)}},
filter: {synonym: {type: 'synonym', synonyms: SYNONYMS}}
analyzer: { custom_analyzer: { type: 'custom', tokenizer: 'whitespace', filter: %w(standard lowercase synonym snowball) } },
filter: { synonym: { type: 'synonym', synonyms: SYNONYMS } }
}
}
},
mappings: {
position_opening: {
_timestamp: {enabled: true},
_ttl: {enabled: true},
_timestamp: { enabled: true },
_ttl: { enabled: true },
properties: {
type: {type: 'string'},
source: {type: 'string', index: :not_analyzed},
tags: {type: 'string', analyzer: 'keyword'},
external_id: {type: 'integer'},
position_title: {type: 'string', analyzer: 'custom_analyzer', term_vector: 'with_positions_offsets', store: true},
organization_id: {type: 'string', analyzer: 'keyword'},
organization_name: {type: 'string', index: :not_analyzed},
type: { type: 'string' },
source: { type: 'string', index: :not_analyzed },
tags: { type: 'string', analyzer: 'keyword' },
external_id: { type: 'integer' },
position_title: { type: 'string', analyzer: 'custom_analyzer', term_vector: 'with_positions_offsets', store: true },
organization_id: { type: 'string', analyzer: 'keyword' },
organization_name: { type: 'string', index: :not_analyzed },
locations: {
type: 'nested',
properties: {
city: {type: 'string', analyzer: 'simple'},
state: {type: 'string', analyzer: 'keyword'},
geo: {type: 'geo_point'}}},
start_date: {type: 'date', format: 'YYYY-MM-dd'},
end_date: {type: 'date', format: 'YYYY-MM-dd'},
minimum: {type: 'float'},
maximum: {type: 'float'},
position_offering_type_code: {type: 'integer'},
position_schedule_type_code: {type: 'integer'},
rate_interval_code: {type: 'string', analyzer: 'keyword'},
id: {type: 'string', index: :not_analyzed, include_in_all: false}
city: { type: 'string', analyzer: 'simple' },
state: { type: 'string', analyzer: 'keyword' },
geo: { type: 'geo_point' } } },
start_date: { type: 'date', format: 'YYYY-MM-dd' },
end_date: { type: 'date', format: 'YYYY-MM-dd' },
minimum: { type: 'float' },
maximum: { type: 'float' },
position_offering_type_code: { type: 'integer' },
position_schedule_type_code: { type: 'integer' },
rate_interval_code: { type: 'string', analyzer: 'keyword' },
id: { type: 'string', index: :not_analyzed, include_in_all: false }
}
}
}
Expand All @@ -56,9 +56,10 @@ def search_for(options = {})
options.reverse_merge!(size: 10, from: 0, sort_by: :_timestamp)
document_limit = [options[:size].to_i, MAX_RETURNED_DOCUMENTS].min
source = options[:source]
tags = options[:tags].present? ? options[:tags].split : nil
tags = options[:tags].present? ? options[:tags].split(/[ ,]/) : nil
lat, lon = options[:lat_lon].split(',') rescue [nil, nil]
query = Query.new(options[:query], options[:organization_id])
organization_ids = organization_ids_from_options(options)
query = Query.new(options[:query], organization_ids)

search = Tire.search index_name do
query do
Expand All @@ -76,7 +77,14 @@ def search_for(options = {})
end
end if query.keywords.present? && query.location.nil?
must { match :rate_interval_code, query.rate_interval_code } if query.rate_interval_code.present?
must { send(query.organization_format, :organization_id, query.organization_id) } if query.organization_id.present?
must do
boolean do
should { terms :organization_id, query.organization_terms } if query.organization_terms.present?
query.organization_prefixes.each do |organization_prefix|
should { prefix :organization_id, organization_prefix }
end if query.organization_prefixes.present?
end
end if query.organization_ids.present?
must do
nested path: 'locations' do
query do
Expand All @@ -90,7 +98,7 @@ def search_for(options = {})
end
end if source.present? || tags || query.valid?

filter :range, start_date: {lte: Date.current}
filter :range, start_date: { lte: Date.current }

if query.keywords.blank?
if lat.blank? || lon.blank?
Expand All @@ -109,7 +117,7 @@ def search_for(options = {})
end
size document_limit
from options[:from]
highlight position_title: {number_of_fragments: 0}
highlight position_title: { number_of_fragments: 0 }
end

Rails.logger.info("[Query] #{options.merge(result_count: search.results.total).to_json}")
Expand Down Expand Up @@ -188,5 +196,13 @@ def url_for_position_opening(position_opening)
nil
end
end

def organization_ids_from_options(options)
organization_ids = []
organization_ids << options[:organization_id] if options[:organization_id].present?
organization_ids.concat options[:organization_ids].split(',') if options[:organization_ids].present?
organization_ids
end

end
end
3 changes: 3 additions & 0 deletions app/views/api/v3/position_openings/search.json.jbuilder
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
json.array! @position_openings do |position_opening|
json.(position_opening, :id, :position_title, :organization_name, :rate_interval_code, :minimum, :maximum, :start_date, :end_date, :locations, :url)
end
Loading

0 comments on commit 2ced859

Please sign in to comment.