Jobs API can take multiple comma-separated organization codes

- Created v3 of jobs API - Updated README to point at developer API docs, and DRY'd up stuff that appeared in both places - Removed API v1 support - Switched to using Agency API v2 - tags param supports both space and comma delimiters Closes #1
GSA · Jan 22, 2015 · 2ced859 · 2ced859
1 parent fcfb450
commit 2ced859
Show file tree

Hide file tree

Showing 16 changed files with 473,249 additions and 343 deletions.
diff --git a/README.md b/README.md
@@ -4,19 +4,10 @@ Jobs API Server
 [![Build Status](https://travis-ci.org/GSA/jobs_api.png)](https://travis-ci.org/GSA/jobs_api)
 [![Coverage Status](https://coveralls.io/repos/GSA/jobs_api/badge.png?branch=master)](https://coveralls.io/r/GSA/jobs_api?branch=master)
 
-The unemployment rate has hovered around 8 percent since early 2012. So, not surprisingly, many people are hitting the web to search for jobs. Federal, state, and local government agencies are hiring and have thousands of job openings across the country.
+The server code that runs the DigitalGov [Jobs API](http://search.digitalgov.gov/developer/jobs.html) is here on Github. If you're a Ruby developer, keep reading. Fork this repo to add features (such as additional datasets) or fix bugs.
 
-## Current Version
-
-You are reading documentation for Jobs API v2. Documentation for v1 is available [here](https://github.com/GSA/jobs_api/tree/v1).
-
-## Access the Data
-
-Use our [Jobs API](http://usasearch.howto.gov/developer/jobs.html) to tap into a list of current jobs openings with the government. Jobs are searchable by keyword, location, agency, schedule, or any combination of these.
-
-## Contribute to the Code
-
-The server code that runs our [Jobs API](http://usasearch.howto.gov/developer/jobs.html) is here on Github. If you're a Ruby developer, keep reading. Fork this repo to add features (such as additional datasets) or fix bugs.
+The documentation on request parameters and response format is on the [API developer page](http://search.digitalgov.gov/developer/jobs.html). 
+This README just covers software development of the API service itself.
 
 ### Ruby
 
@@ -29,27 +20,30 @@ We use bundler to manage gems. You can install bundler and other required gems l
     gem install bundler
     bundle install
 
-### ElasticSearch
+### Elasticsearch
 
-We're using [ElasticSearch](http://www.elasticsearch.org/) (>= 1.4.0) for fulltext search. On a Mac, it's easy to install with [Homebrew](http://mxcl.github.com/homebrew/).
+We're using [Elasticsearch](http://www.elasticsearch.org/) (>= 1.4.0) for fulltext search. On a Mac, it's easy to install with [Homebrew](http://mxcl.github.com/homebrew/).
 
     $ brew install elasticsearch
 
 Otherwise, follow the [instructions](http://www.elasticsearch.org/download/) to download and run it.
 
 ### Geonames
 
-We use the United States location data from [Geonames.org](http://www.geonames.org) to help geocode the locations of each job position. By assigning latitude and longitude coordinates to each position location, we can sort job results based on proximity to the user's location, provided that information is sent in with the request.
+We use the United States location data from [Geonames.org](http://www.geonames.org) to help geocode the locations of each job position. By assigning latitude and longitude coordinates to each position location, we can sort job results based on proximity to the searcher's location, provided that information is sent in with the request.
 
-Download and extract the 'US.txt' file from [the Geonames archive](http://download.geonames.org/export/dump/US.zip), and import it into ElasticSearch.
+The 'US.txt' file from [the Geonames archive](http://download.geonames.org/export/dump/US.zip) contains goecoding information for many entities that we aren't interested in for the purpose of government jobs (e.g., canals, churches), so we pick out just what we need in order to keep the index small with this AWK script:
 
-    bundle exec rake geonames:import[/path/to/US.txt]
+    awk -F $'\\t' '$8 ~ /PPL|ADM\d?|PRK|BLDG|AIR|INSM/' US.txt > doc/filtered_US.txt
 
-The file contains goecoding information for many entities that we aren't interested in for the purpose of government jobs (e.g., canals, churches), so we pick out just what we need in order to keep the index small with this AWK script:
+This includes populated places, administrative areas, parks, buildings, airports, and military bases.
 
-    awk -F $'\\t' '$8 ~ /PPL|ADM\d?|PRK|BLDG|AIR|INSM/' US.txt > filtered_US.txt
+You can download, unzip, and filter a more recent version of the file if you like, or you can import the one in this repo to get started:
 
-This includes populated places, administrative areas, parks, buildings, airports, and military bases.
+    bundle exec rake geonames:import[doc/filtered_US.txt]
+
+If you are running Elasticsearch with the default 1g JVM heap, this import process will be pretty slow. 
+You may want to consider [allocating more memory](http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/heap-sizing.html) to Elasticsearch.
 
 ### Seed jobs data
 
@@ -74,80 +68,23 @@ Fire up a server and try it all out.
 
 <http://127.0.0.1:3000/search.json?query=nursing+jobs&organization_id=VATA&hl=1>
 
-### Parameters
-
-These parameters are accepted:
-
-1. query
-2. organization_id
-3. hl [highlighting]
-4. size
-5. from
-6. tags
-7. lat_lon
-
-Full documentation on the parameters is in our [Jobs API documentation](http://usasearch.howto.gov/developer/jobs.html#parameters).
-
-### Results
-
-* `id`: Job identifier.
-* `position_title`: The brief title of the job.
-* `organization_name`: The full name of the hiring organization.
-* `minimum, maximum`: The remuneration range for this position.
-* `rate_interval_code`: This two letter code specifies the frequency of payment, most usually yearly or hourly. The full list of possibilities is [here](https://schemas.usajobs.gov/Enumerations/CodeLists.xml), about halfway down the page.
-* `start_date, end_date`: The application period for this position.
-* `locations`: Note that a job opening can have multiple locations associated with it.
-* `url`: The official listing for the job.
-
-Sample results:
-
-    [
-      {
-        "id": "usajobs:327358300",
-        "position_title": "Student Nurse Technicians",
-        "organization_name": "Veterans Affairs, Veterans Health Administration",
-        "minimum": 27,
-        "maximum": 34,
-        "rate_interval_code": "PH",
-        "start_date": "2012-12-29",
-        "end_date": "2013-2-28",
-        "locations": [
-          "Odessa, TX",
-          "Fairfax, VA",
-          "San Angelo, TX",
-          "Abilene, TX"
-        ],
-        "url": "https://www.usajobs.gov/GetJob/ViewDetails/327358300"
-      },
-      {
-        "id": "usajobs:325054900",
-        "position_title": "Physician (Surgical Critical Care)",
-        "organization_name": "Veterans Affairs, Veterans Health Administration",
-        "minimum": 100000,
-        "maximum": 150000,
-        "rate_interval_code": "PA",
-        "start_date": "2012-12-29",
-        "end_date": "2013-2-28",
-        "locations": [
-          "Charleston, SC"
-        ],
-        "url": "https://www.usajobs.gov/GetJob/ViewDetails/325054900"
-      }
-    ]
+### Parameters and Results
+
+Full documentation on the parameters and result format is in our [Jobs API documentation](http://search.digitalgov.gov/developer/jobs.html).
 
 ### Expiration
 
 When a job opening's end application date has passed, it is automatically purged from the index and won't show up in search results.
 
 ### API Versioning
 
-We support API versioning with JSON format. The current version is v2. You can specify a specific JSON API version like this:
+We support API versioning with JSON format. The current/default version is v3. You can specify a specific JSON API version like this:
 
-    curl -H 'Accept: application/vnd.usagov.position_openings.v2' http://localhost:3000/search.json?query=jobs
+    curl -H 'Accept: application/vnd.usagov.position_openings.v3' http://localhost:3000/search.json?query=jobs
 
 ### Tests
 
-These require an [ElasticSearch](http://www.elasticsearch.org/) server to be running.
+These require an [Elasticsearch](http://www.elasticsearch.org/) server to be running.
 
     bundle exec rake spec
 
@@ -157,11 +94,7 @@ We track test coverage of the codebase over time, to help identify areas where w
 
 After running your tests, view the report by opening `coverage/index.html`.
 
-Click around on the files that have < 100% coverage to see what lines weren't exercised.
-
-## Terms of Use
-
-By accessing this Jobs API server, you agree to our [Terms of Service](http://www.usa.gov/About/developer-resources/terms-of-service.shtml).
+Click around on the files that have less than 100% coverage to see what lines weren't exercised by the tests.
 
 Feedback
 --------

diff --git a/app/classes/query.rb b/app/classes/query.rb
@@ -1,15 +1,15 @@
 class Query
 
   JOB_KEYWORD_TOKENS = '(position|job|employment|career|trabajo|puesto|empleo|vacante)s?'.freeze
-  NON_CAPTURING_JOB_KEYWORD_TOKENS = JOB_KEYWORD_TOKENS.sub('(','(?:')
+  NON_CAPTURING_JOB_KEYWORD_TOKENS = JOB_KEYWORD_TOKENS.sub('(', '(?:')
   STOPWORDS = 'appl(y|ications?)|for|the|a|and|available|gov(ernment)?|usa|current|civilian|fed(eral)?|(usajob|opening|posting|description|announcement|listing)s?|(opportunit|vacanc)(y|ies)|search(es)?|(posicion|ocupacion|oportunidad|federal)es|gobierno'.freeze
 
-  attr_accessor :location, :organization_id, :keywords, :position_offering_type_code, :position_schedule_type_code, :rate_interval_code
+  attr_accessor :location, :organization_ids, :keywords, :position_offering_type_code, :position_schedule_type_code, :rate_interval_code
 
-  def initialize(query, organization_id)
-    organization_id.upcase! if organization_id.present?
+  def initialize(query, organization_ids)
+    organization_ids.each(&:upcase!) if organization_ids.present?
     self.keywords = parse(normalize(query)) if query.present?
-    self.organization_id ||= organization_id
+    self.organization_ids ||= organization_ids
   end
 
   def has_state?
@@ -21,12 +21,16 @@ def has_city?
   end
 
   def valid?
-    keywords.present? || location.present? || organization_id.present? ||
+    keywords.present? || location.present? || organization_ids.present? ||
       position_offering_type_code.present? || position_schedule_type_code.present? || rate_interval_code.present?
   end
 
-  def organization_format
-    organization_id.length == 2 ? :prefix : :term
+  def organization_prefixes
+    organization_ids.select { |str| str.length == 2 }
+  end
+
+  def organization_terms
+    organization_ids.select { |str| str.length > 2 }
   end
 
   private
@@ -45,12 +49,12 @@ def parse(query)
       nil
     end
     query.gsub!(/ ?(at|with) (.*) in (.*)/) do
-      self.organization_id = Agencies.find_organization_id($2)
+      self.organization_ids = Agencies.find_organization_ids($2)
       self.location = Location.new($3)
       nil
     end
     query.gsub!(/ ?(at|with) (.*)/) do
-      self.organization_id = Agencies.find_organization_id($2)
+      self.organization_ids = Agencies.find_organization_ids($2)
       nil
     end
     query.gsub!(/ ?in (.*)/) do
@@ -61,16 +65,16 @@ def parse(query)
       self.location = Location.new(location_str)
       query.gsub!(location_str, '')
     end
-    if self.organization_id.nil? && (possible_org = extract_possible_org(query))
-      if (self.organization_id = Agencies.find_organization_id(possible_org))
+    if self.organization_ids.nil? && (possible_org = extract_possible_org(query))
+      if (self.organization_ids = Agencies.find_organization_ids(possible_org))
         query.gsub!(possible_org, '')
       end
     end
     query.gsub(/\b#{JOB_KEYWORD_TOKENS}\b/, '').squish
   end
 
   def normalize(query)
-    query.downcase.gsub('.','').gsub(/[^0-9a-z \-]/, ' ').gsub(/\b(#{Date.current.year}|#{STOPWORDS})\b/, ' ').squish
+    query.downcase.gsub('.', '').gsub(/[^0-9a-z \-]/, ' ').gsub(/\b(#{Date.current.year}|#{STOPWORDS})\b/, ' ').squish
   end
 
   def extract_possible_org(query)

diff --git a/app/controllers/api/v1/position_openings_controller.rb b/app/controllers/api/v1/position_openings_controller.rb
diff --git a/app/controllers/api/v3/position_openings_controller.rb b/app/controllers/api/v3/position_openings_controller.rb
@@ -0,0 +1,10 @@
+class Api::V3::PositionOpeningsController < ApplicationController
+  include NewRelic::Agent::Instrumentation::ControllerInstrumentation
+
+  def search
+    @position_openings = PositionOpening.search_for(params.slice(:query, :organization_ids, :tags, :size, :from, :hl, :lat_lon))
+    render
+  end
+
+  add_transaction_tracer :search
+end
diff --git a/app/models/position_opening.rb b/app/models/position_opening.rb
@@ -14,37 +14,37 @@ def create_search_index
           settings: {
             index: {
               analysis: {
-                analyzer: {custom_analyzer: {type: 'custom', tokenizer: 'whitespace', filter: %w(standard lowercase synonym snowball)}},
-                filter: {synonym: {type: 'synonym', synonyms: SYNONYMS}}
+                analyzer: { custom_analyzer: { type: 'custom', tokenizer: 'whitespace', filter: %w(standard lowercase synonym snowball) } },
+                filter: { synonym: { type: 'synonym', synonyms: SYNONYMS } }
               }
             }
           },
           mappings: {
             position_opening: {
-              _timestamp: {enabled: true},
-              _ttl: {enabled: true},
+              _timestamp: { enabled: true },
+              _ttl: { enabled: true },
               properties: {
-                type: {type: 'string'},
-                source: {type: 'string', index: :not_analyzed},
-                tags: {type: 'string', analyzer: 'keyword'},
-                external_id: {type: 'integer'},
-                position_title: {type: 'string', analyzer: 'custom_analyzer', term_vector: 'with_positions_offsets', store: true},
-                organization_id: {type: 'string', analyzer: 'keyword'},
-                organization_name: {type: 'string', index: :not_analyzed},
+                type: { type: 'string' },
+                source: { type: 'string', index: :not_analyzed },
+                tags: { type: 'string', analyzer: 'keyword' },
+                external_id: { type: 'integer' },
+                position_title: { type: 'string', analyzer: 'custom_analyzer', term_vector: 'with_positions_offsets', store: true },
+                organization_id: { type: 'string', analyzer: 'keyword' },
+                organization_name: { type: 'string', index: :not_analyzed },
                 locations: {
                   type: 'nested',
                   properties: {
-                    city: {type: 'string', analyzer: 'simple'},
-                    state: {type: 'string', analyzer: 'keyword'},
-                    geo: {type: 'geo_point'}}},
-                start_date: {type: 'date', format: 'YYYY-MM-dd'},
-                end_date: {type: 'date', format: 'YYYY-MM-dd'},
-                minimum: {type: 'float'},
-                maximum: {type: 'float'},
-                position_offering_type_code: {type: 'integer'},
-                position_schedule_type_code: {type: 'integer'},
-                rate_interval_code: {type: 'string', analyzer: 'keyword'},
-                id: {type: 'string', index: :not_analyzed, include_in_all: false}
+                    city: { type: 'string', analyzer: 'simple' },
+                    state: { type: 'string', analyzer: 'keyword' },
+                    geo: { type: 'geo_point' } } },
+                start_date: { type: 'date', format: 'YYYY-MM-dd' },
+                end_date: { type: 'date', format: 'YYYY-MM-dd' },
+                minimum: { type: 'float' },
+                maximum: { type: 'float' },
+                position_offering_type_code: { type: 'integer' },
+                position_schedule_type_code: { type: 'integer' },
+                rate_interval_code: { type: 'string', analyzer: 'keyword' },
+                id: { type: 'string', index: :not_analyzed, include_in_all: false }
               }
             }
           }
@@ -56,9 +56,10 @@ def search_for(options = {})
       options.reverse_merge!(size: 10, from: 0, sort_by: :_timestamp)
       document_limit = [options[:size].to_i, MAX_RETURNED_DOCUMENTS].min
       source = options[:source]
-      tags = options[:tags].present? ? options[:tags].split : nil
+      tags = options[:tags].present? ? options[:tags].split(/[ ,]/) : nil
       lat, lon = options[:lat_lon].split(',') rescue [nil, nil]
-      query = Query.new(options[:query], options[:organization_id])
+      organization_ids = organization_ids_from_options(options)
+      query = Query.new(options[:query], organization_ids)
 
       search = Tire.search index_name do
         query do
@@ -76,7 +77,14 @@ def search_for(options = {})
               end
             end if query.keywords.present? && query.location.nil?
             must { match :rate_interval_code, query.rate_interval_code } if query.rate_interval_code.present?
-            must { send(query.organization_format, :organization_id, query.organization_id) } if query.organization_id.present?
+            must do
+              boolean do
+                should { terms :organization_id, query.organization_terms } if query.organization_terms.present?
+                query.organization_prefixes.each do |organization_prefix|
+                  should { prefix :organization_id, organization_prefix }
+                end if query.organization_prefixes.present?
+              end
+            end if query.organization_ids.present?
             must do
               nested path: 'locations' do
                 query do
@@ -90,7 +98,7 @@ def search_for(options = {})
           end
         end if source.present? || tags || query.valid?
 
-        filter :range, start_date: {lte: Date.current}
+        filter :range, start_date: { lte: Date.current }
 
         if query.keywords.blank?
           if lat.blank? || lon.blank?
@@ -109,7 +117,7 @@ def search_for(options = {})
         end
         size document_limit
         from options[:from]
-        highlight position_title: {number_of_fragments: 0}
+        highlight position_title: { number_of_fragments: 0 }
       end
 
       Rails.logger.info("[Query] #{options.merge(result_count: search.results.total).to_json}")
@@ -188,5 +196,13 @@ def url_for_position_opening(position_opening)
           nil
       end
     end
+
+    def organization_ids_from_options(options)
+      organization_ids = []
+      organization_ids << options[:organization_id] if options[:organization_id].present?
+      organization_ids.concat options[:organization_ids].split(',') if options[:organization_ids].present?
+      organization_ids
+    end
+
   end
 end
diff --git a/app/views/api/v3/position_openings/search.json.jbuilder b/app/views/api/v3/position_openings/search.json.jbuilder
@@ -0,0 +1,3 @@
+json.array! @position_openings do |position_opening|
+  json.(position_opening, :id, :position_title, :organization_name, :rate_interval_code, :minimum, :maximum, :start_date, :end_date, :locations, :url)
+end