# Raw Job Records

##### Summary:

This is the core of our data.  Generally models are build off of this data, or aggregates of the data.  In general when using any other file, you will want to do a left join with this file - dropping any reference information that does not have a job record (ie companies with no job records).

###### Fields:

- hash
    - Data Type:  String/varchar
    - Summary: Unique Identifier for Job Records.  This is used to join Job Descriptions to job records
- title
    - Data Type:  String/varchar
    - Summary:  Job title scraped from the unique url for the job post
- company_id
    - Data Type:  String/varchar
    - Summary: Unique Identifier for Company.  This is is used to join to reference files with job records
- company_name
    - Data Type:  String/varchar
    - Summary:  This is the name of the company pertaining to the ID
- city
    - Data Type:  String/varchar
    - Summary:  City location for the job
- state
    - Data Type:  String/varchar
    - Summary:   location for the job
- zip
    - Data Type:  String/varchar
    - Summary:  Zip location for the job
- country
    - Data Type:  String/varchar
    - Summary:  Country location for the job
- created
    - Data Type:  Timestamp
    - Summary:  This is the first time that we scraped the site and saw this job
- last_checked
    - Data Type:  Timestamp
    - Summary:  This is the most recent time that we scraped the site and saw this job
- last_updated
    - Data Type:  Timestamp
    - Summary:  This is the most recent time that we scraped the site and saw this job had been modified
- delete_date
    - Data Type:  Timestamp
    - Summary:  This is the most recent time that we scraped the site and did not find this job
- unmapped_location
    - Data Type:  Boolean
    - Summary:  If True, this indicates that we were not able to accurately identify the location of the job
- onet_occupation_code
    - Data Type:  String
    - Summary:  This is the onet classiciation of the job.  It is one way of normalizing job titles
- url
    - Data Type:  URL
    - Summary:  This is the unique URL for the specific job

# PIT Company Reference

##### Summary:

This file shows point in time compnay information for company ids.  This can be joined to job records to understand some corporate change (ie corporate name change, url changes, etc.).  This would be joined to job records or aggregates to be used, dropping any company ids with no job recods

###### Fields:

- company_id
    - Data Type:  String/varchar
    - Summary: Unique Identifier for Company.  This is is used to join to reference files with job records or aggregates.
- start_date
    - Data Type:  Date
    - Summary:  Start date for this row.  This is used for joining point-in-time information.  Please see join tutorial for assistance with join.
- end_date
    - Data Type:  Date
    - Summary:  End date for this row.  This is used for joining point-in-time information.  Please see join tutorial for assistance with join.
- company_name
    - Data Type:  String/varchar
    - Summary:  This is the name of the company pertaining to the ID
- company_url
    - Data Type:  URL
    - Summary:  This is the URL for the company in our scrape system
- lei
    - Data Type:  String/varchar
    - Summary:  Legal Entity Identifier
- open_perm_id
    - Data Type:  String/varchar
    - Summary:  Company Identifier for joining to other dataset
- naics_code
    - Data Type:  String/varchar
    - Summary:  Industry classification.  This can be used to join to BLS salary data


# PIT Scrape Log

##### Summary:

This documents when scrapes run and changes.  The primary use is identifying outliers.  You can use this to classify outliers as legitimate or due to scrape break.  This can eliminate noise and false signals in job data.

Scrape changes can provide meaninful information as well as just reducing noise.  For example after an influx of financing, a company may change Applicant Tracking Systems (Ie Charming Charlie coming out of bankruptcy).  We know they changed this because we had to modify the scrape to fix it, and the documentation of when that change occurred is in the scrape log.

###### Fields:

- company_id
    - Data Type:  String/varchar
    - Summary: Unique Identifier for Company.  This is is used to join to reference files with job records or aggregates.
- date
    - Data Type:  Date
    - Summary:  Date for the line
- scrape_run_complete
    - Data Type:  Boolean
    - Summary:  If True, this indicates that we scraped the company ID on that date.
- scrape_changed
    - Data Type: Boolean
    - Summary:  If true, this indicates that we modified the code for the scrape on that date.
    

# Raw Job Descriptions

##### Summary:

This contains only the job descriptions.  To use job descriptions, you will join with Job records on hash.  FOR NLP applications you would typically join this with job descriptions and use a combonation of job title from job records file and job description.

###### Fields:

- job_hash
    - Data Type: String/varchar
    - Summary:  Unique Identifier for Job Records.  This is used to join Job Descriptions to job records
- description
    - Data Type: String/varchar
    - Summary:  Full text description that we scrape from the site pertaining to a specific job record

# FS Company Reference

##### Summary:

This file shows point in time ticker information for company ids. This can be joined to job records to understand mergers and acquisitions.

###### Fields:

- company_id
    - Data Type:  String/varchar
    - Summary: Unique Identifier for Company.  This is is used to join to reference files with job records or aggregates.
- factset_entity_id
    - Data Type:  String/varchar
    - Summary: Unique Identifier for  FactSet entity.
- start_date
    - Data Type:  Date
    - Summary:  Start date for this row.  This is used for joining point-in-time information.  Please see join tutorial for assistance with join.
- end_date
    - Data Type:  Date
    - Summary:  End date for this row.  This is used for joining point-in-time information.  Please see join tutorial for assistance with join.
- stock_ticker
    - Data Type:  String/varchar
    - Summary: Ticker Symbol
- stock_exchange_country
    - Data Type:  String/varchar
    - Summary: Country of the stock exchange the Ticker Symbol is traded on
- stock_exchange_name
    - Data Type:  String/varchar
    - Summary: Stock exchange symbol that the ticker is traded on
- ISIN
    - Data Type:  String/varchar
    - Summary: International Securities identification Number, mapped from Factset database using factset_identity_id
- CUSIP
    - Data Type:  String/varchar
    - Summary: Committee on uniform securities identification procedures number, mapped from Factset database using factset_identity_id
- SEDOL
    - Data Type:  String/varchar
    - Summary: Stock Exchange Daily Official List number, mapped from Factset database using factset_identity_id
- primary_flag
    - Data Type:  Boolean
    - Summary: If True, this is the primary ticker for the company.