# Data Normalization Demo
We'll use three user-define modules below to demonstrate the process of data normalization when **new** job postings are available in the future.

In [1]:
# Import relevant modules
import provenir                  # access to job bulletins and customized printing
import individual_toolkit as itk # access to unit functions that operate on one job at a time
import multiple_toolkit as mtk   # access to aggregate functions that operate on multiple jobs at a time

# JOB_CLASS_TITLE

In [27]:
# Overview of all job titles
mtk.jct_print_results(job_path=mtk.raw_path, job_type=mtk.raw_jobs)

311 DIRECTOR
ACCOUNTANT
ACCOUNTING CLERK
ACCOUNTING RECORDS SUPERVISOR
ADMINISTRATIVE ANALYST
ADMINISTRATIVE CLERK
ADMINISTRATIVE HEARING EXAMINER
ADVANCE PRACTICE PROVIDER CORRECTIONAL CARE
AIR CONDITIONING MECHANIC
AIR CONDITIONING MECHANIC SUPERVISOR
AIRPORT AIDE
AIRPORT CHIEF INFORMATION SECURITY OFFICER
AIRPORT ENGINEER
AIRPORT GUIDE
AIRPORT INFORMATION SPECIALIST
AIRPORT LABOR RELATIONS ADVOCATE
AIRPORT MANAGER
AIRPORT POLICE CAPTAIN
AIRPORT POLICE LIEUTENANT
AIRPORT POLICE OFFICER
AIRPORT POLICE SPECIALIST
AIRPORT SUPERINTENDENT OF OPERATIONS
AIRPORTS MAINTENANCE SUPERINTENDENT
AIRPORTS MAINTENANCE SUPERVISOR
AIRPORTS PUBLIC AND COMMUNITY RELATIONS DIRECTOR
ANIMAL CARE ASSISTANT
ANIMAL CARE TECHNICIAN
WATER TREATMENT OPERATOR
ANIMAL CONTROL OFFICER
ANIMAL KEEPER
APPARATUS OPERATOR
APPLICATIONS PROGRAMMER
APPRENTICE - METAL TRADES
APPRENTICE MACHINIST
AQUARIST
AQUARIUM EDUCATOR
AQUATIC DIRECTOR
AQUATIC FACILITY MANAGER
AQUEDUCT AND RESERVOIR KEEPER
AQUEDUCT AND RESERVOIR SUPERVIS

OFFICE SERVICES ASSISTANT
OFFICE TRAINEE
OPERATIONS AND STATISTICAL RESEARCH ANALYST
PAINTER
PAINTER SUPERVISOR
PARK MAINTENANCE SUPERVISOR
PARK RANGER
PARK SERVICES ATTENDANT
PARK SERVICES SUPERVISOR
PARKING ATTENDANT
PARKING ENFORCEMENT MANAGER
PARKING MANAGER
PARKING METER TECHNICIAN
PARKING METER TECHNICIAN SUPERVISOR
PAYROLL ANALYST
PAYROLL SUPERVISOR
PERFORMING ARTS DIRECTOR
PERSONNEL ANALYST
PERSONNEL DIRECTOR
PERSONNEL RECORDS SUPERVISOR
PERSONNEL RESEARCH ANALYST
PHOTOGRAPHER
PILE DRIVER WORKER
PIPEFITTER
PIPEFITTER SUPERVISOR
PLANNING ASSISTANT
PLUMBER
PLUMBER SUPERVISOR
PLUMBING INSPECTOR
POLICE ADMINISTRATOR
POLICE CAPTAIN
POLICE COMMANDER
POLICE DETECTIVE
POLICE LIEUTENANT
POLICE OFFICER
POLICE PERFORMANCE AUDITOR
POLICE SERGEANT
POLICE SERVICE REPRESENTATIVE
POLICE SPECIAL INVESTIGATOR
POLICE SPECIALIST
POLICE SURVEILLANCE SPECIALIST
POLYGRAPH EXAMINER
PORT ELECTRICAL MECHANIC
PORT ELECTRICAL MECHANIC SUPERVISOR
PORT MAINTENANCE SUPERVISOR
PORT PILOT
PORT POLICE CAPTAIN
P

## First Analysis
There are two jobs that have suspiciously long titles:
* DISTRICT SUPERVISOR ANIMAL SERVICES 4320 022318.txt
* MARINE ENVIRONMENTAL SUPERVISOR 9433 071114 (1).txt

We'll use the `provenir.spotlight` function and the unit function `itk.jct_get_one` to inspect each job individually.

DISTRICT SUPERVISOR ANIMAL SERVICES 4320 022318.txt

In [9]:
# Get the content of this job
content = provenir.spotlight(job_name='DISTRICT SUPERVISOR ANIMAL SERVICES 4320 022318.txt',
                             job_path=mtk.raw_path,
                             job_type=mtk.raw_jobs)
content

'DISTRICT SUPERVISOR ANIMAL SERVICES\n(Class Title of District Supervisor Animal Regulation)\n\nClass Code:       4320\nOpen Date:  02-23-18\n(Exam Open to Current City Employees)\n\nANNUAL SALARY\n\n$76,128 to $111,332\n\nNOTES:\n\n1. Annual salary is at the start of the pay range. The current salary range is subject to change. Please confirm the starting salary with the hiring department before accepting a job offer.\n2. Candidates from the eligible list are normally appointed to vacancies in the lower pay grade positions.\n3. A District Supervisor Animal Services (District Supervisor Animal Regulation) must be available for assignment to various shifts, weekends and holidays, at any one of the animal shelters located in Central Los Angeles, South Central Los Angeles, West Los Angeles, San Pedro, and the San Fernando Valley.\n\nDUTIES\n\nA District Supervisor Animal Services (District Supervisor Animal Regulation) plans, organizes, and directs the work of animal care and control pers

In [10]:
# Get job title
itk.jct_get_one(content)

'DISTRICT SUPERVISOR ANIMAL SERVICES (Class Title of District Supervisor Animal Regulation)'

By looking at the content, it's clear why our unit function gives such a result. It considers anything that comes before <font color='red'>Class Code</font> contributing to a job title. Since the phrase, *(Class Title of District Supervisor Animal Regulation)*, is not related to the job title (not all capitalized) we should consider removing it.

MARINE ENVIRONMENTAL SUPERVISOR 9433 071114 (1).txt

In [19]:
# Get the content of this job
content = provenir.spotlight(job_name='MARINE ENVIRONMENTAL SUPERVISOR 9433 071114 (1).txt',
                             job_path=mtk.raw_path,
                             job_type=mtk.raw_jobs)
content

'\n\n\n\n\n\n\nMARINE ENVIRONMENTAL SUPERVISOR    \n                           \n \t\t\t\t\t\t\t\t\t\tClass  Code:      9433    \n                                                                                                       Open Date:  07-11-14\n\nANNUAL SALARY\n\n $92,769 to $115,278\n*$85,503 to $106,216\n\nNOTES:\n\n*1.  Individuals hired on or after July 2, 2013 shall be hired at three (3) premium levels (one premium level equals 2.75%) \n      below the salary range.\n 2.  Candidates from the eligible list are normally appointed to vacancies in the lower pay grade positions.\n 3.  The current salary range is subject to change. You may confirm the starting salary with the hiring department before accepting a job offer. \n\nDUTIES\n\nA Marine Environmental Supervisor oversees the preparation of studies of the Los Angeles Harbor, hazardous materials site assessments, characterization and remedial action plans, air and water quality programs, environmental assessments, enviro

In [26]:
# Get job title
itk.jct_get_one(content)

'MARINE ENVIRONMENTAL SUPERVISOR Class Code: 9433 Open Date: 07-11-14 ANNUAL SALARY $92,769 to $115,278 *$85,503 to $106,216 NOTES: *1. Individuals hired on or after July 2, 2013 shall be hired at three (3) premium levels (one premium level equals 2.75%) below the salary range. 2. Candidates from the eligible list are normally appointed to vacancies in the lower pay grade positions. 3. The current salary range is subject to change. You may confirm the starting salary with the hiring department before accepting a job offer. DUTIES A Marine Environmental Supervisor oversees the preparation of studies of the Los Angeles Harbor, hazardous materials site assessments, characterization and remedial action plans, air and water quality programs, environmental assessments, environmental impact reports, and statements relative to the effect of port development on the environment; supervises the preparation and administration of contracts for technical environmental services and research; supervis

From the content of this job, we see that <font color='red'>Class Code </font> has two white spaces instead of one. Since this is a violation of section heading, we fix it by removing one white space.

### Conclusion 1
* It is recommended to keep a Change Log (CL) in an Excel file (See CL_UnitFunctions/JOB_CLASS_TITLE/Issue1). This generates **new data**, which one can use for advanced analysis, for example, investigate the consistency of the source used to generate these .txt files.
* It is also recommended to make a copy of the raw data and process data cleaning on this copy. I have two folders, **Job Bulletins** and **JobBulletins_cleaned**, where the former contains pre-cleaning data while the latter contains post-cleaning data.
* Let's look at all the jobs after data cleaning has been made.

In [28]:
# Results after data has been cleaned
mtk.jct_print_results(job_path=mtk.cleaned_path,job_type=mtk.cleaned_jobs)

311 DIRECTOR
ACCOUNTANT
ACCOUNTING CLERK
ACCOUNTING RECORDS SUPERVISOR
ADMINISTRATIVE ANALYST
ADMINISTRATIVE CLERK
ADMINISTRATIVE HEARING EXAMINER
ADVANCE PRACTICE PROVIDER CORRECTIONAL CARE
AIR CONDITIONING MECHANIC
AIR CONDITIONING MECHANIC SUPERVISOR
AIRPORT AIDE
AIRPORT CHIEF INFORMATION SECURITY OFFICER
AIRPORT ENGINEER
AIRPORT GUIDE
AIRPORT INFORMATION SPECIALIST
AIRPORT LABOR RELATIONS ADVOCATE
AIRPORT MANAGER
AIRPORT POLICE CAPTAIN
AIRPORT POLICE LIEUTENANT
AIRPORT POLICE OFFICER
AIRPORT POLICE SPECIALIST
AIRPORT SUPERINTENDENT OF OPERATIONS
AIRPORTS MAINTENANCE SUPERINTENDENT
AIRPORTS MAINTENANCE SUPERVISOR
AIRPORTS PUBLIC AND COMMUNITY RELATIONS DIRECTOR
ANIMAL CARE ASSISTANT
ANIMAL CARE TECHNICIAN
WATER TREATMENT OPERATOR
ANIMAL CONTROL OFFICER
ANIMAL KEEPER
APPARATUS OPERATOR
APPLICATIONS PROGRAMMER
APPRENTICE - METAL TRADES
APPRENTICE MACHINIST
AQUARIST
AQUARIUM EDUCATOR
AQUATIC DIRECTOR
AQUATIC FACILITY MANAGER
AQUEDUCT AND RESERVOIR KEEPER
AQUEDUCT AND RESERVOIR SUPERVIS

PERSONNEL RECORDS SUPERVISOR
PERSONNEL RESEARCH ANALYST
PHOTOGRAPHER
PILE DRIVER WORKER
PIPEFITTER
PIPEFITTER SUPERVISOR
PLANNING ASSISTANT
PLUMBER
PLUMBER SUPERVISOR
PLUMBING INSPECTOR
POLICE ADMINISTRATOR
POLICE CAPTAIN
POLICE COMMANDER
POLICE DETECTIVE
POLICE LIEUTENANT
POLICE OFFICER
POLICE PERFORMANCE AUDITOR
POLICE SERGEANT
POLICE SERVICE REPRESENTATIVE
POLICE SPECIAL INVESTIGATOR
POLICE SPECIALIST
POLICE SURVEILLANCE SPECIALIST
POLYGRAPH EXAMINER
PORT ELECTRICAL MECHANIC
PORT ELECTRICAL MECHANIC SUPERVISOR
PORT MAINTENANCE SUPERVISOR
PORT PILOT
PORT POLICE CAPTAIN
PORT POLICE LIEUTENANT
PORT POLICE OFFICER
PORT POLICE SERGEANT
PORTFOLIO MANAGER
POWER ENGINEERING MANAGER
POWER SHOVEL OPERATOR
PRE-PRESS OPERATOR
PRINCIPAL ACCOUNTANT
PRINCIPAL ANIMAL KEEPER
PRINCIPAL CITY PLANNER
PRINCIPAL CIVIL ENGINEER
PRINCIPAL CIVIL ENGINEERING DRAFTING TECHNICIAN
PRINCIPAL CLERK
PRINCIPAL CLERK POLICE
PRINCIPAL CLERK UTILITY
PRINCIPAL COMMUNICATIONS OPERATOR
PRINCIPAL CONSTRUCTION INSPECTOR
PR

## Second Analysis
If we look carefully at the results above [1], some jobs have strange titles as they start with the phrase, *CAMPUS INTERVIEWS ONLY*:
* CityofLA/JobBulletins_cleaned/ARCHITECTURAL ASSOCIATE 7926 013114 REV 032916.txt
* CityofLA/JobBulletins_cleaned/ENVIRONMENTAL ENGINEERING ASSOCIATE  7871 020113 REV 032916.txt
* CityofLA/JobBulletins_cleaned/STREET LIGHTING ENGINEERING ASSOCIATE 7527 101102 REV 032916.txt

Such inconsistency is due to the job postings themselves. Thus rather than deleting the phrase, we move it to a section called <font color='red'>NOTES:</font> (with colon) and surround it with the forward and backward Python prompt symbol, i.e., >>>CAMPUS INTERVIEWS ONLY<<<, so we know where it is and how to retrieve if we need it later. As always, we note any changes we made in our Change Log.

You may ask what happens if I don't notice the problem with *CAMPUS INTERVIEWS ONLY* job postings because it's so hard to detect such issue. That's where the power of `multiple_toolkit` comes in. This module contains all the **checkpoints** that must be passed so you can confidently trust the final results. If a checkpoint fails, Python will raise an AssertionError, showing which one did not pass and why. These checkpoints are based on my own experience studying the patterns in these job postings. Thus, code and in particular, user-define modules, are written in a way that fosters data cleaning rather than code tweaking when a bug is found. Experience has shown that code tweaking is not suitable for the goal we're aiming at, creating a csv file based on job postings (more details in README file).

Later on if you find any issue that is not covered in one of my checkpoints, all you need to do is add a checkpoint and one assert statement in the main function of JOB_CLASS_TITLE in `multiple_toolkit` (more details in README file). It is very to do so and you will have very reliable results.

[1] Actually, the results above don't have this issue because I already cleaned **my** data. However, if you run my code for the first time using **your** own data, you will see this issue. Furthermore, Python will raise an AssertionError until you normalize the data as suggested.

In [34]:
# Get an n-by-1 dataframe of all job titles
# If a checkpoint fails, AssertionError will be raised
mtk.jct_get_many()

Unnamed: 0,JOB_CLASS_TITLE
0,311 DIRECTOR
1,ACCOUNTANT
2,ACCOUNTING CLERK
3,ACCOUNTING RECORDS SUPERVISOR
4,ADMINISTRATIVE ANALYST
5,ADMINISTRATIVE CLERK
6,ADMINISTRATIVE HEARING EXAMINER
7,ADVANCE PRACTICE PROVIDER CORRECTIONAL CARE
8,AIR CONDITIONING MECHANIC
9,AIR CONDITIONING MECHANIC SUPERVISOR


# <font size=8, color='green'>END DEMO</font>