In [84]:
import pandas as pd
data_url = 'https://data.cityofnewyork.us/api/views/kpav-sd4t/rows.csv?accessType=DOWNLOAD'
df = pd.read_csv(data_url)

Method

- We will use the midpoint of the salary range for each position.
- For the purposes of this analysis all daily and hourly positions will be shown as full time:
  - Hourly Positions will be estimated at 40 hours a week for 50 weeks
  - Daily Salaries will be estimated at 5 days a week for 50 weeks
- Time the job is left open will be used as a proxy for difficulty to fill
  - This is calculated as the time from last update to the original posting date
- Duplicate rows are removed - only first occurence of each Job ID is shown
  
  
Notes:

Positions like City Seasonal Aide and City Park Worker have hundreds of low paying open positons

In [105]:
pd.set_option('display.max_columns', None)
df.columns

Index([u'Job ID', u'Agency', u'Posting Type', u'# Of Positions',
       u'Business Title', u'Civil Service Title', u'Title Code No', u'Level',
       u'Salary Range From', u'Salary Range To', u'Salary Frequency',
       u'Work Location', u'Division/Work Unit', u'Job Description',
       u'Minimum Qual Requirements', u'Preferred Skills',
       u'Additional Information', u'To Apply', u'Hours/Shift',
       u'Work Location 1', u'Recruitment Contact', u'Residency Requirement',
       u'Posting Date', u'Post Until', u'Posting Updated', u'Process Date'],
      dtype='object')

In [98]:
deduped_df = df.drop_duplicates('Job ID')

In [106]:
deduped_df[deduped_df['Agency']=='DEPT OF PARKS & RECREATION'].head()

Unnamed: 0,Job ID,Agency,Posting Type,# Of Positions,Business Title,Civil Service Title,Title Code No,Level,Salary Range From,Salary Range To,Salary Frequency,Work Location,Division/Work Unit,Job Description,Minimum Qual Requirements,Preferred Skills,Additional Information,To Apply,Hours/Shift,Work Location 1,Recruitment Contact,Residency Requirement,Posting Date,Post Until,Posting Updated,Process Date
191,191646,DEPT OF PARKS & RECREATION,External,16,Forester,FORESTER,81361,2,50000,52460,Annual,Flushing Meadow Pk Olmsted Ctr,Forestry,"The mission of Forestry, Horticulture, and Nat...",1. A Masterâ€™s Degree from an accredited coll...,1. Proficiency in Microsoft Office. 2. Excell...,www.nyc.gov/parks,City employees: 1) Apply through Employee Sel...,,TBD,,This position is exempt from NYC residency req...,04/24/2015 00:00:00,,04/11/2016 00:00:00,02/28/2017 00:00:00
298,223426,DEPT OF PARKS & RECREATION,Internal,3,Architect,ARCHITECT,21215,2,73000,101148,Annual,Flushing Meadow Pk Olmsted Ctr,CP ADMIN,"â€¢ Under general supervision, develop designs...",1. A valid New York State Registration as an ...,1. Excellent knowledge of AutoCAD. 2. Excellen...,*Posting period extended. Previous applicants ...,City employees: 1) Apply through Employee Self...,,"Olmsted Ctr., Queens",,This position is exempt from NYC residency req...,02/19/2016 00:00:00,,01/19/2017 00:00:00,02/28/2017 00:00:00
410,231349,DEPT OF PARKS & RECREATION,Internal,1,Senior Specifications Writer - Structures,ASSISTANT ARCHITECT,21210,0,70000,77000,Annual,Flushing Meadow Pk Olmsted Ctr,CP ADMIN,"â€¢ Under supervision, write and update specif...",1. A Bachelor or a Master of Architecture degr...,"1. Four years of full-time, paid experience in...",www.nyc.gov/parks,City employees: 1) Apply through Employee Self...,,"Olmsted Center, Queens",,This position is exempt from NYC residency req...,02/19/2016 00:00:00,,06/28/2016 00:00:00,02/28/2017 00:00:00
428,227629,DEPT OF PARKS & RECREATION,Internal,1,Electrical Engineer,ELECTRICAL ENGINEER,20315,1,65000,88000,Annual,Flushing Meadow Pk Olmsted Ctr,CP ADMIN,"â€¢ Under general supervision, develop drawing...","(1) Four (4) years of full-time, satisfactory ...",1. Experience with NYC Construction Codes and ...,NOTE: *Posting period extended. Previous appli...,City employees: 1) Apply through Employee Self...,,"Olmsted Center, Queens",,This position is exempt from NYC residency req...,03/25/2016 00:00:00,,06/28/2016 00:00:00,02/28/2017 00:00:00
617,239310,DEPT OF PARKS & RECREATION,Internal,1,Deputy Director of Survey,SURVEYOR,21015,3,80000,90000,Annual,Flushing Meadow Pk Olmsted Ctr,CP ADMIN,â€¢ Under general direction of the Director of...,1. A baccalaureate degree from an accredited c...,"1. Knowledge of AutoCAD, Photoshop, PowerPoint...",www.nyc.gov/parks,City Employees: 1) Apply through Employee Sel...,,"Olmsted Center, Queens",,This position is exempt from NYC residency req...,04/22/2016 00:00:00,,06/28/2016 00:00:00,02/28/2017 00:00:00


In [102]:
salary = deduped_df[['Agency','# Of Positions','Salary Range From', 'Salary Range To','Salary Frequency']]
#salary = salary[salary['Salary Frequency'] == 'Daily']
salary['salary_midpoint'] = salary['Salary Range From'] + (salary['Salary Range To'] - salary['Salary Range From'])/2

# Annualize Hourly and Daily rates (see http://stackoverflow.com/questions/12307099/modifying-a-subset-of-rows-in-a-pandas-dataframe)
salary.ix[df['Salary Frequency'] =='Hourly', 'salary_midpoint'] = salary.ix[df['Salary Frequency'] =='Hourly', 'salary_midpoint'] * 40 * 50
salary.ix[df['Salary Frequency'] =='Daily', 'salary_midpoint'] = salary.ix[df['Salary Frequency'] =='Daily', 'salary_midpoint'] * 5 * 50

salary['salary_total'] = salary['# Of Positions'] * salary['salary_midpoint']

salary_grouped = salary.groupby('Agency').sum()
salary[salary['Salary Frequency'] == 'Annual'].head(5)
salary_grouped['average_salary'] = salary_grouped['salary_total'] / salary_grouped['# Of Positions'])
salary_grouped[['# Of Positions', 'average_salary']].sort('average_salary', ascending=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,# Of Positions,average_salary
Agency,Unnamed: 1_level_1,Unnamed: 2_level_1
TEACHERS RETIREMENT SYSTEM,1,107669.0
CONFLICTS OF INTEREST BOARD,3,103678.333333
CIVILIAN COMPLAINT REVIEW BD,16,95774.71875
FINANCIAL INFO SVCS AGENCY,9,88812.055556
DEPT OF INFO TECH & TELECOMM,156,84214.358974
FIRE DEPARTMENT,64,83087.476562
DISTRICT ATTORNEY RICHMOND COU,9,81632.444444
OFFICE OF MANAGEMENT & BUDGET,2,80500.25
ADMIN FOR CHILDREN'S SVCS,124,80164.173387
DEPARTMENT FOR THE AGING,13,79711.346154
