<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 15px; height: 80px">


# Web Scraping for Indeed.com and Predicting Salaries

### Business Case Overview

You're working as a data scientist for a contracting firm that's rapidly expanding. Now that they have their most valuable employee (you!), they need to leverage data to win more contracts. Your firm offers technology and scientific solutions and wants to be competitive in the hiring market. Your principal wants you to

   - determine the industry factors that are most important in predicting the salary amounts for these data.

To limit the scope, your principal has suggested that you *focus on data-related job postings*, e.g. data scientist, data analyst, research scientist, business intelligence, and any others you might think of. You may also want to decrease the scope by *limiting your search to a single region.*

Hint: Aggregators like [Indeed.com](https://www.indeed.com) regularly pool job postings from a variety of markets and industries.

**Goal:** Scrape your own data from a job aggregation tool like Indeed.com in order to collect the data to best answer this question.

---

### Directions

In this project you will be leveraging a variety of skills. The first will be to use the web-scraping and/or API techniques you've learned to collect data on data jobs from Indeed.com or another aggregator. Once you have collected and cleaned the data, you will use it to address the question above.

### Factors that impact salary

To predict salary the most appropriate approach would be a regression model.
Here instead we just want to estimate which factors (like location, job title, job level, industry sector) lead to high or low salary and work with a classification model. To do so, split the salary into two groups of high and low salary, for example by choosing the median salary as a threshold (in principle you could choose any single or multiple splitting points).

Use all the skills you have learned so far to build a predictive model.
Whatever you decide to use, the most important thing is to justify your choices and interpret your results. *Communication of your process is key.* Note that most listings **DO NOT** come with salary information. You'll need to be able to extrapolate or predict the expected salaries for these listings.

### Scraping job listings from Indeed.com

We will be scraping job listings from Indeed.com using BeautifulSoup. Luckily, Indeed.com is a simple text page where we can easily find relevant entries.

First, look at the source of an Indeed.com page: (http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10").

Notice, each job listing is underneath a `div` tag with a class name of `result`. We can use BeautifulSoup to extract those. 

#### Setup a request (using `requests`) to the URL below. Use BeautifulSoup to parse the page and extract all results (HINT: Look for div tags with class name result)

The URL here has many query parameters:

- `q` for the job search
- This is followed by "+20,000" to return results with salaries (or expected salaries >$20,000)
- `l` for a location 
- `start` for what result number to start on

In [None]:
URL = "http://www.indeed.com/jobs?"

In [37]:
import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm_notebook

pd.set_option('display.max_colwidth',1000, 'display.max_columns',1000)

Let's look at one result more closely. A single `result` looks like

```
<div class=" row result" data-jk="2480d203f7e97210" data-tn-component="organicJob" id="p_2480d203f7e97210" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_2480d203f7e97210">
<a class="turnstileLink" data-tn-element="jobTitle" onmousedown="return rclk(this,jobmap[0],1);" rel="nofollow" target="_blank" title="AVP/Quantitative Analyst">AVP/Quantitative Analyst</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Alliancebernstein?from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=2480d203f7e97210&amp;jcid=b374f2a780e04789" target="_blank">
    AllianceBernstein</a></span>
</span>
<tr>
<td class="snip">
<nobr>$117,500 - $127,500 a year</nobr>
<div>
<span class="summary" itemprop="description">
C onduct quantitative and statistical research as well as portfolio management for various investment portfolios. Collaborate with Quantitative Analysts and</span>
</div>
</div>
</td>
</tr>
</table>
</div>
```

While this has some more verbose elements removed, we can see that there is some structure to the above:
- The salary is in a `span` with `class='salaryText'`.
- The title of a job is in a link with class set to `jobtitle` and a `data-tn-element='jobTitle'`.  
- The location is set in a `span` with `class='location'`. 
- The company is set in a `span` with `class='company'`. 
- Decide which other components could be relevant, for example the region or the summary of the job advert.

### Write 4 functions to extract each item: location, company, job, and salary.

Example: 
```python
def extract_location_from_result(result):
    return result.find ...
```


- **Make sure these functions are robust and can handle cases where the data/field may not be available.**
    - Remember to check if a field is empty or `None` for attempting to call methods on it.
    - Remember to use `try/except` if you anticipate errors.
- **Test** the functions on the results above and simple examples.

In [None]:
## YOUR CODE HERE

def extract_title_from_result(result):
    try:
        return result.find('a', attrs={'class':'jobtitle', 'data-tn-element':'jobTitle'}).text.strip()
    except:
        np.nan
    
def extract_company_from_result(result):
    try:
        return result.find('span', attrs={'class':'company'}).text.strip()
    except:
        return np.nan
        
def extract_rating_from_result(result):
    try:
        return result.find('span', attrs={'class':'ratingsDisplay'}).text.strip()
    except:
        return np.nan
        
def extract_location_from_result(result):
    try:
        return result.find('span', attrs={'class':'location'}).text.strip()
    except:
        return np.nan

def extract_summary_from_result(result):
    try:
        return result.find('div', attrs={'class':'summary'}).text.strip()
    except:
        return np.nan

def extract_date_from_result(result):
    try:
        return result.find('span', attrs={'class':'date'}).text.strip()
    except:
        return np.nan

def extract_salary_from_result(result):
    try:
        return result.find('span', attrs={'class':'salaryText'}).text.strip()
    except:
        return np.nan
    
def calc_salary_USD_from_salary(result):
    try:
        return extract_salary_from_result(result)*fx_rate[str(city)]
    except:
        return extract_salary_from_result(result)

Now, to scale up our scraping, we need to accumulate more results. We can do this by examining the URL above.

- "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"

There are two query parameters here we can alter to collect more results, the `l=New+York` and the `start=10`. The first controls the location of the results (so we can try a different city). The second controls where in the results to start and gives 10 results (thus, we can keep incrementing by 10 to go further in the list).

### Complete the following code to collect results from multiple cities and starting points. 
- Enter your city below to add it to the search.
- Remember to convert your salary to U.S. Dollars to match the other cities if the currency is different.

#### Use the functions you wrote above to parse out the 4 fields - location, title, company and salary. Create a dataframe from the results with those 4 columns.

In [None]:
## YOUR CODE HERE

import re

UK_CITY = ['London', 'Edinburgh', 'Cambridge', 'Oxford', 'Manchester', 'Reading', 'Bristol', 'Belfast',
            'Leeds', 'Glasgow', 'Birmingham', 'Nottingham']
UK_FX_rate = 1.3

max_results_per_city = 5000
job_city = {}
run_count = 0

for city in set(['New+York', 'Chicago', 'San+Francisco', 'Austin', 'Seattle', 'Los+Angeles', 'Philadelphia',
                 'Atlanta', 'Dallas', 'Pittsburgh', 'Portland', 'Phoenix', 'Denver', 'Houston',
                 'Miami'] + UK_CITY):
    
    title = []
    company = []
    rating = []
    search_city = []
    location = []
    salary = []
    age = []
    summary = []
    
    run_count+=1
    print('Searching City number: ', run_count)

    for occupation in set(['data scientist', 'data analyst', 'data analytics', 'data engineer',
                           'business intelligence', 'machine learning', 'artificial intelligence']): 
        old_page = 0
        print('Running: ', str(city),' & ',str(occupation))

        for page in tqdm_notebook(range(0, max_results_per_city, 10)):
            if city in UK_CITY:
                URL = "http://www.indeed.co.uk/jobs?"
                PARAMS = dict(as_phr=str(occupation), l=city, start=page)
            else:
                URL = "http://www.indeed.com/jobs?"
                PARAMS = dict(as_phr=str(occupation), l=city, start=page)

            r = requests.get(url=URL, params=PARAMS)
            soup = BeautifulSoup(r.text, 'html.parser')

            if soup.find('div', attrs={'id':'searchCountPages'}) == None:
                break
            else:
                new_page = int(re.findall(re.compile(r'\w+'), soup.find('div', attrs={'id':'searchCountPages'}).text.strip('\n'))[1])

                if new_page > old_page:
                    old_page = new_page
                    for posting in soup.find_all('div', attrs={'class':'result'}):
                        title.append(extract_title_from_result(posting))
                        company.append(extract_company_from_result(posting))
                        rating.append(extract_rating_from_result(posting))
                        search_city.append(city)
                        location.append(extract_location_from_result(posting))
                        salary.append(extract_salary_from_result(posting))
                        age.append(extract_date_from_result(posting))
                        summary.append(extract_summary_from_result(posting))
                else:
                    break

    # creating a df for that city's results and writing it to [insert city].csv...
    job_city_df = pd.DataFrame(dict(title=title,company=company,rating=rating,search_city=search_city,
                                    location=location,salary=salary,age=age,summary=summary))
    job_city_df.to_csv(str(city)+'.csv', index=False)
    
print('FINISHED')

In [None]:
# Load in CSVs and create a dataframe from it:

jobs = pd.DataFrame()

for city in set(['New+York', 'Chicago', 'San+Francisco', 'Austin', 'Seattle', 'Los+Angeles', 'Philadelphia',
                 'Atlanta', 'Dallas', 'Pittsburgh', 'Portland', 'Phoenix', 'Denver', 'Houston',
                 'Miami'] + UK_CITY):

    jobs_city = pd.read_csv(str(city)+'.csv')

    if 'Unnamed: 0' in pd.read_csv(str(city)+'.csv').columns: # to check if csv was saved with index (above)
        jobs_city.drop(columns=['Unnamed: 0'], inplace=True)
    else:
        break
    jobs = jobs.append(jobs_city, ignore_index=True)

print(jobs.shape)
jobs.head()

Lastly, we need to clean up salary data. 

1. Only a small number of the scraped results have salary information - only these will be used for modeling.
1. Some of the salaries are not yearly but hourly or weekly, these will not be useful to us for now.
1. Some of the entries may be duplicated.
1. The salaries are given as text and usually with ranges.

#### Find the entries with annual salary entries, by filtering the entries without salaries or salaries that are not yearly (filter those that refer to hour or week). Also, remove duplicate entries.

In [None]:
## YOUR CODE HERE: adding salary frequencies, and restricting on yearly salaries, also dropping dupes.
jobs_salaried = jobs[jobs['salary'].notnull()]

# find and remove dupes first:
print('Total shape:',jobs_salaried.shape)
print('Duplicates:',jobs_salaried[jobs_salaried.duplicated()].shape)

jobs_salaried.drop_duplicates(inplace=True)
print('Any dupes left?:',jobs_salaried[jobs_salaried.duplicated()].shape)

# create salary_freq column and filter for only yearly salaries:
salary_freq = []
for sal in jobs_salaried['salary']:
    if 'year' in sal:
        salary_freq.append('year')
    elif 'month' in sal:
        salary_freq.append('month')
    elif 'week' in sal:
        salary_freq.append('week')
    elif 'day' in sal:
        salary_freq.append('day')
    elif 'hour' in sal:
        salary_freq.append('hour')
    else:
        salary_freq.append(np.nan)
        
jobs_salaried.insert(6, 'salary_freq', salary_freq)
print('Unique Salary Frequencies found:',jobs_salaried.salary_freq.unique())

jobs_salaried_year = jobs_salaried[jobs_salaried['salary_freq']=='year']
print('Checking only year remains:',jobs_salaried_year.salary_freq.unique())
print('Jobs Salaried Yearly:',jobs_salaried_year.shape)

#### Write a function that takes a salary string and converts it to a number, averaging a salary range if necessary.

In [None]:
## YOUR CODE HERE: creating salary_avg from salary ranges, and dollarizing UK salaries

import re
from re import sub
from decimal import Decimal

jobs_salaried_year.insert(7, 'isRange', jobs_salaried_year.apply(lambda x: 'Range' if '-' in x['salary'] else
                                                        'Not Range', axis=1))

salary_range_min = []
salary_range_max = []
salary_avg = []
salary_avg_USD = []

for x in jobs_salaried_year[['salary','isRange']].values:
    if x[1] == 'Not Range':
        salary_range_min.append(Decimal(sub(r'[^\d.]', '', x[0])))
        salary_range_max.append(Decimal(sub(r'[^\d.]', '', x[0])))
        salary_avg.append(Decimal(sub(r'[^\d.]', '', x[0])))

    if x[1] == 'Range':
        range_min = re.findall(r'[\d.,]+',x[0])[0]
        range_min = Decimal(sub(r',','',range_min))
        salary_range_min.append(range_min)
        
        range_max = re.findall(r'[\d.,]+',x[0])[1]
        range_max = Decimal(sub(r',','',range_max))
        salary_range_max.append(range_max)
        
        salary_avg.append((range_min+range_max)/2)
    
jobs_salaried_year.insert(8, 'salary_range_min', salary_range_min)
jobs_salaried_year.insert(9, 'salary_range_max', salary_range_max)
jobs_salaried_year.insert(10, 'salary_avg', salary_avg)

jobs_salaried_year.insert(11, 'salary_avg_USD',
                         jobs_salaried_year.apply(lambda x: round(Decimal(UK_FX_rate)*x['salary_avg'],2) 
                                                  if x['search_city'] in UK_CITY
                                                  else x['salary_avg'], axis=1))

jobs_salaried_year.head(1)

### Save your results as a CSV

In [None]:
## YOUR CODE HERE: write to csv
jobs_salaried_year.to_csv('indeed_results.csv', index=False)

### Load in the the data of scraped salaries

In [58]:
## YOUR CODE HERE: load in indeed.csv
import pandas as pd
indeed = pd.read_csv('indeed_results.csv')

### We want to predict a binary variable - whether the salary was low or high. Compute the median salary and create a new binary variable that is true when the salary is high (above the median).

We could also perform Linear Regression (or any regression) to predict the salary value here. Instead, we are going to convert this into a _binary_ classification problem, by predicting two classes, HIGH vs LOW salary.

While performing regression may be better, performing classification may help remove some of the noise of the extreme salaries. We don't have to choose the `median` as the splitting point - we could also split on the 75th percentile or any other reasonable breaking point.

In fact, the ideal scenario may be to predict many levels of salaries.

In [59]:
## YOUR CODE HERE: categorise salary into 0-25%-ile, 25-50%-ile, 50-75%-ile, 75-100%-ile

def split_in_quarter(x):
    if x <= indeed.salary_avg_USD.quantile(0.25):
        return '0_25'
    elif x > indeed.salary_avg_USD.quantile(0.25) and x <= indeed.salary_avg_USD.quantile(0.5):
        return '25_50'
    elif x > indeed.salary_avg_USD.quantile(0.5) and x <= indeed.salary_avg_USD.quantile(0.75):
        return '50_75'
    elif x > indeed.salary_avg_USD.quantile(0.75):
        return '75_100'
    else:
        return np.nan

def split_in_half(x):
    if x <= indeed.salary_avg_USD.quantile(0.5):
        return 'LOW'
    elif x > indeed.salary_avg_USD.quantile(0.5):
        return 'HIGH'
    else:
        return np.nan

indeed.insert(12, 'target_quartered', indeed['salary_avg_USD'].apply(split_in_quarter))
indeed.insert(13, 'target_halved', indeed['salary_avg_USD'].apply(split_in_half))

#### Thought experiment: What is the baseline accuracy for this model?

In [60]:
## YOUR CODE HERE: baseline accuracy
baseline = indeed.target_quartered.value_counts(normalize=True).max()
print('Baseline Accuracy:',baseline)
print('Normalized Value Counts:')
print(indeed.target_quartered.value_counts(normalize=True))

Baseline Accuracy: 0.25608536178726243
Normalized Value Counts:
25_50     0.256085
0_25      0.251751
50_75     0.247416
75_100    0.244748
Name: target_quartered, dtype: float64


### Create a classification model to predict High/Low salary. 


- Start by ONLY using the location as a feature.
- Use at least two different classifiers you find suitable.
- Remember that scaling your features might be necessary.
- Display the coefficients/feature importances and write a short summary of what they mean.
- Create a few new variables in your dataframe to represent interesting features of a job title (e.g. whether 'Senior' or 'Manager' is in the title).
- Incorporate other text features from the title or summary that you believe will predict the salary.
- Then build new classification models including also those features. Do they add any value?
- Tune your models by testing parameter ranges, regularization strengths, etc. Discuss how that affects your models.
- Discuss model coefficients or feature importances as applicable.

In [61]:
## YOUR CODE HERE: using two different classifiers on location only

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold, StratifiedKFold
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, GradientBoostingClassifier, RandomForestClassifier

# set X,y
X = indeed['search_city']
y = indeed['target_quartered']

# dummify X
X_dum = pd.get_dummies(X, drop_first=True)

# scale
scaler = StandardScaler()
Xs = scaler.fit_transform(X_dum)

# DecisionTreeClassifier:
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
dt = DecisionTreeClassifier()
dt.fit(Xs, y)
print('Best Training CVScore:', cross_val_score(dt, Xs, y, cv=5).mean())

# GridSearchCV with Bagging (and Decision Tree base_estimator):
bc = BaggingClassifier(n_estimators=100, n_jobs=2)
bc_params = {'max_features': np.linspace(0.1, 1, 5)}

gs_bc = GridSearchCV(bc, bc_params, cv=kf, n_jobs=2, verbose=1)
gs_bc.fit(Xs,y)
print('Best GS BC params:', gs_bc.best_params_)
best_gs_bc = gs_bc.best_estimator_
print('Best Training GS BC CVScore:', gs_bc.best_score_)

Best Training CVScore: 0.26644129104062325
Fitting 5 folds for each of 5 candidates, totalling 25 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


Best GS BC params: {'max_features': 1.0}
Best Training GS BC CVScore: 0.342779632721202


[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    7.6s finished


In [62]:
# Summarise Feature Importance:
dt_feature_importance = pd.DataFrame({'importance':dt.feature_importances_}, index=X_dum.columns)
dt_feature_importance.sort_values('importance',ascending=False).T

# we see that Birmingham is the most important location feature, which means that it was both used often in the
# Decision Tree Classifier and when it was used yielded a significant improvement in the gini

Unnamed: 0,Birmingham,Nottingham,Reading,Cambridge,Edinburgh,London,Miami,San+Francisco,Pittsburgh,Chicago,Dallas,Phoenix,Houston,Portland,Austin,Denver,Los+Angeles,New+York,Seattle,Philadelphia
importance,0.164971,0.142649,0.139754,0.127168,0.109553,0.079659,0.047117,0.039418,0.022913,0.021399,0.018418,0.017877,0.017232,0.013029,0.012889,0.009878,0.008751,0.004295,0.002027,0.001004


In [63]:
# Summarize accuracy scores:

predictions = best_gs_bc.predict(Xs)
print('Confusion Matrix:')
print(confusion_matrix(y, predictions))
print(' ')
print('Classification Report:')
print(classification_report(y, predictions))

# the best classifier model at this stage is the BaggingClassifier, we see that recall is high 50_75 leading to
# low recall for the other classes, and also low precision for it's own class.

Confusion Matrix:
[[296  60   9 390]
 [200 101  45 422]
 [129  43  78 492]
 [ 57  20  63 594]]
 
Classification Report:
              precision    recall  f1-score   support

        0_25       0.43      0.39      0.41       755
       25_50       0.45      0.13      0.20       768
       50_75       0.40      0.11      0.17       742
      75_100       0.31      0.81      0.45       734

    accuracy                           0.36      2999
   macro avg       0.40      0.36      0.31      2999
weighted avg       0.40      0.36      0.31      2999



In [64]:
# Feature-engineering Title column:
# we start with TfidfVectorizer to get the unique features, and fit to regression to assess importance

from sklearn.feature_extraction.text import TfidfVectorizer

tvec = TfidfVectorizer(stop_words='english', token_pattern='[A-Za-z]+', ngram_range=(1,1))
tvec.fit(indeed['title'])

X = tvec.transform(indeed['title'])
y = indeed.target_quartered

lr = LogisticRegression(solver='lbfgs', multi_class='ovr')
print(cross_val_score(lr,X,y,cv=kf).mean())
lr.fit(X,y)
pd.DataFrame(dict(coef=lr.coef_[0], features=tvec.get_feature_names())).sort_values('coef',ascending=False).head(5)

0.5315225375626043


Unnamed: 0,coef,features
485,4.502704,graduate
602,3.166491,junior
57,3.123792,apprentice
924,3.017499,recruitment
396,2.717922,executive


In [65]:
pd.DataFrame(dict(coef=lr.coef_[0], features=tvec.get_feature_names())).sort_values('coef',ascending=True).head()

Unnamed: 0,coef,features
302,-2.802431,director
624,-2.620541,lead
363,-2.275998,engineer
1019,-1.977597,senior
1004,-1.888458,scientist


In [66]:
# We choose the below interesting features to engineer for:

indeed.insert(1, 'isDS', indeed['title'].apply(lambda x: 1 if 'data science' in x.lower() or 'data scientist' in x.lower() else 0))
indeed.insert(2, 'isDA', indeed['title'].apply(lambda x: 1 if 'data analyst' in x.lower() or 'analytics' in x.lower() else 0))
indeed.insert(3, 'isDE', indeed['title'].apply(lambda x: 1 if 'data engineer' in x.lower() else 0))
indeed.insert(4, 'isML', indeed['title'].apply(lambda x: 1 if 'ml' in x.lower() or 'machine learning' in x.lower() else 0))
indeed.insert(5, 'isAI', indeed['title'].apply(lambda x: 1 if 'ai' in x.lower() or 'artifical intelligence' in x.lower() else 0))
indeed.insert(6, 'isBI', indeed['title'].apply(lambda x: 1 if 'bi' in x.lower() or 'business intelligence' in x.lower() else 0))

indeed.insert(7, 'isDirector', indeed['title'].apply(lambda x: 1 if 'director' in x.lower() else 0))
indeed.insert(8, 'isHead', indeed['title'].apply(lambda x: 1 if 'head' in x.lower() else 0))
indeed.insert(9, 'isLead', indeed['title'].apply(lambda x: 1 if 'lead' in x.lower() else 0))
indeed.insert(10, 'isManager', indeed['title'].apply(lambda x: 1 if 'mgr' in x.lower() or 'manager' in x.lower() else 0))
indeed.insert(11, 'isSenior', indeed['title'].apply(lambda x: 1 if 'sr' in x.lower() or 'senior' in x.lower() else 0))
indeed.insert(12, 'isJunior', indeed['title'].apply(lambda x: 1 if 'jr' in x.lower() or 'junior' in x.lower() else 0))
indeed.insert(13, 'isGraduate', indeed['title'].apply(lambda x: 1 if 'grad' in x.lower() or 'graduate' in x.lower() else 0))
indeed.insert(14, 'isApprentice', indeed['title'].apply(lambda x: 1 if 'apprentice' in x.lower() else 0))

indeed.head(1)


Unnamed: 0,title,isDS,isDA,isDE,isML,isAI,isBI,isDirector,isHead,isLead,isManager,isSenior,isJunior,isGraduate,isApprentice,company,rating,search_city,location,salary,salary_freq,isRange,salary_range_min,salary_range_max,salary_avg,salary_avg_USD,target_quartered,target_halved,age,summary
0,account manager,0,0,0,0,0,0,0,0,0,1,0,0,0,0,BusinesStaff,,Denver,"Denver, CO","$170,000 - $220,000 a year",year,Range,170000,220000,195000.0,195000.0,75_100,HIGH,30+ days ago,"Strategic mindset with proven ability to synthesize customer financial reports, industry information and market intelligence to develop customer growth…"


In [67]:
# Clean up Rating column: replace null with None

indeed['rating'].fillna('None', inplace=True)

In [68]:
indeed.columns

Index(['title', 'isDS', 'isDA', 'isDE', 'isML', 'isAI', 'isBI', 'isDirector',
       'isHead', 'isLead', 'isManager', 'isSenior', 'isJunior', 'isGraduate',
       'isApprentice', 'company', 'rating', 'search_city', 'location',
       'salary', 'salary_freq', 'isRange', 'salary_range_min',
       'salary_range_max', 'salary_avg', 'salary_avg_USD', 'target_quartered',
       'target_halved', 'age', 'summary'],
      dtype='object')

In [69]:
# DecisionTreeClassifier with new features added to assess improvement (not the BaggingClassifier as I want to 
# be able to easily pull out feature importances):

X_dum = pd.get_dummies(indeed[['company', 'rating', 'search_city']], drop_first=True)
X_dum_title = pd.concat([X_dum,indeed.loc[:,'isDS':'isApprentice']], axis=1)

kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
dt = DecisionTreeClassifier()
dt.fit(X_dum_title, y)
print('Best Training CVScore:', cross_val_score(dt, X_dum_title, y, cv=5).mean())

# shows improvement vs. the previous DecisionTreeClassifier, let's display which features most importantly:

dt_feature_importance = pd.DataFrame({'importance':dt.feature_importances_}, index=X_dum_title.columns)
dt_feature_importance.sort_values('importance',ascending=False).T

Best Training CVScore: 0.4361157484696717


Unnamed: 0,isGraduate,isBI,isManager,isSenior,isLead,company_University of Oxford,search_city_Birmingham,isJunior,isDS,isApprentice,isML,isDirector,isDE,company_Harnham,rating_None,rating_3.7,isHead,rating_4.2,search_city_London,search_city_Edinburgh,isDA,search_city_Nottingham,isAI,company_Aspire Data Recruitment,search_city_Cambridge,rating_5.0,rating_3.3,search_city_New+York,company_Metrica Recruitment,rating_4.0,search_city_Reading,rating_4.3,rating_4.1,search_city_Dallas,company_Lead Foot Digital,company_Media IQ Recruitment,company_Source,rating_4.4,company_Tessella Ltd,company_Oxford Brookes University,rating_4.5,search_city_Austin,search_city_San+Francisco,search_city_Phoenix,company_Parallel Consulting,rating_4.9,company_Venturi,company_Linux Recruit,company_University of Nottingham,rating_3.9,company_Oho Group,company_Cortex IT Recruitment,company_Inspiring Interns,rating_3.5,search_city_Los+Angeles,company_STEM Graduates,company_Digital Taxonomy,company_Robertson Sumner,company_Understanding Recruitment,company_Datatech Analytics,company_Client Server,company_Riverbright Recruitment,rating_4.6,search_city_Philadelphia,company_New York City DEPT OF HEALTH/MENTAL HYGIENE,company_Warner Scott,company_Harnham US,company_The Yerba Mate Co.,company_Concept Resourcing,company_Vector Recruitment Limited,company_Consortia,company_ZenShin Talent,company_Mayor's Office of Contract Services,company_ShareForce,search_city_Denver,company_E.ON UK,company_Ampersand Consulting,rating_2.0,company_Northpoint Recruitment,company_Michael Page UK,company_Oscar Technology,company_Imperial College London,company_Fanbank,rating_3.0,search_city_Seattle,company_TalentPool,company_New York City NYC HOUSING AUTHORITY,company_Catch Resource Management Ltd,company_Comcast,company_Talentpoint Jobs,company_Harrington Starr,company_J&C Associates Ltd,company_Taylorollinson,search_city_Pittsburgh,company_Digital Gurus,company_Acrotrend Solutions Limited,search_city_Miami,company_Law Business Research,company_RecruitmentRevolution.com,company_System Recruitment,company_Bettor Believe,company_DVF Recruitment,company_Circle Recruitment,company_Oregon Health & Science University,rating_2.7,rating_3.6,rating_3.4,company_New York City TAXI & LIMOUSINE COMMISSION,company_Hasson Associates,rating_2.6,company_Chi Square Analytics,company_Lloyds Banking Group,company_Capita IT Resourcing,company_3Search,company_Aspire,company_Langley James IT Recruitment,company_A Closer Look,company_kdr Recruitment,company_Gwinnett County,rating_1.5,company_InterQuest Group,company_Indeed,company_Jenrick Group,company_Ultimate Asset,company_myfuturerole.com,company_Harrison Holgate,company_Richard Wheeler Associates,company_ECM Selection,company_Proactive.IT Appointments,company_Warwickshire County Council,company_Nonstop World (Tandemworld) Ltd,company_Cancer Research UK,company_GCS Recruitment Specialists Ltd,company_Bangura Solutions,company_The People Network,company_Cititec,company_Velocity Black,search_city_Houston,company_Wade Macdonald,company_Pearson Frank,company_IC Resources,company_NP Group,company_Adzuna,company_Virgin Media,company_Syntax Consultancy Limited,company_Growth Intelligence,company_Migacore Technologies,company_Reuters Events,company_MetroPlus Health Plan,company_IQPC,company_Taylor James Resourcing,company_New York City DEPARTMENT OF INVESTIGATION,company_Media Contacts ltd,company_Rise Technical Recruitment Limited,company_Hinduja Global Solutions,company_Serco Group,company_Deerfoot,company_Moriati,company_Eligo Recruitment Ltd,company_Streetbees,company_Church International Ltd.,company_Catalyst forward,company_Omnicell,company_Yobota,company_Marie Stopes International,rating_4.7,company_Tecknuovo,company_City Pantry,"company_Milshar, LLC",company_SCRRA/Metrolink,company_Admiral Instruments,company_NuView Analytics,company_Lloyd Recruitment Services,company_Los Angeles Homeless Services Authority,company_Morgan McKinley,company_HRIS Associates Ltd,company_ADLIB,company_STFC,company_New York City Department of Education .,company_Formative Content,company_Streamhub,company_Brunel University,company_Synaptic Resources Ltd,company_Moorfields Eye Hospital NHS Foundation Trust,rating_2.5,company_The AA,company_IntelliSense Systems Inc.,search_city_Portland,company_Be-IT Resourcing,company_Simfoni Analytics Limited,company_Inspire People,company_United States Army,company_GVC Holdings,company_NYCM,company_McGregor Boyall,company_Give A Grad A Go,company_XenZone,company_EMR Marketing Recruitment,company_New York City MAYORS OFFICE OF CONTRACT SVCS,company_Douglas Jackson,company_Cornwaliis Elt,company_Zelus Analytics,company_Opus Recruitment Solutions,company_Cifas,company_Respect Consulting Group,company_Optimus E2E Limited,company_PCS Global Tech,company_Curo Talent,company_Defined Clarity,company_Newbury Building Society,company_Zarathustra Technologies Ltd.,company_Computer Enterprises,company_GVC Careers,"company_Kellington Protection Service, LLC","company_Indigent Legal Services, Office of",company_karros technologies,company_RedTech Recruitment,company_Synaptek,company_we source group,company_RHONDOS,company_Oakmoor Recruitment,company_iKas International,company_focaldata,company_Ezoic,company_Regan & Dean Recruitment Ltd,company_Mortimer Bell International Ltd,company_New York City FINANCIAL INFO SVCS AGENCY,company_Climate Policy Initiative,company_Simplified Recruitment,company_Franklin Bates,company_Purdue University,company_BICP,company_National Entertainment Network,company_Barna Shields Recruitment,company_Florida Grand Opera,company_All Valley Home Health Care,company_State of Colorado Job Opportunities,company_MASS,company_DISYS,company_BrokerCompare.co (eXp Realty),company_PCS GlobalTech,company_Maricopa Community Colleges,company_Spring,company_Datascope Recruitment,company_University of Birmingham,company_Guy's and St Thomas' NHS Foundation Trust,company_University of Colorado Boulder,company_Mudano,company_New York City OFFICE OF EMERGENCY MANAGEMENT,"company_Computer Enterprises, Inc. (CEI)",company_Project Start Recruitment Solutions,company_The Chicago Metropolitan Agency for Planning (CMAP...,company_Vesta Home,company_Zap,company_Airfinity,company_Pplumm,company_Snap Finance Ltd (UK),company_US Department of Transportation,company_Navigator CRE,"company_Neumeister & Associates, LLP",company_KIPP Texas Public Schools,company_Echobox,company_Psixty Recruitment,company_Lawrence Harvey,company_Star Recruitment,company_BiggerPockets Inc.,company_Efficient Frontiers International,company_SF Group,company_AKUVO,company_Initi8 Recruitment,company_Vody,"company_Technology Authority, Georgia - GTA",company_Impact Proteomics,company_New York City DEPARTMENT OF FINANCE,company_ADR Markets,company_NHS Blood and Transplant,company_Evolution Recruitment Solutions,company_Brightwell,company_Conduit Data Services,company_Allen Lane,company_Articulate Group Ltd,company_National Network of Public Health Institutes,company_Chesterfield College,company_Therapy Box,company_Benchmark International,company_Vodafone,company_Revoco Limited,company_Newtons Recruitment,rating_4.8,company_Lidl,company_Butler Rose Ltd,company_Emory School of Medicine,company_Trinamix,company_Operation Warm,company_Flight Centre (UK) Limited,company_Maxx Builders,company_Virtuous Software,company_Advaion,company_Chesterfield Royal Hospital NHS Foundation Trust,company_Monarch,company_Denver Public Schools,company_Precision Distribution Consulting,company_US Department of Justice,company_Percepta,company_The SmartList,company_California creative solutions,company_Adaptive Digital,company_Talentedge,company_RaiseMe,company_AvantStay,"company_City of Dallas, TX",company_Guru Systems,company_Novation Solutions Ltd,company_Onyx Infosoft,company_BusinesStaff,company_Zynx Technologies Limited,company_SPEKTRIX,company_Digital Find Recruitment,company_INOV8 Consulting Ltd,company_CKB Recruitment,company_Horizon Air,company_Apex Vision FItness,company_Technnect3 Marketing Operations SL,company_Magic Carpet AI,company_Cambridge Advisory Group,company_Opus Interactive,company_EGIS INC,company_Hytalentech,company_Reckon Digital,company_US Department of Defense,company_MAPSCorps,company_City of Sunny Isles Beach,company_Tractable,company_Pertemps Network Group,company_Public Health England,company_Packback Inc.,company_Doctors Without Borders/Médecins Sans Frontières (...,company_Vichara,company_Moixa Energy Holdings,company_Oviva,company_Niche employment solutions,company_Hawke Media,company_MW Appointments,company_Hewett Recruitment,company_Urban Empire Recruitment,company_Tier1 IT,company_Adams County Colorado,company_Market Vector,company_83zero,company_Georgia Department of Public Health,company_Pareto Law,company_Neudesic,company_Myriad,company_DP Connect,company_UK Government - Ministry of Defence,company_SeeByte,company_Ubiqus,company_Birdie,company_Sentinel,company_iProov,company_Cobalt Recruitment,company_Synovus,company_Asset Resourcing,company_Reqiva,company_Blu Digital,company_CV Locator,company_Premium Credit Limited,company_Teleboom Inc D/B/A Deter24 Monitoring,company_Blue Legal,company_University of Cambridge,company_Oliver James Associates,company_ShortList,company_Stealth Mode Startup,company_Moonshot CVE,company_ANIX Valve USA,company_Mazars,company_Phoenix Optics,company_Empirical Search,company_Daniel Alexander Recruitment,company_STATS,company_Wayhome,company_Covered Insurance Solutions,company_University Hospitals of Leicester NHS Trust,company_Blue Pelican,company_Agency Within,company_WORLDWIDE BUSINESS RESEARCH Ltd,company_Stanton House,company_Ultromics,company_Compassion UK,company_Synchro Recruitment,company_New York City BOARD OF CORRECTION,company_Adatis,company_CatchFish,company_DeVries Global,company_Futureheads Recruitment,company_Royal Automobile Club,company_wa consultants,company_Health Education England,company_Click IT Consulting,company_Astroscreen,"company_City of Los Angeles, CA",search_city_Chicago,company_SQ Computer Personnel Limited,company_Zebra People,company_AvA-V,company_Informatiq Consulting,company_SmithBayes,company_Marks Sattin Specialist Recruitment,company_Harcourt Matthews,company_Zencargo,company_RoboTech recruitment,company_New York City Employees’ Retirement System (NYCERS...,company_JFL Search & Selection,company_CauseForce LLC,company_University of California Office of the President,company_Adobe,company_Travis Central Appraisal District,company_Capital One - UK,company_University of the Arts London,company_French Selection,company_Compass Associates,company_Oxford University Press,rating_3.8,company_Ocean Media Group Ltd,company_Capstone Hill Search Limited,company_Eason Group,company_Shackleton Duke,company_Mammoth Media,company_Acute Data Systems,company_Prospective,company_MW Recruitment,company_Clarip,company_Sumner & Scott,company_Real Time Recruitment Solutions,company_Neo Prism Solutions,company_Community College of Denver,company_DHA Housing Solutions for North Texas,"company_Research Foundation for Mental Hygiene, Inc.",company_E-Resourcing,company_Soho House & Co.,company_Revere Digital,company_Pearson Whiffin Recruitment,company_Micro Focus,company_Technics Group,company_Mason Blake,company_Saunders Construction Inc.,company_Lyst,company_ComplyAdvantage,company_PR Futures,company_Excelerate Recruitment Partners,company_Hudson Shribman,company_Worldwide Business Research,company_Klanik Corp,company_IBD Registry,company_Target Public Media,company_HeliosX,company_OpenRent,company_The Margolis Team Inc,company_Head Resourcing Limited,company_Technology Resourcing Ltd,company_Veeve,company_Smash Entertainment,company_Advancing Analytics,company_LiveRamp,company_Energy Assets,company_Catman Jobs,company_Petroineos,company_TECHIRE SOLUTIONS,company_TPIRC/ Southern California Food Allergy Institute,company_Runnymede Borough Council,company_Chime Communications Plc,company_Mane Contract Services Limited,company_Talent Crew,company_Explore Group,company_Niantic International Technology Limited,company_JP Engineering Recruitment,company_Loyal Retainers,company_Flux,company_Martin and Conley,company_Kite Group,company_Southern Housing,company_ZealNine,rating_2.2,company_Red10,company_Dome Recruitment,company_Linnk Group Limited,company_Divido,company_Skillsearch Limited,company_U.S. Army,company_Emma Technologies LTD,company_Alscient Limited,company_Secret Intelligence Service,company_DCL Search and Selection,company_BT,company_Corecom Consulting,company_Match Digital,"company_ClearBlade, Inc",company_Hernshead Recruitment,company_RMG Digital,company_Savant Recruitment,...,company_Absolute Appointments LTD,company_Scale Media,company_ATA Recruitment Ltd,company_ARC Group Ltd,company_ANB Systems Inc.,company_Satavia,company_Sanderson Recruitment Plc,company_Salvagnini America,company_A for Appointments,company_Salt Recruitment,company_Alameda Alliance,company_Semta,company_umat,rating_3.1,company_upsgs,rating_2.3,company_SilkFred,company_Anya Consultancy Services Limited,company_Siemens AG,company_Anthony Nolan,company_Showsec,company_West Hertfordshire Hospitals NHS Trust,rating_2.9,company_Anglia Ruskin University,company_Amida Recruitment Limited,company_Alcumus,company_Amazing Prospects Ltd,company_Shenwick Recruitment,company_Alvarez & Marsal,company_Shedul,"company_Aunt Bertha, a Public Benefit Corp.",company_Alluma,company_AllocateRite,company_Allianz,company_Allen Associates,company_Aldi,rating_2.8,company_Vohra Wound Physicians,"company_Weee!, Inc.",company_Truthful Trading Inc,company_Astex Pharmaceuticals,company_Tracer Labs,company_Tracsis Rail Technology & Services,company_TransPerfect,company_Transport for London,company_Travis County,company_Aston University,company_Travtus,company_Tredence,company_Tri-Arc Manufacturing,company_U-Haul,company_US Department of the Interior,company_UK Government - Crown Commercial Service,"company_UK Government - Department for Business, Energy &...",company_UK Government - Government Actuary's Department,company_Talent International,"company_UK Government - Ministry of Housing, Communities &...",company_UK Government - Office for National Statistics,company_UK Government - Office of Gas and Electricity Mark...,"company_Takeoffs, Inc.",company_TSB Banking,company_US Department of Health And Human Services,company_TechMate,company_TechNET IT Recruitment Ltd,company_Technet IT Recruitment,company_Touch Surgery,company_The Chat Shop,company_Thames Water Utilites,company_Texas Department of Transportation,company_Texas Comptroller of Public Accounts,company_The Horniman Museum,company_The Horticultural Trades Association,company_The Institute of Cancer Research,"company_Terrace Consulting, Inc.",company_The JM Group,company_The Perk Company Refresh Vending,company_The Pioneer Group,company_The Risk Partners,company_The Rosalind Franklin Institute,company_The Shopworks,company_The University of Pittsburgh,company_TheMathCompany Inc.,company_ThirdEye Labs,company_Thomsons Online Benefits,company_Tim Hortons,company_Tim Hortons UK & Ireland LTD,company_Title21 Health Solutions,company_Saffron Resourcing,company_US Department of the Navy,company_Weee!,company_Vivacity Labs,company_Streamline - Real Estate Development,company_Venatrix,company_Arizona Department of Public Safety,company_Verbio Technologies,company_Strategic Employment Partners,company_Veridium,"company_Vertical Careers, Inc.",company_Vertus Partners,company_Vision Consulting,company_State of Illinois,company_Volt Europe,company_UT Southwestern Medical Center,company_Volume Ltd,company_Volvo Group,company_WISMETTAC ASIAN FOODS,company_Waldo Photos,company_Walker Hamill,company_Archangel Group,company_St John Ambulance Employees,company_Watford General Hospital,company_Wecudos,company_Sproutt,company_Arizona State University,company_Vadlo Systems,company_Sure Green,company_VBCare Network,company_THIS IS PRIME LIMITED,company_UTMB,company_United Software Corporation,company_THE TALENT STATE CONSULTANT,company_University Hospitals Birmingham NHS Foundation Tru...,company_TEXAS HIGHER EDUCATION COORDINATING BOARD,company_University Hospitals Coventry and Warwickshire NHS...,company_University at Buffalo,company_University of California Berkeley,company_University of Colorado,company_At-Risk International,company_University of Pittsburgh,company_Synchrony Bank,company_Synchrony,company_University of Westminster,company_Synchro Europe,company_University of Wolverhampton,company_UpNest,company_Utility People Ltd,company_Symmetric Health Solutions LLC,company_V Selective Ltd,company_US Department of the Air Force,company_Bluetownonline Ltd,company_SW6 Associates,company_Eleventh Judicial Circuit of Florida,company_Elite Crowdfunding limted,company_Emma Walsh Talent,company_Enterprise Recruitment Limited,company_Estio Technology,company_Estio Training,company_Eurowagens,company_Everpress,company_Exact Sourcing ltd,company_Executive Recruitment Services,company_Expedia Group,company_Express Recruitment,company_FACEIT,company_FDM Group,company_FINTEC recruit,company_FL Group UK Ltd,company_FTD,company_Fair Recruitment,company_Berkeley Research Group (UK) Ltd,company_Farm-Hand Ltd,company_Eligo Recruitment,company_Elevations Credit Union,company_STEP Strategy Advisors,company_Educational Service District 112,company_Digital Catapult,company_Digital Creative Institute,company_Beyond Outsourcing Inc.,company_Digital Uncut,company_Digitive LLC,company_Dimensions UK Ltd,company_Discourse.ai,company_Distinct Recruitment,company_Diverse Talent Solutions,company_Doris IT,company_Drift Net Securities,company_Dudley and Walsall Mental Health Partnership NHS T...,company_Better Homes and Gardens Real Estate Move Time Rea...,company_EEG Enterprises,company_EOS Deal Advisory,company_ERP Maestro,company_EZOPS Inc,company_EarthSense Systems,company_Edmonds Community College,company_Five Guys,company_Flight Centre Travel Group,company_Florida International University,company_Focus Multimedia Limited,company_HM Revenue and Customs,company_HOLLA,company_HamlynWilliams,company_Hanami International,company_Handle Recruitment Ltd,company_Hanson Wade,company_Behaviour Lab,company_Hearst,company_Heavens Recruitment Ltd,company_Hero Labs,company_HireBlazer,"company_Holler Technologies, Inc.",company_HomeSphere,company_Homes England,company_Horniman Museum and Gardens,company_Houndstooth Capital Real Estate,company_Howett Thorpe,company_Hunter and Jones,company_IPROS Insurance Professionals,company_HGS Digital,company_HCML,company_HCL America Inc,company_Gemini,company_Forsyth Barnes,company_FourFront,company_Foxtons,company_Frazer-Nash Consultancy,company_Freshtech IT,company_GDS Group,company_GMAD,company_GatenbySanderson,company_Gemini People,company_Greene Lab,company_Gi Group,company_Gigaclear,company_Global Market Summits – Chancery Lane,company_GlobalData,company_GlobalData PLC,"company_Go, Inc.",company_Grandview Corporation,company_Gravity Technology Solutions,company_Diamond Light Source,company_Detail2Recruitment,company_Derotek,company_Cervest,company_CRS Temporary Housing,company_CRU,company_CSL,company_CV Screen,company_Cactus Search,company_Cadent Gas,company_California State University,company_Cambridge Assessment,company_Capita Plc,company_Carbon60,company_Care UK Healthcare,company_Carlton Recruitment,company_Carrot Pharma Recruitment,company_Catapult Learning,company_Blue Owl,company_Cathcart Associates,company_Catsurveys,company_Cedar Recruitment Limited,company_Celsius Graduate Recruitment,company_CPA Global Limited,company_CGA Strategy Ltd,company_CDG,company_Bryant Associates,company_Boston Consulting Group,"company_Boulder County, CO",company_Brewster Partners,company_Brightflag,company_Brightred,company_British Private Equity and Venture Capital Associa...,company_British Rowing,company_Bromford,company_Brytecore,company_CD Sales Recruitment,company_Buckinghamshire Healthcare NHS Trust,company_Burns Sheehan,company_CAMILLUS HEALTH CONCERN,company_CBRE,company_CBSbutler,company_CCFE,company_CCG Associates,company_CCS Global Tech.,company_Center for Employment Opportunities,company_Chase & Holland Recruitment,company_Delaware Valley Regional Planning Commission,company_Childhood Cancer Data Lab,company_Cottonwood Financial,company_County of San Mateo,company_Coventry Building Society,company_Coventry University,company_Create Music Group,company_Creative Personnel,company_Crowd Link Consulting,company_CruiTek,company_Cubex LLC,company_Cundall,company_Cute Resource,company_CybSafe,company_Cyber Tech Company,company_DCA Recruitment,company_DVCanvass,company_Data Kraken Consultancy Ltd,company_Data Ninjas Inc,company_DearDoc,"company_DecisionIQ, Inc.",company_Bitcoin.com,company_Core Tech Recruitment,company_Cordius,company_Clearabee,company_Chryselys,company_Cintas,company_Cirrus Selection,company_Cisco,"company_City of Glendale, AZ",company_City of Hillsboro Oregon,"company_City of Houston, TX","company_City of Mesa, AZ",company_Bliss Point Media,company_Cook County Sheriff’s Office,company_Clinical Professionals,company_Clockwork Recruitment,company_Coburg Banks,company_Blayze Group,company_Community College of Aurora,company_Blackstone & Cullen,company_Consilium Recruit,company_Black Hills Energy,company_IPS Group,company_ITTStar Consulting LLC,company_Imperial College Healthcare NHS Trust,company_Playtech,company_Online Filings,company_Online Mortgage Advisor,company_Openview,company_Optima IT Recruitment,company_Orbit Group,company_Awaken Intelligence Limited,company_Ortus-iHealth,company_Avison Young,company_Outsource UK,company_PGS LTD,company_PIPs Rewards,company_Packt Publishing,company_Avenue Homes,company_Pennsylvania Lumbermens Mutual Insurance Company,company_Pentasia,company_People Source Consulting,company_Permanent People,company_Picked.ai,company_Pixel Pond,company_One Housing Group,company_On Track Recruitment & Training Ltd,company_On Track Recruitment,company_Bobtrade,company_New York City DEPT OF PARKS & RECREATION,company_New York City DEPT OF YOUTH & COMM DEV SRVS,company_New York City HOUSING PRESERVATION & DVLPMNT,company_New York City LAW DEPARTMENT,company_Newmedica,company_Nexus Jobs,company_Nexus Jobs Limited,company_Nicholson Glover Consulting,company_Noria Water Technologies,company_B2M Solutions,company_BAE Systems,company_Nucleome Therapeutics Ltd,company_Nuffield Health,company_Nutrisystem,"company_Nuvola Staffing & Solutions, LLC.",company_ONE WORLD EXPRESS,company_OSR Recruitment,company_Octagon Group,"company_Pixelogic Media Partners, LLC",company_Plexus Resource Solutions,company_New York City DEPT OF DESIGN & CONSTRUCTION,company_Port of Portland,company_Rentsys Recovery Services,company_Request Technology,company_Research Foundation of The City University of New...,company_Retail-BCG,company_Retention Science,company_Retrace Labs,company_Rivus Fleet Solutions,company_Robert Half United Kingdom,company_Austin Rose,company_Rocky Mountain Industrials,company_Ross4Marketing,company_Round Rock Independent School District,company_Royal Holloway University of London,company_SDH Systems,company_SELDOC (South East London Doctors Cooperative),company_SERG Technologies,"company_SHELTER, Inc.",company_SMIIT Ltd,company_SOLUTE,company_Remarkable Jobs,company_Regis University,company_Redline Group,company_QinetiQ,company_Prestin Technology Ltd,company_Preventa Medical Corporation,company_ProArch,company_Prodsight,company_Profusion,company_Public Sector Information Ltd,company_Pyramid Healthcare,company_QEH2 Business Intelligence,company_Quantexa,company_Reata Realty,company_Qurious Associates,company_R2M Marketing Solutions,company_RELX Group,company_REPL Group,company_RICS,company_RTD,company_Rango Recruitment,company_Real Good foods,company_BMS Performance,company_New York City DEPARTMENT OF TRANSPORTATION,company_Incisive Media,company_MAQ Software,company_L&Q,company_L'Oréal,company_LDN Apprenticeships,company_La Fosse Associates,company_Lambeth Council,company_Language Matters Recruitment Consultants Ltd.,company_Barclay Simpson,company_Leading UK Pension Fund,company_Lee College,company_Legal & General Group Plc.,company_Level Agency,company_LevelPrime Limited,company_Liberty Mutual Insurance,company_Livingstone Technologies,company_London Tri-Borough Councils,company_Long Beach City College,"company_Long Island FQHC, Inc.","company_Longs Peak Advisory Services, LLC",company_Lorien Resourcing,company_Kisaco Research,company_King's College London,company_King County,company_JMC Legal Recruitment,company_Inference Solutions,company_Insights Analytics,company_Barrington James,company_Intellidyne Business Systems,company_InterSTEM Recruitment,company_Interaction Recruitment,company_Intermedia Global Ltd,company_Barran Graduate Recruitment,company_Jackson Rose Ltd,company_Kew Gardens,company_Jaguar Land Rover,company_Jonathan Lee Recruitment Ltd,company_Jumpshot,company_K3 Business Technologies,"company_KIZEN Technologies, Inc.",company_Kairos Recruitment Group,company_Kaplan International,company_Karagozian & Case,company_Los Angeles County Department of Human Resources,company_MBN Recruitment Solutions,company_New York City DEPARTMENT OF BUSINESS SERV.,company_MBN Solutions,company_Microsoft,company_Miriad Products Ltd,company_Monzo,company_Morehouse College,company_Morgan Law,company_4OC,company_Motorway,company_MySense,company_NFP People Limited,company_NFU Mutual,company_NFuzionIT,company_Nathan S. Kline Institute,company_National Grid,company_National Insurance Crime Bureau,company_Navagis,company_Nelson Global Products Inc.,company_New Benefits,company_New Business People,company_New York City ADMIN FOR CHILDREN'S SVCS,company_BMW Financial Services (GB) Ltd.,company_Metro Bank PLC,company_BRUIN Financial,company_Marie Curie,company_MIND.AI,company_MacGregor Black,company_Madiba Inc.,company_Manchester Airports Group,company_Mandeville Recruitment Group,company_Mango Solutions,company_Manufacturing Recruitment Ltd,"company_Maricopa County, AZ","company_MarketTrust, Inc.",company_Memorial MRI and Diagnostic LLC,company_Matthew Noah,company_Maveneer,"company_BUSINESS INTEGRA, INC",company_McCann Central,"company_Mechsoft Technology (USA) Co., LLC",company_MediaAndLanguageJobs.co.uk,company_Meet Recruitment,"company_Melax Technologies, Inc.",company_Morson International
importance,0.022467,0.022039,0.020102,0.0168,0.015447,0.013669,0.013502,0.013192,0.012535,0.012471,0.012449,0.012214,0.011904,0.011505,0.011391,0.01047,0.010392,0.010305,0.009684,0.009371,0.009032,0.008844,0.008396,0.008047,0.007741,0.007712,0.00726,0.007108,0.006837,0.006809,0.00648,0.006297,0.006223,0.005777,0.005292,0.005102,0.005071,0.005059,0.005047,0.0049,0.004818,0.004731,0.004674,0.004444,0.004434,0.004313,0.004244,0.004242,0.004119,0.003963,0.003915,0.003713,0.003562,0.00345,0.003382,0.003338,0.003306,0.003213,0.003191,0.003183,0.003139,0.003039,0.002948,0.002928,0.002891,0.002841,0.002789,0.002782,0.00276,0.002747,0.002714,0.002713,0.002679,0.002631,0.0026,0.002599,0.002579,0.002506,0.002457,0.002457,0.002425,0.002417,0.002277,0.00225,0.002244,0.002232,0.002232,0.002225,0.002209,0.002207,0.002205,0.002198,0.002181,0.002174,0.00217,0.002169,0.002162,0.00214,0.002111,0.00202,0.002014,0.002013,0.001988,0.001974,0.001974,0.001968,0.001919,0.001913,0.001901,0.0019,0.001897,0.001879,0.001826,0.001825,0.00182,0.00182,0.001743,0.001722,0.001703,0.001669,0.001645,0.001636,0.001613,0.001606,0.001601,0.001595,0.001587,0.001576,0.001514,0.001503,0.001502,0.00149,0.001474,0.001462,0.001452,0.001449,0.001442,0.001438,0.001427,0.001426,0.001421,0.001409,0.001396,0.001392,0.001382,0.001375,0.001368,0.001366,0.001363,0.001361,0.00136,0.001359,0.001347,0.001344,0.001343,0.001333,0.001331,0.00132,0.001318,0.001316,0.001315,0.001312,0.001309,0.001308,0.001308,0.0013,0.001295,0.001294,0.00128,0.001277,0.001275,0.00127,0.001251,0.001251,0.001247,0.001229,0.001229,0.001228,0.001227,0.001225,0.001222,0.001221,0.001216,0.001211,0.001208,0.001208,0.001207,0.001201,0.001198,0.001187,0.001182,0.001181,0.00117,0.001166,0.001161,0.001155,0.001149,0.001108,0.001107,0.001106,0.001105,0.001096,0.00108,0.001076,0.001073,0.001069,0.001069,0.001066,0.001056,0.001051,0.001038,0.001032,0.001027,0.001025,0.001024,0.001008,0.000995,0.000993,0.000986,0.000985,0.000978,0.000975,0.000961,0.00096,0.000951,0.00095,0.000948,0.000946,0.000945,0.000944,0.000939,0.000937,0.000932,0.000927,0.000925,0.000925,0.000924,0.000921,0.000918,0.00091,0.000909,0.000908,0.000908,0.000906,0.0009,0.000897,0.000894,0.000886,0.000883,0.000881,0.00088,0.000879,0.000877,0.000876,0.000876,0.000874,0.000874,0.00087,0.000865,0.000865,0.000865,0.000863,0.000858,0.000856,0.000856,0.000855,0.000853,0.000853,0.000852,0.000851,0.00085,0.000847,0.000847,0.000845,0.000844,0.000841,0.000841,0.000841,0.000841,0.00084,0.000838,0.000838,0.000835,0.000835,0.000835,0.000832,0.000832,0.00083,0.000829,0.000827,0.000827,0.000827,0.000821,0.000819,0.000819,0.000817,0.000815,0.000815,0.000812,0.000809,0.000809,0.000807,0.000807,0.000807,0.000807,0.000807,0.000807,0.000807,0.000805,0.0008,0.0008,0.000799,0.000799,0.000799,0.000797,0.000795,0.000792,0.00079,0.000789,0.000789,0.000785,0.000783,0.000783,0.000783,0.000779,0.000779,0.000777,0.000776,0.000776,0.000775,0.000774,0.000774,0.000773,0.00077,0.000768,0.000767,0.000765,0.000762,0.00076,0.000759,0.000757,0.000757,0.000757,0.000756,0.000754,0.000754,0.000753,0.000753,0.000752,0.000752,0.000752,0.000751,0.000751,0.00075,0.000744,0.000744,0.000743,0.000742,0.000742,0.000738,0.000736,0.000733,0.000733,0.000732,0.000731,0.00073,0.00073,0.00073,0.000729,0.000729,0.000728,0.000724,0.000724,0.000724,0.000721,0.00072,0.000719,0.000717,0.000716,0.000715,0.000714,0.000711,0.000711,0.000709,0.000709,0.000707,0.000705,0.000701,0.000701,0.0007,0.000699,0.000699,0.000697,0.000697,0.000696,0.000696,0.000696,0.000696,0.000692,0.000689,0.000688,0.000687,0.000686,0.000684,0.00068,0.000678,0.000677,0.000676,0.000676,0.000675,0.000674,0.000674,0.000674,0.000673,0.000673,0.000673,0.000673,0.000673,0.000673,0.000673,0.000672,0.000672,0.000669,0.000668,0.000668,0.000663,0.000661,0.00066,0.000658,0.000656,0.000654,0.000652,0.00065,0.000649,0.000649,0.000649,0.000647,0.000647,0.000647,0.000647,0.000647,0.000646,0.000642,0.00064,0.00064,0.000639,0.000639,0.000636,0.000633,0.000633,0.000631,0.00063,0.00063,0.000627,0.000627,0.000626,0.000626,0.000622,0.000622,0.00062,0.000616,0.000613,0.000609,0.000607,0.000606,0.000606,0.000606,0.000606,0.000605,0.000603,0.000603,0.000602,0.000601,0.000599,0.000597,0.000596,0.000592,0.00059,0.000588,0.000586,0.000583,0.000581,0.00058,0.000579,0.000578,0.000577,0.000577,0.000575,0.000572,0.000572,0.000571,0.00057,0.000562,0.000562,0.00056,0.00056,0.00056,0.000559,0.000557,0.000557,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [70]:
# Incorporate other text features from the title or summary that you believe will predict the salary:
    # Do they add any value?
tvec = TfidfVectorizer(stop_words='english', token_pattern='[A-Za-z]+', ngram_range=(1,1), max_features=500)
tvec.fit(indeed['summary'])

tvec_summary = pd.DataFrame(tvec.transform(indeed['summary']).toarray(), columns=tvec.get_feature_names())

X_summary = pd.concat([X_dum_title,tvec_summary], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X_summary, y, stratify=y, random_state=1)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
print('Training Score:', dt.score(X_train, y_train))
print('Cross Val Score:', cross_val_score(dt, X_train, y_train, cv=5).mean())
print('Test Score:', dt.score(X_test, y_test))

predictions = dt.predict(X_test)
print('Confusion Matrix:')
print(confusion_matrix(y_test, predictions))
print(' ')
print('Classification Report:')
print(classification_report(y_test, predictions))

dt_feature_importance = pd.DataFrame({'importance':dt.feature_importances_}, index=X_summary.columns)
dt_feature_importance.sort_values('importance',ascending=False).T

# we can see from the score vs cross val score that we've overfitted to the training data, we need to use a 
# gridsearch with decision tree or bagging / boosting classifier to regularize for this.

Training Score: 0.9959982214317474
Cross Val Score: 0.45352635486265774
Test Score: 0.4786666666666667
Confusion Matrix:
[[117  40  15  17]
 [ 29  83  51  29]
 [ 22  48  71  45]
 [ 12  31  52  88]]
 
Classification Report:
              precision    recall  f1-score   support

        0_25       0.65      0.62      0.63       189
       25_50       0.41      0.43      0.42       192
       50_75       0.38      0.38      0.38       186
      75_100       0.49      0.48      0.49       183

    accuracy                           0.48       750
   macro avg       0.48      0.48      0.48       750
weighted avg       0.48      0.48      0.48       750



Unnamed: 0,data,analyst,experience,machine,isGraduate,engineer,company_University of Oxford,analysis,isSenior,intelligence,rating_None,isDirector,business,isDE,team,analytics,search_city_Cambridge,isHead,isLead,artificial,software,company_Inspiring Interns,isApprentice,skills,using,isDA,search_city_Birmingham,isJunior,manager,digital,looking,search_city_Edinburgh,isBI,isDS,techniques,information,search_city_New+York,search_city_Nottingham,working,search_city_London,search_city_Reading,management,company_Metrica Recruitment,scientist,learning,engineering,company_Lead Foot Digital,global,marketing,health,cloud,development,including,isAI,based,strong,technologies,sql,relevant,isManager,systems,insights,join,role,finance,following,ai,company_Oxford Brookes University,experienced,tech,warehousing,knowledge,events,developer,research,java,company_The Yerba Mate Co.,sales,provide,company_University of Cambridge,platform,years,driven,reports,rating_3.7,junior,projects,s,senior,new,london,search_city_Pittsburgh,clients,seeking,operations,customer,company_STEM Graduates,company_Understanding Recruitment,perform,leading,company_Harnham,deep,required,performance,warehouse,aws,k,background,post,google,rating_3.5,python,company_Aspire Data Recruitment,executive,databases,content,central,vision,develop,identify,salary,rating_3.6,results,complex,support,good,scientists,problem,work,job,ssis,professional,requirements,big,science,exciting,rating_4.4,company_Digital Taxonomy,algorithms,g,company_New York City HOUSING PRESERVATION & DVLPMNT,related,mining,preferred,search_city_Miami,rating_4.0,company_Catch Resource Management Ltd,responsible,benefits,hadoop,etl,rating_5.0,solutions,rating_4.9,high,company_Imperial College London,designing,tableau,year,service,company_Moriati,sources,architecture,include,pipelines,tools,lead,rating_2.7,opportunities,applications,field,computing,consultancy,company_University of Warwick,various,search_city_Denver,improve,different,rating_4.2,isML,understanding,search_city_Portland,individuals,limited,office,company_University of Westminster,great,company_ZenShin Talent,start,hands,search_city_Los+Angeles,date,processes,graduate,activities,key,independent,company,company_JMC Legal Recruitment,company_Psixty Recruitment,insurance,able,current,architect,company_Linux Recruit,models,training,company_Lloyds Banking Group,company_Catalyst forward,ability,technical,monitoring,opportunity,program,company_Los Angeles Homeless Services Authority,associate,project,responsibility,"company_Technology Authority, Georgia - GTA",time,company_US Department of Transportation,offer,company_Harcourt Matthews,venturi,social,company_iKas International,search_city_Dallas,amounts,azure,mainly,real,market,statistics,proven,company_Frazer-Nash Consultancy,production,company_Adaptive Tech,company_TalentPool,company_Saunders Construction Inc.,product,company_Capita IT Resourcing,effectively,company_HM Revenue and Customs,company_FINTEC recruit,company_Oregon Health & Science University,held,sector,security,record,report,e,company_Farm-Hand Ltd,areas,future,ideal,gathering,rating_4.3,track,company_Tessella Ltd,transformation,company_Florida Grand Opera,company_Guru,company_Archangel Group,company_DHA Housing Solutions for North Texas,expert,company_Berkeley HR,reporting,day,company_Florida International University,position,person,company_Douglas Jackson,company_YouGov,trends,computer,closely,rating_4.6,agency,building,search,company_Noria Water Technologies,media,medical,creating,company_ShareForce,employment,company_Indeed,company_RMG Digital,company_Austin Fraser,director,"company_Weee!, Inc.",rating_3.4,search_city_Phoenix,established,company_Mandeville Recruitment Group,database,technology,company_Bond Williams,multiple,apply,language,need,company_Michael Page UK,scale,search_city_Seattle,expertise,visualization,company_Buckinghamshire Healthcare NHS Trust,employees,rating_3.2,company_Talent Crew,design,bi,company_Care UK Healthcare,build,company_Pertemps Network Group,rating_3.0,closing,teams,insight,r,consultant,company_Reqiva,environment,use,practice,commercial,company_Liberty Mutual Insurance,assist,company_Warwickshire County Council,application,ensuring,company_CybSafe,company_New York City DEPT OF DESIGN & CONSTRUCTION,company_Northpoint Recruitment,similar,advanced,company_McKinsey & Company,"company_TRIPBAM, Inc.",t,company_Syntax Consultancy Limited,company_Prodsight,analytical,client,company_experisuk,company_karros technologies,timean,rating_2.6,company_Virgin Media,company_Virtuous Software,company_Vision Consulting,company_VisionTree Ventures,company_Vivacity Labs,company_Vodafone,company_Vody,company_Vohra Wound Physicians,company_Vesta Home,company_re&m,company_Volt Europe,company_Volume Ltd,universities,company_Volvo Group,company_WISMETTAC ASIAN FOODS,company_WMOL,rating_2.5,title,company_WORLDWIDE BUSINESS RESEARCH Ltd,understand,company_Vichara,company_Vertus Partners,rating_2.8,company_Waldo Photos,used,company_menschForce LLC,company_University of Wolverhampton,company_kdr Recruitment,company_University of the Arts London,company_UpNest,company_Urban Empire Recruitment,company_Utility People,company_psKINETIC,company_Utility People Ltd,company_V Selective Ltd,company_VBCare Network,company_Vadlo Systems,rating_3.1,company_Vector Recruitment Limited,rating_2.9,company_Veeve,company_Velocity Black,company_Venatrix,company_Venturi,company_Verbio Technologies,company_Veridium,"company_Vertical Careers, Inc.",company_Wade Macdonald,rating_2.3,company_eTek IT Service,company_ZealNine,company_Yellow Bricks,company_Yobota,uk,company_YouLand Inc,company_Zap,company_wa consultants,travel,company_upsgs,company_Zarathustra Technologies Ltd.,company_iGreen Remodeling,company_Zazu Digital,company_hoytNIVA,rating_2.2,typically,company_Zebra People,company_focaldata,company_Zelus Analytics,type,company_ficus Inc,company_Zencargo,company_Zotefoams,company_Zynx Technologies Limited,company_adaptai,company_umat,company_Xihelm,company_XenZone,company_Xcelacore,company_iProov,rating_2.0,company_Walker Hamill,company_Warner Scott,company_informu Inc.,company_iTeleNet Engineering Solution,company_Watford General Hospital,company_Wayhome,company_myfuturerole.com,rating_1.5,company_Weee!,undergraduate,company_West Hertfordshire Hospitals NHS Trust,company_West Nottinghamshire College Group,company_Westchester Community College,company_we source group,company_Whistl,company_WhiteHat,company_ucreate,company_Willoughby Professional Ltd,company_University of Texas at Dallas,company_Woodland Trust,company_Woolf Group,company_Worldwide Business Research,company_Wecudos,innovative,company_University of Surrey,company_Tecknuovo,company_Thames Water Utilites,company_Texas Department of Transportation,company_Texas Comptroller of Public Accounts,whilst,"company_Terrace Consulting, Inc.",company_Teleboom Inc D/B/A Deter24 Monitoring,company_Technology Resourcing Ltd,company_Taylorollinson,wide,company_Technnect3 Marketing Operations SL,company_Technics Group,company_Technet IT Recruitment,world,company_TechNET IT Recruitment Ltd,company_The AA,company_The Chat Shop,company_The Chicago Metropolitan Agency for Planning (CMAP...,web,company_The Green Recruitment Company,company_The Horniman Museum,company_The Horticultural Trades Association,company_The Institute of Cancer Research,company_The JM Group,company_The Margolis Team Inc,company_The People Network,company_The Perk Company Refresh Vending,company_The Pioneer Group,company_The Risk Partners,company_The Rosalind Franklin Institute,company_The Shopworks,company_The SmartList,company_TechMate,company_Taylor James Resourcing,company_University of Reading,company_Symmetry Lending,company_Synchrony Bank,company_Synchrony,company_Synchro Recruitment,company_Synchro Europe,company_Synaptic Resources Ltd,company_Synaptek,company_Symmetric Health Solutions LLC,company_Target Public Media,company_Surrey and Borders Partnership NHS Foundation Trus...,company_Sure Green,company_Sumner & Scott,...,company_Adzuna,company_BiggerPockets Inc.,company_Birdie,company_Bitcoin.com,company_Childhood Cancer Data Lab,company_Chesterfield Royal Hospital NHS Foundation Trust,company_Chesterfield College,company_Chase & Holland Recruitment,company_Cervest,company_Center for Employment Opportunities,company_Celsius Graduate Recruitment,company_Cedar Recruitment Limited,company_CauseForce LLC,company_Catsurveys,company_Catman Jobs,company_Cathcart Associates,company_CatchFish,company_Catapult Learning,company_Carrot Pharma Recruitment,company_Carlton Recruitment,company_Carbon60,company_Capstone Hill Search Limited,company_Capital One - UK,company_Capita Plc,company_Cancer Research UK,company_Cambridge Assessment,company_Chi Square Analytics,company_Chime Communications Plc,company_Black Hills Energy,company_Chryselys,company_Climate Policy Initiative,company_Client Server,company_Click IT Consulting,company_Clearabee,"company_ClearBlade, Inc",company_Clarip,company_City of Sunny Isles Beach,"company_City of Mesa, AZ","company_City of Los Angeles, CA","company_City of Houston, TX",company_City of Hillsboro Oregon,"company_City of Glendale, AZ","company_City of Dallas, TX",company_City Pantry,company_Cititec,company_Cisco,company_Cirrus Selection,company_Circle Recruitment,company_Cintas,company_Cifas,company_Church International Ltd.,company_Cambridge Advisory Group,company_California creative solutions,company_California State University,company_Cadent Gas,company_Brunel University,company_Bromford,company_BrokerCompare.co (eXp Realty),company_British Rowing,company_British Private Equity and Venture Capital Associa...,company_Brightwell,company_Brightred,company_Brightflag,company_Brewster Partners,company_Brewer Direct Inc.,"company_Boulder County, CO",company_Boston Consulting Group,company_Bobtrade,company_Bluetownonline Ltd,company_Blue Pelican,company_Blue Owl,company_Blue Legal,company_Blu Digital,company_Bliss Point Media,company_Blayze Group,company_Blackstone & Cullen,company_Bryant Associates,company_Brytecore,company_Bulletproof,company_CGA Strategy Ltd,company_Cactus Search,company_CYTED Ltd,company_CV Screen,company_CV Locator,company_CSL,company_CRU,company_CRS Temporary Housing,company_CPA Global Limited,company_CKB Recruitment,company_CDG,company_Burns Sheehan,company_CD Sales Recruitment,company_CCS Global Tech.,company_CCG Associates,company_CCFE,company_CBSbutler,company_CBRE,company_CAMILLUS HEALTH CONCERN,company_Butler Rose Ltd,company_BusinesStaff,company_Huntress,company_Hytalentech,company_State of Colorado Job Opportunities,company_Pyramid Healthcare,company_Public Sector Information Ltd,company_Public Health England,company_Prospective,company_Project Start Recruitment Solutions,company_Profusion,company_Professional Technical Recruitment,company_Proactive.IT Appointments,company_ProFocus,company_ProArch,company_Prism Digital,company_Preventa Medical Corporation,company_Prestin Technology Ltd,company_Premium Credit Limited,company_Precision Distribution Consulting,company_Pplumm,company_Port of Portland,company_Plexus Resource Solutions,company_Playtech,"company_Pixelogic Media Partners, LLC",company_Pixel Pond,company_Picked.ai,company_Purdue University,company_QEH2 Business Intelligence,company_Petroineos,company_QinetiQ,company_Regan & Dean Recruitment Ltd,company_Redline Group,company_Redfox Executive Selection Ltd.,company_RedTech Recruitment,company_Red10,company_RecruitmentRevolution.com,company_Reckon Digital,company_Reata Realty,company_Real Time Recruitment Solutions,company_Real Good foods,company_Rango Recruitment,company_RaiseMe,company_Radley Green,company_RTD,company_RICS,company_RHONDOS,company_REPL Group,company_RELX Group,company_R2M Marketing Solutions,company_Qurious Associates,company_Quantexa,company_Phoenix Optics,company_Permanent People,company_Remarkable Jobs,company_Opus Recruitment Solutions,company_Optimus E2E Limited,company_Optima IT Recruitment,company_Operation Warm,company_Openview,company_Openreach,company_OpenRent,company_Onyx Infosoft,company_Online Mortgage Advisor,company_Online Filings,company_One Housing Group,company_On Track Recruitment & Training Ltd,company_On Track Recruitment,company_Omnicell,company_Oliver James Associates,company_Oho Group,company_Octagon Group,company_Ocean Media Group Ltd,company_Oakmoor Recruitment,company_OSR Recruitment,company_ONE WORLD EXPRESS,"company_Nuvola Staffing & Solutions, LLC.",company_Opus Interactive,company_Orbit Group,company_Percepta,company_Ortus-iHealth,company_People Source Consulting,company_Pentasia,company_Pennsylvania Lumbermens Mutual Insurance Company,company_Pearson Whiffin Recruitment,company_Pearson Frank,company_Park-IT,company_Pareto Law,company_Parallel Consulting,company_Page Personnel - UK,company_Packt Publishing,company_Packback Inc.,company_PR Futures,company_PIPs Rewards,company_PGS LTD,company_PCS GlobalTech,company_PCS Global Tech,company_P N Daly Limited,company_Oxford University Press,company_Oviva,company_Outsource UK,company_Oscar Technology,company_Regis University,company_Rentsys Recovery Services,company_IBD Registry,company_4OC,company_Skypath,company_Skillsearch Limited,company_Simplify Reality Inc,company_Simplified Recruitment,company_Simfoni Analytics Limited,company_Silverdrum,company_SilkFred,company_Siemens AG,company_Showsec,company_ShortList,company_Shiraka,company_Shenwick Recruitment,company_Shedul,company_Shackleton Duke,company_Serco Group,company_Sentinel,company_Semta,company_Selecta,company_SeeByte Ltd,company_SeeByte,company_Secret Intelligence Service,company_Slicedbread.agency,company_Smash Entertainment,company_Searchability,company_SmithBayes,company_State of Arizona,company_Star Recruitment,company_Stanton House,company_Stafffinders,company_St John Ambulance Employees,company_Sproutt,company_Spring,company_Spectrum IT,company_Specialist Recruit International,company_Sparta Global,"company_Spacee, Inc.",company_Southern Housing,company_Source Enterprises,company_Source,company_SoulTek,company_Sopra Steria,company_Solologic,company_Soho House & Co.,company_Social Finance Limited,company_SoCode,company_Snap Finance Ltd (UK),company_Seattle Indian Health Board (SIHB),company_Scottish Water,company_Request Technology,company_Royal Holloway University of London,company_Round Rock Independent School District,company_Ross4Marketing,company_Rocky Mountain Industrials,company_RoboTech recruitment,company_Robertson Sumner,company_Robert Half United Kingdom,company_Rivus Fleet Solutions,company_Riverbright Recruitment,company_RishTani Technologies Limited,company_Rise Technical Recruitment Limited,company_Richard Wheeler Associates,company_Revoco Limited,company_Revere Digital,company_Reuters Events,company_Retrace Labs,company_Retention Science,company_Retail-BCG,company_Responsiv Solutions,company_Respect Consulting Group,company_Research Foundation of The City University of New...,"company_Research Foundation for Mental Hygiene, Inc.",company_Royal Automobile Club,company_Runnymede Borough Council,company_Science Museum Group,company_SCRRA/Metrolink,company_Science Museum,company_Scale Media,company_Savant Recruitment,company_Satavia,company_Sanderson Recruitment Plc,company_Salvagnini America,company_Salt Recruitment,company_Saffron Resourcing,company_SW6 Associates,company_STFC,company_STEP Strategy Advisors,company_STATS,company_SQ Computer Personnel Limited,company_SPEKTRIX,company_SOLUTE,company_SMIIT Ltd,"company_SHELTER, Inc.",company_SF Group,company_SERG Technologies,company_SELDOC (South East London Doctors Cooperative),company_SDH Systems,company_Nuvola Resourcing,company_Nutrisystem,company_Nuffield Health,company_Los Angeles Unified School District,company_Lorien Resourcing,"company_Longs Peak Advisory Services, LLC","company_Long Island FQHC, Inc.",company_Long Beach City College,company_London Tri-Borough Councils,company_London School of Hygiene & Tropical Medicine,company_Lloyd Recruitment Services,company_Livingstone Technologies,company_LiveRamp,company_Linnk Group Limited,company_Lidl,company_LevelPrime Limited,company_Level Agency,company_Legal & General Group Plc.,company_Lee College,company_Leading UK Pension Fund,company_Lawrence Harvey,company_Law Business Research,company_Language Matters Recruitment Consultants Ltd.,company_Langley James IT Recruitment,company_Lambeth Council,company_Los Angeles County Department of Human Resources,company_Loyal Retainers,company_LDN Apprenticeships,company_Lyst,company_Marie Stopes International,company_Marie Curie,"company_Maricopa County, AZ",company_Maricopa Community Colleges,company_Manufacturing Recruitment Ltd,company_Mango Solutions,company_Mane Contract Services Limited,company_Manchester Airports Group,company_Mammoth Media,company_Magic Carpet AI,company_Madiba Inc.,company_MacGregor Black,company_MW Recruitment,company_MW Appointments,company_MIND.AI,company_MBN Solutions,company_MBN Recruitment Solutions,company_MASS,company_MAQ Software,company_MAPSCorps,company_M-Brain,company_La Fosse Associates,company_L'Oréal,company_Nucleome Therapeutics Ltd,company_J&C Associates Ltd,company_Interaction Recruitment,company_InterSTEM Recruitment,company_InterQuest Group,company_Intellidyne Business Systems,company_IntelliSense.io,company_IntelliSense Systems Inc.,company_Inspire People,company_Insights Analytics,company_Initi8 Recruitment,company_Informatiq Consulting,company_Inference Solutions,"company_Indigent Legal Services, Office of",company_Incisive Media,company_Imperial College Healthcare NHS Trust,company_Impact Proteomics,company_ITTStar Consulting LLC,company_IQPC,company_IPS Group,company_IPROS Insurance Professionals,company_INOV8 Consulting Ltd,company_IC Resources,company_Intermedia Global Ltd,company_JDX Consulting,company_L&Q,company_JFL Search & Selection,company_Klanik Corp,company_Kite Group,company_Kisaco Research,company_King's College London,company_King County,company_Kew Gardens,"company_Kellington Protection Service, LLC",company_Karagozian & Case,company_Kaplan International,company_Kairos Recruitment Group,"company_KIZEN Technologies, Inc.",company_KIPP Texas Public Schools,company_K3 Business Technologies,company_Jumpshot,company_Jonathan Lee Recruitment Ltd,company_Jobs in Letting,company_Jenrick Group,company_Jaguar Land Rover,company_Jackson Rose Ltd,company_JPE,company_JP Engineering Recruitment,company_Market Vector,"company_MarketTrust, Inc.",company_Marks Sattin Specialist Recruitment,company_New York City DEPT OF YOUTH & COMM DEV SRVS,company_New York City DEPT OF INFO TECH & TELECOMM,company_New York City DEPT OF HEALTH/MENTAL HYGIENE,company_New York City DEPARTMENT OF TRANSPORTATION,company_New York City DEPARTMENT OF INVESTIGATION,company_New York City DEPARTMENT OF FINANCE,company_New York City DEPARTMENT OF CORRECTION,company_New York City DEPARTMENT OF BUSINESS SERV.,company_New York City BOARD OF CORRECTION,company_New York City ADMIN FOR CHILDREN'S SVCS,company_New Business People,company_New Benefits,"company_Neumeister & Associates, LLP",company_Neudesic,company_Netleadz,company_Neo Prism Solutions,company_Nelson Global Products Inc.,company_Navigator CRE,company_Navagis,company_National Network of Public Health Institutes,company_National Insurance Crime Bureau,company_National Grid,company_New York City DEPT OF PARKS & RECREATION,company_New York City Department of Education .,company_Marshall Wolfe,company_New York City Employees’ Retirement System (NYCERS...,company_NuView Analytics,company_Novation Solutions Ltd,company_Nonstop World (Tandemworld) Ltd,company_NinetyThousandHours,company_Ninety Thousand Hours,company_Nicholson Glover Consulting,company_Niche employment solutions,company_Niantic International Technology Limited,company_Niantic,company_Nexus Jobs Limited,company_Nexus Jobs,company_Newtons Recruitment,company_Newmedica,company_Newbury Building Society,company_New York City TAXI & LIMOUSINE COMMISSION,company_New York City OFFICE OF EMERGENCY MANAGEMENT,company_New York City NYC HOUSING AUTHORITY,company_New York City MAYORS OFFICE OF CONTRACT SVCS,company_New York City LAW DEPARTMENT,company_New York City FIRE DEPARTMENT,company_New York City FINANCIAL INFO SVCS AGENCY,company_National Entertainment Network,company_Nathan S. Kline Institute,company_NYCM,company_NP Group,company_Metro Bank PLC,company_Method Resourcing,company_Memorial MRI and Diagnostic LLC,"company_Melax Technologies, Inc.",company_Meet Recruitment,company_Medicine Man Technologies,company_MediaAndLanguageJobs.co.uk,company_Media IQ Recruitment,company_Media Contacts ltd,"company_Mechsoft Technology (USA) Co., LLC",company_Mcgregor Boyall,company_McGregor Boyall,company_McCann Central,company_Mazars,company_Mayor's Office of Contract Services,company_Maxx Builders,company_Maveneer,company_Matthew Noah,company_Match Digital,company_Mason Blake,company_Martin and Conley,company_MetroPlus Health Plan,company_Micro Focus,company_Microsoft,company_Morson International,company_NHS Blood and Transplant,company_NFuzionIT,company_NFU Mutual,company_NFP People Limited,company_Myriad,company_MySense,company_Mudano,company_Motorway,company_Mortimer Bell International Ltd,company_Morgan McKinley,company_Migacore Technologies,company_Morgan Law,company_Morehouse College,company_Moorfields Eye Hospital NHS Foundation Trust,company_Moonshot CVE,company_Monzo,company_Monarch,company_Moixa Energy Holdings,company_Miriad Products Ltd,"company_Milshar, LLC",company_Smart 1 Recruitment Limited
importance,0.035806,0.027501,0.019975,0.018469,0.018092,0.015749,0.014825,0.013399,0.012858,0.012536,0.012513,0.012358,0.011485,0.011416,0.011387,0.01075,0.010699,0.009708,0.009689,0.009633,0.009616,0.009238,0.009042,0.007732,0.007585,0.007545,0.006694,0.00651,0.006397,0.006273,0.006266,0.006235,0.006188,0.006013,0.005953,0.005683,0.005646,0.005613,0.005556,0.005486,0.005311,0.005276,0.005266,0.005256,0.005247,0.005113,0.005061,0.005058,0.004994,0.004964,0.004941,0.004879,0.00461,0.004518,0.004421,0.004412,0.004389,0.004341,0.004326,0.004306,0.004286,0.004173,0.003994,0.003899,0.003799,0.003785,0.003779,0.003775,0.003755,0.003719,0.003695,0.003546,0.003536,0.003533,0.003462,0.003432,0.003428,0.003404,0.003389,0.003381,0.003375,0.003371,0.003363,0.003329,0.003275,0.003266,0.00325,0.003236,0.003234,0.003225,0.003166,0.003158,0.003124,0.003064,0.003039,0.003034,0.002994,0.002945,0.002942,0.002895,0.002875,0.002873,0.00284,0.00278,0.002773,0.002754,0.002749,0.002734,0.00268,0.002653,0.002621,0.0026,0.002504,0.002492,0.00243,0.002348,0.002336,0.002329,0.002305,0.002302,0.002294,0.002292,0.002277,0.00226,0.00224,0.002234,0.00217,0.002155,0.002142,0.002125,0.002112,0.002095,0.002084,0.00208,0.002077,0.002073,0.002073,0.002072,0.002067,0.002066,0.002038,0.002028,0.002017,0.002014,0.002014,0.001999,0.001968,0.001948,0.00193,0.001898,0.001898,0.001885,0.001864,0.001859,0.001857,0.001853,0.001852,0.001843,0.001842,0.001837,0.001811,0.001795,0.001792,0.001789,0.001787,0.001764,0.001761,0.001722,0.001707,0.00167,0.001655,0.001636,0.001628,0.001627,0.001619,0.001618,0.001607,0.001591,0.001589,0.001583,0.001582,0.001576,0.001572,0.001571,0.001569,0.001559,0.001558,0.001548,0.001541,0.001523,0.001506,0.001487,0.001472,0.001463,0.001449,0.001431,0.0014,0.001392,0.00137,0.001346,0.001342,0.001336,0.001326,0.001316,0.001298,0.001292,0.001283,0.001255,0.001245,0.001223,0.001206,0.001205,0.001205,0.001204,0.001198,0.001193,0.001177,0.001167,0.001167,0.00116,0.001154,0.001153,0.001153,0.001149,0.001143,0.001139,0.001134,0.001127,0.001127,0.001123,0.001117,0.001113,0.001108,0.001108,0.001106,0.0011,0.001096,0.001092,0.001089,0.001084,0.001084,0.001082,0.001074,0.001074,0.001074,0.001074,0.001065,0.001064,0.001063,0.001061,0.001056,0.001053,0.001044,0.001044,0.001043,0.001031,0.001029,0.001027,0.001022,0.001022,0.001022,0.001022,0.001017,0.001015,0.001014,0.001011,0.000994,0.000991,0.000986,0.000972,0.000968,0.000958,0.000954,0.000954,0.000954,0.000954,0.000954,0.000954,0.000954,0.000954,0.000954,0.00095,0.000948,0.000948,0.000929,0.000921,0.000909,0.000909,0.000907,0.000904,0.000895,0.000895,0.000895,0.000895,0.000895,0.000895,0.000895,0.000895,0.000895,0.000894,0.00088,0.000878,0.000831,0.000809,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000795,0.000788,0.000756,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000596,0.000547,0.000537,0.000447,0.000398,0.000289,0.000199,0.000199,9.9e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [71]:
X_summary.head()

Unnamed: 0,company_3Search,company_4OC,company_83zero,company_A Closer Look,company_A for Appointments,company_ADLIB,company_ADR Markets,company_AKUVO,company_ANB Systems Inc.,company_ANIX Valve USA,company_ARC Group Ltd,company_ATA Recruitment Ltd,company_AXA UK,company_Absolute Appointments LTD,company_Accountancy Action,company_Acrotrend Solutions Limited,company_Acute Data Systems,company_Adams County Colorado,company_Adaptive Digital,company_Adaptive Tech,company_Adatis,company_Additional Resources,company_Admiral Instruments,company_Adobe,company_Adthena,company_Advaion,company_Advancing Analytics,company_Advento Staffing,company_Adzuna,company_Agency Within,company_Agility Recovery,company_Agility Resourcing,company_Aiimi Ltd,company_Air Products,company_Airfinity,company_AkzoNobel,company_Alameda Alliance,company_Alcumus,company_Aldi,company_All Valley Home Health Care,company_Allen Associates,company_Allen Lane,company_Allianz,company_AllocateRite,company_Alluma,company_Alscient Limited,company_Alvarez & Marsal,company_Amazing Prospects Ltd,company_Amida Recruitment Limited,company_Ampersand Consulting,company_Anglia Ruskin University,company_Anthony Nolan,company_Anya Consultancy Services Limited,company_Apex Vision FItness,company_Aptonet Inc,company_Archangel Group,company_Arizona Department of Public Safety,company_Arizona State University,company_Articulate Group Ltd,company_Artos Systems,company_Aspire,company_Aspire Data Recruitment,company_Asset Resourcing,company_Association for Mental Health & Wellness,company_Astex Pharmaceuticals,company_Aston University,company_Astroscreen,company_At-Risk International,company_Atkins,"company_Aunt Bertha, a Public Benefit Corp.",company_Austin Fraser,company_Austin Rose,company_AvA-V,company_AvantStay,company_Avenue Homes,company_Avios Group,company_Avison Young,company_Awaken Intelligence Limited,company_B2M Solutions,company_BAE Systems,company_BERRICLE,company_BICP,company_BMS Performance,company_BMW Financial Services (GB) Ltd.,company_BRUIN Financial,company_BT,"company_BUSINESS INTEGRA, INC",company_Babylon Health,company_Bangura Solutions,company_Barclay Simpson,company_Barna Shields Recruitment,company_Barran Graduate Recruitment,company_Barrington James,company_Be-IT Resourcing,company_Behaviour Lab,company_Benchmark International,company_BenevolentAI,company_Berkeley HR,company_Berkeley Research Group (UK) Ltd,company_Better Homes and Gardens Real Estate Move Time Rea...,company_Bettor Believe,company_Beyond Outsourcing Inc.,company_BiSoft,company_BiggerPockets Inc.,company_Birdie,company_Bitcoin.com,company_Black Hills Energy,company_Blackstone & Cullen,company_Blayze Group,company_Bliss Point Media,company_Blu Digital,company_Blue Legal,company_Blue Owl,company_Blue Pelican,company_Bluetownonline Ltd,company_Bobtrade,company_Bond Williams,company_Boston Consulting Group,"company_Boulder County, CO",company_Brewer Direct Inc.,company_Brewster Partners,company_Brightflag,company_Brightred,company_Brightwell,company_British Private Equity and Venture Capital Associa...,company_British Rowing,company_BrokerCompare.co (eXp Realty),company_Bromford,company_Brunel University,company_Bryant Associates,company_Brytecore,company_Buckinghamshire Healthcare NHS Trust,company_Bulletproof,company_Burns Sheehan,company_BusinesStaff,company_Butler Rose Ltd,company_CAMILLUS HEALTH CONCERN,company_CBRE,company_CBSbutler,company_CCFE,company_CCG Associates,company_CCS Global Tech.,company_CD Sales Recruitment,company_CDG,company_CGA Strategy Ltd,company_CKB Recruitment,company_CPA Global Limited,company_CRS Temporary Housing,company_CRU,company_CSL,company_CV Locator,company_CV Screen,company_CYTED Ltd,company_Cactus Search,company_Cadent Gas,company_California State University,company_California creative solutions,company_Cambridge Advisory Group,company_Cambridge Assessment,company_Cancer Research UK,company_Capita IT Resourcing,company_Capita Plc,company_Capital One - UK,company_Capstone Hill Search Limited,company_Carbon60,company_Care UK Healthcare,company_Carlton Recruitment,company_Carrot Pharma Recruitment,company_Catalyst forward,company_Catapult Learning,company_Catch Resource Management Ltd,company_CatchFish,company_Cathcart Associates,company_Catman Jobs,company_Catsurveys,company_CauseForce LLC,company_Cedar Recruitment Limited,company_Celsius Graduate Recruitment,company_Center for Employment Opportunities,company_Cervest,company_Chase & Holland Recruitment,company_Chesterfield College,company_Chesterfield Royal Hospital NHS Foundation Trust,company_Chi Square Analytics,company_Childhood Cancer Data Lab,company_Chime Communications Plc,company_Chryselys,company_Church International Ltd.,company_Cifas,company_Cintas,company_Circle Recruitment,company_Cirrus Selection,company_Cisco,company_Cititec,company_City Pantry,"company_City of Dallas, TX","company_City of Glendale, AZ",company_City of Hillsboro Oregon,"company_City of Houston, TX","company_City of Los Angeles, CA","company_City of Mesa, AZ",company_City of Sunny Isles Beach,company_Clarip,"company_ClearBlade, Inc",company_Clearabee,company_Click IT Consulting,company_Client Server,company_Climate Policy Initiative,company_Clinical Professionals,company_Clockwork Recruitment,company_Cobalt Recruitment,company_Coburg Banks,company_Comcast,company_Community College of Aurora,company_Community College of Denver,company_Compass Associates,company_Compassion UK,company_ComplyAdvantage,company_Computer Enterprises,"company_Computer Enterprises, Inc. (CEI)",company_Concept Resourcing,company_Conduit Data Services,company_Consilium Recruit,company_Consortia,company_Context Recruitment,company_Cook County Sheriff’s Office,company_Cordius,company_Core Tech Recruitment,company_Corecom Consulting,company_Cornwaliis Elt,company_Cortex IT Recruitment,company_Cottonwood Financial,company_County of San Mateo,company_Coventry Building Society,company_Coventry University,company_Covered Insurance Solutions,company_Create Music Group,company_Creative Personnel,company_Crowd Link Consulting,company_CruiTek,company_Cubex LLC,company_Cundall,company_Curo Talent,company_Cute Resource,company_CybSafe,company_Cyber Tech Company,company_DCA Recruitment,company_DCL Search and Selection,company_DHA Housing Solutions for North Texas,company_DISYS,company_DM People Recruitment Consultants,company_DP Connect,company_DVCanvass,company_DVF Recruitment,company_Daniel Alexander Recruitment,company_Data Kraken Consultancy Ltd,company_Data Ninjas Inc,"company_Data Warehouse Consultants, LLC",company_Datascope Recruitment,company_Datatech Analytics,company_DeVries Global,company_DearDoc,"company_DecisionIQ, Inc.",company_Deerfoot,company_Defined Clarity,company_Degree Analytics,company_Delaware Valley Regional Planning Commission,company_Denver Public Schools,company_Derotek,company_Detail2Recruitment,company_Diamond Light Source,company_Digital Catapult,company_Digital Creative Institute,company_Digital Find Recruitment,company_Digital Gurus,company_Digital Taxonomy,company_Digital Uncut,company_Digitive LLC,company_Dimensions UK Ltd,company_Discourse.ai,company_Distinct Recruitment,company_Diverse Talent Solutions,company_Divido,company_Doctors Without Borders/Médecins Sans Frontières (...,company_Dome Recruitment,company_Doris IT,company_Douglas Jackson,company_Drift Net Securities,company_Dudley and Walsall Mental Health Partnership NHS T...,company_E-Resourcing,company_E.ON UK,company_ECM Selection,company_EEG Enterprises,company_EGIS INC,company_EMR Marketing Recruitment,company_EOS Deal Advisory,company_ERP Maestro,company_EZOPS Inc,company_EarthSense Systems,company_Eason Group,company_Echobox,company_Eden Brown,company_Edmonds Community College,company_Educational Service District 112,company_Efficient Frontiers International,company_Elevations Credit Union,company_Eleventh Judicial Circuit of Florida,company_Eligo Recruitment,company_Eligo Recruitment Ltd,company_Elite Crowdfunding limted,company_Emma Technologies LTD,company_Emma Walsh Talent,company_Emory School of Medicine,company_Empirical Search,company_Energy Assets,company_Enterprise Recruitment Limited,company_Estio Technology,company_Estio Training,company_Eurowagens,company_Evermore Global,company_Everpress,company_Evolution Recruitment Solutions,company_Exact Sourcing ltd,company_Excelerate Recruitment Partners,company_Executive Recruitment Services,company_Expedia Group,company_Explore Group,company_Express Recruitment,company_Ezoic,company_FACEIT,company_FDM Group,company_FINTEC recruit,company_FL Group UK Ltd,company_FTD,company_Fair Recruitment,company_Fanbank,company_Farm-Hand Ltd,company_Five Guys,company_Flight Centre (UK) Limited,company_Flight Centre Travel Group,company_Florida Grand Opera,company_Florida International University,company_Flux,company_Focus Multimedia Limited,company_Formative Content,company_Forsyth Barnes,company_Foster Denovo,company_FourFront,company_Foxtons,company_Franklin Bates,company_Frazer-Nash Consultancy,company_French Selection,company_Freshtech IT,company_Futureheads Recruitment,company_G-Research,company_GCS Recruitment Specialists Ltd,company_GDS Group,company_GMAD,company_GVC Careers,company_GVC Holdings,company_GatenbySanderson,company_Gemini,company_Gemini People,company_Georgia Department of Public Health,company_Gi Group,company_Gigaclear,company_Give A Grad A Go,company_Global Accounting Network,company_Global Market Summits – Chancery Lane,company_Global Risk Partners Group,company_GlobalData,company_GlobalData PLC,company_Globizz Corp.,"company_Go, Inc.",company_Grandview Corporation,company_Gravity Technology Solutions,company_Greene Lab,company_Greenwell Gleeson,company_Growth Intelligence,company_Guru,company_Guru Systems,company_Guy's and St Thomas' NHS Foundation Trust,company_Gwinnett County,company_HCL America Inc,company_HCML,company_HGS Digital,company_HM Revenue and Customs,company_HOLLA,company_HRIS Associates Ltd,company_HamlynWilliams,company_Hanami International,company_Handle Recruitment Ltd,company_Hanson Wade,company_Harcourt Matthews,company_Harnham,company_Harnham US,company_Harrington Starr,company_Harrison Holgate,company_Harvey Thomas,company_Hasson Associates,company_Hawke Media,company_Head Resourcing Limited,company_Health Education England,company_Hearst,company_Heavens Recruitment Ltd,company_HeliosX,company_Henderson Drake,company_Hernshead Recruitment,company_Hero Labs,company_Hewett Recruitment,company_Hinduja Global Solutions,company_HireBlazer,"company_Holler Technologies, Inc.",company_HomeSphere,company_Homes England,company_Horizon Air,company_Horniman Museum and Gardens,company_Houndstooth Capital Real Estate,company_Howett Thorpe,company_Hudson Shribman,company_Hunter and Jones,company_Huntress,company_Hylink Digital Solution Limited,company_Hytalentech,company_IBD Registry,company_IC Resources,company_INOV8 Consulting Ltd,company_IPROS Insurance Professionals,company_IPS Group,company_IQPC,company_ITTStar Consulting LLC,company_Impact Proteomics,company_Imperial College Healthcare NHS Trust,company_Imperial College London,company_Incisive Media,company_Indeed,"company_Indigent Legal Services, Office of",company_Inference Solutions,company_Informatiq Consulting,company_Initi8 Recruitment,company_Insights Analytics,company_Inspire People,company_Inspiring Interns,company_IntelliSense Systems Inc.,company_IntelliSense.io,company_Intellidyne Business Systems,company_InterQuest Group,company_InterSTEM Recruitment,company_Interaction Recruitment,company_Intermedia Global Ltd,company_J&C Associates Ltd,company_JDX Consulting,company_JFL Search & Selection,company_JMC Legal Recruitment,company_JP Engineering Recruitment,company_JPE,company_Jackson Rose Ltd,company_Jaguar Land Rover,company_Jenrick Group,company_Jobs in Letting,company_Jonathan Lee Recruitment Ltd,company_Jumpshot,company_K3 Business Technologies,company_KIPP Texas Public Schools,"company_KIZEN Technologies, Inc.",company_Kairos Recruitment Group,company_Kaplan International,company_Karagozian & Case,"company_Kellington Protection Service, LLC",company_Kew Gardens,company_King County,company_King's College London,company_Kisaco Research,company_Kite Group,company_Klanik Corp,company_L&Q,company_L'Oréal,company_LDN Apprenticeships,company_La Fosse Associates,company_Lambeth Council,company_Langley James IT Recruitment,company_Language Matters Recruitment Consultants Ltd.,company_Law Business Research,company_Lawrence Harvey,company_Lead Foot Digital,company_Leading UK Pension Fund,company_Lee College,company_Legal & General Group Plc.,company_Level Agency,company_LevelPrime Limited,company_Liberty Mutual Insurance,company_Lidl,company_Linnk Group Limited,company_Linux Recruit,company_LiveRamp,company_Livingstone Technologies,company_Lloyd Recruitment Services,company_Lloyds Banking Group,company_London School of Hygiene & Tropical Medicine,company_London Tri-Borough Councils,...,ability,able,access,account,accurate,activities,additional,administration,administrative,advanced,agency,agile,ai,algorithms,allows,amounts,analyse,analysing,analysis,analyst,analysts,analytical,analytics,analyze,analyzing,application,applications,applied,apply,architect,architecture,areas,artificial,assist,associate,automation,available,awarded,aws,azure,b,bachelor,background,based,benefits,best,bi,big,bonus,brand,build,building,business,c,candidate,candidates,career,cavity,central,client,clients,closely,closing,cloud,collaboration,collection,colleges,commercial,communication,company,competitive,completion,complex,computer,computing,considered,consultancy,consultant,consulting,content,contributions,control,core,corporate,course,cover,covers,create,creating,crm,cross,current,currently,customer,customers,cutting,cyber,d,data,database,databases,date,day,decisions,deep,degree,degreedegree,degreepostsecondary,deliver,delivering,delivery,dental,department,design,designing,develop,developed,developer,developers,developing,development,devops,different,digital,director,disabilities,disability,doctoral,drive,driven,dynamic,e,edge,education,effectively,employee,employees,employer,employment,encouraged,end,engineer,engineering,engineers,ensure,ensuring,enterprise,environment,equivalent,essential,established,etl,events,excel,excellent,exchanging,exciting,executive,existing,expanding,expenses,experience,experienced,experienceexperience,expert,expertise,exposure,extensive,external,eye,fast,field,finance,financial,focus,focused,following,framework,friendlyindividuals,function,future,g,gain,gathering,global,good,google,graduate,great,group,growing,growth,hadoop,hands,head,health,healthcare,held,help,high,highly,hire,ideal,ideally,identify,identifying,implement,implementation,implementing,improve,include,includes,including,independent,individual,individuals,industry,information,infrastructure,innovative,insight,insights,institute,insurance,insurancea,insurancehealth,insured,integration,intelligence,interacting,interested,internal,international,interpret,interpreted,involved,java,javascript,job,join,junior,k,key,knowledge,language,languages,large,latest,lead,leading,learn,learning,level,life,like,limited,linux,ll,local,location,london,looking,machine,mainly,maintain,maintaining,make,making,manage,management,manager,managing,market,marketing,master,mathematics,media,medical,meet,methods,microsoft,mining,ml,mobile,modeling,modelling,models,monday,monitoring,motivated,multiple,need,needed,needs,network,new,non,offer,office,offpaid,online,open,operates,operational,operations,opportunities,opportunity,order,organisation,organization,package,paid,passionate,patterns,perform,performance,permanent,person,phd,pipelines,plan,planning,platform,platforms,plus,portfolio,position,post,power,practice,practices,predictive,preferred,previous,problem,problems,process,processes,processing,product,production,products,professional,program,programming,project,projects,proven,provide,provider,provides,providing,public,python,pythonan,quality,quantitative,r,range,rapidly,real,reasoning,record,recruiting,recruitment,related,relational,relationships,relevant,report,reporting,reports,require,required,requirements,requires,research,researcher,responsibilities,responsibility,responsible,results,retirement,role,roles,s,salary,sales,savings,scala,scale,schedule,school,science,scientist,scientists,search,sector,security,seeking,senior,serve,server,service,services,sets,similar,skill,skills,skillsexperience,small,social,software,solutions,solving,source,sources,spark,specialist,specialized,specifically,sponsored,sql,sqlprogramming,ssis,ssrs,stack,stakeholders,start,state,statistical,statistics,storage,strategic,strategy,strong,structured,structures,students,study,subject,successful,support,supporting,systems,t,tableau,talented,tasks,tax,team,teams,tech,technical,techniques,technologies,technology,test,testing,time,timean,title,tools,track,training,transformation,travel,trends,type,typically,uk,undergraduate,understand,understanding,universities,use,used,user,users,uses,using,value,variety,various,venturi,verbally,vision,visualisation,visualization,warehouse,warehousing,web,whilst,wide,work,working,world,writing,year,years
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0.261527,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.498139,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22449,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.274611,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.309022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.274611,0.22818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.141085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.269896,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.286668,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.257775,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.305463,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.477894,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.594152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.646994,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.45996,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.160873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.329466,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.334718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.319501,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.300982,0.53485,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.251471
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.324951,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.266149,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.342657,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.190641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2065,0.0,0.0,0.0,0.0,0.0,0.350304,0.0,0.0,0.0,0.0,0.208574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.434188,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.338247,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.311848,0.0,0.0,0.246529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.47932,0.0,0.0,0.0,0.0,0.0,0.0,0.470353,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.510229,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.537296,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [72]:
# Using different models on my dataset:

models = [BaggingClassifier(n_estimators=100),
          RandomForestClassifier(n_estimators=100),
          GradientBoostingClassifier(n_estimators=100,
                                     random_state=1,
                                     validation_fraction=0.1,
                                     max_depth=3,
                                     n_iter_no_change=20)]
params = [{'max_features': np.linspace(0.3,0.4,3)},
          {'max_depth': list(range(2,20,5)),
           'min_samples_split': np.linspace(0.3,0.4,3)},
          {'learning_rate': np.linspace(.1,1.,4)}]

for i in range(len(models)):
    gs = GridSearchCV(models[i], params[i], n_jobs=2, cv=kf, verbose=1)
    gs.fit(X_train, y_train)
    print('Best params:',gs.best_params_)
    print('Cross Val Score:',gs.best_score_)
    predictions = gs.predict(X_test)
    print('Confusion Matrix:')
    print(' ')
    print(confusion_matrix(y_test, predictions))
    print('Classification Report:')
    print(' ')
    print(classification_report(y_test, predictions))

Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  15 out of  15 | elapsed:  1.5min finished


Best params: {'max_features': 0.35}
Cross Val Score: 0.5282246968572135
Confusion Matrix:
 
[[137  32  13   7]
 [ 40  82  43  27]
 [ 27  40  74  45]
 [  8  20  38 117]]
Classification Report:
 
              precision    recall  f1-score   support

        0_25       0.65      0.72      0.68       189
       25_50       0.47      0.43      0.45       192
       50_75       0.44      0.40      0.42       186
      75_100       0.60      0.64      0.62       183

    accuracy                           0.55       750
   macro avg       0.54      0.55      0.54       750
weighted avg       0.54      0.55      0.54       750

Fitting 5 folds for each of 12 candidates, totalling 60 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   18.3s
[Parallel(n_jobs=2)]: Done  60 out of  60 | elapsed:   26.1s finished


Best params: {'max_depth': 17, 'min_samples_split': 0.4}
Cross Val Score: 0.4784370205394704
Confusion Matrix:
 
[[124  43  11  11]
 [ 51  68  26  47]
 [ 26  42  47  71]
 [ 10  24  26 123]]
Classification Report:
 
              precision    recall  f1-score   support

        0_25       0.59      0.66      0.62       189
       25_50       0.38      0.35      0.37       192
       50_75       0.43      0.25      0.32       186
      75_100       0.49      0.67      0.57       183

    accuracy                           0.48       750
   macro avg       0.47      0.48      0.47       750
weighted avg       0.47      0.48      0.47       750

Fitting 5 folds for each of 4 candidates, totalling 20 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  20 out of  20 | elapsed: 27.0min finished


KeyboardInterrupt: 

In [None]:
# Using different models on my dataset:

y = indeed['target_halved']

models = [BaggingClassifier(n_estimators=100),
          RandomForestClassifier(n_estimators=100),
          GradientBoostingClassifier(n_estimators=100,
                                     random_state=1,
                                     validation_fraction=0.1,
                                     max_depth=3,
                                     n_iter_no_change=20)]
params = [{'max_features': np.linspace(0.3,0.4,3)},
          {'max_depth': list(range(2,20,5)),
           'min_samples_split': np.linspace(0.3,0.4,3)},
          {'learning_rate': np.linspace(.1,1.,4)}]

for i in range(len(models)):
    gs = GridSearchCV(models[i], params[i], n_jobs=2, cv=kf, verbose=1)
    gs.fit(X_train, y_train)
    print('Best params:',gs.best_params_)
    print('Cross Val Score:',gs.best_score_)
    predictions = gs.predict(X_test)
    print('Confusion Matrix:')
    print(' ')
    print(confusion_matrix(y_test, predictions))
    print('Classification Report:')
    print(' ')
    print(classification_report(y_test, predictions))

In [None]:
# from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score

# # For class 1, find the area under the curve
# fpr, tpr, threshold = roc_curve(y_train, Y_pp.class_1_pp)
# roc_auc = auc(fpr, tpr)

# # Plot of a ROC curve for class 1
# plt.figure(figsize=[6, 6])
# plt.plot(fpr, tpr, label='ROC curve' % roc_auc)
# plt.plot([0, 1], [0, 1], 'k--')
# plt.xlim([-0.05, 1.0])
# plt.ylim([-0.05, 1.05])
# plt.xlabel('False Positive Rate', fontsize=18)
# plt.ylabel('True Positive Rate', fontsize=18)
# plt.title('ROC curve', fontsize=18)
# plt.legend(loc="lower right")
# plt.show()

### Model evaluation:

Your boss would rather tell a client incorrectly that they would get a lower salary job than tell a client incorrectly that they would get a high salary job. Adjust one of your models to ease his mind, and explain what it is doing and any tradeoffs.


- Use cross-validation to evaluate your models.
- Evaluate the accuracy, AUC, precision and recall of the models.
- Plot the ROC and precision-recall curves for at least one of your models.

In [None]:
## YOUR CODE HERE: ie, take your strongest model to achieve his aims, and chart it as required.

<img src="http://imgur.com/xDpSobf.png" style="float: left; margin: 25px 15px 0px 0px; height: 25px">

### Bonus:

- Answer the salary discussion by using your model to explain the tradeoffs between detecting high vs low salary positions. 
- Discuss the differences and explain when you want a high-recall or a high-precision model in this scenario.
- Obtain the ROC/precision-recall curves for the different models you studied (at least the tuned model of each category) and compare.

In [None]:
## YOUR CODE HERE

### Summarize your results in an executive summary written for a non-technical audience.
   
- Writeups should be at least 500-1000 words, defining any technical terms, explaining your approach, as well as any risks and limitations.

In [None]:
## YOUR TEXT HERE IN MARKDOWN FORMAT 

<img src="http://imgur.com/xDpSobf.png" style="float: left; margin: 25px 15px 0px 0px; height: 25px">

### BONUS

Convert your executive summary into a public blog post of at least 500 words, in which you document your approach in a tutorial for other aspiring data scientists. Link to this in your notebook.

In [None]:
## YOUR LINK HERE IN MARKDOWN FORMAT 