<a href="https://colab.research.google.com/github/EduHdzVillasana/Technical-Test-Torre/blob/main/Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Technical Test Torre
----
*Eduardo Alan Hernandez Villasana*

## Data Extraction

The data was extracted from [Data World](https://data.world/promptcloud/50000-job-board-records-from-reed-uk).

Reed is one of the top employment agency based in the United Kingdom. This data set contains 50000 records of latest job postings on Reed UK.

This data was extracted on March 13th 2018 and contains job postings from last 15 days. Following data fields are included in the dataset:

* category
* city
* state
* company name
* job title
* job description
* job requirement
* job type
* salary offered
* posting date

In [9]:
# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [10]:
data_url = "https://query.data.world/s/rut5pr5nm5i4onzlfp6xjgcsw3rvu4"
df_raw = pd.read_csv(data_url)

## Data Exploration

In [11]:
df_raw.sample(5)

Unnamed: 0,category,city,company_name,geo,job_board,job_description,job_requirements,job_title,job_type,post_date,salary_offered,state
29388,marketing jobs,Cardiff,Signet Resources,uk,reed,Apply now Exciting digital marketing opportun...,Required skills CRM Digital Marketing Marketi...,Digital Marketing E-Mail/Automation Manager,"Permanent, full-time",3/11/2018,"£30,000 - £35,000 per annum, negotiable, inc ...",South Glamorgan
14027,engineering jobs,North London,KBA Recruitment,uk,reed,Apply now Global Technology Company are recru...,Required skills ATMs Customer Service Enginee...,ATM Field Service Engineer - North London,"Permanent, full-time",3/8/2018,"£26,700 - £35,000 per annum",London
8758,logistics jobs,Huthwaite,Senior Salmon,uk,reed,Apply now We are looking on behalf of our cli...,,Van Driver,"Contract, full-time",3/9/2018,£7.50 - £8.21 per hour,Nottinghamshire
46384,banking jobs,City Of London,LMA,uk,reed,Apply now This is an exciting opportunity to ...,,Operational Risk Manager,"Contract, full-time",3/5/2018,"£40,000 per annum",London
40678,admin secretarial pa jobs,Lichfield,Aprico Limited,uk,reed,Apply now We are looking for a dynamic indivi...,Required skills Administrative Booking Custom...,Administrator / Move Coordinator,"Permanent, full-time",3/7/2018,Salary negotiable,Staffordshire


In [12]:
df_raw.dtypes

category            object
city                object
company_name        object
geo                 object
job_board           object
job_description     object
job_requirements    object
job_title           object
job_type            object
post_date           object
salary_offered      object
state               object
dtype: object

In [13]:
df_raw.isnull().sum()

category                0
city                    0
company_name            0
geo                     0
job_board               0
job_description         0
job_requirements    29452
job_title               0
job_type                0
post_date               0
salary_offered          0
state                  20
dtype: int64

In [14]:
df_raw.shape

(50000, 12)

In [15]:
df_raw.describe()

Unnamed: 0,category,city,company_name,geo,job_board,job_description,job_requirements,job_title,job_type,post_date,salary_offered,state
count,50000,50000,50000,50000,50000,50000,20548,50000,50000,50000,50000,49980
unique,37,2918,5166,1,1,42057,14887,29155,9,66,7345,167
top,health jobs,London,Hays Specialist Recruitment Limited,uk,reed,Apply on employer's website Add an annual tur...,Required skills Recruitment,Administrator,"Permanent, full-time",3/7/2018,Salary negotiable,London
freq,1930,4349,1830,50000,50000,85,123,162,36864,8472,4539,5900


In [52]:
df_raw["city"].sort_values().unique()[[937,1012]]

array(['FRANKFURT', 'Frankfurt'], dtype=object)

In [62]:
df_raw["job_type"].unique()

array(['Permanent, full-time', 'Permanent, full-time or part-time',
       'Permanent, part-time', 'Contract, full-time',
       'Temporary, part-time', 'Temporary, full-time or part-time',
       'Temporary, full-time', 'Contract, full-time or part-time',
       'Contract, part-time'], dtype=object)

## Data Cleaning
* The `geo` and `job_board` columns will be dropped because they have only one unique value in all rows.
* Transform to lower case the cities and states because some cities or states are repeated but some with uper case or lower case.
* Get kew words in `job_requirements` to get a list of requirements.

### Dropping unnecessary columns

In [53]:
df = df_raw.drop(columns = ["geo","job_board"], axis = 1)

### Transforming string columns to lower case.

In [59]:
df["city"] = df["city"].str.lower()
df["state"] = df["state"].str.lower()
df["job_requirements"] = df["job_requirements"].str.lower()
df["job_title"] = df["job_title"].str.lower()
df["company_name"] = df["company_name"].str.lower()
df["category"] = df["category"].str.lower()

In [70]:
len(df[df["salary_offered"] == " Salary not specified "])

317

In [71]:
df["salary_offered"] = df["salary_offered"].str.strip()

In [73]:
len(df[df["salary_offered"] == "Salary not specified"])

317

### Getting key words of `job_requirements` 

In [74]:
df["job_requirements"].sample(3)

19511     required skills hardworking communicative eng...
37498                                                  NaN
15284     required skills administrative calls communic...
Name: job_requirements, dtype: object

"required skills* is repeated in all non NaN rows

In [None]:
df["job_requirements"] = df["job_requirements"].str.re