# Data Science jobs in Ireland
This project scrapes current Data Science job listings off [IrishJobs.ie](https://www.irishjobs.ie/) and [NIjobs.com](https://www.nijobs.com/). The data is stored as a data frame and is cleaned for further analysis.

In [1]:
import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
pd.set_option('display.max_colwidth', -1)
import numpy as np
import time

This code can be used to search for any job, but this project focuses on Data Scientist jobs.

In [2]:
job_title = 'data scientist'

### Web Scraping

In [3]:
job_title = job_title.replace(' ', '+')
urls = ['https://www.irishjobs.ie/ShowResults.aspx?Keywords='+job_title+'&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company%2cAgency&btnSubmit=+&PerPage=100',
        'https://www.irishjobs.ie/ShowResults.aspx?Keywords='+job_title+'&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company%2cAgency&btnSubmit=+&PerPage=100&Page=2',
        'https://www.nijobs.com/ShowResults.aspx?Keywords='+job_title+'&Location=&Category=&Recruiter=Company%2cAgency&btnSubmit=+&PerPage=100']

In [4]:
titles = []
companies = []
salaries = []
dates = []
locations = []
descriptions = []
websites = []
links = []

The following code grabs the desired information from each job posting, adding it to the relevant list.
It does this for each URL in the urls list.

The time.sleep() function pauses the program for 5 seconds to limit the chance of crashing or getting blacklisted on either website.

A try and except block handles any errors with companies, as some job postings did not contain a company. This could be done for each column, but it was not required in this instance.

In [5]:
for url in urls:
    page = requests.get(url)
    page.raise_for_status()
    soup = bs4.BeautifulSoup(page.text, 'html.parser')
    time.sleep(5)
    listings = soup.find_all('div', {'class':'module job-result'})
    
    for job in listings:
    
        title = job.find('div', {'class':'job-result-title'}).find('h2').text
        titles.append(title)

        try:
            company = job.find('div', {'class':'job-result-title'}).find('h3').text
            companies.append(company)
        except:
            companies.append(np.NaN)

        salary = job.find('li', {'class':'salary'}).text
        salaries.append(salary)

        date = job.find('li', {'class':'updated-time'}).text
        dates.append(date)

        location = job.find('li', {'class':'location'}).text
        locations.append(location)

        description = job.find('p', {'style':'display: '}).text
        descriptions.append(description)
        
        if url in urls[:2]:
            website = 'Irish Jobs'
            websites.append(website)
        else:
            website = 'NI Jobs'
            websites.append(website)
            
        link = job.find('a', {'class':'show-more'})['href']
        links.append(link)

The lists are used to create the data frame

In [6]:
df = pd.DataFrame({
        "Title": titles, 
        "Company": companies, 
        "Salary": salaries, 
        "Date": dates, 
        "Location": locations, 
        "Description":descriptions,
        "Website":websites,
        "Link":links
    })

In [7]:
df.sample(10)

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link
83,Senior Data Analyst\n,\nProsperity Recruitment\n,65000 - 75000,Updated 22/02/2019,\nDublin City Centre\n,\nA leading performance marketing agency are currently seeking a Senior Analyst to join their Data Science Team in the heart of Dublin City.\n,Irish Jobs,/Jobs/Senior-Data-Analyst-8285703.aspx
53,Data Engineer\n,\nRecruiters – Looking For You\n,40000 - 50000,Updated 27/02/2019,\nDublin\n,"\nhis is an opportunity for an experienced coder or software engineer to design, architect, implement, and maintain tools and systems to help the GTI team to detect fraud and collect data from internal data sources and open-sourced intelligence. Calling all Data Engineers!\n",Irish Jobs,/Jobs/Data-Engineer-8288162.aspx
56,Data Engineer\n,\nCpl Resources\n,Negotiable,Updated 27/02/2019,\nTipperary\n / Clare\n / Limerick\n,"\nI am recruiting a Data Engineer for my client in Nenagh - We are looking for an experienced data engineer with a minimum of 3 years experience in industry. To be comfortable in working with large data sets, and have enterprise experience in developing end to end ETL pipelines. Must have 3+ years experience working as a data engineer at an enterprise level.\n",Irish Jobs,/Jobs/Data-Engineer-8294174.aspx
8,Data Analyst\n,\nEden Recruitment Ltd.\n,Negotiable,Updated 03/03/2019,\nDublin\n / Dublin City Centre\n / Dublin South\n,"\nPosess the ability to find trends in data sets and interpret them to offer business insights - We have a requirement for a Reporting Analyst to join our Global IT client. We are looking for a Reporting Analyst to join our client, a Global tech company who has been ranked one of the top performers and fastest growing organisations in the world.\n",Irish Jobs,/Jobs/Data-Analyst-8295766.aspx
5,Data Scientist Contract (Great Day Rate)\n,\nReperio Human Capital Ltd\n,100000 - or more,Updated 16/02/2019,\nDublin\n,"\nWe are looking for a Data Scientist who will support our product, sales, leadership and marketing teams with insights gained from analysing company data. Our leading Telecommunication Services client is actively recruiting for an experienced Data Scientist to join their Business Analytics team based in Dublin for a 6-12 Month Contract.\n",Irish Jobs,/Jobs/Data-Scientist-Contract-Great-Day-8283415.aspx
2,Data Scientist\n,\nNext Generation\n,55000 - 65000,Updated 22/02/2019,\nDublin City Centre\n,"\nAre you a talented programmer that is looking to pivot into Data Science? You have an IT/Computer Science background and have recently started to learn how your skills may apply to Data Science. * Computational MSc (Computer Science, IT, Data Analytics).\n",Irish Jobs,/Jobs/Data-Scientist-8292817.aspx
52,Data Engineer\n,\nComputer Futures\n,Negotiable,Updated 27/02/2019,\nDublin City Centre\n / Dublin South\n / Dublin\n,"\nYou will be working closely with the Data Scientist's to help build predictive models, build API's, mash up data sets and adding value to them. I am on the lookout for a Data Engineer for my client, who are based in heart of Dublin City Centre.\n",Irish Jobs,/Jobs/Data-Engineer-8287643.aspx
177,\nData Engineer (Microsoft BI Stack)\n,\nCPL\n,Negotiable,Updated 19/02/2019,\nBelfast\n,"\nYou'll be at the cutting edge of innovation, using best in class technologies and creative problem-solving techniques to help clients’ manage their data issues.\n",NI Jobs,/Data-Engineer-Microsoft-BI-Stack-Job-1390029.aspx
34,Temporary Business Intelligence Developer\n,\nPramerica Limited\n,Not disclosed,Updated 28/02/2019,\nDonegal\n,\nPramerica are looking for an established hands on Business Intelligence Developer who has Data Visualization experience and some development experience. * Proficient in Power BI (other industry leading Data Visualization technologies would be considered) Overview\n,Irish Jobs,/Jobs/Temporary-Business-Intelligence-Developer-8294798.aspx
164,\nData Engineer (Belfast – R/Python/AWS)\n,\nRealTime Recruitment\n,Negotiable,Updated 23/02/2019,\nBelfast\n,\nData Engineer| Competitive Salary | 2 days WFH | MedTech | R/Python/AWS | On-The-Job Training | Excellent Holidays | Private Medical &Bonus | Why should I be interested in this role as a Data Engineer? What will I be doing as a Data Engineer? What do I need to secure this Data Engineer role? * Some experience with Data Science or a keen interest\n,NI Jobs,/Data-Engineer-Belfast-R-Python-Job-1391392.aspx


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184 entries, 0 to 183
Data columns (total 8 columns):
Title          184 non-null object
Company        183 non-null object
Salary         184 non-null object
Date           184 non-null object
Location       184 non-null object
Description    184 non-null object
Website        184 non-null object
Link           184 non-null object
dtypes: object(8)
memory usage: 5.8+ KB


In [9]:
df.describe()

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link
count,184,183,184,184,184,184,184,184
unique,155,87,26,16,58,179,2,184
top,Data Analyst\n,\nReperio Human Capital Ltd\n,Negotiable,Updated 26/02/2019,\nDublin\n,"\nUse internal and external tools to scrub raw lead data and have it prepared for use by the internal Marketing Teams. Support marketing by mapping out companies in countries based on turnover, advertising spend and other data. * Experience using SalesForce, Google Suite, and LinkedIn to gather and report data\n",Irish Jobs,/Jobs/Biostatistician-Clinical-Assessment-8285698.aspx
freq,7,11,48,28,51,2,129,1


In [10]:
df.sample(10)

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link
56,Data Engineer\n,\nCpl Resources\n,Negotiable,Updated 27/02/2019,\nTipperary\n / Clare\n / Limerick\n,"\nI am recruiting a Data Engineer for my client in Nenagh - We are looking for an experienced data engineer with a minimum of 3 years experience in industry. To be comfortable in working with large data sets, and have enterprise experience in developing end to end ETL pipelines. Must have 3+ years experience working as a data engineer at an enterprise level.\n",Irish Jobs,/Jobs/Data-Engineer-8294174.aspx
0,Data Scientist - Dublin\n,\nReperio Human Capital Ltd\n,60000 - 70000,Updated 01/03/2019,\nDublin\n,"\nMy client, an exciting financial services company are currently expanding their data science team. My client, an exciting financial services company, are currently hoping to expand their data science team with a talented and ambitious individual to join their team.\n",Irish Jobs,/Jobs/Data-Scientist-Dublin-8289327.aspx
36,Data Marketing Analyst (Spanish & Italian)\n,\nRecruiters – Looking For You\n,25000 - 35000,Updated 27/02/2019,\nDublin City Centre\n,"\nUse internal and external tools to scrub raw lead data and have it prepared for use by the internal Marketing Teams. Support marketing by mapping out companies in countries based on turnover, advertising spend and other data. * Experience using SalesForce, Google Suite, and LinkedIn to gather and report data\n",Irish Jobs,/Jobs/Data-Marketing-Analyst-Spanish-Italian-8287998.aspx
63,Oracle Analyst and Developer\n,\nDunnes Stores\n,See description,Updated 25/02/2019,\nDublin\n,"\nDunnes Stores is Ireland’s leading retailer positioned at the cutting edge of food and fashion retailing. We have over 130 stores in Ireland, Northern Ireland and Spain and are expanding our service and product offer all the time.\n",Irish Jobs,/Jobs/Oracle-Analyst-and-Developer-8283942.aspx
59,Technical Development Analyst (Fixed Term Contract\n,\nMylan\n,Not disclosed,Updated 26/02/2019,\nDublin North\n,"\nFor Us, It's a Mission At Mylan, we mean it when we say we work every day to provide access to high quality medicines to the world's 7 billion people. If you are unconventional, relentless and pass...\n",Irish Jobs,/Jobs/Technical-Development-Analyst-Fixed-Term-8291147.aspx
104,Adobe Analytics Consultant\n,\n@Recruit IT\n,See description,Updated 20/02/2019,\nDublin\n,\nGood Data Analysis skills and Behaviour Analysis Skills - We have an opportunity with a Consulting client for an experienced Adobe Analytics implementation and configuration consultants My client are a leading analytics firm who are looking for a client facing resource for a client facing consultancy role - Expert knowledge on Adobe Analytics suite\n,Irish Jobs,/Jobs/Adobe-Analytics-Consultant-8291514.aspx
136,\nData Scientist ( Java )\n,\nVANRATH IT\n,£60000 - 70000,Updated 19/02/2019,\nBelfast\n / Northern Ireland\n,\nMy client is seeking an experienced Statistical Senior Analyst or Data Scientist to join their expanding Data Science function.\n,NI Jobs,/Data-Scientist-Java-Job-1393731.aspx
145,\nSenior Data Analyst (Software Management)\n,\nHAYS Specialist Recruitment\n,£30000 - 40000,Updated 01/03/2019,\nNorthern Ireland\n,\nWant to work in a data analyst role that will offer you more than the rest and hone business skills?\n,NI Jobs,/Senior-Data-Analyst-Software-Management-Job-1394873.aspx
84,Medical Compliance Reporting Specialist - Galway\n,\nPE Global\n,Negotiable,Updated 22/02/2019,\nGalway\n / Nationwide\n,\nThe successful candidate will be part of the Complaint Handling and Medical Device Regulatory reporting department within a US Multinational organisation based in Galway 12 month contract\n,Irish Jobs,/Jobs/Medical-Compliance-Reporting-Specialist-Galway-8285682.aspx
163,\nPrincipal Statistician\n,\nAbacus Careers\n,Not disclosed,Updated 22/02/2019,\nBelfast\n,\nWork with clinical researchers & scientists who work in the life science remit. This firm has established itself as leaders in the Life Science Analytics market and have produced a leading cloud-based software package that is vital to scientists & researchers.\n,NI Jobs,/Principal-Statistician-Job-1394392.aspx


### Data Cleaning
From looking at the data frame, the most noticeable issue is the '\n' that was scraped as part of the strings. These can easily be removed, along with any whitespace using the following code:

In [11]:
for column in df.columns:
    df[column] = df[column].str.strip()
    df[column] = df[column].str.replace('\n', '')
    df[column] = df[column].str.replace('\s+', ' ', regex=True)

In [12]:
df.head()

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link
0,Data Scientist - Dublin,Reperio Human Capital Ltd,60000 - 70000,Updated 01/03/2019,Dublin,"My client, an exciting financial services company are currently expanding their data science team. My client, an exciting financial services company, are currently hoping to expand their data science team with a talented and ambitious individual to join their team.",Irish Jobs,/Jobs/Data-Scientist-Dublin-8289327.aspx
1,"Data Scientist – Customer, Ops & Trading",Paragon Executive Intelligence,60000 - 70000,Updated 26/02/2019,Dublin,"Interesting opportunity to become part of a large, multi-disciplinary Data Science capability working hand in hand with the DWH and BI teams to ensure that Data is supporting all aspects of the organisation.",Irish Jobs,/Jobs/Data-Scientist-Customer-Ops-Trading-8287551.aspx
2,Data Scientist,Next Generation,55000 - 65000,Updated 22/02/2019,Dublin City Centre,"Are you a talented programmer that is looking to pivot into Data Science? You have an IT/Computer Science background and have recently started to learn how your skills may apply to Data Science. * Computational MSc (Computer Science, IT, Data Analytics).",Irish Jobs,/Jobs/Data-Scientist-8292817.aspx
3,Data Scientist,Collins McNicholas Recruitment & HR Services Group,45000 - 55000,Updated 20/02/2019,Cork / Cork city / Nationwide,A Cork-based real-time data analytics software company are looking for a creative and innovative data scientist to help provide our greater insights to their business,Irish Jobs,/Jobs/Data-Scientist-8284510.aspx
4,Data Scientist,Computer Futures,See description,Updated 19/02/2019,Dublin / Dublin North / Dublin South,I'm on the lookout for a Data Scientist to join a very exciting client of mine based in Dublin. As a Data Scientist will be joining a company that apply machine learning/Deep Learning techniques in the Digital Healthcare domain.,Irish Jobs,/Jobs/Data-Scientist-8283728.aspx


Check for, and remove any duplicates and check for any null values. As this data was scraped from job search webpages, I wouldn't expect there to be many duplciate or null values.

In [13]:
df[df.duplicated()]

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link


In [14]:
df.drop_duplicates(inplace=True)

Check for any null values

In [15]:
df.isnull().sum()

Title          0
Company        1
Salary         0
Date           0
Location       0
Description    0
Website        0
Link           0
dtype: int64

In [16]:
df[df['Company'].isnull()]

Unnamed: 0,Title,Company,Salary,Date,Location,Description,Website,Link
58,Reporting Analyst,,Negotiable,Updated 26/02/2019,Dublin,"The Reporting Analyst will design and develop metrics, reports and analyses to drive key business decisions, providing a crucial link between raw enterprise data and management. This role is crucial to sound decision-making based on data and evidence. Gather and Input Data (approx. 25% time)",Irish Jobs,/Jobs/Reporting-Analyst-8287740.aspx


The only null value is from the company column. The user that uploaded the job must have made an error and not entered the company.

The try and except block in the for loop handled this and stopped the program from crashing. 

##### Title & Company
The Title and Company columns can be checked for any issues. 

The Title column could be cleaned to only include the job title (e.g. *Data Scientist* cleaned from *Senior Data Scientist*), however I decided to leave these columns as I didn't want to unnecessarily remove any information.

In [17]:
df['Title'].value_counts()

Data Analyst                                                        15
Data Engineer                                                       7 
Data Scientist                                                      6 
Business Intelligence Analyst                                       5 
Data Manager                                                        3 
Senior Data Scientist                                               3 
Data Analyst-Personal Lines                                         2 
Reporting Analyst                                                   2 
Senior Data Analyst (Software Management)                           2 
BI Technical Support Analyst                                        2 
SAS Data Analyst (All Levels)                                       2 
Biostatistician - Regulatory Body                                   1 
Java Application Developer's                                        1 
Enterprise Risk Reporting Specialist                                1 
Gradua

In [18]:
df['Company'].value_counts().head()

Reperio Human Capital Ltd    11
VANRATH IT                   8 
RealTime Recruitment         7 
Cpl Resources                7 
MCS Group                    7 
Name: Company, dtype: int64

##### Salary
As the data frame combines jobs in NI and Ireland, there are two different currencies used in the Salary column. Any central tendency calculations of the salary would be incorrect as there are two currencies used. 
To be able to differentiate between the currencies, I created a new Currency column and then removed any currency symbols from the Salary column.

From experience of using job finder websites, when a Salary is *Negotiable* it usually does not contain any information about the salary range. Therefore, I grouped these with *Not disclosed* as most salaries are negotiable.

In [19]:
df['Currency'] = df['Website'].map(lambda x: '£' if 'NI Jobs' in x else '€' if 'Irish Jobs' in x else '')

In [20]:
df['Salary'] = df['Salary'].str.replace(r'\£', '')
df['Salary'] = df['Salary'].str.replace(r'\€', '')
df['Salary'] = df['Salary'].str.replace('Negotiable', 'Not disclosed')

In [21]:
df['Salary'].value_counts()

Not disclosed       82
See description     27
50000 - 60000       9 
20000 - 30000       9 
40000 - 50000       8 
30000 - 40000       8 
60000 - 70000       7 
25000 - 35000       7 
35000 - 45000       5 
90000 - 100000      5 
45000 - 55000       5 
55000 - 65000       4 
65000 - 75000       3 
100000 - or more    2 
70000 - 80000       2 
45000 - 50000       1 
Name: Salary, dtype: int64

#### Date
As the date column may be used in future analysis, the data needs converted from a string into time series.

In [22]:
df['Date'] = df['Date'].str.replace('Updated', '')
df['Date'] = pd.to_datetime(df['Date'])

#### Location
With different recruitment consultants uploading the jobs, there are many occurrences of the major cities being named differently e.g. *Dublin City Centre*, *Dublin South* etc. as seen below. 

In [23]:
df['Location'].value_counts()

Dublin                                                   51
Belfast                                                  22
Dublin City Centre                                       14
Belfast / Northern Ireland                               7 
Limerick                                                 6 
Dublin / Dublin City Centre                              5 
Dublin North                                             4 
Belfast / Antrim / Northern Ireland                      4 
Ulster                                                   4 
Dublin / Dublin City Centre / Dublin South               4 
Northern Ireland                                         3 
Dublin South                                             3 
Belfast / Derry                                          2 
Dublin / Dublin West                                     2 
Dublin / Dublin South                                    2 
Armagh                                                   2 
UK                                      

If this data is to be analysed based on location, these instances need to be grouped together. I decided to rename any location that contained 'Dublin' (e.g. Dublin City Centre) to just 'Dublin', and used the same method for Belfast and Cork. This greatly reduced the number of different locations.

In [24]:
df['Location'] = df['Location'].str.replace(r'(^.*Dublin.*$)', 'Dublin')
df['Location'] = df['Location'].str.replace(r'(^.*Belfast.*$)', 'Belfast')
df['Location'] = df['Location'].str.replace(r'(^.*Cork.*$)', 'Cork')

In [25]:
df['Location'].value_counts()

Dublin                                        99
Belfast                                       39
Limerick                                      6 
Cork                                          5 
Ulster                                        4 
Northern Ireland                              3 
Donegal                                       2 
Armagh                                        2 
UK                                            2 
Mayo / Sligo                                  1 
Westmeath                                     1 
Tipperary / Clare / Limerick                  1 
Westmeath / Galway / Kildare                  1 
Tipperary / Limerick city / Nationwide        1 
Galway / Nationwide                           1 
Wexford / Kilkenny                            1 
Nationwide                                    1 
Cavan / Meath                                 1 
Dungannon Area / Tyrone / Northern Ireland    1 
Craigavon/Lurgan Area                         1 
Ballymoney Area / No

#### Link
As the scraped HREF URL only contains the path, the host name has to be concatenated to the path. This allows the user to copy and paste the link into the browser. 

In [26]:
def urlmaker(row):
    if row['Website'] == 'NI Jobs':
        return 'https://www.nijobs.com' + row['Link']
    else:
        return 'https://www.irishjobs.ie' + row['Link']

In [29]:
df['Link'] = df.apply(lambda row: urlmaker(row),axis=1)
df.head()

Unnamed: 0,Title,Company,Currency,Salary,Date,Location,Description,Website,Link
0,Data Scientist - Dublin,Reperio Human Capital Ltd,€,60000 - 70000,2019-01-03,Dublin,"My client, an exciting financial services company are currently expanding their data science team. My client, an exciting financial services company, are currently hoping to expand their data science team with a talented and ambitious individual to join their team.",Irish Jobs,https://www.irishjobs.iehttps://www.irishjobs.ie/Jobs/Data-Scientist-Dublin-8289327.aspx
1,"Data Scientist – Customer, Ops & Trading",Paragon Executive Intelligence,€,60000 - 70000,2019-02-26,Dublin,"Interesting opportunity to become part of a large, multi-disciplinary Data Science capability working hand in hand with the DWH and BI teams to ensure that Data is supporting all aspects of the organisation.",Irish Jobs,https://www.irishjobs.iehttps://www.irishjobs.ie/Jobs/Data-Scientist-Customer-Ops-Trading-8287551.aspx
2,Data Scientist,Next Generation,€,55000 - 65000,2019-02-22,Dublin,"Are you a talented programmer that is looking to pivot into Data Science? You have an IT/Computer Science background and have recently started to learn how your skills may apply to Data Science. * Computational MSc (Computer Science, IT, Data Analytics).",Irish Jobs,https://www.irishjobs.iehttps://www.irishjobs.ie/Jobs/Data-Scientist-8292817.aspx
3,Data Scientist,Collins McNicholas Recruitment & HR Services Group,€,45000 - 55000,2019-02-20,Cork,A Cork-based real-time data analytics software company are looking for a creative and innovative data scientist to help provide our greater insights to their business,Irish Jobs,https://www.irishjobs.iehttps://www.irishjobs.ie/Jobs/Data-Scientist-8284510.aspx
4,Data Scientist,Computer Futures,€,See description,2019-02-19,Dublin,I'm on the lookout for a Data Scientist to join a very exciting client of mine based in Dublin. As a Data Scientist will be joining a company that apply machine learning/Deep Learning techniques in the Digital Healthcare domain.,Irish Jobs,https://www.irishjobs.iehttps://www.irishjobs.ie/Jobs/Data-Scientist-8283728.aspx


Lastly, the columns can be rearranged, and rows sorted by date.
The data frame is saved as a csv file.

In [28]:
cols = df.columns.tolist()
cols = ['Title',
 'Company',
 'Currency',
 'Salary',
 'Date',
 'Location',
 'Description',
 'Website',
 'Link']
df = df[cols]
df.sort_values(by='Date').head(10)
df.to_csv(job_title + ' jobs.csv')