# Web scraping the communitycare jobs site 

Using this tutorial https://www.dataquest.io/blog/web-scraping-beautifulsoup/ 

This is to scrape MULTIPLE pages 

In [19]:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
from time import sleep
from time import time 
from random import randint
from warnings import warn
from IPython.core.display import clear_output



Import get from the requests libraries, then put the URL in a variable called url, then use the get method on url to grab the data, then print the first 500 characters to check it worked. 
response is a type of object called a Response object. 

In [4]:


url = 'https://jobs.communitycare.co.uk/searchjobs/?Keywords=&radialtown=&LocationId=&RadialLocation=5'

response = get(url)

print(response.text[:500])

<!DOCTYPE html>
	<html data-placeholder-focus="false" lang="en-GB">
	<head>
		<!-- SiteScope -->
	    <meta charset="utf-8">
	    <meta http-equiv="X-UA-Compatible" content="IE=edge">
		
		<title>Search for a Social Care Job | Community Care Jobs</title>
		
			
				<link rel="stylesheet" href="//jobs.communitycare.co.uk/assets/dist/css/package.css;p=master,branding;v=7de15f9dc396b7ff563342f666351c2c" type="text/css">
			
			
			
		
		<!--[if lt IE 10]>
			<link rel="stylesheet" h


## Import beautiful soup and parse 

Then use beautiful soup and the html parser to parse the content of the web page, and then check what type of object it is (it is a BeautifulSoup object). 

In [5]:
from bs4 import BeautifulSoup

html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)

bs4.BeautifulSoup

## Work out what distinguishes each item (job) and use find_all() to grab them in one object

In the browser, work out what tag or class distinguishes each element, by using 'view source' or developer tools.

Then check the type of object (it should say bs4.elementbe a ResultSet)

Then check the length of the object (the length should match the number of items on the page) 



In [6]:
job_containers = html_soup.find_all('li', class_ = 'lister__item')
print(type(job_containers))
print(len(job_containers))

<class 'bs4.element.ResultSet'>
20


## Work out EXACTLY which bits of data to grab

E.g, job title, salary, region, description 

## Work out the code needed to drill down to the specific bits of data

To do this put the first item of the ResultSet into a variable, so that I can just use one item to drill down on 


In [7]:
first_job = job_containers[0]

Then use beautiful soup to drill down. There are different ways of doing it... 

-- If the bit of text you want is inside the first instance of a tag (e.g. the first h3 tag) you can use dot notation, e.g. first_job.h3, if it is further in you can chain them, e.g. first_job.h3.a (but this only works if all the tags are the first instances of them, i.e. if you want the text in the second h3 it won't work) 

-- If you want to find the first instance of a tag with a specific attribute you can use find(), e.g. first_job.find('div', class_ = 'lister_item'). Another way to do the same thing is putting the attribute name and value as a dictionary, e.g. first_job.find('div', attrs = {'class' = 'lister_item'})

-- If you want to find the second, or third (or fourth etc) instance of a tag you can use find_all() and give it a number, e.g. first_job.find(h3)[1] 

-- If you want to grab the value of the attribute instead of the text, e.g. if you want a link, then cam put the name of the attribute in brackets, e.g. first_job.find('a')['href']


In [8]:
print(first_job)

<li class="lister__item cf lister__item--upsell lister__item--has-ribbon brand-highlight lister__item--premium-job lister__item--promoted-job lister__item--premium-job lister__item--promoted-job" id="item-1401610100">
<div class="lister__details cf js-clickable">
<h3 class="lister__header"><a class="js-clickable-area-link" href=" 
	/job/1401610100/supervising-social-worker/?LinkSource=PremiumListing



"><span>Supervising Social Worker</span></a></h3>
<img alt="Orange Grove Fostercare logo" class="lister__logo rec-logo float-right one-quarter portable-two-fifths palm-two-fifths" src="//jobs.communitycare.co.uk/getasset/0e60ec28-5862-499e-8313-cec96f78ad2c/"/>
<ul class="lister__meta">
<li class="lister__meta-item lister__meta-item--location">Stoke-on-Trent, Staffordshire</li>
<li class="lister__meta-item lister__meta-item--salary">£30,000 - £37,000, £3,000 Car Allowance</li>
<li class="lister__meta-item lister__meta-item--recruiter">Orange Grove Fostercare</li>
</ul>
<p class="lis

In [9]:
job_title = first_job.span.text
print(job_title)

Supervising Social Worker


In [10]:
job_area = first_job.find('li', class_ = 'lister__meta-item--location').text
print(job_area)

Stoke-on-Trent, Staffordshire


In [11]:
job_salary = first_job.find('li', class_ = 'lister__meta-item--salary').text
print(job_salary)


£30,000 - £37,000, £3,000 Car Allowance


In [12]:
job_description = first_job.find('p', class_ = 'lister__description').text
print(job_description)

We are looking for a committed social care professional with a sound knowledge of fostering legislation and practice issues.


In [13]:
job_link = first_job.find('a')['href']
print(job_link)

 
	/job/1401610100/supervising-social-worker/?LinkSource=PremiumListing






## Putting it together

Note the 's' at the end of some to make the variables different 

In [14]:
job_title = []
job_area = []
job_salary = []
job_description = []
job_link = []

for jobs in job_containers: 
    
    # The job title
    job_titles = jobs.span.text
    job_title.append(job_titles)
    
    # The area 
    job_areas = jobs.find('li', class_ = 'lister__meta-item--location').text
    job_area.append(job_areas)
    
    # The salary
    job_salaries = jobs.find('li', class_ = 'lister__meta-item--salary').text
    job_salary.append(job_salaries)
    
    # The job description
    job_descriptions = jobs.find('p', class_ = 'lister__description').text 
    job_description.append(job_descriptions)
    
    # The link
    job_links = jobs.find('a')['href']
    job_link.append(job_links)
    
    
    
    
    

## Use pandas to test it worked 

In [15]:
test_df = pd.DataFrame({'job': job_title,
                       'area': job_area,
                        'salary': job_salary,
                        'description': job_description,
                        'link': job_link
                       })

test_df

Unnamed: 0,area,description,job,link,salary
0,"Stoke-on-Trent, Staffordshire",We are looking for a committed social care pro...,Supervising Social Worker,\r\n\t/job/1401610100/supervising-social-work...,"£30,000 - £37,000, £3,000 Car Allowance"
1,"Walsall, West Midlands",Do you love going the extra mile for people to...,Assistant Care Manager,\r\n\t/job/1401609580/assistant-care-manager/...,"£21,500 per annum"
2,"Haverhill, Suffolk",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609455/care-assistant-support-...,£8.96 per hour
3,"Cirencester, Gloucestershire",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609416/care-assistant-support-...,£10.07 per hour
4,"Banbury, Oxfordshire",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609402/care-assistant-support-...,£10.05 per hour
5,"Alvaston, Derby",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609418/care-assistant-support-...,£8.61 per hour
6,Lincolnshire,Families change. We change with them. That’s w...,Children's Social Worker - Level 2 - Lincolnshire,\r\n\t/job/1401599304/children-s-social-worke...,"£30,153 - £33,437"
7,"Lambeth, London (Greater)",We are looking for experienced Social Workers ...,Experienced Social Workers - Children's Social...,\r\n\t/job/1401609710/experienced-social-work...,"PO3 Starting salary £37,650 pa rising to £40,6..."
8,"Croydon (City/Town), London (Greater)",Can you champion and drive service improvement...,Team Managers Adolescent Service,\r\n\t/job/1401609342/team-managers-adolescen...,"Salary up £54,953 per annum"
9,Hampshire Countywide (please note you must be ...,What’s not to like about being a Hampshire Chi...,Experienced Children's Qualified Social Worker...,\r\n\t/job/1401608457/experienced-children-s-...,"£32,109 - £36,139 per annum + market supplemen..."


## The full script

This has to loop through the different URLs

This is the URL: https://jobs.communitycare.co.uk/searchjobs/?countrycode=GB&Page=1 

In [43]:
# Make a list of the bit that varies in the URL. Here it is just the page number, and there are 88 pages.
# The numbers have to be a string so they can be added in to the URL as a string, not a number 
pages = [str(i) for i in range(1,88)]

headers = {"Accept-Language": "en-US, en;q=0.5"}

# Create the lists
job_title = []
job_area = []
job_salary = []
job_description = []
job_link = [] 

# Prepare the monitoring of the loop
start_time = time()
requests = 0 

# For every page in the list of of pages
for page_number in pages: 
    
    # Make a get request 
    response = get('https://jobs.communitycare.co.uk/searchjobs/?countrycode=GB&Page=' + page_number)
    
    # Pause the loop
    sleep (randint(8,15))
    
    # Monitor the requests 
    requests += 1
    elapsed_time = time() - start_time 
    print('Request:{}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
    clear_output(wait = True)
    
    # Throw a warning for non-200 status codes 
    if response.status_code != 200:
        warn('Request: {}; Status code: {}'.format(requests, response.status_code))
        
    # Break the loop if the number of requests is more than the number of pages I want to scrape 
    if requests > 88:
        warn('Number of requests is bigger than expected Rosa!')
        break 
    
    # Parse the content of the request with BeautifulSoup
    page_html = BeautifulSoup(response.text, 'html.parser')
    
    # Select the containers from a single page 
    job_containers = page_html.find_all('li', class_ = 'lister__item')
    
    # For every item on the page 
    for jobs in job_containers: 
        
        # The job title
        job_titles = jobs.span.text
        job_title.append(job_titles)
    
        # The area 
        job_areas = jobs.find('li', class_ = 'lister__meta-item--location').text
        job_area.append(job_areas)

        # The salary
        job_salaries = jobs.find('li', class_ = 'lister__meta-item--salary').text
        job_salary.append(job_salaries)

        # The job description
        job_descriptions = jobs.find('p', class_ = 'lister__description').text 
        job_description.append(job_descriptions)

        # The link
        job_links = jobs.find('a')['href']
        job_link.append(job_links)
    
    

Request:87; Frequency: 0.08049674202684205 requests/s


In [44]:
sc_jobs_df = pd.DataFrame({'job': job_title,
                       'area': job_area,
                        'salary': job_salary,
                        'description': job_description,
                        'link': job_link
                       })

sc_jobs_df


Unnamed: 0,area,description,job,link,salary
0,"Stoke-on-Trent, Staffordshire",We are looking for a committed social care pro...,Supervising Social Worker,\r\n\t/job/1401610100/supervising-social-work...,"£30,000 - £37,000, £3,000 Car Allowance"
1,"Walsall, West Midlands",Do you love going the extra mile for people to...,Assistant Care Manager,\r\n\t/job/1401609580/assistant-care-manager/...,"£21,500 per annum"
2,"Haverhill, Suffolk",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609455/care-assistant-support-...,£8.96 per hour
3,"Cirencester, Gloucestershire",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609416/care-assistant-support-...,£10.07 per hour
4,"Banbury, Oxfordshire",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609402/care-assistant-support-...,£10.05 per hour
5,"Alvaston, Derby",Do you love going the extra mile for people to...,Care Assistant / Support Worker,\r\n\t/job/1401609418/care-assistant-support-...,£8.61 per hour
6,Lincolnshire,Families change. We change with them. That’s w...,Children's Social Worker - Level 2 - Lincolnshire,\r\n\t/job/1401599304/children-s-social-worke...,"£30,153 - £33,437"
7,"Lambeth, London (Greater)",We are looking for experienced Social Workers ...,Experienced Social Workers - Children's Social...,\r\n\t/job/1401609710/experienced-social-work...,"PO3 Starting salary £37,650 pa rising to £40,6..."
8,"Croydon (City/Town), London (Greater)",Can you champion and drive service improvement...,Team Managers Adolescent Service,\r\n\t/job/1401609342/team-managers-adolescen...,"Salary up £54,953 per annum"
9,Hampshire Countywide (please note you must be ...,What’s not to like about being a Hampshire Chi...,Experienced Children's Qualified Social Worker...,\r\n\t/job/1401608457/experienced-children-s-...,"£32,109 - £36,139 per annum + market supplemen..."
