## Webscraping

In this exercise, you'll practice using BeautifulSoup to parse the content of a web page. The page that you'll be scraping, https://realpython.github.io/fake-jobs/, contains job listings. Your job is to extract the data on each job and convert into a pandas DataFrame.

1. Start by performing a GET request on the url above and convert the response into a BeautifulSoup object.  
a. Use the .find method to find the tag containing the first job title ("Senior Python Developer"). Hint: can you find a tag type and/or a class that could be helpful for extracting this information? Extract the text from this title.  
b. Now, use what you did for the first title, but extract the job title for all jobs on this page. Store the results in a list.  
c. Finally, extract the companies, locations, and posting dates for each job. For example, the first job has a company of "Payne, Roberts and Davis", a location of "Stewartbury, AA", and a posting date of "2021-04-08". Ensure that the text that you extract is clean, meaning no extra spaces or other characters at the beginning or end.  
d. Take the lists that you have created and combine them into a pandas DataFrame. 

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io

In [3]:
# Check Connection
URL = 'https://realpython.github.io/fake-jobs/'
response = requests.get(URL)
response.status_code

200

In [4]:
soup = BeautifulSoup(response.text)

In [5]:
# Peek HTML
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <title>
   Fake Python
  </title>
  <link href="https://cdn.jsdelivr.net/npm/bulma@0.9.2/css/bulma.min.css" rel="stylesheet"/>
 </head>
 <body>
  <section class="section">
   <div class="container mb-5">
    <h1 class="title is-1">
     Fake Python
    </h1>
    <p class="subtitle is-3">
     Fake Jobs for Your Web Scraping Journey
    </p>
   </div>
   <div class="container">
    <div class="columns is-multiline" id="ResultsContainer">
     <div class="column is-half">
      <div class="card">
       <div class="card-content">
        <div class="media">
         <div class="media-left">
          <figure class="image is-48x48">
           <img alt="Real Python Logo" src="https://files.realpython.com/media/real-python-logo-thumbnail.7f0db70c2ed2.jpg?__no_cf_polish=1"/>
          </figure>
         </div>
         <div class="media-content">
          <h2 c

In [6]:
# Initialize Dataframe
fake_jobs = pd.DataFrame()

# Q1

## Job Titles Extraction

In [9]:
# Try to find first job title
soup.find('h2')

<h2 class="title is-5">Senior Python Developer</h2>

In [10]:
# Success... extract text
soup.find('h2').text

'Senior Python Developer'

In [11]:
# Save all h2 tags to list
h2_list = soup.findAll('h2')

In [12]:
# Peek first element in h2_list
h2_list[0]

<h2 class="title is-5">Senior Python Developer</h2>

In [13]:
# Extact text from all h2_list elements
job_titles = [x.text for x in h2_list]

In [14]:
# Peek list and check for success
job_titles

['Senior Python Developer',
 'Energy engineer',
 'Legal executive',
 'Fitness centre manager',
 'Product manager',
 'Medical technical officer',
 'Physiological scientist',
 'Textile designer',
 'Television floor manager',
 'Waste management officer',
 'Software Engineer (Python)',
 'Interpreter',
 'Architect',
 'Meteorologist',
 'Audiological scientist',
 'English as a second language teacher',
 'Surgeon',
 'Equities trader',
 'Newspaper journalist',
 'Materials engineer',
 'Python Programmer (Entry-Level)',
 'Product/process development scientist',
 'Scientist, research (maths)',
 'Ecologist',
 'Materials engineer',
 'Historic buildings inspector/conservation officer',
 'Data scientist',
 'Psychiatrist',
 'Structural engineer',
 'Immigration officer',
 'Python Programmer (Entry-Level)',
 'Neurosurgeon',
 'Broadcast engineer',
 'Make',
 'Nurse, adult',
 'Air broker',
 'Editor, film/video',
 'Production assistant, radio',
 'Engineer, communications',
 'Sales executive',
 'Software Deve

In [15]:
# Save to df fake_jobs
fake_jobs['title'] = job_titles

In [16]:
# Check for success
fake_jobs.head()

Unnamed: 0,title
0,Senior Python Developer
1,Energy engineer
2,Legal executive
3,Fitness centre manager
4,Product manager


## Company Extraction

In [18]:
# Identify company name locations
soup.find('h3')

<h3 class="subtitle is-6 company">Payne, Roberts and Davis</h3>

In [19]:
# Extract all
companies_list = soup.findAll('h3')

In [20]:
# Extract text from all elements
companies = [x.text for x in companies_list]

In [21]:
# Checking...
companies

['Payne, Roberts and Davis',
 'Vasquez-Davidson',
 'Jackson, Chambers and Levy',
 'Savage-Bradley',
 'Ramirez Inc',
 'Rogers-Yates',
 'Kramer-Klein',
 'Meyers-Johnson',
 'Hughes-Williams',
 'Jones, Williams and Villa',
 'Garcia PLC',
 'Gregory and Sons',
 'Clark, Garcia and Sosa',
 'Bush PLC',
 'Salazar-Meyers',
 'Parker, Murphy and Brooks',
 'Cruz-Brown',
 'Macdonald-Ferguson',
 'Williams, Peterson and Rojas',
 'Smith and Sons',
 'Moss, Duncan and Allen',
 'Gomez-Carroll',
 'Manning, Welch and Herring',
 'Lee, Gutierrez and Brown',
 'Davis, Serrano and Cook',
 'Smith LLC',
 'Thomas Group',
 'Silva-King',
 'Pierce-Long',
 'Walker-Simpson',
 'Cooper and Sons',
 'Donovan, Gonzalez and Figueroa',
 'Morgan, Butler and Bennett',
 'Snyder-Lee',
 'Harris PLC',
 'Washington PLC',
 'Brown, Price and Campbell',
 'Mcgee PLC',
 'Dixon Inc',
 'Thompson, Sheppard and Ward',
 'Adams-Brewer',
 'Schneider-Brady',
 'Gonzales-Frank',
 'Smith-Wong',
 'Pierce-Herrera',
 'Aguilar, Rivera and Quinn',
 'Lowe,

In [22]:
# Ensuring length consistency
len(companies)

100

In [23]:
fake_jobs.shape

(100, 1)

In [24]:
# Success... Adding to df fake_jobs
fake_jobs['company'] = companies

In [25]:
# Checking...
fake_jobs.head()

Unnamed: 0,title,company
0,Senior Python Developer,"Payne, Roberts and Davis"
1,Energy engineer,Vasquez-Davidson
2,Legal executive,"Jackson, Chambers and Levy"
3,Fitness centre manager,Savage-Bradley
4,Product manager,Ramirez Inc


## Location Extraction

In [27]:
# Identify location... location
soup.find('p')

<p class="subtitle is-3">
        Fake Jobs for Your Web Scraping Journey
      </p>

In [28]:
# Hmm... findAll and parse?
soup.findAll('p')

[<p class="subtitle is-3">
         Fake Jobs for Your Web Scraping Journey
       </p>,
 <p class="location">
         Stewartbury, AA
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p>,
 <p class="location">
         Christopherville, AA
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p>,
 <p class="location">
         Port Ericaburgh, AA
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p>,
 <p class="location">
         East Seanview, AP
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p>,
 <p class="location">
         North Jamieview, AP
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p>,
 <p class="location">
         Davidville, AP
       </p>,
 <p class="is-small has-text-grey">
 <time datetime="2021-04-08">2021-04-08</time>
 </p

In [29]:
# Try this?
soup.find('p', attrs={'class' : 'location'})

<p class="location">
        Stewartbury, AA
      </p>

In [30]:
# Boom... findall
soup.findAll('p', attrs={'class' : 'location'})

[<p class="location">
         Stewartbury, AA
       </p>,
 <p class="location">
         Christopherville, AA
       </p>,
 <p class="location">
         Port Ericaburgh, AA
       </p>,
 <p class="location">
         East Seanview, AP
       </p>,
 <p class="location">
         North Jamieview, AP
       </p>,
 <p class="location">
         Davidville, AP
       </p>,
 <p class="location">
         South Christopher, AE
       </p>,
 <p class="location">
         Port Jonathan, AE
       </p>,
 <p class="location">
         Osbornetown, AE
       </p>,
 <p class="location">
         Scotttown, AP
       </p>,
 <p class="location">
         Ericberg, AE
       </p>,
 <p class="location">
         Ramireztown, AE
       </p>,
 <p class="location">
         Figueroaview, AA
       </p>,
 <p class="location">
         Kelseystad, AA
       </p>,
 <p class="location">
         Williamsburgh, AE
       </p>,
 <p class="location">
         Mitchellburgh, AE
       </p>,
 <p class="location

In [31]:
# Nice... to list!
locs_list = soup.findAll('p', attrs={'class' : 'location'})

In [32]:
# Text Extraction
locations = [x.text for x in locs_list]

In [33]:
# Checking length consistency
len(locations)

100

In [34]:
# Adding to df fake_jobs
fake_jobs['location'] = locations

In [35]:
# Checking
fake_jobs.head()

Unnamed: 0,title,company,location
0,Senior Python Developer,"Payne, Roberts and Davis","\n Stewartbury, AA\n"
1,Energy engineer,Vasquez-Davidson,"\n Christopherville, AA\n"
2,Legal executive,"Jackson, Chambers and Levy","\n Port Ericaburgh, AA\n"
3,Fitness centre manager,Savage-Bradley,"\n East Seanview, AP\n"
4,Product manager,Ramirez Inc,"\n North Jamieview, AP\n"


In [36]:
# Oops... something wrong
fake_jobs['location'][0]

'\n        Stewartbury, AA\n      '

In [37]:
fake_jobs['location'][0].strip()

'Stewartbury, AA'

In [38]:
# That worked more graciously than I thought it would lol
stripped_locs = [x.strip() for x in locations]

In [39]:
stripped_locs

['Stewartbury, AA',
 'Christopherville, AA',
 'Port Ericaburgh, AA',
 'East Seanview, AP',
 'North Jamieview, AP',
 'Davidville, AP',
 'South Christopher, AE',
 'Port Jonathan, AE',
 'Osbornetown, AE',
 'Scotttown, AP',
 'Ericberg, AE',
 'Ramireztown, AE',
 'Figueroaview, AA',
 'Kelseystad, AA',
 'Williamsburgh, AE',
 'Mitchellburgh, AE',
 'West Jessicabury, AA',
 'Maloneshire, AE',
 'Johnsonton, AA',
 'South Davidtown, AP',
 'Port Sara, AE',
 'Marktown, AA',
 'Laurenland, AE',
 'Lauraton, AP',
 'South Tammyberg, AP',
 'North Brandonville, AP',
 'Port Robertfurt, AA',
 'Burnettbury, AE',
 'Herbertside, AA',
 'Christopherport, AP',
 'West Victor, AE',
 'Port Aaron, AP',
 'Loribury, AA',
 'Angelastad, AP',
 'Larrytown, AE',
 'West Colin, AP',
 'West Stephanie, AP',
 'Laurentown, AP',
 'Wrightberg, AP',
 'Alberttown, AE',
 'Brockburgh, AE',
 'North Jason, AE',
 'Arnoldhaven, AE',
 'Lake Destiny, AP',
 'South Timothyburgh, AP',
 'New Jimmyton, AE',
 'New Lucasbury, AP',
 'Port Cory, AE',
 

In [40]:
# Literally amazing replace that nasty location column with this
fake_jobs['location'] = stripped_locs

In [41]:
# Checking
fake_jobs.head()

Unnamed: 0,title,company,location
0,Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA"
1,Energy engineer,Vasquez-Davidson,"Christopherville, AA"
2,Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA"
3,Fitness centre manager,Savage-Bradley,"East Seanview, AP"
4,Product manager,Ramirez Inc,"North Jamieview, AP"


## Date Extraction

In [43]:
# Getting confident... straight to findall?
soup.findAll('time')

[<time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08</time>,
 <time datetime="2021-04-08">2021-04-08<

In [44]:
# Save to list
dates = soup.findAll('time')

In [45]:
# Quite... give me your text
soup.findAll('time')[0].text

'2021-04-08'

In [46]:
# All according to my master plan...
listing_dates = [x.text for x in dates]

In [47]:
# Check length
len(listing_dates)

100

In [48]:
# Perfect
fake_jobs['listing_date'] = listing_dates

In [49]:
# Checking]
fake_jobs.head()

Unnamed: 0,title,company,location,listing_date
0,Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA",2021-04-08
1,Energy engineer,Vasquez-Davidson,"Christopherville, AA",2021-04-08
2,Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA",2021-04-08
3,Fitness centre manager,Savage-Bradley,"East Seanview, AP",2021-04-08
4,Product manager,Ramirez Inc,"North Jamieview, AP",2021-04-08


# Q2

2. Next, add a column that contains the url for the "Apply" button. Try this in two ways.   
    a. First, use the BeautifulSoup find_all method to extract the urls.  
    b. Next, get those same urls in a different way. Examine the urls and see if you can spot the pattern of how they are constructed. Then, build the url using the elements you have already extracted. Ensure that the urls that you created match those that you extracted using BeautifulSoup. Warning: You will need to do some string cleaning and prep in constructing the urls this way. For example, look carefully at the urls for the "Software Engineer (Python)" job and the "Scientist, research (maths)" job.

## Apply Button URL extraction (BeautifulSoup)

In [53]:
# findAll URLs
card_footers = soup.findAll('a', attrs={'class' : 'card-footer-item'})

In [54]:
# It extracted the information for the 'Learn'and 'Apply' buttons, only the actual text is different between the two.
# Initialize Lists to separate
apply_buttons = []
learn_buttons = []

In [55]:
# Separate learn from apply
for x in card_footers:
    if x.text == 'Learn':
        learn_buttons.append(x.get('href'))
    elif x.text == 'Apply':
        apply_buttons.append(x.get('href'))

In [56]:
# Checking learn
learn_buttons[0]

'https://www.realpython.com'

In [57]:
# Checking Apply
apply_buttons[0]

'https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html'

In [58]:
# Curious...
apply_buttons

['https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html',
 'https://realpython.github.io/fake-jobs/jobs/energy-engineer-1.html',
 'https://realpython.github.io/fake-jobs/jobs/legal-executive-2.html',
 'https://realpython.github.io/fake-jobs/jobs/fitness-centre-manager-3.html',
 'https://realpython.github.io/fake-jobs/jobs/product-manager-4.html',
 'https://realpython.github.io/fake-jobs/jobs/medical-technical-officer-5.html',
 'https://realpython.github.io/fake-jobs/jobs/physiological-scientist-6.html',
 'https://realpython.github.io/fake-jobs/jobs/textile-designer-7.html',
 'https://realpython.github.io/fake-jobs/jobs/television-floor-manager-8.html',
 'https://realpython.github.io/fake-jobs/jobs/waste-management-officer-9.html',
 'https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html',
 'https://realpython.github.io/fake-jobs/jobs/interpreter-11.html',
 'https://realpython.github.io/fake-jobs/jobs/architect-12.html',
 'https://realpython.gi

## Manual URL Construction

The urls are a combination of the consistent beginning section of 'https://realpython.github.io/fake-jobs/jobs/' with each job title (sometimes slightly altered) lowered, hyphens instead of spaces, followed by a hyphen and then the index number of the job title, and ended with '.html'. Piece of cake.

### Exploration and Testing

In [62]:
# Let's take a look at the job titles and note the differences.
fake_jobs.head(30)

Unnamed: 0,title,company,location,listing_date
0,Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA",2021-04-08
1,Energy engineer,Vasquez-Davidson,"Christopherville, AA",2021-04-08
2,Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA",2021-04-08
3,Fitness centre manager,Savage-Bradley,"East Seanview, AP",2021-04-08
4,Product manager,Ramirez Inc,"North Jamieview, AP",2021-04-08
5,Medical technical officer,Rogers-Yates,"Davidville, AP",2021-04-08
6,Physiological scientist,Kramer-Klein,"South Christopher, AE",2021-04-08
7,Textile designer,Meyers-Johnson,"Port Jonathan, AE",2021-04-08
8,Television floor manager,Hughes-Williams,"Osbornetown, AE",2021-04-08
9,Waste management officer,"Jones, Williams and Villa","Scotttown, AP",2021-04-08


In [63]:
# I notice index 20:
fake_jobs[20:21]

Unnamed: 0,title,company,location,listing_date
20,Python Programmer (Entry-Level),"Moss, Duncan and Allen","Port Sara, AE",2021-04-08


In [64]:
apply_buttons[20]

'https://realpython.github.io/fake-jobs/jobs/python-programmer-entry-level-20.html'

In [65]:
# and index 10:
fake_jobs[10:11]

Unnamed: 0,title,company,location,listing_date
10,Software Engineer (Python),Garcia PLC,"Ericberg, AE",2021-04-08


In [66]:
apply_buttons[10]

'https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html'

In [67]:
# and index 25:
fake_jobs[25:26]

Unnamed: 0,title,company,location,listing_date
25,Historic buildings inspector/conservation officer,Smith LLC,"North Brandonville, AP",2021-04-08


In [68]:
apply_buttons[25]

'https://realpython.github.io/fake-jobs/jobs/historic-buildings-inspector-conservation-officer-25.html'

The exceptions appear to be any non alphabet character is removed or replaced by a hyphen.

In [70]:
# Test for for loop
test = fake_jobs['title'][9]

In [71]:
test.isalpha()

False

In [72]:
# Did not work... try this
test.replace(' ', '').isalpha()

True

In [73]:
test = fake_jobs['title'][10]

In [74]:
test.replace(' ', '').isalpha()

False

In [75]:
# Separate Dirty Strings
dirty_strings = []
for x in fake_jobs['title']:
    if x.replace(' ', '').isalpha() == False:
            dirty_strings.append(x)

In [76]:
dirty_strings

['Software Engineer (Python)',
 'Python Programmer (Entry-Level)',
 'Product/process development scientist',
 'Scientist, research (maths)',
 'Historic buildings inspector/conservation officer',
 'Python Programmer (Entry-Level)',
 'Nurse, adult',
 'Editor, film/video',
 'Production assistant, radio',
 'Engineer, communications',
 'Software Developer (Python)',
 'Designer, multimedia',
 'Chemist, analytical',
 'Programmer, multimedia',
 'Engineer, broadcasting (operations)',
 'Teacher, primary school',
 'Producer, television/film/video',
 'Scientist, forensic',
 'Back-End Web Developer (Python, Django)',
 'Engineer, automotive',
 'Producer, radio',
 'Designer, fashion/clothing',
 'Back-End Web Developer (Python, Django)',
 'Forest/woodland manager',
 'Python Programmer (Entry-Level)',
 'Warden/ranger',
 'Programmer, applications',
 'Software Developer (Python)',
 'Surveyor, land/geomatics',
 'Librarian, academic',
 'Museum/gallery exhibitions officer',
 'Radiographer, diagnostic']

---

So we have:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parenthesis<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;slashes<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;commas<br>
<br>
parenthesis surround the immediate text and always appear at the end of a string. They are just removed.<br>
slashes can appear anywhere inside the string and are only replaced by a hyphen.<br>
commas can appear anywhere in the string and are removed.<br>

### Constructing URLs

In [79]:
# Initialize List
apply_urls = []

In [80]:
# Save Beginning String
start = 'https://realpython.github.io/fake-jobs/jobs/'

In [81]:
# And ending string
end = '.html'

In [82]:
# Absolutely RANK for loop + if statement
# Variable i is to add the index number following the job title in the url.
i = 0
for x in fake_jobs['title']:
    # CLEAN
    if x.replace(' ', '').isalpha() == True:
        url_title = x.replace(' ', '-').lower()
        apply_urls.append(start + url_title + '-' + str(i) + end)
        i+=1
    # DIRTY
    else:
        url_title = x
        url_title = url_title.replace('(', '')
        url_title = url_title.replace(')', '')
        url_title = url_title.replace(',', '')
        url_title = url_title.replace('/', ' ')
        url_title = url_title.replace('-', ' ')
        # debugging
        if url_title.replace(' ', '').isalpha() == True:
            url_title = url_title.replace(' ', '-').lower()
            apply_urls.append(start + url_title + '-' + str(i) + end)
            i+=1
        else:
            print('parenthesis issue: ' + url_title)

In [83]:
# Checking problem string at index 25
apply_urls[25]

'https://realpython.github.io/fake-jobs/jobs/historic-buildings-inspector-conservation-officer-25.html'

In [84]:
apply_buttons[25]

'https://realpython.github.io/fake-jobs/jobs/historic-buildings-inspector-conservation-officer-25.html'

In [85]:
# Checking problem string at index 10
apply_urls[10]

'https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html'

In [86]:
apply_buttons[10]

'https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html'

LIKE BUTTER

# Q3

3. Finally, we want to get the job description text for each job.  
    a. Start by looking at the page for the first job, https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html. Using BeautifulSoup, extract the job description paragraph.  
    b. We want to be able to do this for all pages. Write a function which takes as input a url and returns the description text on that page. For example, if you input "https://realpython.github.io/fake-jobs/jobs/television-floor-manager-8.html" into your function, it should return the string "At be than always different American address. Former claim chance prevent why measure too. Almost before some military outside baby interview. Face top individual win suddenly. Parent do ten after those scientist. Medical effort assume teacher wall. Significant his himself clearly very. Expert stop area along individual. Three own bank recognize special good along.".  
    c. Use the [.apply method](https://pandas.pydata.org/docs/reference/api/pandas.Series.apply.html) on the url column you created above to retrieve the description text for all of the jobs.

In [90]:
URL = 'https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html'
response = requests.get(URL)
soup = BeautifulSoup(response.text)
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <title>
   Fake Python
  </title>
  <link href="https://cdn.jsdelivr.net/npm/bulma@0.9.2/css/bulma.min.css" rel="stylesheet"/>
 </head>
 <body>
  <section class="section">
   <div class="container mb-5">
    <h1 class="title is-1">
     Fake Python
    </h1>
    <p class="subtitle is-3">
     Fake Jobs for Your Web Scraping Journey
    </p>
   </div>
   <div class="container">
    <div class="columns is-multiline" id="ResultsContainer">
     <div class="box">
      <h1 class="title is-2">
       Senior Python Developer
      </h1>
      <h2 class="subtitle is-4 company">
       Payne, Roberts and Davis
      </h2>
      <div class="content">
       <p>
        Professional asset web application environmentally friendly detail-oriented asset. Coordinate educational dashboard agile employ growth opportunity. Company programs CSS explore role. Html educational

In [91]:
# Find second instance of <p> (first is subheader)
soup.findAll('p')[1]

<p>Professional asset web application environmentally friendly detail-oriented asset. Coordinate educational dashboard agile employ growth opportunity. Company programs CSS explore role. Html educational grit web application. Oversea SCRUM talented support. Web Application fast-growing communities inclusive programs job CSS. Css discussions growth opportunity explore open-minded oversee. Css Python environmentally friendly collaborate inclusive role. Django no experience oversee dashboard environmentally friendly willing to learn programs. Programs open-minded programs asset.</p>

In [112]:
# define function to get description
def apply(func_url):
    func_response = requests.get(func_url)
    func_soup = BeautifulSoup(func_response.text)
    return func_soup.findAll('p')[1]

In [114]:
# testing
apply('https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html')

<p>Professional asset web application environmentally friendly detail-oriented asset. Coordinate educational dashboard agile employ growth opportunity. Company programs CSS explore role. Html educational grit web application. Oversea SCRUM talented support. Web Application fast-growing communities inclusive programs job CSS. Css discussions growth opportunity explore open-minded oversee. Css Python environmentally friendly collaborate inclusive role. Django no experience oversee dashboard environmentally friendly willing to learn programs. Programs open-minded programs asset.</p>

In [116]:
fake_jobs.head()

Unnamed: 0,title,company,location,listing_date
0,Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA",2021-04-08
1,Energy engineer,Vasquez-Davidson,"Christopherville, AA",2021-04-08
2,Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA",2021-04-08
3,Fitness centre manager,Savage-Bradley,"East Seanview, AP",2021-04-08
4,Product manager,Ramirez Inc,"North Jamieview, AP",2021-04-08


In [121]:
# Success... for loop to find all
descriptions = []
for x in apply_buttons:
    result = apply(x)
    descriptions.append(result)

In [123]:
# Checking
apply_buttons[99]

'https://realpython.github.io/fake-jobs/jobs/ship-broker-99.html'

In [125]:
descriptions[99]

<p>Management common popular project only. Must small hair strong reveal future girl. Mother anything western Congress. You thought PM charge or put upon. Least building military seem. Glass type structure so magazine worker. Message become of Republican life field game look.</p>