In [1]:
import requests

Now we will use requests to make a GET HTTP request for the following url = 'https://www.python.org/events/python-events/' by making a GET request.

GET - Used to *request* data from a specified resource.

POST - Used to send data to a server to create/update a resource.

In [2]:
url = 'https://www.python.org/events/python-events/'
req = requests.get(url)

In [3]:
req.text[:200]

'<!doctype html>\n<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->\n<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->\n<!--[if IE 8]>      <h'

This shows that we have now grabbed the HTML of this web page. So now we can use Beautiful Soup (bs4) module to parse he HTML and retrieve the event data.

In [4]:
#pip3 install bs4

from bs4 import BeautifulSoup

Let's create a BeautifulSoup object and pass it the HTML text.

In [5]:
soup = BeautifulSoup(req.text, 'lxml')

In [6]:
events = soup.find('ul', {'class':'list-recent-events'})

In [7]:
type(events.findAll())

bs4.element.ResultSet

In [8]:
events.findAll('li')

[<li>
 <h3 class="event-title"><a href="/events/python-events/896/">HackBVICAM National Student’s Convention 2k20</a></h3>
 <p>
 <time datetime="2020-03-13T00:00:00+00:00">13 March<span class="say-no-more"> 2020</span></time>
 <span class="event-location">New Delhi, India</span>
 </p>
 </li>, <li>
 <h3 class="event-title"><a href="/events/python-events/902/">MoscowPythonConf++</a></h3>
 <p>
 <time datetime="2020-03-27T00:00:00+00:00">27 March<span class="say-no-more"> 2020</span></time>
 <span class="event-location">Moscow, Russia</span>
 </p>
 </li>, <li>
 <h3 class="event-title"><a href="/events/python-events/879/">PyCon SK 2020</a></h3>
 <p>
 <time datetime="2020-03-27T00:00:00+00:00">27 March – 29 March <span class="say-no-more"> 2020</span></time>
 <span class="event-location">Bratislava, Slovakia</span>
 </p>
 </li>, <li>
 <h3 class="event-title"><a href="/events/python-events/884/">PyCon Italia 2020</a></h3>
 <p>
 <time datetime="2020-04-02T00:00:00+00:00">02 April – 05 April <s

In [9]:
for event in events.findAll('li'):
    d = dict()
    d['name'] = event.find('h3').find('a').text
    d['time'] = event.find('p').find('time').text
    d['location'] = event.find('span', {'class':'event-location'}).text
    
    print(d)

{'name': 'HackBVICAM National Student’s Convention 2k20', 'time': '13 March 2020', 'location': 'New Delhi, India'}
{'name': 'MoscowPythonConf++', 'time': '27 March 2020', 'location': 'Moscow, Russia'}
{'name': 'PyCon SK 2020', 'time': '27 March – 29 March  2020', 'location': 'Bratislava, Slovakia'}
{'name': 'PyCon Italia 2020', 'time': '02 April – 05 April  2020', 'location': 'Florence, Italy'}
{'name': 'PyCon US 2020', 'time': '15 April – 23 April  2020', 'location': 'Pittsburgh, PA, USA'}
{'name': 'Django Day Copenhagen', 'time': '17 April 2020', 'location': 'Copenhagen, Denmark'}


In [10]:
import pandas as pd

df = pd.DataFrame()

df['name'] = [e.find('a').text for e in events.findAll('h3')]
df['time'] = [e.find('time').text for e in events.findAll('p')]
df['location'] = [e.find('span', {'class':'event-location'}) for e in events.findAll('li')]

In [11]:
df

Unnamed: 0,name,time,location
0,HackBVICAM National Student’s Convention 2k20,13 March 2020,"[New Delhi, India]"
1,MoscowPythonConf++,27 March 2020,"[Moscow, Russia]"
2,PyCon SK 2020,27 March – 29 March 2020,"[Bratislava, Slovakia]"
3,PyCon Italia 2020,02 April – 05 April 2020,"[Florence, Italy]"
4,PyCon US 2020,15 April – 23 April 2020,"[Pittsburgh, PA, USA]"
5,Django Day Copenhagen,17 April 2020,"[Copenhagen, Denmark]"


## Things to keep in mind

1. Requests is used to exeucte HTTP requests. It helps in GET and POST request's responses.

2. Requests object holds the results of the request.

3. We use bs4 to do the parsing of the HTML and also finding of content withing the HTML.

## Assignment

Go to the https://www.python.org/jobs/ url and grab all recent jobs posted on the url.