**LSE DATA SCIENCE INSTITUTE** 


[DS105M](https://lse-dsi.github.io/lse-ds105-course-notes/) | **Week 04 - Lab Solutions (90 min)**

---



🎯 **OBJECTIVE:** Learn how to collect data from the Web using Python packages

👨‍💻 **AUTHORS:** [@antonboychenko](https://github.com/antonboychenko), [@jonjoncardoso](https://github.com/jonjoncardoso)

📅 **LAST UPDATED:** 09 January 2023


---


## Part 1: Exploring CIVICA

In [23]:
# importing the libraries we need
import requests
from bs4 import BeautifulSoup

In [24]:
# sending a GET request
response = requests.get('https://socialdatascience.network/index.html#schedule')
print(response)

<Response [200]>


In [25]:
# parse the HTML code using Beautiful Soup
soup = BeautifulSoup(response.text)

In [33]:
# creating empty lists to store data
links = []
events = []
speakers = []
dates = []  

# extracting all the tags that contain separate event cards
event_cards = soup.find_all('div', attrs={'class':"card mb-4"})

# iterating through cards to find the information we need
for card in event_cards:
    
    # get links, event names, speakers and dates
    link = 'https://socialdatascience.network/' + card.find('a').get('href')
    event = card.find('h6').get_text()
    name_and_date = card.find('p').get_text()
    
    # cleaning the name of the speaker
    speaker_name = name_and_date.split('Date')[0].replace('Speaker: ', '').strip()
    
    # cleaning the date
    date = name_and_date.split('Date')[1].replace(': ', '').strip()
    
    # for each event appending to the appropriate list
    links.append(link)
    events.append(event)
    speakers.append(speaker_name)
    dates.append(date)

## Part 2: Let's make a dataframe

In [34]:
# import pandas 
import pandas as pd

# create a dataframe and populate with information collected
CIVICA_df = pd.DataFrame({'name': events,
                         'speaker': speakers,
                         'date': dates,
                         'link': links})


# checking if the dataframe created correctly
CIVICA_df.head()

Unnamed: 0,name,speaker,date,link
0,Introducing the Online Harms Observatory: AI p...,Pica Johnsson,"Wednesday, 11 January 2023",https://socialdatascience.network/spring2023/s...
1,Using Open Source Data Streams and Surveys to ...,Prof. Lisa Singh,"Wednesday, 02 November 2022",https://socialdatascience.network/fall2022/ses...
2,Does Epistemic Vice Explain Corporate Misconduct?,Dr. Marco Meyer,"Wednesday, 19 October 2022",https://socialdatascience.network/fall2022/ses...
3,Becoming a data scientist: what it means to pu...,Prof. Anne Beaulieu,"Wednesday, 14 September 2022",https://socialdatascience.network/spring2022/s...
4,The Making of a French Migration Crisis,Dr. Michelle Reddy & Dr. Hélène Thiollet,"Wednesday, 15 June 2022",https://socialdatascience.network/spring2022/s...


In [None]:
# save the dataframe as CSV
CIVICA_df.to_csv('CIVICA_seminars.csv')

## Bonus task: Tell me more

In [52]:
# create an empty list to store bios
bios = []

# itrerate through all links that were saved before
for link in CIVICA_df['link']:
    
    try:
        # GET a response
        response = requests.get(link)

        # parse HTML
        soup = BeautifulSoup(response.text)

        # get the bio of the speaker
        bio = soup.find('p', attrs={'class':'card-text'}).get_text()

        # add bio to the list
        bios.append(bio)
        
    except:
        # in case the link is flawed append this
        bios.append('No response')

In [54]:
# adding bios to the dataframe
CIVICA_df['speaker_bio'] = bios

# checking the dataframe
CIVICA_df.head()

Unnamed: 0,name,speaker,date,link,speaker_bio
0,Introducing the Online Harms Observatory: AI p...,Pica Johnsson,"Wednesday, 11 January 2023",https://socialdatascience.network/spring2023/s...,Pica is a researcher on the Online Safety Team...
1,Using Open Source Data Streams and Surveys to ...,Prof. Lisa Singh,"Wednesday, 02 November 2022",https://socialdatascience.network/fall2022/ses...,Lisa Singh is the Director of the Massive Data...
2,Does Epistemic Vice Explain Corporate Misconduct?,Dr. Marco Meyer,"Wednesday, 19 October 2022",https://socialdatascience.network/fall2022/ses...,Dr. Marco Meyer is the principal investigator ...
3,Becoming a data scientist: what it means to pu...,Prof. Anne Beaulieu,"Wednesday, 14 September 2022",https://socialdatascience.network/spring2022/s...,Prof. Anne Beaulieu holds the Aletta Jacobs Ch...
4,The Making of a French Migration Crisis,Dr. Michelle Reddy & Dr. Hélène Thiollet,"Wednesday, 15 June 2022",https://socialdatascience.network/spring2022/s...,Michelle Reddy is a Postdoctoral Scholar at CE...
