# Dataset Creation with Web Scraping 

Typeform is an online SAAS company for web form and survey creation. OSMI utilizes this software for its surveys.
To scrape details of its online surveys, we use the ```requests``` and ```BeautifulSoup``` libraries.

Data Sources:
1. [OSMI 2016 Survey]()
2. [OSMI 2017 Survey]()
3. [OSMI 2018 Survey]()
4. [OSMI 2020 Survey]()

## 01 OSMI 2018 Survey

In [86]:
# Import libraries necessary for web scraping and dataset creation
import re
import sys
import json
import pandas
import requests
from bs4 import BeautifulSoup

# Request typeform survey webpage for contents
res = requests.get(
      'https://osmi.typeform.com/report/xztgPT/NFu2PHjwsMUkkL3h')

# Extract response status and validate for successful transaction
status = res.status_code
if status != 200:
    sys.exit(1)
else:
    print("Web scraping response status:\n", status)

# Parse HTML title, head and body contents using BeautifulSoup
soup = BeautifulSoup(res.content, 'html.parser')

print("Dataset Title:\n", soup.title.text)

Web scraping response status:
 200
Dataset Title:
 OSMI Mental Health in Tech Survey 2018


In [127]:
# Select content inside the script element that contains information about the survey questions and answers
script = soup.select('script')[11]

# Set a Regex pattern to extract the report's payload and apply the pattern on the script text
pattern = re.compile("(?<=window.__REPORT_PAYLOAD = ).*(?=};)")
fields = re.findall(pattern, script.text)

# Complete the string to be able to input to the JSON parser
fields[0] = fields[0] + '}'

# Convert the string to JSON
json_param = json.loads(fields[0])

# Print all the questions asked in the survey
print("Number of questions in the survey:", len(json_param['blocks']))
print('-'*100)
for question in json_param['blocks']:
    title = question['title']
        uestion['title'].replace('*', '')
    print(question['title'])

Number of questions in the survey: 68
----------------------------------------------------------------------------------------------------
Are you self-employed?
How many employees does your company or organization have?
Is your employer primarily a tech company/organization?
Is your primary role within your company related to tech/IT?
Does your employer provide mental health benefits as part of healthcare coverage?
Do you know the options for mental health care available under your employer-provided health coverage?
Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?
Does your employer offer resources to learn more about mental health disorders and options for seeking help?
Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?
If a mental health issue prompted you to request a medical leave from work, how easy or di