# Dataset Creation with Web Scraping 

Typeform is an online SAAS company for web form and survey creation. OSMI utilizes this software for its surveys.
To scrape details of its online surveys, we use the ```requests``` and ```BeautifulSoup``` libraries.

Data Sources:
1. [OSMI 2016 Survey]()
2. [OSMI 2017 Survey]()
3. [OSMI 2018 Survey]()
4. [OSMI 2020 Survey]()

## 01 OSMI 2018 Survey

In [86]:
# Import libraries necessary for web scraping and dataset creation
import re
import sys
import json
import pandas
import requests
from bs4 import BeautifulSoup

# Request typeform survey webpage for contents
res = requests.get(
      'https://osmi.typeform.com/report/xztgPT/NFu2PHjwsMUkkL3h')

# Extract response status and validate for successful transaction
status = res.status_code
if status != 200:
    sys.exit(1)
else:
    print("Web scraping response status:\n", status)

# Parse HTML title, head and body contents using BeautifulSoup
soup = BeautifulSoup(res.content, 'html.parser')

print("Dataset Title:\n", soup.title.text)

Web scraping response status:
 200
Dataset Title:
 OSMI Mental Health in Tech Survey 2018


In [121]:
# Select content inside the script element that contains information about the survey questions and answers
script = soup.select('script')[11]

# Set a Regex pattern to extract the report's payload and apply the pattern on the script text
pattern = re.compile("(?<=window.__REPORT_PAYLOAD = ).*(?=};)")
fields = re.findall(pattern, script.text)

# Complete the string to be able to input to the JSON parser
fields[0] = fields[0] + '}'

# Convert the string to JSON
json_param = json.loads(fields[0])

for question in json_param['blocks']:
    title = ['title']
    if title[0] == '*' and title[-1] == '*':
        t

*Are you self-employed?*
*If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?*
*If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?*
*Do you have previous employers?*
*Have your previous employers provided mental health benefits?*
*Were you aware of the options for mental health care provided by your previous employers?*
*Would you have been willing to discuss your mental health with your coworkers at previous employers?*
*What disorder(s) have you been diagnosed with?*
*If possibly, what disorder(s) do you believe you have?*
*Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?*
*Have you observed or experienced supportive or well handled response to a mental health issue in your current or previous workplace?*
