## **Import Libraries**

In [3]:
#!pip install selenium
#!pip install webdriver-manager
#!pip install pyyaml ua-parser user-agents fake-useragent

################ WEB SCRAPING MODULES ############
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.common.by import By
import bs4
from fake_useragent import UserAgent
import requests
################ TIME MODLULES ###################
import time
from datetime import date 
import datetime
############## DATA MANIPULATION MODULES #########
import os
import pandas as pd
import numpy as np
import re
from nltk.corpus import stopwords

## **Define web source**

In [4]:
link = 'https://www.autism.org.uk/advice-and-guidance/what-is-autism/asperger-syndrome'

## **Read 100++ questions list**

In [5]:
df = pd.read_excel('ASDquestions2.xlsx',engine='openpyxl')

df[link]=np.nan
df

Unnamed: 0,Question,https://birchtreecenter.org/learn/autism,https://www.myautismteam.com/resources/autism-an-overview,https://www.autism.org.uk/advice-and-guidance/what-is-autism/asperger-syndrome
0,What are the Autism Spectrum Disorders (ASD)?,ASD refers to a wide spectrum of neurodevelopm...,,
1,How common is autism?,According to a 2020 report commissioned by the...,It is estimated that in the United States 1.6 ...,
2,What causes autism? Can it be cured?,The causes of this complex disorder remain unc...,,
3,Is autism contagious?,,Autism is not a contagious condition. Autism i...,
4,Are rates of autism increasing?,,Estimates released by the Centers for Disease ...,
...,...,...,...,...
104,What are some ways that parents can reduce the...,,,
105,Do some families deal with stress better than ...,,,
106,Do siblings suffer increased stress as a resul...,,,
107,What can I do about my children’s stress?,,,


## **Scrape QA pairs from website**

In [6]:
# Open webpage in a new window for scraping
#driver = webdriver.Chrome(ChromeDriverManager(chrome_type=ChromeType.GOOGLE).install())   #cannot fix in colab

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(link)



Current google-chrome version is 92.0.4515
Get LATEST driver version for 92.0.4515
Get LATEST driver version for 92.0.4515
Trying to download new driver from https://chromedriver.storage.googleapis.com/92.0.4515.107/chromedriver_linux64.zip
Driver has been saved in cache [/home/aceirus/.wdm/drivers/chromedriver/linux64/92.0.4515.107]


In [7]:
# Parse text in webpage
source = driver.page_source
soup = bs4.BeautifulSoup(source, 'html.parser')

In [8]:
# Search the questions mentioned in webpage
quesList = []
for ques in soup.find_all('h2'):
    print(ques.text)
    quesList.append(ques.text)

'Asperger syndrome' was introduced to the world by British psychiatrist Lorna Wing in the 1980s.
How common is Asperger syndrome?
How do people with Asperger syndrome see the world?
The benefits of an Asperger syndrome diagnosis
How Asperger syndrome is diagnosed
Differences in communication 
Differences in social interaction
Repetitive behaviours and routines
Highly focused interests
Different names and terms for autism
The problematic history of Hans Asperger
Join the community
Cookies on this site
Please wait while we check your current settings 


In [11]:
# Search the answers for questions
ansList = []
#for ans in soup.find_all('p'):
for ans in soup.find_all('dd'):

    print(ans.text)
    print('*'*100)
    ansList.append(ans.text)


Autism, including Asperger syndrome, is much more common than most people think. There are around 700,000 autistic people in the UK – that's more than 1 in 100. People with Asperger syndrome come from all nationalities and cultural, religious and social backgrounds.  Historically, more men have been diagnosed as autistic than women, although this is beginning to change. 

****************************************************************************************************

Some people with Asperger syndrome say the world feels overwhelming and this can cause them considerable anxiety. In particular, understanding and relating to other people, and taking part in everyday family, school, work and social life, can be harder. Other people appear to know, intuitively, how to communicate and interact with each other, yet can also struggle to build rapport with people with Asperger syndrome. People with Asperger syndrome may wonder why they are 'different' and feel their social differences me

## **Check which questions are similar**

In [12]:
stop_words = set(stopwords.words('english'))

def clean_text(sent):
    sent = sent.lower() # lowercase
    sent = re.sub(r'[^\w\s]', '', sent) # remove punctuations
    sent = re.sub('Autism Spectrum Disorder','ASD',sent) # Compress term
    sent = [w for w in sent.split() if not w.lower() in stop_words] # Remove stopwords
    sent = " ".join(sent)
    return sent

In [13]:
def jaccard_similarity(list1, list2):
    s1 = set(list1)
    s2 = set(list2)
    return float(len(s1.intersection(s2)) / len(s1.union(s2)))

In [14]:
# try to match websource questions with our own 100 questions list
for c1,i in enumerate(quesList):
    i2 = clean_text(i)
    list1 = i2.split()
    
    temp1 = 0.0
    temp2 = ''
    temp3 = ''
    temp4 = 0
    
    for c2,j in enumerate(df['Question']):
        j2 = clean_text(j)
        list2 = j2.split()
            
        sim = jaccard_similarity(list1, list2)
        if(sim>temp1):
            temp1 = sim
            temp2 = j
            temp3 = i
            temp4 = c2
            
    if(temp1>=0.3):  # sim threshold
        print('Website --> ',temp3,'(Index {})'.format(c1))
        print('100 questions list --> ',temp2,'(Index {})'.format(temp4))
        print('similarity:', temp1)
        print('*'*100)

Website -->  Differences in communication  (Index 5)
100 questions list -->  Facilitated Communication (Index 41)
similarity: 0.3333333333333333
****************************************************************************************************


In [15]:
for i in range (0,len(quesList)):
    print("Index #{0:d}: {1:s}".format(i,quesList[i]))

Index #0: 'Asperger syndrome' was introduced to the world by British psychiatrist Lorna Wing in the 1980s.
Index #1: How common is Asperger syndrome?
Index #2: How do people with Asperger syndrome see the world?
Index #3: The benefits of an Asperger syndrome diagnosis
Index #4: How Asperger syndrome is diagnosed
Index #5: Differences in communication 
Index #6: Differences in social interaction
Index #7: Repetitive behaviours and routines
Index #8: Highly focused interests
Index #9: Different names and terms for autism
Index #10: The problematic history of Hans Asperger
Index #11: Join the community
Index #12: Cookies on this site
Index #13: Please wait while we check your current settings 


In [16]:
for i in range (0,len(ansList)):
    print("Index #{0:d}: {1:s}".format(i,ansList[i]))

Index #0: 
Autism, including Asperger syndrome, is much more common than most people think. There are around 700,000 autistic people in the UK – that's more than 1 in 100. People with Asperger syndrome come from all nationalities and cultural, religious and social backgrounds.  Historically, more men have been diagnosed as autistic than women, although this is beginning to change. 

Index #1: 
Some people with Asperger syndrome say the world feels overwhelming and this can cause them considerable anxiety. In particular, understanding and relating to other people, and taking part in everyday family, school, work and social life, can be harder. Other people appear to know, intuitively, how to communicate and interact with each other, yet can also struggle to build rapport with people with Asperger syndrome. People with Asperger syndrome may wonder why they are 'different' and feel their social differences mean people don’t understand them.

Autistic people often do not 'look' disabled. S

In [17]:
# add websource answer to matched question in existing dataframe
#df[link].loc[1]=ansList[2]

In [18]:
df[link].loc[1]

nan

In [31]:
# update with new valid questions list
quesListUpd = quesList[1:5]+quesList[9:10]

quesListUpd

['How common is Asperger syndrome?',
 'How do people with Asperger syndrome see the world?',
 'The benefits of an Asperger syndrome diagnosis',
 'How Asperger syndrome is diagnosed',
 'Different names and terms for autism']

In [32]:
# manually select answers to updated questions list
ansListUpd = [ansList[0],
              ansList[1],
              ansList[2],
              ansList[3],
              ansList[8]]

ansListUpd

["\nAutism, including Asperger syndrome, is much more common than most people think. There are around 700,000 autistic people in the UK – that's more than 1 in 100. People with Asperger syndrome come from all nationalities and cultural, religious and social backgrounds. \xa0Historically, more men have been diagnosed as autistic than\xa0women, although this is beginning to change.\xa0\n",
 "\nSome people with Asperger syndrome say the world feels overwhelming and this can cause them considerable anxiety. In particular, understanding and relating to other people, and taking part in everyday family, school, work and social life, can be harder. Other people appear to know, intuitively, how to communicate and interact with each other, yet can also struggle to build rapport with people with Asperger syndrome. People with Asperger syndrome may wonder why they are 'different' and feel their social differences mean people don’t understand them.\n\nAutistic people often do not 'look' disabled. S

In [33]:
# Create new dataframe with QA pairs
df2 = pd.DataFrame(zip(quesListUpd,ansListUpd),columns=['Question',link])
df2

Unnamed: 0,Question,https://www.autism.org.uk/advice-and-guidance/what-is-autism/asperger-syndrome
0,How common is Asperger syndrome?,"\nAutism, including Asperger syndrome, is much..."
1,How do people with Asperger syndrome see the w...,\nSome people with Asperger syndrome say the w...
2,The benefits of an Asperger syndrome diagnosis,\nSome people see a formal diagnosis as an unh...
3,How Asperger syndrome is diagnosed,\nThe characteristics of Asperger syndrome var...
4,Different names and terms for autism,"\nOver the years, different diagnostic labels ..."


In [34]:
# Concatenate existing and new dataframes
df3 = pd.concat([df,df2],axis=0)
df3 = df3.sort_values(by=list(df3.columns[1:])).reset_index(drop=True)
df3.head(15)

Unnamed: 0,Question,https://birchtreecenter.org/learn/autism,https://www.myautismteam.com/resources/autism-an-overview,https://www.autism.org.uk/advice-and-guidance/what-is-autism/asperger-syndrome
0,What are the Autism Spectrum Disorders (ASD)?,ASD refers to a wide spectrum of neurodevelopm...,,
1,How common is autism?,According to a 2020 report commissioned by the...,It is estimated that in the United States 1.6 ...,
2,What causes autism? Can it be cured?,The causes of this complex disorder remain unc...,,
3,Is autism contagious?,,Autism is not a contagious condition. Autism i...,
4,Are rates of autism increasing?,,Estimates released by the Centers for Disease ...,
5,Is autism a new condition?,,"It is likely that autism has always existed, b...",
6,Is there a cure for autism?,,"There is no cure for autism. However, early in...",
7,How Is Autism Diagnosed?,,There is no one single conclusive test for aut...,
8,Is autism permanent?,,There is some controversy on the topic of whet...,
9,How Is Autism Treated?,,Treatment for autism depends largely on the in...,


## **Save Output**

In [35]:
df3.to_excel('ASDquestions4.xlsx',index=False)