# Webscraping dynamic page content

### My husband mentioned needing to get information from ASQ (American Society for Quality) about newly certified members. ASQ's site provides some search options, but not as specific as he needed, and he would've needed to copy/paste dozens of pages of search results into a spreadsheet in order to further filter responses. I offered to find a webscraping option for him. My thanks to Damien Martin and this helpful post: https://kiwidamien.github.io/using-api-calls-via-the-network-panel.html for instructions about determining the correct url to access search results.

### Takeaways from this:

## 1. In previous projects, I had struggled with how to webscrape pages that required user input (search fields, buttons, etc.). This gave me a great opportunity to learn something new and use it in a way that helped save time and effort and decrease manual input.  

## 2.  Ultimately, I found similar functionality through Excel, which is more widely used by my husband's team, and thus easier for them to incorporate, so this process won't be implemented for now. 

## 3. There will often be multiple options for achieving results. Determining which one best fits a team's skills and needs is a critical step.

## 4. This was a quick project I did while in the middle of a larger data science project. I hesitated to share it at first, because it only took me a few hours and didn't seem impactful enough to be worthwhile, but EVERY project has the potential to both increase my knowledge and experience as well as possibly being helpful to others. 

In [17]:
import json
import requests
import pandas as pd

In [18]:
url = 'https://asq.org/cert/asq/VerificationPager.aspx?page=0&size=2000&Certification=&Country=US&State=IA&Province=&Name='

# Make the same request that Javascript makes
r = requests.get(url)

In [19]:
#Verifying the correct info was collected
print(r.text)

[1606,[["9601", "Abele, Grant D.", "Garner, IA", "UNITED STATES", "CQI", "March, 2000", "Active"],["516", "Abele, Grant D.", "Garner, IA", "UNITED STATES", "CQIA", "June, 2001", "Active"],["20318", "Abodeely, Paul D.", "Lisbon, IA", "UNITED STATES", "CQT", "March, 2007", "Active"],["7036", "Adams, David A.", "Marion, IA", "UNITED STATES", "CQT", "March, 1992", "Active"],["3529", "Adrian, Edwin M.", "West Union, IA", "UNITED STATES", "CQI", "March, 1992", "Active"],["21845", "Agarwal, Nikhil", "Davenport, IA", "UNITED STATES", "CQT", "October, 2010", "Active"],["2448", "Ahmed, Ovais", "Bettendorf, IA", "UNITED STATES", "CSSBB", "October, 2004", "Active"],["36869", "Ahmed, Ovais", "Bettendorf, IA", "UNITED STATES", "CQE", "June, 1997", "Active"],["15906", "Ahmed, Ovais", "Bettendorf, IA", "UNITED STATES", "CQA", "December, 1996", "Active"],["65338", "Alagappan Thangavel, Saravanan", "Coralville, IA", "UNITED STATES", "CQA", "May, 2016", "Active"],["98838", "Alagappan Thangavel, Saravanan

In [5]:
#Organizing and formatting the information so it can be put in a dataframe and saved as .csv file
certs = []
certs.extend(r.json())

In [13]:
#View a sample of the information so I can correctly assign dataframe column names
certs[1][1]

['516',
 'Abele, Grant D.',
 'Garner, IA',
 'UNITED STATES',
 'CQIA',
 'June, 2001',
 'Active']

In [9]:
df = pd.DataFrame(certs[1], columns = ['ASQ ID', 'Name', 'City', 'Country', 'Credential', 'Date Earned', 'Status'])
df

Unnamed: 0,ASQ ID,Name,City,Country,Credential,Date Earned,Status
0,9601,"Abele, Grant D.","Garner, IA",UNITED STATES,CQI,"March, 2000",Active
1,516,"Abele, Grant D.","Garner, IA",UNITED STATES,CQIA,"June, 2001",Active
2,20318,"Abodeely, Paul D.","Lisbon, IA",UNITED STATES,CQT,"March, 2007",Active
3,7036,"Adams, David A.","Marion, IA",UNITED STATES,CQT,"March, 1992",Active
4,3529,"Adrian, Edwin M.","West Union, IA",UNITED STATES,CQI,"March, 1992",Active
...,...,...,...,...,...,...,...
1601,841,"Zheng, Zhi","Cedar Falls, IA",UNITED STATES,CQPA,"December, 2010",Active
1602,69465,"Zhou, Ning","Urbandale, IA",UNITED STATES,CQA,"December, 2018",Active
1603,56278,"Zhou, Ning","Urbandale, IA",UNITED STATES,CMQ/OE,"January, 2020",Active
1604,2986,"Ziegenmeyer, Daniel E.","Montezuma, IA",UNITED STATES,CQI,"March, 1991",Active


In [11]:
df.to_csv('../asq.csv', sep=',', index=False)