In [1]:
import pandas as pd
from google import search

# Methodology Overview: The purpose of this document is flesh out the methods around RQ 1. 

RQ 1 states: What is the scope and nature of the network between item bank vendors and assessment system vendors? <p>
Current methods state: The data for RQ 1 is obtained from publically available information on the internet or provided by school district staff and/or assessment platform vendor upon request (e.g., see Forrester, 2014 for a technical report provided upon request). To constrain the scope of this part of the study, we are focusing attention on existing school district assessment management system vendors among the 50 largest public school districts in the United States.

# Overview of the proposed methodology

The proposed methodology involves the following steps: 
1. Manually gather information of item banks and their associated platform partners (KeyData Systems example presented below). 
2. Identify the top 50 school districts in the country, and place these in a list.
3. Programmatically create search terms that will search for the following:  ItemBankPartner AND SchoolDistrict AND "assessment"
4. Automatically query either Google or Bing (both pose unique problems), and have the first (most relevant) url returned. 
5. Compile all returned urls, and visual inspect urls to furter explore any that are from any one of the following three sources: (1) The school district's website, (2) an official document from the school district, (3) the platform's website. 
6. Document when a school district or a vendor website explicity state that the platform is used for a given school district. 

## Step 1

Manually gather information of item banks, and their associated platform partners (KeyData Systems example presented below):

In [16]:
item_partners = pd.read_csv("/Users/amyburkhardt/Dropbox/ItemBankInvestigation/ItemBank_Network.csv")
item_partners['ItemBankPartners'] = '"' + item_partners['ItemBankPartners'] + '"' # quotes used for google query
item_partners[['ItemBank', 'ItemBankPartners']]

Unnamed: 0,ItemBank,ItemBankPartners
0,KeyDataSystems,"""Five-star technology solutions"""
1,KeyDataSystems,"""onHands Schools"""
2,KeyDataSystems,"""Edluastic"""
3,KeyDataSystems,"""PowerSchool"""
4,KeyDataSystems,"""eduphoria"""
5,KeyDataSystems,"""Otus"""
6,KeyDataSystems,"""eDoctrina"""
7,KeyDataSystems,"""Mastery Connect"""
8,KeyDataSystems,"""performance matters"""
9,KeyDataSystems,"""illuminate education"""


## Step 2

Identify the top 50 school districts in the country (source: Wikipedia):

In [18]:
schools = pd.read_csv("/Users/amyburkhardt/Dropbox/ItemBankInvestigation/schoolDistricts.csv")
schools['School District'] = '"' + schools['School District'] + '"'
schools.head(15)

Unnamed: 0,School District
0,"""New York City Department of Education"""
1,"""Los Angeles Unified School District"""
2,"""Puerto Rico Department of Education"""
3,"""Chicago Public Schools"""
4,"""Miami-Dade County Public Schools"""
5,"""Clark County School District"""
6,"""Broward County Public Schools"""
7,"""Houston Independent School District"""
8,"""Hillsborough County Public Schools"""
9,"""Hawai_i Department of Education"""


## Step 3

Programmatically create search terms that will search for the following:  ItemBankPartner AND SchoolDistrict AND "assessment". Note, below are 1000 unique search terms that are automatically generated, but only the top 100 are printed out. 

In [32]:
search_terms = []
for school in schools['School District']:
    for ven in item_partners['ItemBankPartners']: 
        term = "{} AND {} AND assessment".format(ven, school)
        search_terms.append(term)
search_terms[:100]

['"Five-star technology solutions" AND "New York City Department of Education" AND assessment',
 '"onHands Schools" AND "New York City Department of Education" AND assessment',
 '"Edluastic" AND "New York City Department of Education" AND assessment',
 '"PowerSchool" AND "New York City Department of Education" AND assessment',
 '"eduphoria" AND "New York City Department of Education" AND assessment',
 '"Otus" AND "New York City Department of Education" AND assessment',
 '"eDoctrina" AND "New York City Department of Education" AND assessment',
 '"Mastery Connect" AND "New York City Department of Education" AND assessment',
 '"performance matters" AND "New York City Department of Education" AND assessment',
 '"illuminate education" AND "New York City Department of Education" AND assessment',
 '"FOCALPOINTK12" AND "New York City Department of Education" AND assessment',
 '"engrade" AND "New York City Department of Education" AND assessment',
 '"SchoolCity" AND "New York City Department of

## Step 4

Automatically query either Google or Bing (both pose unique problems), and have the first (most relevant) url returned. Below we are using a module called 'google' in Python to request search terms.  

In [31]:
returned_urls = []
phrase = []
for item in search_terms:
    for url in search(item, start = 1, stop = 2, num = 1, pause = 8):
        returned = url
        returned_urls.append(url)
        phrase.append(item)
returned_urls
phrase

## Step 5

Compile all returned urls, and visual inspect urls to determine any that meet the following criteria: (1) The school district's website, (2) an official document from the school district, (3) the platform's website. 

In [27]:
# put the returned urls and the query phrases into a single dataframe. 
returned_list = pd.DataFrame (
{
 'returned_urls': returned_urls,
 'phrase': phrase
    
})

In [28]:
pd.set_option("display.max_colwidth", 999)
returned_list

Unnamed: 0,phrase,returned_urls
0,"""Five-star technology solutions"" AND ""New York City Department of Education"" AND assessment",https://www.pinterest.com/pin/512284526337160384/
1,"""PowerSchool"" AND ""New York City Department of Education"" AND assessment",http://schools.nyc.gov/documents/oaosi/cep/2016-17/cep_Q744.pdf
2,"""eduphoria"" AND ""New York City Department of Education"" AND assessment",https://pt.scribd.com/document/335816426/The-Band-Directors
3,"""Otus"" AND ""New York City Department of Education"" AND assessment",https://www.imsglobal.org/leadership/technical-advisory-board-tab
4,"""eDoctrina"" AND ""New York City Department of Education"" AND assessment",http://www.datag.org/Websites/datagorg/images/Summer_Conference_FINAL_3_bios.pdf
5,"""Mastery Connect"" AND ""New York City Department of Education"" AND assessment",http://schools.nyc.gov/documents/oaosi/cep/2016-17/cep_Q169.pdf
6,"""performance matters"" AND ""New York City Department of Education"" AND assessment",https://www.imsglobal.org/leadership/executive-board-assessment
7,"""illuminate education"" AND ""New York City Department of Education"" AND assessment",https://www.edsurge.com/news/2016-01-13-teachscape-s-tangled-tale
8,"""FOCALPOINTK12"" AND ""New York City Department of Education"" AND assessment",https://www.imsglobal.org/cc/statuschart.cfm
9,"""engrade"" AND ""New York City Department of Education"" AND assessment",http://schools.nyc.gov/NR/rdonlyres/0A3F0EF9-CDB5-42AE-9D7D-A9B5201A436B/165486/Forthewebsite.docx


## Step 6

Document when a school district or a vendor website explicity states that the platform is used for a given school district.

### Examples

#### "eDoctrina" AND "New York City Department of Education" AND assessment	(KeyData System)
https://www.regents.nysed.gov/common/regents/files/215p12a3.pdf

"The school uses ongoing formative and summative assessments and evaluation data to inform instructional decisions and promote student learning. Data analysis is facilitated by the use of a tool called <font color='red'>eDoctrina </font>, which provides assessments, a wide verity of report formats and curriculum tracking."

#### "Mastery Connect" AND "New York City Department of Education" AND assessment  <p>

http://schools.nyc.gov/documents/oaosi/cep/2016-17/cep_M189.pdf

"We have partnered with Assessment Matters and <font color='red'>Mastery Connect</font> to implement an assessment tool and structure that will support the school in further developing assessment-informed school culture..."

#### "io education" AND "New York City Department of Education" AND assessment	

http://schools.nyc.gov/documents/oaosi/cep/2016-17/cep_X131.pdf

"Teachers will utilize Skedula and <font color='red'>IO Education</font> to track students grades and assessment data. Students and parents will access Pupil Path to monitor student progress." 


### <font color='red'> Issues to address </font> 

1. Google hates when you try and automatically search things, and it will stop working if it feels that a bot is doing the Googling (we could try and get around this by varying and lengthing the times between each request -- not an entirely professional approach). In the example above, it returned 36 of 1000 requests before erroring out. 
2. Alternatively, we can be copcetic and use Bing. For more than 1000 requests in a month, we have to pay something like $30 dollars. I have done a little proof of concept to see how well it would perform, and the returned URLs are often not the same as with Google (in some cases, not as good; in other cases, just different), but we could probably work to try and improve upon what is being returend by fine-tuning the search terms to better fit how Bing searches (e.g., make sure we are using the correct logic operators, etc). 
3. This still leaves the manual task of reviewing the urls to see which links are worthy documentation that a school district is using a particular platform (backed by a particular item bank). But, I think that the visual inspection does go pretty quickly -- clearly Pinterest, for example, won't be a link we are interested in pursuing. We could, however, find additional automatic ways to parse the data, so that we are only left with serious candidates to exam.


## Thoughts??