# Scraping

As an example, suppose we want to crawl the list of "Available Technologies" being licensed by MIT at http://technology.mit.edu and store their basic info; their associated patents; and the reference counts on their associated patents.

### Step 1: Understanding the URL

Okay, let's open Chrome browser and go to that URL.

<img src="images/mit0.png">

- _First try_:  Aha, a list of links on the right.  Let's click on a few -- what do we see?  Many are empty, the categories are not obviously mutually exclusive, okay.  Maybe there's a better way.
- _Second try_: Let's just search for all technologies at http://technology.mit.edu/technologies.  Okay, better but it only gives us 50 at a time.  We could just combine the four pages, that's fine.  Let's just click on page 2 to see what happens
- _Third try_: Aha, the URL for page 2 is http://technology.mit.edu/technologies?limit=50&offset=50&query=.  That looks like we can just specify a higher limit and offset 0 and get the whole thing.
- _Final answer_: Indeed, http://technology.mit.edu/technologies?limit=1000 has a giant list.


We are going to seu `urllib2` python library to get the raw page content as a string. Note: this library is not available in Python3, instead `urllib` should be used, which has almost identical API. 

In [10]:
import urllib2

url = "http://technology.mit.edu/technologies?limit=1000"
raw_page = urllib2.urlopen(url).read()
#print raw_page

### Step 2: CSS selectors

Let us inspect the http://technology.mit.edu/technologies?limit=1000 in Chrome.
Open View->Developer->Developer Tools. Or in the newer version simply: however over the element, right click->Inspect
Right click on one of the technology titles, and choose "Inspect Element".

<img src="images/mit1.png">

What are we looking at? Well.. it's this is the structure of the webpage. Nested tags of different types and having a variety of attributes.

  - All of the technologies are underneath ("_descendents of_")   `<div class="search" id="nouvant-portfolio-content">`
  - In fact, each of them is in its own `<div class="technology" data-images="true" id="technology_XXXX">`
    
Now we're ready to move on: we'll use BeautifulSoup to leverage the above to zoom in on the individual technologies and to get links to the pages with detailed info.

This pattern -- where you have nested finds, each given by conditions on tag type, id, and class -- is very common.  It's so common that there is a special convenience language for such traversals: [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).

BeautifulSoup supports a form of CSS selectors, and this will let us write the above in a more concise and expressive way:
    >    tech_divs = soup.select('div#nouvant-portfolio-content  div.technology')

All selectors work like a 'find_all'.  Some basic building examples of selectors are:

 - _'mytag'_ picks out all tags of type _mytag_.
 - _'#myid'_ picks out all tags whose _id_ is equal to _myid_
 - _'.myclass'_ picks out all tags whose _class_ is equal to _myclass_
 - _'mytag#myid'_ will pick all tags of type _mytag_ **and** _id_ equal to _myid_ (analgously for _'mytag.myclass'_)
 - If _'selector1'_ and _'selector2'_ are two selectors, then there is another selector '_selector1 selector2'_.  It picks out all tags satisfying _selector2_ that are __descendents__(*) of something satisfying _selector1_, i.e., it's like our nested find.
 
 (*) It doesn't have to be a _direct_ descedent.  I.e., it can be a grand-grand-..-grand-child of something satisfying _selector1_.  For direct descendents we'd instead write _'selector1 > selector2'_
 
Let's just explain how this applies to our example:

1.  Let's start with the first half
        >    tech_divs = soup.select('div#nouvant-portfolio-content  div.technology')
        >                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This picks out all 'div' tags with id 'nouvant-portfolio-content'.
2.  Then the second half
        >    tech_divs = soup.select('div#nouvant-portfolio-content  div.technology')
        >                                                            ^^^^^^^^^^^^^^
This picks out all 'div' tags with class 'technology'.
3.  Finally the whole thing
        >    tech_divs = soup.select('div#nouvant-portfolio-content  div.technology')
        >                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
does exactly the same as our nested find above!


In [9]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(raw_page)
#print soup.prettify()

Following a detailed description above, we will first find all tags of interested using 2 `find` statements, and then using one nested `select` statement from BeautifulSoup

In [3]:
parent_div = soup.find('div', attrs={'id': 'nouvant-portfolio-content'}) #Find (at most) *one*
tech_divs = parent_div.find_all('div', attrs={'class':'technology'})  #Find *all*
print len(tech_divs)

200


These two find statements can be combined into one select statement with a more complex CSS pattern:

In [4]:
tech_divs = soup.select('div#nouvant-portfolio-content div.technology')
print len(tech_divs)

200


Let's check out what we've zoomed in to 

In [5]:
print tech_divs[0].prettify()

<div class="technology" data-images="false" id="technology_10355">
 <h2>
  <a href="/technologies/17704_procedural-sedation-monitoring-systems-and-methods-for-predicting-adverse-events-and-assessing-level-of-sedation">
   Procedural Sedation Monitoring: Systems and Methods for Predicting Adverse Events and Assessing Level of Sedation
  </a>
 </h2>
 <p>
  <strong>
   17704
  </strong>
  –
  <span>
   This invention is a system for monitoring sedation state and detecting adverse events during procedural sedation. The system can be implemented in a standalone monitor or incorporated into commercially available monitoring systems within clinical settings.     Procedural sedation has allowed many painful procedures to be conducted outside the operating room. During such procedures, it is...
   <a href="/technologies/17704_procedural-sedation-monitoring-systems-and-methods-for-predicting-adverse-events-and-assessing-level-of-sedation">
    Read More
   </a>
  </span>
 </p>
</div>



Now we're ready to pull out some key pieces of info:

- The technology's "title" (the text in the `<a>` element)
- The link to follow for more info on the technology (the _href_ attribute of the `<a>`)
- And a short blurb about the text (in the `<span>`)

Let's write some code to extract this.  But before we do, let's discuss what _form_ the output should take: It is often convenient to store data in _key-value_ form (e.g., as a hashtable), in other words to name the bits of data you are collecting.  One big advantage is that this makes it easier to add in extra fields progrssively.

Let's see what the code looks like:

In [6]:
firsta = tech_divs[0].find('a')
print firsta.text
print firsta['href']

Procedural Sedation Monitoring: Systems and Methods for Predicting Adverse Events and Assessing Level of Sedation 
/technologies/17704_procedural-sedation-monitoring-systems-and-methods-for-predicting-adverse-events-and-assessing-level-of-sedation


We're going to use a "named tuple" to store our key-value data. We could also have used a dictionary, with strings as keys.

Named tuples have some advantages
 - Better notation, `x.field_name instead of x['field_name']`
 - If you change your object structure later and fail to update your code to include the new fields, this will make it easier to find.
 - They are immutable, preventing certain sorts of bugs... and some disadvantages:
   - If you want to augment object structure you need a new type (or to go back and fill your code )
   - They are immutable.

In [7]:
from collections import namedtuple
TechBasic = namedtuple('TechBasic', 'title, url, short')

def td_info(td):
    la = td.select('h2 > a')
    ls = td.select('span')
    if len(la)!=1 or len(ls)!=1:
        print "Uh oh! We did something wrong"
        return None
    return TechBasic (
            title = la[0].text,
            url   = la[0]['href'],
            short = ls[0].text
            )
tech_links=[td_info(td) for td in tech_divs]

print tech_links[0].url
print tech_links[0].short

/technologies/17704_procedural-sedation-monitoring-systems-and-methods-for-predicting-adverse-events-and-assessing-level-of-sedation
    This invention is a system for monitoring sedation state and detecting adverse events during procedural sedation. The system can be implemented in a standalone monitor or incorporated into commercially available monitoring systems within clinical settings.     Procedural sedation has allowed many painful procedures to be conducted outside the operating room. During such procedures, it is... Read More


### Getting the patent information

Let us start from the end. First, I will provide the solution to the problem, and then we will work backwards to understand it:

In [8]:
Patent = namedtuple('Patent', 'name url')
TechDetailed = namedtuple('TechDetailed', 'tech_basic, patents')
def get_tech_details(tech_basic):
    url_base="http://technology.mit.edu/"
    soup = BeautifulSoup( urllib2.urlopen(url_base + tech_basic.url) )
    def patent_info(a):
       return Patent ( 
                name = a.text, 
                url = a['href'] 
                )
    patents = [patent_info(a) for a in soup.select('dd.us_patent_issued a')]
    return TechDetailed ( 
            tech_basic = tech_basic, 
            patents = patents 
            )

tech_basics = map(get_tech_details, tech_links[0:10])  #This takes a list

#okay, some technologies will not have associated patents. So let us filter to keep only those that have
print filter(lambda x: len(x.patents) != 0, tech_basics)

[TechDetailed(tech_basic=TechBasic(title=u'The Visual Microphone', url=u'/technologies/16488_the-visual-microphone', short=u' \n\n   \n\n The inventors have\ndeveloped a method to turn effectively any object into a visual microphone to\nenable the detection of sound from afar. Sound signals produce air pressure\nfluctuations that cause objects in the vicinity to vibrate. This method works\nby analyzing video recordings of these vibrations to convert them back into a\ncorresponding sound signal. This technology is mainly... Read More'), patents=[Patent(name=u'US Patent 2015-0319540', url=u'http://google.com/patents/US20150319540')]), TechDetailed(tech_basic=TechBasic(title=u'Methods of Evaluating Gene Expression Levels', url=u'/technologies/15194_methods-of-evaluating-gene-expression-levels', short=u'    The ability to build a higher level representation of a novel biological design from known parts, with flexible protocol automation and DNA expression characterization, is useful in any

**Note**: 
In the last code segment, we only did the first one.  If we try to get them all this way, it'll take a while.  Run the next cell for as long (or not) as you wish, and when you get bored use _Kernel->Interrupt_ to stop it.

The problem is of course that it takes a while to connect to the remote server and fetch the page.  Fortunately, thought it takes a long time it is not actually _computationally expensive_: your computer would be perfectly happy doing this for 20 pages at a time.  The **multiprocessing** package in Python makes it easy to do this kind of (easy) parallelization.

In [15]:
# Slow version
# Uncomment and run it to see
# import time

# start_time = time.time()
# tech_details = map(get_tech_details, tech_links)  #This takes a list
# end_time = time.time()

# print "Done!", end_time-start_time

In [16]:
# Multi-processor version
import time
import multiprocessing as mp
from multiprocessing import Pool

cpu_count=mp.cpu_count()
workers = Pool(cpu_count-1) 

print "Running with {} cores...".format(cpu_count)

start_time = time.time()
tech_details = workers.map(get_tech_details, tech_links)
end_time = time.time()

print "Done!", end_time-start_time

Running with 4 cores...
Done! 23.4673888683


**Exercise**:

Let's put all of that together.  Write a function 
```python
def get_tech_basics(url):
    ...
```

that returns `TechBasic` all each technology on the page.  Combine this with the pooled requests to get_tech_details to obtain a list of TechDetails.

**Fin.**
That's it, we now have a basic not-entirely-trivial example.  Along the way we took some detours, so let's just take a look at what our code looks like without those detours:

**Exercises:**

1. Modify "get_tech_details" to get other interesting information on the technology, like a long form description and/or the authors' names.  (You'll also want to modify TechDetailed.  Do that first and note that now the code breaks when it tries to construct a TechDetailed with the wrong number of fields.)

2. Modify "get_tech_details" to try to follow the link and to get more information on the patent -- for instance when it was filed and granted, or how many other patents reference it.  (Warning: The patent web site is much less regular than MIT's!)

In [None]:
import urllib2
from bs4 import BeautifulSoup
from collections import namedtuple
from multiprocessing import Pool
import multiprocessing as mp

cpu_count=mp.cpu_count()

# Getting the list of short 'blurbs' about the techs
TechBasic = namedtuple('TechBasic', 'title, url, short')
def get_tech_basics(url):
    url = "http://technology.mit.edu/technologies?limit=1000"
    soup = BeautifulSoup(urllib2.urlopen(url))

    ## Get the list of tech blurbs
    tech_divs = soup.select('div#nouvant-portfolio-content  div.technology')

    ## Parse a single 'td' on the index page
    def td_info(td):
        la = td.select('h2 > a')
        ls = td.select('span')
        if len(la)!=1 or len(ls)!=1:
            print "Uh oh! We did something wrong"
            return None
        return TechBasic (
                title = la[0].text,
                url   = la[0]['href'],
                short = ls[0].text
                )
    
    return [td_info(td) for td in tech_divs]


# Adding in some details (just patent info, for now)
Patent = namedtuple('Patent', 'name url')
TechDetailed = namedtuple('TechDetailed', 'tech_basic, patents')
def get_tech_details(tech_basic):
    url_base="http://technology.mit.edu/"
    soup = BeautifulSoup( urllib2.urlopen(url_base + tech_basic.url) )
    def patent_info(a):
       return Patent ( 
                name = a.text, 
                url = a['href'] 
                )
    patents = [patent_info(a) for a in soup.select('dd.us_patent_issued a')]
    return TechDetailed ( 
            tech_basic = tech_basic, 
            patents = patents 
            )

## The main driver code:
tech_basics = get_tech_basics("http://technology.mit.edu/technologies?limit=1000")

workers = Pool(cpu_count)  # number of worker processes
tech_details = workers.map(get_tech_details, tech_basics)

print tech_details[7]

## Example: accessing info from a web page through drop-down menu


Suppose we need to collect the text information for different reports on the **Fed Board Governors** Webpage and output it in a table if possible.

URL: http://www.federalreserve.gov/apps/reportforms/default.aspx

<img src="images/dropdown1.png">

For every form, we need to collect the basic information: 
 - Name of Form
 - Description
 - OMB
 - Purpose
 - Background
 - Respondent Panel
 - Frequency
 
Not every form will have all this information, but whatever is there we’d like to scrape. We do not need to download the forms, just want the information, the whole block of text is fine.

<img src="images/dropdown2.png">

In the end, we would like to structure the information as a CSV file:

**Form Number; Description; OMB; Purpose; Background; Respondent Panel; Frequency; Public Release; FR 2004**

Inspecting the URL and playing with it a bit, we find out that there is no URL we could derive, that would allow to capture all the information we need. It looks like we would need to start from the front page and navigate through it, select something from a drop-down menu like a human would!

While we can still use CSS selectors to find web elements on the page, BeautifulSoup would not allow us to navigate through the page easily (i.e. click buttons) or invoke embeded javascript. However, Selenium library can allow us all do that!

### Selenium

Selenium WebDriver API supports different possibilities to identify elements: by ID, by CLASS, by NAME, by CSS selector, by XPath, by TAG name. 

To inspect an element you just have to open the desired web page, right-click the desired element and click on Inspect Element. 

In [14]:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By

### Set-up pyvirtualdisplay



In [22]:
from selenium import webdriver
from pyvirtualdisplay import Display

#Display - read about pyvirtualdisplay
display = Display(visible=0, size=(1024, 768))
display.start()
#webdriver - read about selenium.webdriver
driver = webdriver.Firefox()
    
#this is a starting page we are scraping
driver.get("http://www.federalreserve.gov/apps/reportforms/default.aspx")

Every element on the HTML page can be located using CSS selectors (similar to the previous problem discussed).
Opening the starting page in Chrome or Firefox, right click on the drop-down menu, click "Inspect" we see a tag on the right highlighted, we copy it's id - `MainContent_ddl_ReportForms`.

Knowing the id of dropdown menu, we can locate it with Selenium like this:

In [23]:
main_menu = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#MainContent_ddl_ReportForms")))

Drop down menu is an HTML table of options which can be verified in Chrome browser (Developer Tools, that pop up when you right click and press "Inspect" on an element)

Following returns all of the options - rows in that table:

In [24]:
form_options = main_menu.find_elements_by_tag_name("option")

In [25]:
#We count them
option_count = len(form_options)
print option_count

128


Next, we iterate through the menu - essentially, like we scrolling down the drop down menu and clicking on each every form!

Here is an example for the first item only:

In [26]:
#list to store all scraped data
all_items = list()


#Get web element corresponding to a form
form = form_options[1]
#Click as a mouse click-action in browser 
form.click()
#Get text, because we need to store the form number
form_id = form.text
print form_id

FFIEC 001


In [27]:
#Locate a web element corresponding to the submit button. By CSS selector which we found by inspection in Chrome browser (same logic as above)
submit_button = WebDriverWait(driver,3).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#MainContent_btn_GetForm")))
#Click as a mouse click-action in browser 
submit_button.click()   

In [29]:
#Explore all the infor we want to scrape: 'Description','OMB','Background',
#'RespondentPanel','Frequency','PublicRelease'. some of them might nit be available on the page

description = driver.find_element_by_css_selector("#MainContent_lbl_Description_data") 
print description.text

The Board of Governors of the Federal Reserve System discontinued the Annual Report of Trust Assets (FFIEC 001; OMB No. 7100-0031), effective with the December 31, 2001, report. The Federal Reserve had collected the FFIEC 001 report from all state member banks that had been granted trust powers and from trust company subsidiaries of bank holding companies not otherwise supervised by a federal banking agency. The purpose of the report was to provide information on the volume and character of discretionary fiduciary activities exercised by such institutions.


In [40]:
#OMB tag is not present on this particular page... Can be coinfirmed by manually inspecting the element
#OMB = driver.find_element_by_css_selector("#MainContent_lbl_OMB_data") 
#print OMB.text

## Putting it all together

Now we assemble all the pieces of code above into an a function.

In [34]:
def get_all_items(max_num_items=None):
    #list to store all scraped data
    all_items = list()

    #Display - read about pyvirtualdisplay
    display = Display(visible=0, size=(1024, 768))
    display.start()
    #webdriver - read about selenium.webdriver
    driver = webdriver.Firefox()
    
    #this is a starting page we are scraping
    driver.get("http://www.federalreserve.gov/apps/reportforms/default.aspx")
    #Every element on the HTML page can be located using CSS selectors.
    #Opening the starting page in Chrome, right click on the drop-down menu, click "Inspect" we see a tag on the right highlighted, we copy it's id - MainContent_ddl_ReportForms
    #Knowing the id of dropdown menu, we can locate it with Selenium like this
    main_menu = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#MainContent_ddl_ReportForms")))
    #Drop down menu is an HTML table of options which can be verified in Chrome browser (Developer Tools, that pop up when you right click and press "Inspect" on an element)
    #Following returns all of the options - rows in that table
    form_options = main_menu.find_elements_by_tag_name("option")
    #We count them
    option_count = len(form_options)
    if max_num_items is not None:
        option_count = min(max_num_items,option_count) 
        if option_count < 2: 
            print "Need to inspect at least one option"
            exit(1)
    #Next, we loop over all of them - essentially like we scrolling down the drop down menu and clicking on each every form 
    for form_i in xrange(1,option_count):
        #Get web element corresponding to a form
        form = form_options[form_i]
        #Click as a mouse click-action in browser 
        form.click()
        #Get text, because we need to store the form number
        form_id = form.text
        #Locate a web element corresponding to the submit button. By CSS selector which we found by inspection in Chrome browser (same logic as above)
        submit_button = WebDriverWait(driver,3).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#MainContent_btn_GetForm")))
        #Click as a mouse click-action in browser 
        submit_button.click()      
        #Prepare data structures to store all the info we want to scrape
        a = dict.fromkeys(['Description','OMB','Background','RespondentPanel','Frequency','PublicRelease'])
        #We are on a web page after submit-click, following will search for all items of interest. Or for corresponding
        #web-elements 
        for el in a.keys():
            try:
                item = driver.find_element_by_css_selector("#MainContent_lbl_"+el+"_data") 
                #Once found it will store them in our dictionary, if not it will proceed to "except" section and do nothing
                a[el] = item.text 
            except: 
                #case when there is no such field
                pass
        #we need form number as well
        a['FormNumber'] = form_id
        #keeping them all in one list, which will have a dictionary per Form Number - and later, a row in your excel file per Form number
        all_items.append(a)
    
        #Ok, that part bothers me a little: it looks like I have to refresh "form_options" each time... 
        #Otherwise I get following exception: selenium.common.exceptions.StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up
        driver.get("http://www.federalreserve.gov/apps/reportforms/default.aspx")
        main_menu = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#MainContent_ddl_ReportForms")))
        form_options = main_menu.find_elements_by_tag_name("option")

    driver.close()
    display.stop()

    return all_items

In [35]:
all_items = get_all_items(2)
print len(all_items)

1


In [36]:
print all_items

[{'OMB': None, 'Description': None, 'RespondentPanel': None, 'FormNumber': u'FFIEC 001', 'Frequency': None, 'Background': None, 'PublicRelease': None}]


As we see all of the data is "None" - what went wrong? Let us check in the debugger.

**Spoiler**: use WebDriverWait to wait for presence of element looked for.

We can inspect what we have screaped either manually or loading into a Pandas dataframe:

In [37]:
import pandas as pd
scraped_data = pd.read_csv("forms.csv")

In [38]:
scraped_data.head()

Unnamed: 0,FormNumber,Description,OMB,Background,RespondentPanel,Frequency,PublicRelease
0,FFIEC 001,The Board of Governors of the Federal Reserve ...,,,,,


## Scraping web pages that require username/password

If you need to scrape data from a web page that requires a username/password login - for instance, a forum.
You can use mechanize and cookielib Python libraries.

For instance, following forum requires registration: http://www.mothering.com/forum/443-i-m-not-vaccinating

<img src="images/mothering0.png">

In [1]:
import mechanize
import cookielib

def doLogin():
    # Browser
    br = mechanize.Browser()

    # Cookie Jar
    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    # Browser options
    br.set_handle_equiv(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)
    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

    br.addheaders = [('User-agent', 'Chrome')]

    # The site we will navigate into, handling its session
    br.open('http://www.mothering.com/forum/login.php?do=login')

    # View available forms
    for f in br.forms():
        print f

    # Select the second (index one) form (the first form is a search query box)
    br.select_form(nr=1)

    # User credentials
    br.form['vb_login_username'] = 'giant_cat'
    br.form['vb_login_password'] = 'TestPassword33!'
    # Login
    br.submit()
    return br

In [5]:
from bs4 import BeautifulSoup
import re

login = doLogin()
main_url = 'http://www.mothering.com/forum/17507-vaccinating-schedule' 
#'http://www.mothering.com/forum/443-i-m-not-vaccinating'
page = login.open(main_url+'/index5.html').read()
soup = BeautifulSoup(page)
links = soup.findAll(href = re.compile(main_url+"/\d+(-.*)+.html$"))

<GET http://www.mothering.com/forum/gtsearch.php application/x-www-form-urlencoded
  <HiddenControl(cx=partner-pub-7865546952023728:8370985649) (readonly)>
  <HiddenControl(cof=FORID:11) (readonly)>
  <HiddenControl(ie=UTF-8) (readonly)>
  <TextControl(q=)>
  <SubmitControl(sa=Search) (readonly)>>
<POST http://www.mothering.com/forum/login.php?do=login application/x-www-form-urlencoded
  <TextControl(vb_login_username=User Name)>
  <PasswordControl(vb_login_password=)>
  <SubmitControl(<None>=Log in) (readonly)>
  <CheckboxControl(cookieuser=[1])>
  <HiddenControl(s=) (readonly)>
  <HiddenControl(securitytoken=guest) (readonly)>
  <HiddenControl(do=login) (readonly)>
  <HiddenControl(vb_login_md5password=) (readonly)>
  <HiddenControl(vb_login_md5password_utf=) (readonly)>>
<POST http://www.mothering.com/forum/profile.php?do=dismissnotice application/x-www-form-urlencoded
  <HiddenControl(do=dismissnotice) (readonly)>
  <HiddenControl(securitytoken=guest) (readonly)>
  <HiddenControl(d

In [6]:
print links

[<a class="thread_title_link" href="http://www.mothering.com/forum/17507-vaccinating-schedule/1566201-vaccination-forum-guidelines.html" id="thread_title_1566201" itemprop="headline">Vaccination Forum Guidelines</a>, <a class="thread_title_link" href="http://www.mothering.com/forum/17507-vaccinating-schedule/1551946-guidelines-referencing-articles-studies-another-site.html" id="thread_title_1551946" itemprop="headline">Guidelines For Referencing Articles Or Studies From Another Site</a>, <a class="thread_title_link" href="http://www.mothering.com/forum/17507-vaccinating-schedule/1402834-vaccinating-schedule.html" id="thread_title_1402834" itemprop="headline">Vaccinating on Schedule</a>, <a class="thread_title_link" href="http://www.mothering.com/forum/17507-vaccinating-schedule/1501602-baby-gets-vaccines-friday-wish-there-more-vaccines.html" id="thread_title_1501602" itemprop="headline">Baby gets Vaccines on Friday, Wish there was more vaccines</a>, <a class="thread_title_link" href="h