In [61]:
## all imports
from IPython.display import HTML
import numpy as np
import urllib
import bs4 #this is beautiful soup
import time
import operator
import socket
import re # regular expressions
from urllib.request import urlopen

from pandas import Series
import pandas as pd
from pandas import DataFrame

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set_context("talk")
sns.set_style("white")

from secret import *

API registrations
=================

If you would like to run all the examples in this notebook, you need to register for the following APIs:

* Twitter

https://apps.twitter.com/app/new

* Twitter instructions

https://twittercommunity.com/t/how-to-get-my-api-key/7033

Todays lecture:
===============

* all about data scraping
* ***What is it? ***
* How to do it:
    - from a website
    - with an API

Answer: Data scraping is about obtaining data from webpages. There is low level scraping where you parse the data out of the html code of the webpage. There also is scraping over APIs from websites who try to make your life a bit easier.

IPython Notebooks:
===================

![IPython](images/ipython.png "IPython")

General advice about programming
==================================

* You will find nearly everything on google
* Try: length of a list in python
* A programmer is someone who can turn stack overflow snippets into running code
* Use tab completion
* Make your variable names meaningful


Python data scraping
====================

* Why scrape the web?
    - vast source of information
    - automate tasks
    - keep up with sites
    - fun!

** Can you think of examples ? **

Python data scraping
====================

* copyrights and permission:
    - be careful and polite
    - give credit
    - care about media law
    - don't be evil (no spam, overloading sites, etc.)

Robots.txt
==========

![Robots.txt](images/robots_txt.jpg "Robots.txt")

Robots.txt
==========

* specified by web site owner
* gives instructions to web robots (aka your script)
* is located at the top-level directory of the web server

http://www.example.com/robots.txt

If you want you can also have a look at

http://google.com/robots.txt

Robots.txt
==========

*** What does this one do? ***

Answer: This file allows google to search through everything on the server, while all others should stay completely away.

Things to consider:
-------------------

* can be just ignored
* can be a security risk - *** Why? ***

Answer: You are basically telling everybody who cares to look into the file where you have stored sensitive information.

Scraping with Python:
=====================

* scraping is all about HTML tags
* bad news: 
    - need to learn about tags
    - websites can be ugly

HTML
=====

* HyperText Markup Language

* standard for creating webpages

* HTML tags 
    - have angle brackets
    - typically come in pairs

This is an example for a minimal webpage defined in HTML tags. The root tag is `<html>` and then you have the `<head>` tag. This part of the page typically includes the title of the page and might also have other meta information like the author or keywords that are important for search engines. The `<body>` tag marks the actual content of the page. You can play around with the `<h2>` tag trying different header levels. They range from 1 to 6. 

In [64]:
htmlString = """<!DOCTYPE html>
<html>
  <head>
    <title>This is a title</title>
  </head>
  <body>
    <h2> Test </h2>
    <p>Hello world!</p>
  </body>
</html>"""

htmlOutput = HTML(htmlString)
htmlOutput

Useful Tags
===========

* heading
`<h1></h1> ... <h6></h6>`

* paragraph
`<p></p>` 

* line break
`<br>` 

* link with attribute

`<a href="http://www.example.com/">An example link</a>`


Scraping with Python:
=====================

* example of a beautifully simple webpage:

http://www.crummy.com/software/BeautifulSoup

Scraping with Python:
=====================

* good news: 
    - some browsers help
    - look for: inspect element
    - need only basic html
    
** Try 'Ctrl-Shift I' in Chrome **

** Try 'Command-Option I' in Safari **


Scraping with Python
==================

* different useful libraries:
    - urllib
    - beautifulsoup
    - pattern
    - soupy
    - LXML
    - Selenium
    - ...
    

The following cell just defines a url as a string and then reads the data from that url using the `urllib` library. If you uncomment the print command you see that we got the whole HTML content of the page into the string variable source.

In [65]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

url = 'https://www.crummy.com/software/BeautifulSoup'
source = urlopen(url).read()
print(source)

b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"\n"http://www.w3.org/TR/REC-html40/transitional.dtd">\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<title>Beautiful Soup: We called him Tortoise because he taught us.</title>\n<link rev="made" href="mailto:leonardr@segfault.org">\n<link rel="stylesheet" type="text/css" href="/nb/themes/Default/nb.css">\n<meta name="Description" content="Beautiful Soup: a library designed for screen-scraping HTML and XML.">\n<meta name="generator" content="Markov Approximation 1.4 (module: leonardr)">\n<meta name="author" content="Leonard Richardson">\n</head>\n<body bgcolor="white" text="black" link="blue" vlink="660066" alink="red">\n<img align="right" src="10.1.jpg" width="250"><br />\n\n<p>You didn\'t write that awful page. You\'re just trying to get some\ndata out of it. Beautiful Soup is here to help. Since 2004, it\'s been\nsaving programmers hours or days of work on quick-turnaround\nscreen sc

In [67]:
type(source)

bytes

In [68]:
source = str(source)

In [69]:
source

'b\'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"\\n"http://www.w3.org/TR/REC-html40/transitional.dtd">\\n<html>\\n<head>\\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\\n<title>Beautiful Soup: We called him Tortoise because he taught us.</title>\\n<link rev="made" href="mailto:leonardr@segfault.org">\\n<link rel="stylesheet" type="text/css" href="/nb/themes/Default/nb.css">\\n<meta name="Description" content="Beautiful Soup: a library designed for screen-scraping HTML and XML.">\\n<meta name="generator" content="Markov Approximation 1.4 (module: leonardr)">\\n<meta name="author" content="Leonard Richardson">\\n</head>\\n<body bgcolor="white" text="black" link="blue" vlink="660066" alink="red">\\n<img align="right" src="10.1.jpg" width="250"><br />\\n\\n<p>You didn\\\'t write that awful page. You\\\'re just trying to get some\\ndata out of it. Beautiful Soup is here to help. Since 2004, it\\\'s been\\nsaving programmers hours or days of work on qu

Quiz :
======

* Is the word 'Alice' mentioned on the beautiful soup homepage?
* How often does the word 'Soup' occur on the site?
    - hint: use `.count()`
* At what index occurs the substring 'alien video games' ?
    - hint: use `.find()`

In [70]:
## is 'Alice' in source?
print ('Alice' in source)

## count occurences of 'Soup'
print (source.count('Soup'))

## find index of 'alien video games'
position =  source.find('alien video games')
print (position)

## quick test to see the substring in the source variable
## you can access strings like lists
print (source[position:position + 20])

## or the tidier version:
print (source[position:position + len('alien video games')])

False
45
-1




Beautiful Soup
==============

* designed to make your life easier
* many good functions for parsing html code

Some examples
=============


In [38]:
## get bs4 object
soup = bs4.BeautifulSoup(source)
 
## compare the two print statements
#print (soup)
#print (soup.prettify())

## show how to find all a tags
soup.findAll('a')

## ***Why does this not work? ***
#soup.findAll('Soup')

[<a href="bs4/download/"><h1>Beautiful Soup</h1></a>,
 <a href="#Download">Download</a>,
 <a>Documentation</a>,
 <a href="#HallOfFame">Hall of Fame</a>,
 <a href="https://code.launchpad.net/beautifulsoup">Source</a>,
 <a href="https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG">Changelog</a>,
 <a href="https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup">Discussion group</a>,
 <a href="zine/">Zine</a>,
 <a href="https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&amp;utm_medium=referral&amp;utm_campaign=website">Tidelift subscription</a>,
 <a href="zine/"><i>Tool Safety</i></a>,
 <a>the discussion\ngroup</a>,
 <a href="https://bugs.launchpad.net/beautifulsoup/">file it</a>,
 <a>lxml</a>,
 <a>html5lib</a>,
 <a href="bs4/doc/">Read more.</a>,
 <a name="Download"><h2>Download Beautiful Soup</h2></a>,
 <a href="bs4/download/">Beautiful Soup\n4.8.0</a>,
 <a>Here\'s\nthe Beautiful Soup 3 documentation.</a>,
 <a>3.

The last command only returns an empty list, because `Soup` is not an HTML tag. It is just a string that occours in the webpage.

Some examples
=============

In [71]:
## get attribute value from an element:
## find tag: this only returns the first occurrence, not all tags in the string
first_tag = soup.find('a')

## get attribute `href`
first_tag.get('href')

## get all links in the page
link_list = [l.get('href') for l in soup.findAll('a')]
link_list

['bs4/download/',
 '#Download',
 None,
 '#HallOfFame',
 'https://code.launchpad.net/beautifulsoup',
 'https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG',
 'https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup',
 'zine/',
 'https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=website',
 'zine/',
 None,
 'https://bugs.launchpad.net/beautifulsoup/',
 None,
 None,
 'bs4/doc/',
 None,
 'bs4/download/',
 None,
 None,
 None,
 'http://www.nytimes.com/2007/10/25/arts/design/25vide.html',
 None,
 'http://www.harrowell.org.uk/viktormap.html',
 None,
 'http://www2.ljworld.com/',
 None,
 'http://esrl.noaa.gov/gsd/fab/',
 None,
 None,
 None,
 None,
 'https://bugs.launchpad.net/beautifulsoup/',
 '/source/software/BeautifulSoup/index.bhtml',
 '/self/',
 '/self/contact.html',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://www.crum

In [76]:
## filter all external links
# create an empty list to collect the valid links
external_links = []

# write a loop to filter the links
# if it starts with 'http' we are happy
for l in link_list:
    if l is not None and l[:4] == 'http':
        external_links.append(l)

# this throws an error! It says something about 'NoneType'
print(external_links)

['https://code.launchpad.net/beautifulsoup', 'https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG', 'https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup', 'https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=website', 'https://bugs.launchpad.net/beautifulsoup/', 'http://www.nytimes.com/2007/10/25/arts/design/25vide.html', 'http://www.harrowell.org.uk/viktormap.html', 'http://www2.ljworld.com/', 'http://esrl.noaa.gov/gsd/fab/', 'https://bugs.launchpad.net/beautifulsoup/', 'http://creativecommons.org/licenses/by-sa/2.0/', 'http://creativecommons.org/licenses/by-sa/2.0/', 'http://www.crummy.com/', 'http://www.crummy.com/software/', 'http://www.crummy.com/software/BeautifulSoup/']


In [75]:
# lets investigate. Have a close look at the link_list:
link_list

# Seems that there are None elements!
# Let's verify
#print sum([l is None for l in link_list])

# So there are two elements in the list that are None!

['bs4/download/',
 '#Download',
 None,
 '#HallOfFame',
 'https://code.launchpad.net/beautifulsoup',
 'https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG',
 'https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup',
 'zine/',
 'https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=website',
 'zine/',
 None,
 'https://bugs.launchpad.net/beautifulsoup/',
 None,
 None,
 'bs4/doc/',
 None,
 'bs4/download/',
 None,
 None,
 None,
 'http://www.nytimes.com/2007/10/25/arts/design/25vide.html',
 None,
 'http://www.harrowell.org.uk/viktormap.html',
 None,
 'http://www2.ljworld.com/',
 None,
 'http://esrl.noaa.gov/gsd/fab/',
 None,
 None,
 None,
 None,
 'https://bugs.launchpad.net/beautifulsoup/',
 '/source/software/BeautifulSoup/index.bhtml',
 '/self/',
 '/self/contact.html',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://www.crum

In [42]:
# Let's filter those objects out in the for loop
external_links = []

# write a loop to filter the links
# if it is not None and starts with 'http' we are happy
for l in link_list:
    if l is not None and l[:4] == 'http':
        external_links.append(l)
        
external_links

['https://code.launchpad.net/beautifulsoup',
 'https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG',
 'https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup',
 'https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=website',
 'https://bugs.launchpad.net/beautifulsoup/',
 'http://www.nytimes.com/2007/10/25/arts/design/25vide.html',
 'http://www.harrowell.org.uk/viktormap.html',
 'http://www2.ljworld.com/',
 'http://esrl.noaa.gov/gsd/fab/',
 'https://bugs.launchpad.net/beautifulsoup/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://www.crummy.com/',
 'http://www.crummy.com/software/',
 'http://www.crummy.com/software/BeautifulSoup/']

Note: The above `if` condition works because of lazy evaluation in Python. The `and` statement becomes `False` if the first part is `False`, so there is no need to ever evaluate the second part. Thus a `None` entry in the list gets never asked about its first four characters. 

In [43]:
# and we can put this in a list comprehension as well, it almost reads like 
# a sentence.

[l for l in link_list if l is not None and l.startswith('http')]

['https://code.launchpad.net/beautifulsoup',
 'https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG',
 'https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup',
 'https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=website',
 'https://bugs.launchpad.net/beautifulsoup/',
 'http://www.nytimes.com/2007/10/25/arts/design/25vide.html',
 'http://www.harrowell.org.uk/viktormap.html',
 'http://www2.ljworld.com/',
 'http://esrl.noaa.gov/gsd/fab/',
 'https://bugs.launchpad.net/beautifulsoup/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://creativecommons.org/licenses/by-sa/2.0/',
 'http://www.crummy.com/',
 'http://www.crummy.com/software/',
 'http://www.crummy.com/software/BeautifulSoup/']

Parsing the Tree
================



In [44]:
# redifining `s` without any line breaks
s = """<!DOCTYPE html><html><head><title>This is a title</title></head><body><h3> Test </h3><p>Hello world!</p></body></html>"""
## get bs4 object
tree = bs4.BeautifulSoup(s)

## get html root node
root_node = tree.html

## get head from root using contents
head = root_node.contents[0]

## get body from root
body = root_node.contents[1]

## could directly access body
tree.body

<body><h3> Test </h3><p>Hello world!</p></body>

In [78]:
tree.html.head.title.text

'This is a title'

Quiz:
=====

* Find the `h3` tag by parsing the tree starting at `body`
* Create a list of all __Hall of Fame__ entries listed on the Beautiful Soup webpage
    - hint: it is the only unordered list in the page (tag `ul`)


In [79]:
## get h3 tag from body
soup.find('h3')

<h3>Beautiful Soup 3</h3>

In [82]:
## use ul as entry point
soup.find_all('ul')
## get hall of fame list from entry point
[li.text for li in soup.find('ul').find_all('li')]


['"Movable\\n Type", a work of digital art on display in the lobby of the New\\n York Times building, uses Beautiful Soup to scrape news feeds.\\n\\n',
 "Reddit uses Beautiful Soup to parse\\na page that\\'s been linked to and find a representative image.\\n\\n",
 'Alexander Harrowell uses Beautiful Soup to track the business\\n activities of an arms merchant.\\n\\n',
 'The developers of Python itself used Beautiful Soup to migrate the Python\\nbug tracker from Sourceforge to Roundup.\\n\\n',
 'The Lawrence Journal-World\\nuses Beautiful Soup to gather\\nstatewide election results.\\n\\n',
 'The NOAA\\\'s Forecast\\nApplications Branch uses Beautiful Soup in TopoGrabber, a script for\\ndownloading "high resolution USGS datasets."\\n\\n']

In [99]:
soup.ul

<ul>\n\n<li><a href="http://www.nytimes.com/2007/10/25/arts/design/25vide.html">"Movable\n Type"</a>, a work of digital art on display in the lobby of the New\n York Times building, uses Beautiful Soup to scrape news feeds.\n\n</li><li>Reddit uses Beautiful Soup to <a>parse\na page that\'s been linked to and find a representative image</a>.\n\n</li><li>Alexander Harrowell uses Beautiful Soup to <a href="http://www.harrowell.org.uk/viktormap.html">track the business\n activities</a> of an arms merchant.\n\n</li><li>The developers of Python itself used Beautiful Soup to <a>migrate the Python\nbug tracker from Sourceforge to Roundup</a>.\n\n</li><li>The <a href="http://www2.ljworld.com/">Lawrence Journal-World</a>\nuses Beautiful Soup to <a>gather\nstatewide election results</a>.\n\n</li><li>The <a href="http://esrl.noaa.gov/gsd/fab/">NOAA\'s Forecast\nApplications Branch</a> uses Beautiful Soup in <a>TopoGrabber</a>, a script for\ndownloading "high resolution USGS datasets."\n\n</li></ul

Advanced Example
===============


Scraping data science skills
=============================

- What skills are in demand for data scientists?
- Should we have a lecture on Spark or only on MapReduce?

We want to scrape the information from job advertisements for data scientists from indeed.com
Let's scrape and find out!

In [48]:
# Fixed url for job postings containing data scientist
url = 'https://www.indeed.com/jobs?q=data+scientist&l=San+Francisco+Bay+Area%2C+CA'
# read the website
source = urlopen(url).read()
# parse html code
bs_tree = bs4.BeautifulSoup(source)

In [96]:
job_count_string = bs_tree.find(id = 'searchCountPages')
job_count_string=job_count_string.text
float(job_count_string.split()[3].replace(',',''))

2682.0

In [84]:
# see how many job postings we found
job_count_string = bs_tree.find(id = 'searchCountPages').contents[0]
job_count_string = job_count_string.split()[-2]
print("Search yielded %s hits." % (job_count_string))

# not that job_count so far is still a string, 
# not an integer, and the , separator prevents 
# us from just casting it to int

job_count_digits = [int(d) for d in job_count_string if d.isdigit()]
job_count = np.sum([digit*(10**exponent) for digit, exponent in 
                    zip(job_count_digits[::-1], range(len(job_count_digits)))])

print (job_count)

Search yielded 2,682 hits.
2682


In [50]:
job_count_string = bs_tree.find(id = 'searchCount').text
job_count_string.split()[3]

'2,682'

In [97]:
# The website is only listing 10 results per page, 
# so we need to scrape them page after page
num_pages = int(np.ceil(job_count/10.0))

base_url = 'http://www.indeed.com'
job_links = []
for i in range(1): #do range(num_pages) if you want them all
    if i%10==0:
        print (num_pages-i)
    url = 'https://www.indeed.com/jobs?q=data+scientist&l=San+Francisco+Bay+Area%2C+CA&start=' + str(i*10)
    print ("opening {}".format(url))
    html_page = urlopen(url).read() 
    bs_tree = bs4.BeautifulSoup(html_page)
    job_link_area = bs_tree.find(id = 'resultsCol')
    job_postings = job_link_area.findAll("div")
    

print ("We found a lot of jobs: ", len(job_links))

269
opening https://www.indeed.com/jobs?q=data+scientist&l=San+Francisco+Bay+Area%2C+CA&start=0
We found a lot of jobs:  0


Another Example
================
https://github.com/kjam/python-web-scraping-tutorial




Getting Data with an API
=========================

* API: application programming interface
* some sites try to make your life easier
* Twitter, New York Times, ImDB, rotten Tomatoes, Yelp, ...

API keys
=========

* required for data access
* identifies application (you)
* monitors usage
* limits rates

JSON
======

* JavaScript Object Notation
* human readable
* transmit attribute-value pairs

In [53]:
a = {'a': 1, 'b':2}
s = json.dumps(a)
a2 = json.loads(s)

## a is a dictionary
print (a)
## vs s is a string containing a in JSON encoding
print (s)
## reading back the keys are now in unicode
print (a2)

{'a': 1, 'b': 2}
{"a": 1, "b": 2}
{'a': 1, 'b': 2}


Twitter Example:
================

* API a bit more complicated
* libraries make life easier
* python-twitter

https://github.com/bear/python-twitter

What we are going to do is scrape Joe's twitter account, and then filter it for the interesting tweets. Defining interesting as tweets that have be re-tweeted at least 10 times. 


In [56]:
!pip install python-twitter

[33mYou are using pip version 19.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [57]:
import twitter

## define the necessary keys
cKey = 
cSecret = 
aKey = 
aSecret = 

## create the api object with the twitter-python library
api = twitter.Api(consumer_key=cKey, consumer_secret=cSecret, access_token_key=aKey, access_token_secret=aSecret)


SyntaxError: invalid syntax (<ipython-input-57-f82de1d1ee62>, line 4)

In [58]:
## get the user timeline with screen_name = 'stat110'
twitter_statuses = api.GetUserTimeline(screen_name = 'gutyril')

## create a data frame
## first get a list of panda Series or dict
pdSeriesList = [pd.Series(t.AsDict()) for t in twitter_statuses]

## then create the data frame
data = pd.DataFrame(pdSeriesList)

data.head(10)

NameError: name 'api' is not defined

In [59]:
## filter tweets with enough retweet_count
maybe_interesting = data[data.retweet_count>20]

## get the text of these tweets
tweet_text = maybe_interesting.text

## print them out
text = tweet_text.values

for t in text:
    print ('######')
    print (t)

NameError: name 'data' is not defined

In [100]:
import requests

In [101]:
response = requests.get('http://sitl.diputados.gob.mx/LXIV_leg/curricula.php?dipt=484')

In [102]:
response.status_code

200

In [105]:
str(response.content)

'b\'\\n\\n\\n \\t\\n<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\\n<html>\\n<head>\\n<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">\\n<script src="./javascripts/anim_imag.js" type="text/javascript" language="JavaScript"></script>\\n<link rel="stylesheet" type="text/css" href="./include/styles_diputados.css" />\\n<link rel="stylesheet" type="text/css" href="./include/formatonvo.css" />\\n<title>Curricula LXIV</title>\\n<style>\\n.fondoimg {\\nbackground:url(images/fondo_fot_5.png) no-repeat center fixed;\\n-webkit-background-size: cover;\\n-moz-background-size: cover;\\n-o-background-size: cover;\\nbackground-size: cover;\\n}\\n.fondoimg2 {\\nbackground-image: url(images/fondo_fot_7.png);\\n  -webkit-background-size: 100% 100%;           /* Safari 3.0 */\\n     -moz-background-size: 100% 100%;           /* Gecko 1.9.2 (Firefox 3.6) */\\n       -o-background-size: 100% 100%;           /* Opera 9.5 */\\n  