# Introduction to Python and Morph.io (for scraping)

This tutorial is intended to build on some basic coding concepts and introduce Morph.io. By the end you should:

* Be able to use GitHub to edit Python files
* Use Morph.io to run Python code hosted on GitHub

Let's get started.


## Get started with Morph.io

1. [create an account on GitHub](https://github.com/) if you haven't got one already, and [sign in to Morph.io using your GitHub account](https://morph.io/users/auth/github)
2. [Click **New scraper**](https://morph.io/scrapers/new) on the menu at the top of Morph.io. You will be taken to a new page asking you to specify more details
3. On the dropdown menu for *Language*, select **Python**. Give your scraper a name in the next box - something like 'startingtocode' (no spaces), and in the final box write 'none' - this isn't a scraper yet: we're just using Morph.io as a place to learn code.
4. Click **Create scraper**

It will take Morph.io a few moments to create the files for your scraper (the files are being created on GitHub). 

When it has finished, you will be taken to a new page for the scraper. Look on the right where it says *Scraper code*. There should be a link to `startingtocode / scraper.py ` - this will take you to the pages on GitHub where the code is now hosted: `scraper.py` is the file for the code itself; `startingtocode` is the link to the repository containing that file.

Open the link to `scraper.py` in a separate tab or window, but also keep your Morph.io page for this scraper open in another tab or window - you will need to edit the code on GitHub, and run it to see the results on Morph.io.

Now we're ready to start.

## Introducing the template code

When you create a new scraper on Morph.io, it creates it with some template code as shown below. 

Each line begins with a hash symbol: `#`. There are two ways that these are most commonly used:

* Firstly, as a way of creating **comments** in Python code: any code starting with a `#` does not do anything, so the hash symbol allows you to add comments which are not treated as working code.
* Secondly, as a way of *disabling* code - what's called *commenting out* code. Rather than delete an entire line of code, it is easier to add a `#` at the front to turn it 'off' to test what happens, so you can always turn it back 'on' again quickly by removing the `#`.

In the template code below generated by Morph.io, the *entire* code is commented out. The idea is that you can **uncomment** the sections you want to use in your own code, saving you time writing scraping code from scratch. We'll come back to this later.

In [None]:
# This is a template for a Python scraper on morph.io (https://morph.io)
# including some code snippets below that you should find helpful

# import scraperwiki
# import lxml.html
#
# # Read in a page
# html = scraperwiki.scrape("http://foo.com")
#
# # Find something on the page using css selectors
# root = lxml.html.fromstring(html)
# root.cssselect("div[align='left']")
#
# # Write out to the sqlite database using scraperwiki library
# scraperwiki.sqlite.save(unique_keys=['name'], data={"name": "susan", "occupation": "software developer"})
#
# # An arbitrary query against the database
# scraperwiki.sql.select("* from data where 'name'='peter'")

# You don't have to do things with the ScraperWiki and lxml libraries.
# You can use whatever libraries you want: https://morph.io/documentation/python
# All that matters is that your final data is written to an SQLite database
# called "data.sqlite" in the current working directory which has at least a table
# called "data".

## Libraries in Morph.io

Make sure you are on this file in GitHub, and click the edit button to make some changes.

Uncomment the two lines that start with `import` so the code looks like below.

These two lines bring in two **libraries** to Morph.io: 

* Scraperwiki is a library which has useful functions for scraping webpages and storing the results in a database
* lxml.html is a library which is useful for *parsing* HTML webpages - i.e. drilling down to particular pieces of information you want.

In [1]:
# This is a template for a Python scraper on morph.io (https://morph.io)
# including some code snippets below that you should find helpful

import scraperwiki
import lxml.html
#
# # Read in a page
# html = scraperwiki.scrape("http://foo.com")
#
# # Find something on the page using css selectors
# root = lxml.html.fromstring(html)
# root.cssselect("div[align='left']")
#
# # Write out to the sqlite database using scraperwiki library
# scraperwiki.sqlite.save(unique_keys=['name'], data={"name": "susan", "occupation": "software developer"})
#
# # An arbitrary query against the database
# scraperwiki.sql.select("* from data where 'name'='peter'")

# You don't have to do things with the ScraperWiki and lxml libraries.
# You can use whatever libraries you want: https://morph.io/documentation/python
# All that matters is that your final data is written to an SQLite database
# called "data.sqlite" in the current working directory which has at least a table
# called "data".

Next, uncomment the line `html = scraperwiki.scrape("http://foo.com")`. 

This line is looking at a URL - foo.com - so it's worth checking that site in another window to see what's there.

It's in parentheses, which means it's being used as an ingredient in a function - `scrape()`. Specifically, `scraperwiki.scrape()`, which means it's part of the **scraperwiki library**. 

When using a library it's always useful to check the **documentation** for that library - [here's the documentation for Scraperwiki](https://classic.scraperwiki.com/docs/python/), or at least it's 'Classic' version which was used by Morph.io. There's a link to [where the documentation is now hosted, on GitHub](https://github.com/scraperwiki/code-scraper-in-browser-tool/wiki)

The `scrape` function grabs the contents of the given URL and stores it in the new variable `html`:

In [3]:
# This is a template for a Python scraper on morph.io (https://morph.io)
# including some code snippets below that you should find helpful

import scraperwiki
import lxml.html
#
# # Read in a page
html = scraperwiki.scrape("http://foo.com")
#
# # Find something on the page using css selectors
# root = lxml.html.fromstring(html)
# root.cssselect("div[align='left']")
#
# # Write out to the sqlite database using scraperwiki library
# scraperwiki.sqlite.save(unique_keys=['name'], data={"name": "susan", "occupation": "software developer"})
#
# # An arbitrary query against the database
# scraperwiki.sql.select("* from data where 'name'='peter'")

# You don't have to do things with the ScraperWiki and lxml libraries.
# You can use whatever libraries you want: https://morph.io/documentation/python
# All that matters is that your final data is written to an SQLite database
# called "data.sqlite" in the current working directory which has at least a table
# called "data".

Now **commit** your changes (GitHub's version of saving), and switch back to the scraper in Morph.io. Run the scraper.

The next section of code *converts* the `html` variable into another new variable called `root`, and drills down further into that using something called `cssselect`, which uses **css selectors** to grab very specific pieces of information from the page. We'll talk about this in class but search around for more about those selectors and think how they could be used in scraping.

In [11]:
# This is a template for a Python scraper on morph.io (https://morph.io)
# including some code snippets below that you should find helpful

import scraperwiki
import lxml.html
#
# # Read in a page
html = scraperwiki.scrape("http://foo.com")
#
# # Find something on the page using css selectors
root = lxml.html.fromstring(html)
root.cssselect("div[align='left']")
#
# # Write out to the sqlite database using scraperwiki library
# scraperwiki.sqlite.save(unique_keys=['name'], data={"name": "susan", "occupation": "software developer"})
#
# # An arbitrary query against the database
# scraperwiki.sql.select("* from data where 'name'='peter'")

# You don't have to do things with the ScraperWiki and lxml libraries.
# You can use whatever libraries you want: https://morph.io/documentation/python
# All that matters is that your final data is written to an SQLite database
# called "data.sqlite" in the current working directory which has at least a table
# called "data".