In [None]:
%xmode plain

# Anatomy of an error message

Error messages in python look obscure and unhelpful, but are a font of information about what is wrong with your code.

Lets see an example:

Write a function to output an arbitrary column of a file

In [None]:
%%bash
# have a look whats in the file. 


In [None]:
def read_col(filename, column_number):
    ''' A function that reads one column from a file and returns it
    as a list'''
    
    # First open the file 
    
    # Then loop over each line, split it into columns
    # and add the columns we want to a list          
        
    # Finally return this list

# Run our function

The resulting error message contains several parts:

![](error1.png)

A **``SyntaxError``** - special type of error: what you've written is not valid python

There are many of types of error, here are some common ones:

* **`NameError`**: You've used a name that doesn't exist
* **`TypeError`**: You are trying to do something with a variable type that can't do that
* **`ValueError`**: The values you are trying to use don't make sense for that operation.
* **`IndexError`**/**`KeyError`** : are are trying to access a member of a list or dictionary that doesn't exist

In [None]:
def read_col(filename, column_number):
    ''' A function that reads one column from a file and returns it
    as a list'''
    
    # First open the file
    fh = open(filename)
    
    datalist = []
    
    # Then loop over each line, split it into columns
    # and add the columns we want to a list
    for line in fh.readlines():    
        columns = line.split()
        datalist.append(columns[column_number])
        
    # Finally return this list
    return datalist

# Run our function
read_col("my_file", "3"

![](error2.png)

* We can see that this error is a <font color="green">**TypeError**</font>.

* This means that we've done something with the wrong type.

* The Location part here is longer. It is called the **"traceback"**. <br> It shows us not just what the error is, but how we got there.

![](error3.png)

This error is cause by the line

    datalist.append(columns[column_number])
    
Which is **`line 14`**, in the file **`<ipython-input-78b9696184f0>`**.

It also tells us this is in the function **`read_col`**.



The traceback tells us how we ended up here:

![](error4.PNG)

It tells us that we are in `read_col` becase `read_col` was called

on `line 20` of the file `<ipython-input-78b9696184f0>` 

with the parameters `("my_file", "3")`

In [None]:
def read_col(filename, column_number):
    ''' A function that reads one column from a file and returns it
    as a list'''
    
    # First open the file
    fh = open(filename)
    
    datalist = []
    
    # Then loop over each line, split it into columns
    # and add the columns we want to a list
    for line in fh.readlines():    
        columns = line.split()
        datalist.append(columns[column_number])
        
    # Finally return this list
    return datalist

# Run our function
read_col("my_file", "3")

The error was caused by trying to ask for column `"3"` of the file when `"3"` is a string. 
But if you are getting an entry from a list, you need to use an integer, not a string. So the correct way to get the third column is to ask for `2` not column `"2"`, which would give:

    read_col("my_file", 2)
    
rather than 

    read_col("my_file", "2")

Of course, we actaully wanted the second column, not the third, so the correct call is

    read_col("my_file", 1)
    

# Programatic Web Access

If you know a _little_ about how html and the web works can get a long way in automating data retrieval.


What happens when I type an address in a web-browser?

![](browser_request.PNG)

A web address is a URL: A Uniform Resorce Locatator. 

A request to send the file `index.html` using the `http` protocol is sent through the internet to the computer `hactar.shef.ac.uk`. 

`hactar` is running a program called a "web server" that is always listerning for such requests.

On `hactar` there will be a folder somthing like `/usr/apache/files/` with a file `index.html` that it will send back to you.

What is in that file?

We can see by using "view source" in browser.

Or can download that file and take a look:

In [None]:
%%bash


The file starts with

    <h1>Hello this is 
    the index file</h1>
    
The things in `<>` are called **tags**. 

`<h1>` is the heading tag. 

Every tag has a matching tag with a `/` to end it. 

The things inbetween `<h1>` and `</h1>` are a heading (i.e. big, bold etc)

The result looks like:

<h1>Hello this is the index file</h1>

The next bit is a link that looks like:

      You can get more info from
      <a href="part1.html">Part one of info</a>
      
Here we have an `<a ....>` tag. This means the next bit is a link.

With in the `<a....>` bit there is `href="part1.html`. This says "fetch `part1.html` when this link is selcted.

Then we have the text we would like to appear in the browser. 

Finally, we end the tag with `</a>`

The result look like:

You can get more info at <a href="https://hactar.shef.ac.uk">Part one of info</a>

## Excercise

* Edit a file to look like `index.html` on iceberg using `nedit`.
* On iceberg start firefox with:

      $ firefox &
  
  Firefox will now be running on iceberg, but showing its window on your desktop.
* Go to File > Open and open the file.
  The URL bar will say `file://index.html`

Difference:
 * Local file actaully store on your disk (loaded using the `file` protocol)
 * Severed HTML file stored on a web sever (transferred to you with the `http` protocol,

## Dynamic Web

Sometimes the file has to be created before it can be sent to you. 

For example:

Weather forcast = needs to generate a forecase with the weather right now for where you are.


![](dynamic_request2.PNG)

CGI = is a mechanism for accepting extra information (arguments/parameters)

Server runs a script (might be written in python) with parameters provided after the `?`

Script creates an html file, which is sent back to the requestor using `http`

There are many other ways to pass information to web sever other than CGI


### Exercise

In the exercises for todays lecture you will be asked to write a script that fetches a `PDB` file.

(`pbd` files are files with 3D coordinates for atoms in protein structures. Can be opened in pymol/rasmol etc)

The server `http://oca.weizmann.ac.il/oca-bin/send-pdb` will send this.

Takes an `id` parameter which is the id of the protein you want.

eg:

In [None]:
%%bash


### Program to access arbitrary page

In [None]:
#! /usr/bin/env

# import modules

# set requested url

# open url

# read data from url

# output each line


Your program should be able to be run something like:

    get_pdb.py 1lys > 1lys.pdb
    
You will have to work out how to take the command line argument and use it to build a URL that will fetch the PDB file. 

### Exercise: Getting the temperature

Go to the page `143.167.65.160`

Is a temperature sensor in the departments NMR labs.

One of the NMR linux servers gets the temperature from this every minute and records it.

**Exercise**: Write a script that reports the temperature in the NMR lab.

## Web-scraping vs web-services

*Web-scraping*: informal retrival of data by piecing together URLs and sending requests.
* Usually returns `html` designed to be read in browser rather than viewed
* Often discouraged so as not to overload servers.
* This is how DDOS attacks work

*Web-services*: Websites designed to be accessed by a computer this way.
* Many academic websites *encourage* access in this manner
* The URLs you need and the options are often documented somewhere.
* Please follow the rules if they exist. e.g. how many requests you can send in an hour.

E.g. 

**NCBI e-utilities**

*esearch*: do keyword search. Returns list of NCBI record IDs that contain keyword<br>
*efetch*: fetch computer readable version of record using an ID. 

# XML

HTML is "HyperText Markup Lanuage"<br>
*HyperText* Becuase there are links
*Markup* because there are tags (things in <>) that "markup" the text with meaning.

In HTML the "markup" tells the browser how to display the text.

XML is "eXtensible Markup Lanuage"

It also uses tags, but here the tags say what the information means, not how to display it.

Many web services will allow you choose to have data returned in `XML`.

Example:

There are packages designed to "parse" XML files - turn them automatically in things that look a bit like lists/dictionaries. 

Or in many cases you can process them yourself, particularly with something called "regular expressions" which we will talk about next week.