# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Data Retrieval 2 - Getting data from WormMine**
Welcome to the second jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with the WormBase data from WormMine. (http://intermine.wormbase.org/tools/wormmine/begin.do)
We will both explore the site, and the intermine python package, and extract data of interest. Let's get started!

We start by installing and loading the libraries that are required for this tutorial. 

In [None]:
!pip install intermine
import intermine
from intermine import registry
from intermine.webservice import Service

In [None]:
#getInfo(mine) can fetch all the information about a particular mine i.e., its description, version, 
#organisms associated etc.
registry.getInfo("WormMine")

In [None]:
#getData(mine) can be used to extract the data sets corresponding to it
registry.getData("WormMine")

In [None]:
#The method "new_query" from Service class creates a query object
service = Service("http://intermine.wormbase.org/tools/wormmine/service")
query=service.new_query()

### Simple Queries

In [None]:
#We can query the WormMine database to extract the commonName, genus, name, shortName, species, and taxonId of all 
#organisms.
query=service.new_query("Organism")
query.select("commonName", "genus", "name", "shortName", "species","taxonId")
#Print first 10 rows of the results of the query
for row in query.rows(start=0,size=10):
    print(row)

In [None]:
#We can query the WormMine database to extract the automatedDescription, biotype, briefDescription, length, operon,
#primaryIdentifier, secondaryIdentifier, and symbol of all genes.
query=service.new_query("Gene")
query.select("automatedDescription", "biotype", "briefDescription", "length", "operon", "primaryIdentifier", "secondaryIdentifier", "symbol")
#Print first 10 rows of the results of the query
for row in query.rows(start=0,size=10):
    print(row)

In [None]:
#Create a query object and query the WormMine database to extract the description of all GO Terms.
query=service.new_query()
query.select("GOTerm.description")
#Add a column to the query with the identifiers of all GO Terms.
query.add_view("GOTerm.identifier")
#Changing the sorting order of the query by a specific column
query.add_sort_order("GOTerm.identifier")
#Print first 10 rows of the results of the query
for row in query.rows(start=0,size=10):
    print(row)

### Constraints

In [None]:
query=service.new_query("Organism")
query.select('*')
#Add a constraint to your query based on the genus column
query.add_constraint("genus","=","Caenorhabditis")
for row in query.rows():
    print(row)

In [None]:
query=service.new_query("Gene")
query.select("primaryIdentifier", "ontologyAnnotations.*")
query.add_constraint("organism.genus","=","Caenorhabditis")
#More than one constraint can be added to the query
query.add_constraint("ontologyAnnotations.ontologyTerm.name","=","kinase activity")
for row in query.rows(size=10):
    print(row)

In [None]:
query=service.new_query("Homologue")
query.select('*', 'gene.primaryIdentifier', 'gene.symbol')
query.add_constraint("gene.organism.genus","=","Caenorhabditis")
query.add_constraint("gene.organism.species","=","elegans")
query.add_constraint("type","=","orthologue")
#Logic operators can be used to set the different constraints on the query
query.set_logic("A & B & C")
for row in query.rows(size=10):
    print(row)

#### Different types of constraints

In [None]:
query=service.new_query("Gene")
#Unary constraint (IS Null and IS NOT Null)
query.add_constraint("primaryIdentifier","IS NOT NULL")
for row in query.rows(size=10):
    print(row)

In [None]:
#Binary constraints (=,<=,>=,<,>,!=)
query.add_constraint("length",">=","12000")
for row in query.rows(size=10):
    print(row)

In [None]:
query=service.new_query()
#Ternary constraint (LOOKUP)
query.add_constraint("Gene","LOOKUP","hlh-2",extra_value="C. elegans")
for row in query.rows():
    print(row)

In [None]:
query=service.new_query("Gene")
#Multi-value constraints (ONE OF and NONE OF)
query.add_constraint("symbol","NONE OF",['hlh-2','unc-26'])
for row in query.rows(size=10):
    print(row)

In [None]:
query=service.new_query()
#List constraint (IN and NOT IN)
query.add_constraint("Gene","IN","C. elegans transcription factor genes")
for row in query.rows(size=10):
    print(row)

#### Creating own lists

In [None]:
#We can create our own lists but for this we need to use our login information to connect to WormMine.
#Enter your login information in the line of code below and then run it.
service=Service("http://intermine.wormbase.org/tools/wormmine/service",username="purrrpleeee@gmail.com",password="zxb3dCrxN7SJBaq")

In [None]:
#Upload the required list of symbols
symbols=["ugt-59","sgn-1","kinase"]

In [None]:
#Declare a list manager object to create a list
lm=service.list_manager()

In [None]:
#Get the names of all the lists related to your account in addition to the public lists
lm.get_all_list_names()

In [None]:
#Create a new list with the list of symbols
lm.delete_lists(["my list"])
lm.create_list(content=symbols,list_type="Gene",name="my list")

In [None]:
#Query WormMine with the newly created list
query=service.new_query("Gene")
query.add_constraint("Gene","IN","my list")

In [None]:
query.add_constraint("symbol","=","sgn-1")

In [None]:
lm.delete_lists(["my list 2"])
lm.create_list(query,name="my list 2")

In [None]:
#We can combine multiple lists in WormMine simply using a + or union operator
l1=lm.get_list(name="C. elegans genes with a locomotion variant - or descendant - allele phenotype as of WS257")
l2=lm.get_list(name="C. elegans genes with a cell cycle variant - or descendant - allele phenotype as of WS257")
l3=l1+l2

lm.delete_lists(["combination-1"])
l3.set_name("combination-1")

In [None]:
for r in l3:
    print(r)

In [None]:
y=[l1,l2]

In [None]:
lm.delete_lists(["combination-2"])
lm.union(y,name="combination-2")

### Some more query examples

In [None]:
query=service.new_query("Gene")
query.add_constraint("ontologyAnnotations","GOAnnotation")
for row in query.rows(size=10):
    print(row)

In [None]:
query=service.new_query("Gene")
query.add_view("homologues.gene.primaryIdentifier","homologues.homologue.primaryIdentifier")
query.add_constraint("Gene", "IN", "C. elegans transcription factor genes", code = "A")
query.add_constraint("homologues.homologue", "IS NOT", "Gene", code = "B")
for row in query.rows(size=10):
    print(row["homologues.gene.primaryIdentifier"],row["homologues.homologue.primaryIdentifier"])

In [None]:
query=service.new_query()
query.add_view("SequenceFeature.organism.shortName", "SequenceFeature.chromosomeLocation.locatedOn.primaryIdentifier", "SequenceFeature.chromosomeLocation.start", "SequenceFeature.chromosomeLocation.end" )
query.add_constraint("chromosomeLocation", "OVERLAPS", ["I:1..4000"])
for row in query.rows(size=10): 
    print(row)

In [None]:
query=service.new_query("Gene")
query.select("primaryIdentifier","symbol", "ontologyAnnotations.ontologyTerm.name", "ontologyAnnotations.ontologyTerm.identifier")
query.add_constraint("homologues.type","=","orthologue")
#We can also perform joins on the queries to get columns from different sets of data
query.add_join("ontologyAnnotations", "INNER")
for row in query.rows(size=10):
    print(row)

In [None]:
#The names of columns can be changed to make them more readable
query.add_path_description("ontologyAnnotations.ontologyTerm.identifier","Ontology Term")
for row in query.rows(size=10):
    print(row)

In [None]:
query=service.new_query("Gene")
query.select("expressionClusters.*")
query.add_constraint("Gene","LOOKUP","aap-1",extra_value="C. elegans")
for gene in query.results(row="rr"):
    print(gene)

In [None]:
#Only some columns can be chosen for display
for gene in query.results("list"):
    print(gene[0], gene[1])

In [None]:
#Only some columns and rows can be chosen for display based on different criteria
for row in query.results(row="list"):
    if row[4]!=None:
        print(row[0], row[1])

### Write query results to a file for later use

In [None]:
import csv

In [None]:
with open('results.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    for gene in query.results(row="rr"):
        csv_writer.writerow(gene)

### Combinations of constraints and set logic

In [None]:
query=service.new_query()
query.add_view("Gene.organism.name","Gene.symbol")
gene_is_ugt = query.add_constraint("Gene.symbol", "=", "ugt-59")
gene_is_sgn = query.add_constraint("Gene.symbol", "=", "sgn-1")
query.set_logic(gene_is_ugt | gene_is_sgn)
for row in query.rows():
    print(row)

In [None]:
for row in query.rows():
    print(row.to_d())

### Get a readable XML serialisation of a query

In [None]:
query.to_xml()

### Clear the output column list

In [None]:
query.clear_view()

This is the end of the tutorial for querying and extracting WormBase data using WormMine through intermine. This tutorial is influenced by the intermine tutorial notebooks from - https://github.com/intermine/intermine-ws-python-docs

In the next tutorial, we will use access the WormBase ParaSite data through their RESTful API.