## Circos for visualizing genomic data
### the power of round!

We can visualize the differences between our taxa in some pretty aesthetically-pleasing ways, as it turns out. Take a gander at these visualizations, generated by Circos: 

- dinosaur genomes: http://islanublar.jurassicworld.com/creation-lab/
- duplications in the human genome: https://www.nytimes.com/2012/05/04/science/it-started-with-genome-omes-proliferate-in-science.html?_r=1
- comparative genomics: https://archive.nytimes.com/www.nytimes.com/imagepages/2007/01/22/science/20070123_SCI_ILLO.html


Our distance matrix tells us a good deal about the minute differences between our taxa; is there a way to visualize this data in a more digestible / intuitive way? 

Introducing: the Circos table viewer! http://mkweb.bcgsc.ca/tableviewer/

Ordinarily, the Circos software package requires quite a few moving parts -- different scripts for describing colors, chords, ideograms, etc. But the table viewer accepts a formatted data table with parameters for chord width and color and spits out a ready-made Circos diagram. We'll be making use of this capability to visualize our genetic distances! Check out a diagram that I made using tableviewer, below: 


<img src="circos-table-tbs.png" alt="Drawing" style="width: 700px;"/>

In [3]:
# And this is a snippet of the data file that generated this image: 
   
# abietina arctica bathypathes eques formosa glaberrima 
# abietina 0 0 0 0 0 0 
# arctica 952.1739 0 0 0 0 0 
# bathypathes 950.7278 5.53690724 0 0 0 0 
# eques 2286.3992 2256.43389 2259.83864 0 0 0 
# formosa 901.7674 481.304441 482.894727 2251.58126 0 0
# glaberrima 997.3686 760.269178 762.004693 2142.21841 662.9687 0  

Notice that the data structure above is a full, 6x6 table, but that only half of all values are actually assigned -- you might recall that our distance matrices are actually lower or upper triangular matrices. For the sake of formatting, we'll be substituting 0s in for the upper triangular half of our distant matrices. 

Note that Circos tableviewer also requires that the values in submitted tables be integral. Because we're working with proportions, all of our values are less than 1! So this means we'll have to __scale__ and __round__ our entries before submitting them to Circos. Luckily, this is super easy to do in Python. For the sake of formatting row and column names, we'll be doing all of this within Pandas, using a dataframe. 

Another consideration we'll have to make is what we want our cords to represent. At the moment, our distance matrices' entries record the amount of __difference__ we've detected in our pairwise comparison of sequences -- so, the higher the entry, the greater the distance between two sequences. This means that thicker cords will correspond to more genetically distinct sequences. This doesn't seem very intuitive, however -- so how could we go about fixing this? 

In [4]:
### how would you design the cords for your Circos diagram?

Jot down the transformations you're planning on making to get your distance matrix into top form for Circos:

-1

-2

-3


In [5]:
# Step 1: updating entries in your distance matrix array. 
    
# how do you access entries in a two-dimensional array? type your answer below.



# how can you take the 'complement' of your distance matrix entries, so that you're getting genetic 'similarity' scores?



# how can you scale these scores so that they're greater than 1 and integral values?
    

In [6]:
# Step 2: labeling your data frame / matrix for Circos table viewer. 

# take a look at the data file snippet pasted in the cell below the Circos image. What do you notice about the formatting?



# how can you label the columns and rows of the distance matrix you've already made? (Hint: think data frames!) 
# to help you out, I've included the list of taxa names matched to their data file number. 

names = ['1: Haloclava producta', '2: Heteractis aurora', '3: Anemonia sulcata', '4: Bartholomea annulata', 
         '5: Edwardsia gilbertensis', '6: Edwardsia timida', '7: Entacmaea quadricolor', '8: Epiactis japonica', 
         '9: Relicanthus daphneae']




In [None]:
# Step 3: outputting your data file and uploading to tableviewer. 

# you have two options here -- you can print your completed and labeled data frame to console and copy and paste into
# a text file, or you can output directly from within python. Either way -- your text file should be easily ingestible
# to the tableviewer! Remember to save it as a txt, NOT an rtf! (to toggle from rtf to txt, hit shift-command-T)