#Exploring Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine

##Introduction

In this set of analyses, we will be making use of data from the Eckert et al. 2010 paper to explore patterns of phenotypic and environmental associations among populations of loblolly pine.


###Abstract

Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental
gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as FST outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes.


##Overview of tasks

In general, what you will be doing is working your way from loading and saving data related to this study, to corrections for population structure, to looking for associations between genotypes and phenotypes, genotypes and the environment (`Bayenv2`), and genotypes+phenotypes+environment (`SQUAT`)

## This notebook

With this notebook, you will explore both environmental and phenotypic associations with your SNP loci using the method we call SQUAT or Berg/Coop.  The citation for this method is in the slides, but you can get the paper [here](http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412).

In this notebook you will work with:

1. population specific allele frequencies
1. SNPassoc in R to obtain genotypic mean estimates for your 3 genotypic classes
1. Calclulate $\alpha$ for each of your SNPs
1. Use estimates of frequency and alpha to compute $Qx$
1. Investigate effects of populations using $z$-scores

As with the previous notebook, execute the cell with the imports and continue

In [0]:
import os, sys
from IPython.display import Image
import pandas as pd
from __future__ import division
import numpy as np
import rpy2
from rpy2 import robjects as ro
import pandas.rpy.common as com
import matplotlib.pyplot as plt
import seaborn as sns
import operator
import scipy as sp
import traceback
from sklearn import preprocessing
from IPython.parallel import Client
from subprocess import Popen, PIPE
import shutil
from IPython.display import FileLink, FileLinks, Image
import psutil
import multiprocessing
from hdfstorehelper import HDFStoreHelper
import warnings
import pandas
import dill
import statsmodels as sm
import statsmodels.formula.api as smf
from scipy.stats.stats import pearsonr
warnings.simplefilter("ignore", pandas.io.pytables.PerformanceWarning)
%matplotlib inline

%load_ext rpy2.ipython
pd.set_option('display.width', 80)
pd.set_option('max.columns', 30)

%load_ext autoreload
%autoreload 2

sns.set_context("talk")

In [0]:
%%R
library(qvalue)

In [0]:
def convert_to_snpassoc(col):
    if "-" in col.name:
        freqs = af[col.name]
        trans = {11: "%s/%s" % (freqs["A"], freqs["A"]),
                12: "%s/%s" % (freqs["A"], freqs["a"]),
                22: "%s/%s" % (freqs["a"], freqs["a"]),
                "NA":"NA"}
        return col.apply(lambda x: trans[x])
    return col

def get_phenotype(row):
    return np.max(pheno[(pheno.Longitude==row.long) & (pheno.Latitude==row.lat)])

def center_and_standardize_value(val, u, var):
    if val == -1:
        return 0.0
    return (val-u)/np.sqrt(var)

def center_and_standardize(snp):
    maf = af.ix["q",snp.name]
    u = np.mean([x for x in snp if x != -1])
    var = np.sqrt(maf*(1-maf))
    return snp.apply(center_and_standardize_value, args=(u, var))

def add_county_id(row):
    key = "%s_%s" % (row.county,row.state)
    if key in county_id:
        return county_id[key]
    return np.nan

In [0]:
r = ro.r

In [0]:
pwd

####Let's pull in some data

You'll need:

* your trait of interest
* the dictionary mapping population names to ids
* the PCA covariance matrix esimtated earlier
* your phenotypes
* the global allele frequencies

In [0]:
hdf = HDFStoreHelper("data.hd5")

In [0]:
trait_name = dill.load(open("trait_name.dill"))
county_id = dill.load(open("county_id.dill"))
pca_cov = hdf.get('pca_cov')
loc_hierf = hdf.get("loc_hierf")
pheno = hdf.get('pheno')
af = hdf.get("af")

In [0]:
trait = loc_hierf.apply(get_phenotype, axis=1)

In [0]:
trait_loc_hierf = trait.join(loc_hierf, how="inner")

In [0]:
trait_complete = trait_loc_hierf.drop(trait_loc_hierf[np.isnan(trait_loc_hierf[trait_name])].index)
trait_complete['countyid'] = trait_complete.apply(add_county_id, axis=1)

In [0]:
%load_ext rpy2.ipython

In [0]:
%%R
ls()

In [0]:
print r["trait_data"]

In [0]:
ro.globalenv['trait_data'] = trait_complete

In [0]:
trait_complete.head()

####Let's keep only those rows that have the trait we care about

In [0]:
trait_complete = trait_complete[trait_complete.countyid > 0]

In [0]:
trait_complete.shape

####Let's convert to SNPassoc format and join the data with the PCA covariance matrix

This gets all of the data together in one frame so that we can use the PCs to correct for population structure in our genotypic mean estimates

In [0]:
trait_snpassoc = trait_complete.apply(convert_to_snpassoc)

In [0]:
trait_snpassoc_pca = trait_snpassoc.join(pca_cov, how="inner")

In [0]:
trait_snpassoc_pca.head()

####Let's make the data more manageable.

Have a look at it

In [0]:
trait_snpassoc_pca = trait_snpassoc_pca.drop(['county_state',
                         'Longitude',
                         'Latitude',
                         'county',
                         'state',
                         'lat',
                         'long',
                         'countyid'], axis=1)

In [0]:
trait_snpassoc_pca.head()

####This part gets a little crazy, mostly due to how we chose to run SNPassoc in the past.  Making an R script seems to be the most straight-forward way to do it.

In [0]:
trait_snpassoc_pca.to_csv("%s_snpassoc.txt" % trait_name,
                             header=True,
                             index=True,
                             sep="\t")

In [0]:
"%s_snpassoc.txt" % trait_name

In [0]:
def write_snpassoc_file(df, input_file, num_pca_axes):
    pheno = df.columns[0:1]
    out_files = []
    for p in pheno:
        with open("snpassoc_%s_%s.R" % (os.path.basename(input_file), p.lower()), "w") as o:
            print "writing %s" % o.name
            out_files.append(o.name)
            text = '''
library(SNPassoc)

d = read.table('%s', sep="\\t", row.names=1, header=T)

#subtract b/c those are the PCA axes
snp_cols = 2:(ncol(d)-%d)
snp_data = setupSNP(d, colSNPs=snp_cols, sep="/")
pca_cols = (ncol(d)-%d):ncol(d)
pca_data = d[,pca_cols]

wg = WGassociation(%s~1+pca_data$PC1+pca_data$PC2+pca_data$PC3+pca_data$PC4+
pca_data$PC5+pca_data$PC6+pca_data$PC7+pca_data$PC8+pca_data$PC9+pca_data$PC10+
+pca_data$PC11+pca_data$PC12+pca_data$PC13+pca_data$PC14, 
data=snp_data, 
model="co", 
genotypingRate=5)

saveRDS(wg, "wg_%s_co.rds")
stats = WGstats(wg)
saveRDS(stats, "wgstats_%s.rds")
''' % (input_file, 
       num_pca_axes,
       num_pca_axes-1,
       p, 
       p.lower(), 
       p.lower())
        
            o.write(text)
    return out_files

In [0]:
write_snpassoc_file(trait_snpassoc_pca, "%s_snpassoc.txt" % trait_name, 14)

##Run in R

```R
source('snpassoc_cfried_melezitose_snpassoc.txt_melezitose.R')
```

If you've used a different file name, just change it so that it's the file that you get back from the function above.  You can do this from the terminal.

####Let's pull that snpassoc data.

The file names may change depending on your trait.  Just get the names of those files and change in them for `wg_trait_co.rds` and `wgstats_trait.rds`

In [0]:
%%R
wg_trait_co.rds = readRDS('wg_melezitose_co.rds')
wgstats_trait.rds = readRDS('wgstats_melezitose.rds')

####This pulls the data from R into python so we can do something useful with it.

In [0]:
wgstats_trait = r['wgstats_trait.rds']
wgstats_trait_labels = r('labels(wg_trait_co.rds)')

In [0]:
wgstats = {trait_name:[wgstats_trait, wgstats_trait_labels.rx2(1)]}
for key, datalist in wgstats.items():
    print "converting %s" % key
    wgstats[key] = [com.convert_robj(x) for x in datalist]

####This set of functions computes $\alpha$

In [0]:
def get_alleles(data):
    a = set()
    for x in data.index:
        for elem in x.split("/"):
            a.add(elem)
    return list(a)  

def get_allele_freqs_wg(data, AA, Aa, aa):
    total = np.sum(data['n'])*2
    A = data.ix[AA, "n"]*2 + data.ix[Aa, "n"]
    a = data.ix[aa, "n"]*2 + data.ix[Aa, "n"]
    return A/total, a/total

def get_genotypes(data, alleles):
    homos = ["%s/%s" % (x,x) for x in alleles]
    Aa = "%s/%s" % (alleles[0], alleles[1])
    if Aa not in data.index:
        Aa = Aa[::-1] #reverse it
    AA, aa = homos
    if data.ix[AA, "n"] < data.ix[aa, "n"]:
        AA, aa = homos[::-1] #reverse it so that major is first
    return AA, Aa, aa

def get_genotypic_values(data, alleles):
    AA, Aa, aa = get_genotypes(data, alleles)
    G_AA = float(data.ix[AA, 'me'])
    G_aa = float(data.ix[aa, 'me'])
    additive = (G_AA-G_aa)/2
    G_Aa = float(data.ix[Aa, 'me'])
    dominance = G_Aa - ((G_AA+G_aa)/2)
    return additive, dominance, AA, Aa, aa
    
def get_alpha(data):
    alleles = get_alleles(data)
    additive, dominance, AA, Aa, aa = get_genotypic_values(data, alleles)
    p, q = get_allele_freqs_wg(data, AA, Aa, aa)
    alpha = additive + (dominance*(q-p))
    return alpha, AA, aa, p, q

####Compute alpha values for your phenotype

The code is from a larger analysis that I've done, and I didn't change it for here.  The `python for p in wgstats:` part is for if you run multiple phenotypes at the same time.  For these examples, though, you just have one.

In [0]:
alpha_vals = {}
for p in wgstats:
    print "running %s" % p
    df = pd.DataFrame(index=["alpha", "p-value", "AA", "aa", "p", "q"])
    alpha_vals[p] = df
    d = wgstats[p][0]
    labels = wgstats[p][1]
    for i, locus in enumerate(d):
        try:
            data = pd.DataFrame(d[locus])
            snp = labels[i]
            genotypes = [g for g in data.index if "/" in g]
            data = data.ix[genotypes,:]
            pvalue = data['p-value'].dropna()[0]
            if len(genotypes) == 3:
                alpha, AA, aa, p, q = get_alpha(data)
                df[snp] = [alpha, pvalue, AA, aa, p, q]
        except Exception as e: 
            pass

####The alpha vals `DataFrame` actually contains more than just $\alpha$

I tend to do this a lot, group stats together. 

In [0]:
alpha_vals[trait_name].head(6)

####Have a look at the p-values.  Does the distribution surprise you?  

Also do the same for the $\alpha$ values.  Ask your self, or a neighbor, the same question.  

* What does this tell you about the effect sizes of the SNPs?  
* Is it what you would expect for polygenic adaptation?

In [0]:
plt.hist(alpha_vals[trait_name].ix['p-value',:], bins=30)
plt.title("p-values")
plt.show()

In [0]:
plt.hist(alpha_vals[trait_name].ix['alpha',:], bins=30)
plt.title("alpha values $\mu %.4f \pm %.4f \ [%.4f, %.4f]$" % (np.mean(alpha_vals[trait_name].ix['alpha',:]),
                                                            np.std(alpha_vals[trait_name].ix['alpha',:]),
                                                            np.min(alpha_vals[trait_name].ix['alpha',:]),
                                                             np.max(alpha_vals[trait_name].ix['alpha',:])))
plt.show()

In [0]:
def is_homozygous(gt):
    if len(set([x.strip() for x in gt.split("/")])) == 1:
        return True
    return False

def get_allele_counts(counts):
    a = {}
    het = 0
    for gt in counts.index:
        for allele in [x.strip() for x in gt.split("/")]:
            if not allele in a:
                a[allele] = 0
            a[allele] += counts[gt]
        if not is_homozygous(gt):
            het += counts[gt]
    return sorted(a.items(), key=lambda x: x[1], reverse=True), het

def get_correction(n):
    #for finite sample size
    return (2*n)/(2*n-1)

def get_allele_freqs(locus):
    locus = locus[locus != '?/?']
    locus = locus[locus != 'NA']
    c = locus.value_counts()
    c = c.sort(inplace=False, ascending=False)
    allele_counts = get_allele_counts(c)
    total_alleles = 2.0*sum(c)
    num_individuals = sum(c)
    A = ""
    a = ""
    P = 0
    Q = 0
    if len(allele_counts[0]) == 2:
        A = allele_counts[0][0][0]
        a = allele_counts[0][1][0]
        P = allele_counts[0][0][1]
        Q = allele_counts[0][1][1]
    else:
        A = allele_counts[0][0][0]
        P = P = allele_counts[0][0][1]
    PQ = allele_counts[-1]
    p = P/total_alleles
    q = Q/total_alleles
    assert p + q == 1.0
    He = 2 * p * q * get_correction(num_individuals)
    Ho = PQ*1.0/num_individuals
    Fis = 1 - (Ho/He)
    #print p, q, He, Ho, Fis
    ret = pd.Series({"p":p, 
                      "q":q,
                      "P":P,
                      "Q":Q,
                      "He":He,
                      "Ho":Ho, 
                      "Fis":Fis,
                    "PQ": PQ,
                    "total_alleles":total_alleles,
                    "num_indiv":num_individuals,
                    "A":A,
                    "a":a})
    return ret

####We need to organize our data by county again, find those which have counties, and then recompute the allele frequencies

In [0]:
trait_snpassoc_pca_county = pd.concat([trait_complete.countyid, trait_snpassoc_pca], axis=1)


In [0]:
trait_snpassoc_pca_county = trait_snpassoc_pca_county.drop(trait_snpassoc_pca_county[np.isnan(trait_snpassoc_pca_county[trait_name])].index)

In [0]:
snpassoc_af = trait_snpassoc_pca_county.ix[:,2:-14].apply(get_allele_freqs)

In [0]:
snpassoc_af

####Additionally, now that we have counties, we can compute allele frequencies for each county.

This takes a minute or two, there are 34 counties right now.  Be patient.

In [0]:
pop_allele_freqs = {}
for pop,data in trait_snpassoc_pca_county.groupby("countyid"):
    print "getting allele freqs for pop % d" % pop
    pop_allele_freqs[pop] = data.ix[:,2:-14].apply(get_allele_freqs)

####We only want to include populations with big enough samples.  Remember doing this earlier?  Let's do it again.

In [0]:
def get_usable_counties(county):
    if county.county_state in county_id:
        return True
    return False

data_ai = hdf.get("data_ai")
data_ai['usable'] = data_ai.apply(get_usable_counties, axis=1)
data_ai = data_ai.drop(data_ai[data_ai.usable == False].index)

####This section below is a nightmare

Reason \#7643 why statisticians should not write software.  The SQUAT method is a compendium of many files and directories that all must play nicely together.  Most of the work has been done (I think) to make this as seemless as possible.  Fingers crossed!

In [0]:
def write_gwas_data_file(df, pheno, outdir):
    out = "%s_gwas_data_file.txt" % pheno
    out = os.path.join(outdir, out)
    df = df.sort_index()
    df[['A1', 'A2', 'EFF', 'FRQ']].to_csv(out,
                                          header=True, 
                                          index=True,
                                          sep="\t")
    print out
    return out

def write_freqs_file(df, pheno, pop_freqs, outdir):
    out = "%s_freqs_file.txt" % pheno
    out = os.path.join(outdir, out)
    print out
    with open(out, "w") as o:
        o.write("SNP\tCLST\tA1\tA2\tFRQ\n")
        for pop, data in pop_freqs.items():
            m = data.T.merge(df, how="inner", left_index=True, right_index=True)
            m['population'] = pop
            m.index.name = 'SNP'
            m = m.sort_index()
            o.write(m[['population','A1','A2','p']].to_csv(header=False, 
                                                             index=True,
                                                             sep="\t"))
def write_match_pop_file(df, pheno, pop_freqs, pop, outdir):
    out = "%s_match_pop_file.txt" % pheno
    out = os.path.join(outdir, out)
    print out
    with open(out, "w") as o:
        o.write("SNP\tCLST\tA1\tA2\tFRQ\n")
        for key, data in pop_freqs.items():
            if key == pop:
                m = data.T.merge(df, how="inner", left_index=True, right_index=True)
                m['population'] = pop
                m.index.name = 'SNP'
                m = m.sort_index()
                o.write(m[['population','A1','A2','p']].to_csv(header=False, 
                                                                 index=True,
                                                                 sep="\t"))
                break
                
def write_full_dataset_file(df, pheno, pop_freqs, outdir):
    out = "%s_full_dataset_file.txt" % pheno
    out = os.path.join(outdir, out)
    print out
    with open(out, "w") as o:
        o.write("SNP\tCLST\tA1\tA2\tFRQ\n")
        for pop, data in pop_freqs.items():
            m = data.T.merge(df, how="inner", left_index=True, right_index=True)
            m['population'] = pop
            m.index.name = 'SNP'
            m = m.sort_index()
            o.write(m[['population','A1','A2','p']].to_csv(header=False, 
                                                             index=True,
                                                             sep="\t"))   
def write_env_var_data_file(pheno, pop_freqs, outdir):
    cols = [x for x in data_ai.columns if "AI" in x]
    for c in cols:
        pop_id = 1
        out = "%s_%s_env_var_data_file.txt" % (c, pheno)
        out = os.path.join(outdir, out)
        print out
        with open(out, "w") as o:
            o.write("CLST\tENV\tREG\n")
            for pop, d in data_ai.groupby('county_state'):
                pop = float(county_id[pop])*1.0
                if pop in pop_freqs:
                    o.write("%.1f\t%f\t%.1f\n" % (pop, d[c], pop_id))
                    pop_id += 1

In [0]:
def get_qvalues(pvalues):
    qvalue = r("qvalue")
    pvalues = r("as.numeric")(pvalues)
    qobj = qvalue(pvalues)
    qvalues = qobj.rx2("qvalues")
    return np.array(qvalues)

####This is a critial section

We must make a decision on what SNPs to include as GWAS snps (those SNPs we think might be important).  The better we do here at choosing, the better our results may end up.  This is tricky, too few and you won't find anything.  Too many and the model might get messed up.  Until more people start using this, it's hard to know what to do.  In our case, we take all 
SNPs with raw $p$-values < 0.05

In [0]:
squat_outdir = "squat" #change for your username
if not os.path.exists(squat_outdir):
    os.mkdir(squat_outdir)

for p in alpha_vals:
    full = alpha_vals[p].T
    full['q-value'] = get_qvalues(full['p-value'])
    full.index = [x.replace(".", "-") for x in full.index]
    full.index = [x[1:] if x.startswith("X") else x for x in full.index]
    full.index.name = "SNP"
    full.AA = full.AA.apply(lambda x: x[0])
    full.aa = full.aa.apply(lambda x: x[0])
    full = full.rename(columns={'alpha':'EFF',
                                'AA':'A1',
                                'aa':'A2',
                                'p': 'FRQ'})
    candidates = full[full['p-value']<0.05]
    plt.hist(full['q-value'], bins=100)
    plt.title("q-value")
    plt.show()
    
    plt.hist(full['p-value'], bins=100)
    plt.title("p-value")
    plt.show()
    print "chose %d candidates" % len(candidates)
    write_gwas_data_file(candidates, p, squat_outdir)
    write_freqs_file(candidates, p, pop_allele_freqs, squat_outdir)
    write_match_pop_file(full, p, pop_allele_freqs, 2, squat_outdir)
    write_full_dataset_file(full, p, pop_allele_freqs, squat_outdir)
    write_env_var_data_file(p, pop_allele_freqs, squat_outdir)

####Examine the output above.

We've written several files.  

1. Can you decifer what each file is for?  Have a look at them with either `!head` or `!cat`. 

1. How many candidate genes are in your GWAS?

1. Why might there be 4 files for AI?

The following bits of code set up the input file for SQUAT.

In [0]:
env_squat_files = !find {squat_outdir} | grep {trait_name} | grep env_var | grep AI
print env_squat_files
env_squat_files = [os.path.basename(x) for x in env_squat_files]


In [0]:
env_var_file_string = "list(%s)" % ", ".join(["'%s'" % x for x in env_squat_files])

In [0]:
squat_scripts_dir = "/gdc_home4/cfried/src/PolygenicAdaptationCode/Scripts"
!rm {squat_outdir}/Scripts
!cd {squat_outdir} && ln -s {squat_scripts_dir} .
def get_squat_vars(pheno):
    d = {"gwas.data.file":"'%s_gwas_data_file.txt'" % pheno,
         "freqs.file":"'%s_freqs_file.txt'" % pheno,
         "env.var.data.files": env_var_file_string,
         "match.pop.file":"'%s_match_pop_file.txt'" % pheno,
         "full.dataset.file":"'%s_full_dataset_file.txt'" % pheno,
         "path":"'%s'" % pheno,
         "match.categories":"c('MAF')",
         "match.bins":"list(seq(0,0.5,0.02), c(2), seq(0,1000,100))",
         "cov.SNPs.per.cycle":5000,
         "cov.cycles":1,
         "null.phenos.per.cycle":1000,
         "null.cycles":1,
         "load.cov.mat":"F",
         "sim.null":"T",
         "check.allele.orientation":"F"}
    return ',\n'.join("%s=%s" % (key,val) for (key,val) in d.items())

def create_squat_run_file(pheno):
    squat_file = os.path.join(squat_outdir, "squat_%s.r" % pheno)
    with open(squat_file, "w") as o:
        o.write('system("rm -rf %s")\n'% pheno)
        o.write("source('%s')\n" % os.path.join(squat_scripts_dir, "CreateTraitFile.R"))
        o.write("source('%s')\n" % os.path.join(squat_scripts_dir, "functions.R"))
        o.write("PolygenicAdaptationFunction(%s)\n" % get_squat_vars(pheno))
    return squat_file

for pheno in alpha_vals:
    squat_file = create_squat_run_file(pheno)
    print squat_file
    !cat $squat_file
    print ""

####If all goes well, you should be able to run SQUAT.  If it fails, come and get me.

In [0]:
def run_squat(p):
    print "running %s" % p
    output = "%s/%s" % (squat_outdir, p)
    if os.path.exists(output):
        !rm -rf {output}
    cmds = ["setwd('%s')" % squat_outdir,
            'source("squat_%s.r")' % (p),
            "setwd('../')"]
    for cmd in cmds:
        print cmd
        r(cmd)
    
run_squat(trait_name)

####You can get the files that SQUAT wrote directly from the file system using `find` and `grep`

In [0]:
rfiles = !find {squat_outdir} | grep Robj | grep Output | grep {trait_name}
bc = {}
for f in rfiles:
    d = f.split("/")
    if not d[1] in bc:
        bc[d[1]] = []
    bc[d[1]].append(f)
bc

####Let's examine the outputs

1. What is the $Qx$ value.  Is it significant?
1. Is anything interesting reported in the decomosition into $F_{ST}$ or LD?

In [0]:
for pheno in bc:
    print pheno
    for obj in bc[pheno]:
        r('load("%s")' % obj)
    print r("the.stats")
    print("------------------")
    print r("p.vals")

####We can now look at the influence of populations in driving environmental correlations

1. Are there populations which show elevated or depressed assocaition with environment?

In [0]:
the_stats = r['the.stats']
the_stats = com.convert_robj(the_stats.rx("ind.Z"))['ind.Z']

In [0]:
def convert_to_county(elem):
    for k, v in county_id.items():
        if int(v) == int(elem):
            return k

In [0]:
sns.barplot(the_stats.index, the_stats.values)
plt.ylabel("Z-score")
plt.xlabel("County")
ax = plt.gca()
ax.set_xticklabels(map(convert_to_county, the_stats.index), rotation=90)
plt.show()