<!-- Title: Insecticide resistance gene copy number variation in Anopheles gambiae -->

*This blog post introduces recent work by the [Anopheles gambiae 1000 Genomes Project](https://www.malariagen.net/projects/ag1000g) to study gene copy number variation, published in ["Whole-genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes" (Lucas et al. 2019)](https://genome.cshlp.org/content/29/8/1250.full?rss=1).*

## Changes in bed net technology

In 2018, 172 million long-lasting insecticidal nets (LLINs) were given out for free in Africa to help control malaria. Here's a breakdown of LLINs distributed by country and year, courtesy of the [AMP Net Mapping Project](https://allianceformalariaprevention.com/net-mapping-project/) (hover over the bars to see details for each country): 


In [1]:
import pandas as pd
df_nets = pd.read_excel(
    'https://allianceformalariaprevention.com/wp-content/uploads/2019/07/Net-Mapping-1st-Q-2019.xlsx', 
    sheet_name='SSA',
    skiprows=2,
    skipfooter=1,
    names=['Country'] + list(range(2004, 2019)),
    usecols=list(range(16)))
df_nets.set_index('Country', inplace=True)
df_nets['all_years'] = df_nets.sum(axis=1)
df_nets.sort_values(by='all_years', inplace=True, ascending=False)
del df_nets['all_years']
df_nets.head()

Unnamed: 0_level_0,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Nigeria,71400,262000,2147404,2724304,15310222,19813977,29908286,2555096,5452563,26355032,42973544,23794214,11240307,35498731,18686909
Congo (Democratic Republic of the),713053,1089997,1750841,3317755,8506216,7129370,12154287,17873788,775690,13418804,20795345,26793394,13635628,13031191,28432276
Ethiopia,642210,2432635,12294218,4639411,1935148,2196289,15146406,1302400,5349703,11625273,9457032,23563000,16000,10470950,5855870
Kenya,391883,2814594,8700429,1555150,3235173,4293195,7305749,9869447,5816422,4866832,6886113,6720361,3288720,17430066,2146646
Uganda,144000,420887,2438134,1603181,1870846,1633302,8257164,1105520,3118001,13242962,10509079,4364151,20984657,6210233,5785560


In [2]:
import bokeh as bk
import bokeh.plotting
import bokeh.models
import bokeh.io
bk.io.output_notebook()

In [6]:
# setup figure
fig = bk.plotting.figure(x_range=(2003, 2019), plot_height=400, plot_width=700,
                         title="LLINs distributed in sub-Saharan Africa by country and year")

# setup colors
countries = df_nets.index.tolist()
from itertools import cycle, islice
colors = list(islice(cycle(bk.palettes.d3['Category20'][20]), len(countries)))

# plot bars
renderers = fig.vbar_stack(countries, x='index', source=df_nets.T, width=0.9, 
                           color=colors, name=countries)

# add tooltips
ht = bk.models.HoverTool(tooltips=[("Country", "$name"),
                                   ("Year", "$x{int}"), 
                                   ("Nets", "@$name{0,0}"),])
fig.add_tools(ht)

# style axes
fig.xaxis.axis_label = "Year"
fig.yaxis.axis_label = "No. nets"
fig.yaxis.formatter = bk.models.NumeralTickFormatter(format="0,0")

bk.plotting.show(fig)

In [4]:
from IPython.core.display import HTML, display
script, div = bk.embed.components(fig)
display(HTML(script))
display(HTML(div))

The total cost of distributing LLINs varies, but [can be around \\$6 per net]((https://malariajournal.biomedcentral.com/articles/10.1186/s12936-016-1671-1)), which includes around \\$2-3 dollars to buy the net itself, and around \\$3 for the logistics required to distribute nets to communities. I don't know the exact number, but it seems reasonable to assume that more than \\$1 billion is spent on distributing LLINs in Africa each year. More than half of that is paid for by the [Global Fund](https://www.theglobalfund.org/en/malaria/), and the remainder by other donors including [PMI](https://www.pmi.gov/how-we-work/technical-areas/insecticide-treated-mosquito-nets-(itns)-pmi), [UNICEF](https://www.unicef.org/supply/index_39977.html) and [AMF](https://www.againstmalaria.com/).

The current generation of LLINs are impregnated with a [pyrethroid insecticide](https://en.wikipedia.org/wiki/Pyrethroid) (e.g., [PermaNet® 2.0](https://www.vestergaard.com/permanet-2-0) uses deltamethrin, [Olyset Net](https://sumivector.com/mosquito-nets/olyset-net) uses permethrin). Over the past 20 years, [mosquito populations across Africa have become resistant to pyrethroids](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6215693/). Some studies have shown that [LLINs remain effective despite pyrethroid resistance](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5968369/). However, other studies have shown that a new generation of LLINs which combine a pyrethroid insecticide with "resistance-breaking" compound called [piperonyl butoxide (PBO)](https://en.wikipedia.org/wiki/Piperonyl_butoxide) are [more effective than standard nets at preventing malaria transmission](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5910376/). 

The debate about efficacy is still open, but [PBO LLINs have been approved for use by WHO](https://www.who.int/malaria/publications/atoz/use-of-pbo-treated-llins/en/), which means they can be bought and distributed as part of public health campaigns to control malaria. National Malaria Control Programmes (NMCPs) are arguing that PBO LLINs are needed to counter high levels of pyrethroid resistance, and PBO LLINs are beginning to be deployed at scale (e.g., see [LLIN priorities work stream meeting, panel 2](https://endmalaria.org/sites/default/files/RBM%20VCWG-14%20Meeting%20Report_Final.pdf)). But PBO LLINs are currently [more expensive than conventional LLINs](https://www.theglobalfund.org/media/5861/psm_llinreferenceprices_table_en.pdf), which makes for some difficult decisions. Buying some PBO LLINs may mean buying fewer nets in total. So how many PBO LLINs should be bought, and where should they be deployed?

## Insecticide resistance surveillance

Clearly we need more data, but what type of data should we collect? Most countries in sub-Saharan Africa have a programme of [entomological monitoring](https://www.pmi.gov/how-we-work/technical-areas/entomological-monitoring), where data about mosquito populations are regularly collected from sentinel sites. This includes [insecticide resistance testing](https://www.who.int/neglected_diseases/vector_ecology/resistance/en/), where mosquitoes are exposed to a controlled dose of an insecticide under standardised conditions and survival rates are recorded. These tests can tell you **if** a mosquito population is resistant to pyrethroids, and can also tell you **how strong** the resistance is. However, they **cannot tell you why** the mosquitoes are resistant. I.e., they cannot tell you which **molecular mechanisms of resistance** are present in the mosquito population.

Insects can become resistant to insecticides via several different molecular mechanisms, each of which is due to changes in a different set of genes. That's been known for more than 40 years. I happen to have on my bookshelf a copy of "[Pest Resistance to Pesticides](https://books.google.co.uk/books?id=SavaBwAAQBAJ&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)" which contains the proceedings of a conference held in December 1979. I believe it was only a one-off conference, which is a shame because it was clearly an amazing meeting, and there was a lot of cross-over between work on disease vectors and crop pests - we could really use a conference like that today. There were 11 papers presented on molecular mechanisms of insecticide resistance, including two papers on the role of mixed-function oxidases (MFOs), also known as Cytochrome P450 enzymes.

![bookshelf](/assets/bookshelf.jpg)

MFOs cause pyrethroid resistance by breaking down pyrethroid molecules before they have a chance to act. MFOs are naturally present in all mosquitoes, but resistant mosquitoes have genetic changes that either increase the amount of specific MFO molecules in their system, or change an MFO protein sequence so it is better at breaking down insecticides. PBO specifically blocks the action of MFOs. So, if a mosquito population **does** have pyrethroid resistance due to MFOs, then PBO LLINs ought to be more effective than standard LLINs. But if a mosquito population **does not** have MFO-based resistance, then PBO LLINs may be no better than standard LLINs (although there are important caveats, see discussion later).

The bottom line is that if you are deciding whether to deploy PBO LLINs, it's worth knowing which mechanisms of resistance are present in mosquito populations at the locations you're targeting. It's also worth knowing what fraction of mosquitoes carry MFO-based resistance, because it may be present but only at low frequency, in which case PBO may still be effective. And it's worth collecting data year on year, because evolution happens quickly in mosquito populations, and resistance genes can spread between different locations.

## Gene copy number variation

Some molecular mechanisms of pyrethroid resistance are relatively well understood. We know at least some of the underlying genetic changes which cause resistance, and we have developed low cost genetic tests that can be used measure the frequency of that resistance mechanism in a given mosquito population. But despite more than 40 years of research, nobody has identified any of the specific genetic changes related to MFO-based resistance in malaria mosquitoes (if I'm wrong about that, please tell me). And this has meant we have had no way of directly measuring the frequency of MFO-based resistance in mosquito populations, nor tracking its spread from one location to another. 

To fill in this blank, we analysed [data from phase 2 of the Anopheles gambiae 1000 Genomes Project](https://www.malariagen.net/data/ag1000g-phase-2-ar1). This dataset includes sequence reads from Illumina whole genome sequencing of 1,142 mosquitoes collected from the field in 13 African countries. We searched for a specific type of genetic change, where some mosquitoes carry more copies of a given gene than others, called gene copy number variation (CNV). CNVs are important because if a mosquito carries more copies of a given gene in its genome, then it will express a greater amount of the corresponding protein in its system. If that gene is an MFO gene that metabolises pyrethroids, then the increase in copy number is very likely to be a cause of resistance.

We first scanned the whole genome of each mosquito for CNVs, then zoomed in our analysis for a more detailed study of CNVs at gene families known to be involved in insecticide resistance, including MFOs. To identify CNVs, we looked for changes in the number of sequence reads aligned to the reference genome, fitting a hidden Markov model to the depth of coverage data. For example, here is sequence read coverage data from a single mosquito showing a CNV spanning four MFO genes:

![Example CNV](http://alimanfoo.github.io/slides/20190606-who-geneva/cnv-coverage.png)

In fact their are 9 MFO genes at this genomic location, all situated in close proximity. Genes in this cluster have [previously been linked to pyrethroid resistance](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2583951/), so any CNVs at this genomic location are likely to be important. In fact we found 15 different CNVs spanning one or more genes in this gene cluster, shown in the plot below by the coloured bars labelled "Dup1" to "Dup15":

![MFO CNVs at the cyp6aa - cyp6p gene cluster](http://alimanfoo.github.io/slides/20190606-who-geneva/cyp6p-cnvs.png)

Some of these CNVs were found in only a handful of mosquitoes, but some were much more common. For example, in Cote d'Ivoire we found that Dup7 was present in 32% of mosquitoes, Dup11 in 40%, Dup14 in 46% and Dup15 in 39%. You may have noticed these frequencies add up to more than 100%, which is possible because some mosquitoes were carrying two different CNVs. Overall 98% of the Cote d'Ivoire mosquitoes were carrying some kind of CNV at this gene cluster, which is probably a sign that the population has been under very intense selection pressure.   

## Tracking the spread of resistance

Eric Lucas from the Liverpool School of Tropical Medicine, who led the analysis, did a beautifully detailed characterisation of the CNVs we discovered at insecticide resistance genes. In the supplementary material there are complete descriptions of each of the CNVs we found, including their precise breakpoints in the DNA sequence, and their frequency in each of the mosquito populations we sampled, for the [*Cyp6paa* - *Cyp6p*](https://genome.cshlp.org/content/suppl/2019/07/19/gr.245795.118.DC1/Supplementary_Data_S4.pdf), [*Gstu* - *Gste*](https://genome.cshlp.org/content/suppl/2019/07/19/gr.245795.118.DC1/Supplementary_Data_S5.pdf) and [*Cyp6m* - *Cyp6z*](https://genome.cshlp.org/content/suppl/2019/07/19/gr.245795.118.DC1/Supplementary_Data_S6.pdf) gene clusters and the [*Cyp9k1*](https://genome.cshlp.org/content/suppl/2019/07/19/gr.245795.118.DC1/Supplementary_Data_S7.pdf) gene. 

CNVs are created when a segment of DNA is not copied perfectly but is duplicated instead, which is a type of accidental mutation that happens occasionally during the normal processes of DNA replication and recombination. The exact beginning and end of the duplicated piece of DNA (known as the breakpoints) can tell us something about its origin, because it's unlikely that two independent CNV mutations will duplicate exactly the same region of DNA. In other words, if you find two mosquitoes that both have a CNV with exactly the same breakpoints, it's fairly safe to assume that they have inherited that CNV from a common ancestor.

When we have mosquitoes collected from multiple geographical locations, we can use this type of reasoning to ask questions about whether CNV mutations are spreading between different mosquito populations. For example, at the *Cyp6aa* - *Cyp6p* gene cluster, we found the Dup15 CNV in mosquitoes from Burkina Faso and Cote d'Ivoire. This tells us that the Dup15 CNV has spread, and has ended up in two different countries. We cannot say exactly where it originated from, because it could have come from Burkina Faso and spread to Cote d'Ivoire, or vice versa, or it could have originated from some other country that we didn't sample any mosquitoes from. 

We found a number of other examples of CNVs that have spread to multiple countries. For example, here's a list of some of the other the spreading CNVs, and the populations in which they were found (population frequency shown in brackets):

* Dup1 - Burkina Faso *An. coluzzii* (8%), Uganda *An. gambiae* (58%)
* Dup7 - Burkina Faso *An. coluzzii* (44%), Cote d'Ivoire *An. coluzzii* (32%), Ghana *An. coluzzii* (5%), Guinea *An. coluzzii* (75%)
* Dup8 - Burkina Faso *An. gambiae* (3%), Guinea *An. gambiae* (3%)
* Dup10 - Burkina Faso *An. coluzzii* (49%), Ghana *An. coluzzii* (5%)
* Dup11 - Burkina Faso *An. coluzzii* (41%), Ghana *An. coluzzii* (5%)
* Dup14 - Burkina Faso *An. coluzzii* (3%), Cote d'Ivoire *An. coluzzii* (46%)
* Dup15 - Burkina Faso *An. coluzzii* (1%), Cote d'Ivoire *An. coluzzii* (39%)

Clearly there is a lot of gene flow connecting the *An. coluzzii* populations in West Africa, with Dup7, Dup10, Dup11, Dup14 and Dup15 all having spread to multiple countries. Dup1 also provides a rather dramatic example of gene flow, having found its way into populations of two different mosquito species, and into both West and East Africa. 

## Challenges and opportunities

These new data on the genetic changes underpinning insecticide resistance expand our horizons and open up new approaches to molecular surveillance of insecticide resistance in mosquito populations. But how could this be translated from research into practice? And what challenges and complications do we need to be aware of?

Clearly high throughput sequencing could be used as an operational tool for mosquito surveillance. We're working to scale up mosquito whole genome sequencing at the Wellcome Sanger Institute, and developing approaches to amplicon sequencing that could be deployed in local laboratories. Portable sequencing technology is also moving fast, so more options are likely to open out in the next few years. In short, there is plenty of opportunity, but there is work to do to operationalise technology and build services, and investments are needed to fund translational work.

There are also many caveats and complexities. CNVs are not the only type of genetic change that could cause MFO-based pyrethroid resistance. And there are many factors beyond mosquito genotype that will determine which type of LLINs are most cost-effective in any given setting. Genome sequencing will not replace the need to perform resistance bioassays, or do experimental hut trials, or do randomized controlled trials to compare different interventions. 

But mosquito populations are a moving target, and we need to find a way to keep pace. If we collect the right type of data, we can start with crude approximations -- decision-making heuristics that are not perfect, but are better than nothing -- and as we accumulate more data, our models can be improved over time.