# Bird Co-occurences
### Analyzing Indian ebird data of co-occurences for habitats and geography

We analyze the e-bird data for _co-occurences_ of species, i.e., species occuring together more than expected by their separate frequencies. We can formulate this as the following question

### Question: What causes bird species to occur together?

We shall see that such _co-occurences_ occur for two reasons - geography and habitat.

With more data, one can reverse this analysis and divide India into geographical regions and habitats, from a _bird's eye view_, based on co-occurences. Clearly the present e-bird data is inadequate for this as we only see some regions and habitats in it.

**Code:** This is a jupyter notebook written in scala.

## Setup and loading data

We first add all our dependencies, both our own code and other stuff on which it depends. The code here is  on the repository for some of our other code at [Proving-Ground](https://github.com/siddhartha-gadgil/ProvingGround)

In [1]:
classpath.addPath("/home/gadgil/code/ProvingGround/deepwalk/target/scala-2.11/deepwalk4s_2.11-0.8.jar")



In [2]:
classpath.add("com.lihaoyi" %% "ammonite-ops" % "0.7.7")


1 new artifact(s)


1 new artifacts in macro
1 new artifacts in runtime
1 new artifacts in compile




In [3]:
import ammonite.ops._

[32mimport [36mammonite.ops._[0m

The e-bird data has been pre-processed (not in this notebook) and saved in various files. 
We first read how many checklists contain a given bird, and how many contain both birds in a pair.

In [4]:
val data = pwd / up / 'data

[36mdata[0m: [32mPath[0m = /home/gadgil/code/ProvingGround/data

In [5]:
val freqF = data / "frequencies.tsv"
val freqs = read.lines(freqF) map (_.split("\t")) map {case Array(a, n) => (a, n.toInt)} sortBy((an) => - an._2)

[36mfreqF[0m: [32mdata[0m.[32mThisType[0m = /home/gadgil/code/ProvingGround/data/frequencies.tsv
[36mfreqs[0m: [32mVector[0m[([32mString[0m, [32mInt[0m)] = [33mVector[0m(
  [33m[0m([32m"Corvus splendens"[0m, [32m12538[0m),
  [33m[0m([32m"Acridotheres tristis"[0m, [32m12509[0m),
  [33m[0m([32m"Halcyon smyrnensis"[0m, [32m11176[0m),
  [33m[0m([32m"Corvus macrorhynchos"[0m, [32m10495[0m),
  [33m[0m([32m"Dicrurus macrocercus"[0m, [32m9785[0m),
  [33m[0m([32m"Ardeola grayii"[0m, [32m9660[0m),
  [33m[0m([32m"Pycnonotus cafer"[0m, [32m9551[0m),
  [33m[0m([32m"Pycnonotus jocosus"[0m, [32m8892[0m),
  [33m[0m([32m"Streptopelia chinensis"[0m, [32m8817[0m),
  [33m[0m([32m"Orthotomus sutorius"[0m, [32m8780[0m),
  [33m[0m([32m"Copsychus saularis"[0m, [32m8545[0m),
  [33m[0m([32m"Centropus sinensis"[0m, [32m8394[0m),
  [33m[0m([32m"Psilopogon viridis"[0m, [32m8126[0m),
  [33m[0m([32m"Psittacula krameri"[

In [6]:
val coF = data /"co-occurences.tsv"
def pairs = read.lines(coF) map (_.split("\t")) map {case Array(a, b, n) => (a, b, n.toInt)}

[36mcoF[0m: [32mdata[0m.[32mThisType[0m = /home/gadgil/code/ProvingGround/data/co-occurences.tsv
defined [32mfunction [36mpairs[0m

We use scientific names for analysis, but we also have extracted common names since they are nicer to see.

In [7]:
val commonF = data / "common-names.tsv"
val commonNames = read.lines(commonF) map (_.split("\t")) map {case Array(sc, comm) => (sc, comm)} toMap

[36mcommonF[0m: [32mdata[0m.[32mThisType[0m = /home/gadgil/code/ProvingGround/data/common-names.tsv
[36mcommonNames[0m: [32mMap[0m[[32mString[0m, [32mString[0m] = [33mMap[0m(
  [32m"Clamator coromandus"[0m -> [32m"Chestnut-winged Cuckoo"[0m,
  [32m"Alauda arvensis/gulgula"[0m -> [32m"Eurasian/Oriental Skylark"[0m,
  [32m"Ardenna carneipes"[0m -> [32m"Flesh-footed Shearwater"[0m,
  [32m"Ichthyophaga ichthyaetus"[0m -> [32m"Gray-headed Fish-Eagle"[0m,
  [32m"Otus lettia"[0m -> [32m"Collared Scops-Owl"[0m,
  [32m"Aythya nyroca"[0m -> [32m"Ferruginous Duck"[0m,
  [32m"Aegithalos iouschistos"[0m -> [32m"Black-browed Tit"[0m,
  [32m"Anatidae sp."[0m -> [32m"waterfowl sp."[0m,
  [32m"Cinclidium leucurum"[0m -> [32m"White-tailed Robin"[0m,
  [32m"Xenus cinereus"[0m -> [32m"Terek Sandpiper"[0m,
  [32m"Horornis flavolivaceus"[0m -> [32m"Aberrant Bush-Warbler"[0m,
  [32m"Catreus wallichii"[0m -> [32m"Cheer Pheasant"[0m,
  [32m"Accip

Let us look at the 250 most common birds, where how common is measured by how many checklists contain a bird.

In [8]:
val top250 = freqs.take(250).map(_._1)
show((top250 map (commonNames)).zipWithIndex)

[33mVector[0m(
  [33m[0m([32m"House Crow"[0m, [32m0[0m),
  [33m[0m([32m"Common Myna"[0m, [32m1[0m),
  [33m[0m([32m"White-throated Kingfisher"[0m, [32m2[0m),
  [33m[0m([32m"Large-billed Crow"[0m, [32m3[0m),
  [33m[0m([32m"Black Drongo"[0m, [32m4[0m),
  [33m[0m([32m"Indian Pond-Heron"[0m, [32m5[0m),
  [33m[0m([32m"Red-vented Bulbul"[0m, [32m6[0m),
  [33m[0m([32m"Red-whiskered Bulbul"[0m, [32m7[0m),
  [33m[0m([32m"Spotted Dove"[0m, [32m8[0m),
  [33m[0m([32m"Common Tailorbird"[0m, [32m9[0m),
  [33m[0m([32m"Oriental Magpie-Robin"[0m, [32m10[0m),
  [33m[0m([32m"Greater Coucal"[0m, [32m11[0m),
  [33m[0m([32m"White-cheeked Barbet"[0m, [32m12[0m),
  [33m[0m([32m"Rose-ringed Parakeet"[0m, [32m13[0m),
  [33m[0m([32m"Rock Pigeon"[0m, [32m14[0m),
  [33m[0m([32m"Black Kite"[0m, [32m15[0m),
  [33m[0m([32m"Purple-rumped Sunbird"[0m, [32m16[0m),
  [33m[0m([32m"Little Cormorant"[0m, [32m17[0m),


[36mtop250[0m: [32mVector[0m[[32mString[0m] = [33mVector[0m(
  [32m"Corvus splendens"[0m,
  [32m"Acridotheres tristis"[0m,
  [32m"Halcyon smyrnensis"[0m,
  [32m"Corvus macrorhynchos"[0m,
  [32m"Dicrurus macrocercus"[0m,
  [32m"Ardeola grayii"[0m,
  [32m"Pycnonotus cafer"[0m,
  [32m"Pycnonotus jocosus"[0m,
  [32m"Streptopelia chinensis"[0m,
  [32m"Orthotomus sutorius"[0m,
  [32m"Copsychus saularis"[0m,
  [32m"Centropus sinensis"[0m,
  [32m"Psilopogon viridis"[0m,
  [32m"Psittacula krameri"[0m,
  [32m"Columba livia"[0m,
  [32m"Milvus migrans"[0m,
  [32m"Leptocoma zeylonica"[0m,
  [32m"Microcarbo niger"[0m,
  [32m"Eudynamys scolopaceus"[0m,
[33m...[0m

We shall analyse which birds are seen together, but focussing attention on pairs with both among the top 250. This is because when we map birds to vectors, if we include all birds then the common ones cluster together.

In [9]:
val topSet = top250.toSet
val scientificNames = (commonNames map {case (s, c) => (c, s)}).toMap
val bothSeen = {for {(a, b, n) <- pairs if topSet.contains(a) && topSet.contains(b)} yield((a, b), n)}.toMap

[36mtopSet[0m: [32mSet[0m[[32mString[0m] = [33mSet[0m(
  [32m"Psittacula eupatria"[0m,
  [32m"Corvus macrorhynchos"[0m,
  [32m"Ardea alba"[0m,
  [32m"Mycteria leucocephala"[0m,
  [32m"Zosterops palpebrosus"[0m,
  [32m"Dendrocitta leucogastra"[0m,
  [32m"Ploceus philippinus"[0m,
  [32m"Artamus fuscus"[0m,
  [32m"Cyornis tickelliae"[0m,
  [32m"Platalea leucorodia"[0m,
  [32m"Dicaeum agile"[0m,
  [32m"Picus chlorolophus"[0m,
  [32m"Eumyias thalassinus"[0m,
  [32m"Dendrocopos nanus"[0m,
  [32m"Acridotheres ginginianus"[0m,
  [32m"Circus aeruginosus"[0m,
  [32m"Merops leschenaulti"[0m,
  [32m"Phaenicophaeus viridirostris"[0m,
  [32m"Ficedula parva"[0m,
[33m...[0m
[36mscientificNames[0m: [32mMap[0m[[32mString[0m, [32mString[0m] = [33mMap[0m(
  [32m"roller sp."[0m -> [32m"Coracias sp."[0m,
  [32m"Whimbrel"[0m -> [32m"Numenius phaeopus"[0m,
  [32m"Laughing Dove"[0m -> [32m"Streptopelia senegalensis"[0m,
  [32m"Tawny-breaste

In [10]:
val p = freqs.toMap

[36mp[0m: [32mMap[0m[[32mString[0m, [32mInt[0m] = [33mMap[0m(
  [32m"Clamator coromandus"[0m -> [32m42[0m,
  [32m"Alauda arvensis/gulgula"[0m -> [32m2[0m,
  [32m"Ardenna carneipes"[0m -> [32m49[0m,
  [32m"Ichthyophaga ichthyaetus"[0m -> [32m48[0m,
  [32m"Otus lettia"[0m -> [32m27[0m,
  [32m"Aythya nyroca"[0m -> [32m188[0m,
  [32m"Aegithalos iouschistos"[0m -> [32m14[0m,
  [32m"Anatidae sp."[0m -> [32m55[0m,
  [32m"Cinclidium leucurum"[0m -> [32m31[0m,
  [32m"Xenus cinereus"[0m -> [32m221[0m,
  [32m"Horornis flavolivaceus"[0m -> [32m37[0m,
  [32m"Catreus wallichii"[0m -> [32m11[0m,
  [32m"Accipiter butleri"[0m -> [32m1[0m,
  [32m"Merops apiaster"[0m -> [32m17[0m,
  [32m"Psittacula eupatria"[0m -> [32m527[0m,
  [32m"Hydroprogne caspia"[0m -> [32m190[0m,
  [32m"Elachura formosa"[0m -> [32m20[0m,
  [32m"Corvus macrorhynchos"[0m -> [32m10495[0m,
  [32m"Dryocopus javensis"[0m -> [32m174[0m,
[33m...[0m

The _co-occurence_ of two species is the ratio of the probability that they are seen together to what this probability would be if they were independent. We don't actually take the ratio but a constant multiple of the ratio as this makes no difference in the analysis.

In [11]:
def coOccurence(a: String, b: String) = 10000.0 * bothSeen((a, b)) / (p(a) * p(b))

defined [32mfunction [36mcoOccurence[0m

We already can see some interesting patterns from the data. Below we see the species that co-occur the most (first by scientific name, then the top 1000 pairs by common name)

In [12]:
val topPairs = for (a <- top250; b <- top250 if a != b) yield (a, b)


[36mtopPairs[0m: [32mVector[0m[([32mString[0m, [32mString[0m)] = [33mVector[0m(
  [33m[0m([32m"Corvus splendens"[0m, [32m"Acridotheres tristis"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Halcyon smyrnensis"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Corvus macrorhynchos"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Dicrurus macrocercus"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Ardeola grayii"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Pycnonotus cafer"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Pycnonotus jocosus"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Streptopelia chinensis"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Orthotomus sutorius"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Copsychus saularis"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Centropus sinensis"[0m),
  [33m[0m([32m"Corvus splendens"[0m, [32m"Psilopogon viridis"[0m),
  [33m[0m([32m"Corvus sple

In [13]:
val together = topPairs.sortBy((ab) => - coOccurence(ab._1, ab._2)).filter((ab) => (ab._1 < ab._2))

[36mtogether[0m: [32mVector[0m[([32mString[0m, [32mString[0m)] = [33mVector[0m(
  [33m[0m([32m"Hypsipetes leucocephalus"[0m, [32m"Psilopogon virens"[0m),
  [33m[0m([32m"Charadrius alexandrinus"[0m, [32m"Charadrius mongolus"[0m),
  [33m[0m([32m"Anas crecca"[0m, [32m"Anas strepera"[0m),
  [33m[0m([32m"Anas strepera"[0m, [32m"Anser indicus"[0m),
  [33m[0m([32m"Anser indicus"[0m, [32m"Tadorna ferruginea"[0m),
  [33m[0m([32m"Calidris minuta"[0m, [32m"Calidris temminckii"[0m),
  [33m[0m([32m"Phylloscopus xanthoschistos"[0m, [32m"Saxicola ferreus"[0m),
  [33m[0m([32m"Myophonus caeruleus"[0m, [32m"Phylloscopus xanthoschistos"[0m),
  [33m[0m([32m"Pycnonotus leucogenys"[0m, [32m"Saxicola ferreus"[0m),
  [33m[0m([32m"Phoenicurus ochruros"[0m, [32m"Sylvia curruca"[0m),
  [33m[0m([32m"Phylloscopus xanthoschistos"[0m, [32m"Pycnonotus leucogenys"[0m),
  [33m[0m([32m"Anas clypeata"[0m, [32m"Anas strepera"[0m),
  [33m[0

In [14]:
show(together map {case (x, y) => (commonNames(x), commonNames(y))} take (1000))

[33mVector[0m(
  [33m[0m([32m"Black Bulbul"[0m, [32m"Great Barbet"[0m),
  [33m[0m([32m"Kentish Plover"[0m, [32m"Lesser Sand-Plover"[0m),
  [33m[0m([32m"Green-winged Teal"[0m, [32m"Gadwall"[0m),
  [33m[0m([32m"Gadwall"[0m, [32m"Bar-headed Goose"[0m),
  [33m[0m([32m"Bar-headed Goose"[0m, [32m"Ruddy Shelduck"[0m),
  [33m[0m([32m"Little Stint"[0m, [32m"Temminck's Stint"[0m),
  [33m[0m([32m"Gray-hooded Warbler"[0m, [32m"Gray Bushchat"[0m),
  [33m[0m([32m"Blue Whistling-Thrush"[0m, [32m"Gray-hooded Warbler"[0m),
  [33m[0m([32m"Himalayan Bulbul"[0m, [32m"Gray Bushchat"[0m),
  [33m[0m([32m"Black Redstart"[0m, [32m"Lesser Whitethroat"[0m),
  [33m[0m([32m"Gray-hooded Warbler"[0m, [32m"Himalayan Bulbul"[0m),
  [33m[0m([32m"Northern Shoveler"[0m, [32m"Gadwall"[0m),
  [33m[0m([32m"Blue Whistling-Thrush"[0m, [32m"Great Barbet"[0m),
  [33m[0m([32m"Black Bulbul"[0m, [32m"Gray-hooded Warbler"[0m),
  [33m[0m([32m"No



We see that there are two kinds of pairs above:

* Those with the same geography, particularly those confined mainly to the himalayas or the western ghat.
* Those with similar habitats, most obviously water birds. Even better, we see waders co-occur with other waders and swimmers with other swimmers, and grassland birds occur with others.

## Force-directed layout

We visualize the results using a _force-directed layout_, using the implementation in _https://github.com/rsimon/scala-force-layout_

In [15]:
classpath.addPath("/home/gadgil/code/scala-force-layout/target/scala-2.11/scala-force-layout_2.11-0.4.0.jar")



In [16]:
import at.ait.dme.forcelayout._

[32mimport [36mat.ait.dme.forcelayout._[0m

In [17]:
import deepwalk4s._

import SvgGraphs._


[32mimport [36mdeepwalk4s._[0m
[32mimport [36mSvgGraphs._[0m

In [18]:
val birdNodes = topSet.toVector map ((s) => Node(s, commonNames(s)))

[36mbirdNodes[0m: [32mVector[0m[[32mNode[0m] = [33mVector[0m(
  [33mNode[0m(
    [32m"Psittacula eupatria"[0m,
    [32m"Alexandrine Parakeet"[0m,
    [32m1.0[0m,
    [32m0[0m,
    [33mList[0m(),
    [33mList[0m(),
    [33mNodeState[0m(
      [33mVector2D[0m([32m0.45907578782237324[0m, [32m0.019321055374912244[0m),
      [33mVector2D[0m([32m0.0[0m, [32m0.0[0m),
      [33mVector2D[0m([32m0.0[0m, [32m0.0[0m)
    )
  ),
  [33mNode[0m(
    [32m"Corvus macrorhynchos"[0m,
    [32m"Large-billed Crow"[0m,
    [32m1.0[0m,
    [32m0[0m,
    [33mList[0m(),
[33m...[0m

In [19]:
val birdEdges = for ((x, y) <- topPairs) yield Edge(Node(x, commonNames(x)), Node(y, commonNames(y)), coOccurence(x, y))

[36mbirdEdges[0m: [32mVector[0m[[32mEdge[0m] = [33mVector[0m(
  [33mEdge[0m(
    [33mNode[0m(
      [32m"Corvus splendens"[0m,
      [32m"House Crow"[0m,
      [32m1.0[0m,
      [32m0[0m,
      [33mList[0m(),
      [33mList[0m(),
      [33mNodeState[0m(
        [33mVector2D[0m([32m-0.2187561722736484[0m, [32m0.280245108196452[0m),
        [33mVector2D[0m([32m0.0[0m, [32m0.0[0m),
        [33mVector2D[0m([32m0.0[0m, [32m0.0[0m)
      )
    ),
    [33mNode[0m(
      [32m"Acridotheres tristis"[0m,
      [32m"Common Myna"[0m,
      [32m1.0[0m,
      [32m0[0m,
[33m...[0m

In [20]:
val birdGraph = new SpringGraph(birdNodes, birdEdges)

[36mbirdGraph[0m: [32mSpringGraph[0m = at.ait.dme.forcelayout.SpringGraph@106fad9c

In [21]:
birdGraph.doLayout(maxIterations = 3000)




In [22]:
val birdTriples = birdGraph.nodes.toVector map ((n) => (n.state.pos.x, n.state.pos.y, n.label))

[36mbirdTriples[0m: [32mVector[0m[([32mDouble[0m, [32mDouble[0m, [32mString[0m)] = [33mVector[0m(
  [33m[0m([32m244.18298343815485[0m, [32m32.43422228152828[0m, [32m"Alexandrine Parakeet"[0m),
  [33m[0m([32m46.99990366291408[0m, [32m-44.32769751148088[0m, [32m"Large-billed Crow"[0m),
  [33m[0m([32m103.75698510233357[0m, [32m21.28726034086808[0m, [32m"Great Egret"[0m),
  [33m[0m([32m354.34239330408036[0m, [32m-15.263688749163816[0m, [32m"Painted Stork"[0m),
  [33m[0m([32m-79.83362939002694[0m, [32m81.0052870677415[0m, [32m"Oriental White-eye"[0m),
  [33m[0m([32m-625.6861184459806[0m, [32m22.044998806442837[0m, [32m"White-bellied Treepie"[0m),
  [33m[0m([32m107.7530442402556[0m, [32m59.83810036758661[0m, [32m"Baya Weaver"[0m),
  [33m[0m([32m-116.02449762362188[0m, [32m-73.56173118989649[0m, [32m"Ashy Woodswallow"[0m),
  [33m[0m([32m-101.5849214576269[0m, [32m-53.550942694970516[0m, [32m"Tickell's Blue-Flyc

In [23]:
val bigBirdPlot = scatterPlot(birdTriples, width = 1200, height = 500, r = 2)

[36mbigBirdPlot[0m: [32mString[0m = [32m"""

      <div>
      
    
    <style>
      .labl {
        display: none;
      }
      .labelled:hover + .labl {
        display: inline;
      }
    </style>
    

    <svg version="1.1"
   baseProfile="full"
   width="1200" height="500"
   xmlns="http://www.w3.org/2000/svg">

[33m...[0m

### Co-occurence plots

On hovering over points in the pictures below, we can see what bird they represent. One can wander around the picture to see how birds have clustered.

In [24]:
display.html(bigBirdPlot)



To the right of the picture are himalayan birds, and to the left are those from Malabar. The lower part of the  main cluster is mainly water birds.

As geographical factors dominate the picture, it is worth separating these from others such as habitats. To do this, we take a variant of co-occurence where we take the ratio of the probability of a pair of species occuring together to what the  probability would be if their occurence were independent _conditioned on the geographic distribution_.

In [25]:
val birdLocFreqF = data / "bird-location-freqs.tsv"
val locFreqF = data / "location-freqs.tsv"

val locationFreqs = read.lines(locFreqF) map(_.split("\t")) map {case Array(l, n) => (l, n.toInt)} toMap

[36mbirdLocFreqF[0m: [32mdata[0m.[32mThisType[0m = /home/gadgil/code/ProvingGround/data/bird-location-freqs.tsv
[36mlocFreqF[0m: [32mdata[0m.[32mThisType[0m = /home/gadgil/code/ProvingGround/data/location-freqs.tsv
[36mlocationFreqs[0m: [32mMap[0m[[32mString[0m, [32mInt[0m] = [33mMap[0m(
  [32m"IN-DL"[0m -> [32m357[0m,
  [32m"IN-MH"[0m -> [32m1171[0m,
  [32m"IN-AP"[0m -> [32m249[0m,
  [32m"IN-LD"[0m -> [32m2[0m,
  [32m"IN-DD"[0m -> [32m4[0m,
  [32m"IN-CH"[0m -> [32m15[0m,
  [32m"IN-MP"[0m -> [32m139[0m,
  [32m"IN-PY"[0m -> [32m65[0m,
  [32m"IN-TR"[0m -> [32m14[0m,
  [32m"IN-OR"[0m -> [32m83[0m,
  [32m"IN-UL"[0m -> [32m1354[0m,
  [32m"IN-GA"[0m -> [32m1396[0m,
  [32m"IN-BR"[0m -> [32m217[0m,
  [32m"IN-ML"[0m -> [32m53[0m,
  [32m"IN-GJ"[0m -> [32m463[0m,
  [32m"IN-DN"[0m -> [32m1[0m,
  [32m"IN-TN"[0m -> [32m4127[0m,
  [32m"IN-JK"[0m -> [32m227[0m,
  [32m"IN-KL"[0m -> [32m11632[0m,
[33m...[

In [26]:
val birdLocFreqs = read.lines(birdLocFreqF) map(_.split("\t")) map {case Array(b, l, n) => ((b, l), n.toInt)} toMap

[36mbirdLocFreqs[0m: [32mMap[0m[([32mString[0m, [32mString[0m), [32mInt[0m] = [33mMap[0m(
  [33m[0m([32m"Motacilla flava"[0m, [32m"IN-BR"[0m) -> [32m2[0m,
  [33m[0m([32m"Parulidae sp."[0m, [32m"IN-KL"[0m) -> [32m13[0m,
  [33m[0m([32m"Ficedula hodgsoni"[0m, [32m"IN-WB"[0m) -> [32m16[0m,
  [33m[0m([32m"Mirafra erythroptera"[0m, [32m"IN-BR"[0m) -> [32m8[0m,
  [33m[0m([32m"Prunella immaculata"[0m, [32m"IN-WB"[0m) -> [32m23[0m,
  [33m[0m([32m"Lophura leucomelanos"[0m, [32m"IN-NL"[0m) -> [32m29[0m,
  [33m[0m([32m"Gallirallus striatus"[0m, [32m"IN-GJ"[0m) -> [32m1[0m,
  [33m[0m([32m"Bubulcus ibis"[0m, [32m"IN-UP"[0m) -> [32m1552[0m,
  [33m[0m([32m"Anthus cervinus"[0m, [32m"IN-HR"[0m) -> [32m2[0m,
  [33m[0m([32m"Anas querquedula/crecca"[0m, [32m"IN-TN"[0m) -> [32m1[0m,
  [33m[0m([32m"Mycteria leucocephala"[0m, [32m"IN-AS"[0m) -> [32m12[0m,
  [33m[0m([32m"Circus aeruginosus"[0m, [32m"IN-BR"

In [27]:
val places = locationFreqs.keys.toVector

[36mplaces[0m: [32mVector[0m[[32mString[0m] = [33mVector[0m(
  [32m"IN-DL"[0m,
  [32m"IN-MH"[0m,
  [32m"IN-AP"[0m,
  [32m"IN-LD"[0m,
  [32m"IN-DD"[0m,
  [32m"IN-CH"[0m,
  [32m"IN-MP"[0m,
  [32m"IN-PY"[0m,
  [32m"IN-TR"[0m,
  [32m"IN-OR"[0m,
  [32m"IN-UL"[0m,
  [32m"IN-GA"[0m,
  [32m"IN-BR"[0m,
  [32m"IN-ML"[0m,
  [32m"IN-GJ"[0m,
  [32m"IN-DN"[0m,
  [32m"IN-TN"[0m,
  [32m"IN-JK"[0m,
  [32m"IN-KL"[0m,
[33m...[0m

In [28]:
def birdLocNum(bird: String, loc: String) = birdLocFreqs.getOrElse((bird, loc), 0).toDouble

defined [32mfunction [36mbirdLocNum[0m

In [29]:
def pairInLoc(x: String, y: String, loc: String) = birdLocNum(x, loc) * birdLocNum(y, loc) / locationFreqs(loc)

defined [32mfunction [36mpairInLoc[0m

In [30]:
def seenSameLoc(x: String, y: String) = {places map (pairInLoc(x, y, _))}.sum

defined [32mfunction [36mseenSameLoc[0m

In [31]:
def localCoOccurence(x: String, y: String) = bothSeen(x, y) / seenSameLoc(x, y)

defined [32mfunction [36mlocalCoOccurence[0m

In [32]:
val byLocalCoOcc = {for{ (x, y) <- topPairs if seenSameLoc(x, y) > 0} yield 
                (x, y, localCoOccurence(x, y))} sortBy ((t) => -t._3)

[36mbyLocalCoOcc[0m: [32mVector[0m[([32mString[0m, [32mString[0m, [32mDouble[0m)] = [33mVector[0m(
  [33m[0m([32m"Gracula indica"[0m, [32m"Rhipidura albicollis"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"Dendrocitta leucogastra"[0m, [32m"Rhipidura albicollis"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"Rhipidura albicollis"[0m, [32m"Gracula indica"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"Rhipidura albicollis"[0m, [32m"Dendrocitta leucogastra"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"Mirafra affinis"[0m, [32m"Saxicola ferreus"[0m, [32m390.3333333333333[0m),
  [33m[0m([32m"Saxicola ferreus"[0m, [32m"Mirafra affinis"[0m, [32m390.3333333333333[0m),
  [33m[0m([32m"Psilopogon malabaricus"[0m, [32m"Saxicola ferreus"[0m, [32m292.75[0m),
  [33m[0m([32m"Saxicola ferreus"[0m, [32m"Psilopogon malabaricus"[0m, [32m292.75[0m),
  [33m[0m([32m"Turdoides affinis"[0m, [32m"Saxicola ferreus"[0m, [32m234.200

In [33]:
show(byLocalCoOcc map {case (x, y, n) => (commonNames(x), commonNames(y), n)} take 1000)

[33mVector[0m(
  [33m[0m([32m"Southern Hill Myna"[0m, [32m"White-throated Fantail"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"White-bellied Treepie"[0m, [32m"White-throated Fantail"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"White-throated Fantail"[0m, [32m"Southern Hill Myna"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"White-throated Fantail"[0m, [32m"White-bellied Treepie"[0m, [32m465.3333333333333[0m),
  [33m[0m([32m"Jerdon's Bushlark"[0m, [32m"Gray Bushchat"[0m, [32m390.3333333333333[0m),
  [33m[0m([32m"Gray Bushchat"[0m, [32m"Jerdon's Bushlark"[0m, [32m390.3333333333333[0m),
  [33m[0m([32m"Malabar Barbet"[0m, [32m"Gray Bushchat"[0m, [32m292.75[0m),
  [33m[0m([32m"Gray Bushchat"[0m, [32m"Malabar Barbet"[0m, [32m292.75[0m),
  [33m[0m([32m"Yellow-billed Babbler"[0m, [32m"Gray Bushchat"[0m, [32m234.20000000000002[0m),
  [33m[0m([32m"Gray Bushchat"[0m, [32m"Yellow-billed Babbler"[0m, [32m234.200000



In [34]:
val birdLocEdges = for ((x, y) <- topPairs) yield Edge(Node(x, commonNames(x)), Node(y, commonNames(y)), localCoOccurence(x, y))

[36mbirdLocEdges[0m: [32mVector[0m[[32mEdge[0m] = [33mVector[0m(
  [33mEdge[0m(
    [33mNode[0m(
      [32m"Corvus splendens"[0m,
      [32m"House Crow"[0m,
      [32m1.0[0m,
      [32m0[0m,
      [33mList[0m(),
      [33mList[0m(),
      [33mNodeState[0m(
        [33mVector2D[0m([32m0.4839669666914944[0m, [32m0.4848822396937428[0m),
        [33mVector2D[0m([32m0.0[0m, [32m0.0[0m),
        [33mVector2D[0m([32m0.0[0m, [32m0.0[0m)
      )
    ),
    [33mNode[0m(
      [32m"Acridotheres tristis"[0m,
      [32m"Common Myna"[0m,
      [32m1.0[0m,
      [32m0[0m,
[33m...[0m

In [35]:
val birdLocGraph = new SpringGraph(birdNodes, birdLocEdges)

[36mbirdLocGraph[0m: [32mSpringGraph[0m = at.ait.dme.forcelayout.SpringGraph@5d4dab43

In [36]:
birdLocGraph.doLayout(maxIterations = 3000)



In [37]:
val birdLocTriples = birdLocGraph.nodes.toVector map ((n) => (n.state.pos.x, n.state.pos.y, n.label))

[36mbirdLocTriples[0m: [32mVector[0m[([32mDouble[0m, [32mDouble[0m, [32mString[0m)] = [33mVector[0m(
  [33m[0m([32m1208.5794247404278[0m, [32m753.6699889774096[0m, [32m"Alexandrine Parakeet"[0m),
  [33m[0m([32m-458.16670543294015[0m, [32m-536.2105051943871[0m, [32m"Large-billed Crow"[0m),
  [33m[0m([32m-757.1440727866661[0m, [32m1287.6958046326067[0m, [32m"Great Egret"[0m),
  [33m[0m([32m-1431.0742321824405[0m, [32m3892.769002085436[0m, [32m"Painted Stork"[0m),
  [33m[0m([32m473.3690936750347[0m, [32m-1639.7080922546318[0m, [32m"Oriental White-eye"[0m),
  [33m[0m([32m993.5467289042905[0m, [32m-2758.681601265033[0m, [32m"White-bellied Treepie"[0m),
  [33m[0m([32m-299.3997492786954[0m, [32m1587.1546889558756[0m, [32m"Baya Weaver"[0m),
  [33m[0m([32m-24.274335130624408[0m, [32m125.90195005396666[0m, [32m"Ashy Woodswallow"[0m),
  [33m[0m([32m684.8407269152327[0m, [32m-1307.350226579547[0m, [32m"Tickell's Blu

In [38]:
val birdLocPlot = scatterPlot(birdLocTriples, width = 1200, height = 500, r = 2)

[36mbirdLocPlot[0m: [32mString[0m = [32m"""

      <div>
      
    
    <style>
      .labl {
        display: none;
      }
      .labelled:hover + .labl {
        display: inline;
      }
    </style>
    

    <svg version="1.1"
   baseProfile="full"
   width="1200" height="500"
   xmlns="http://www.w3.org/2000/svg">

[33m...[0m

In [39]:
display.html(birdLocPlot)



Once the geography has been separated out, one can see that habitats dominate. On the left end are clearly waterbirds, on the right it appears that we have birds that are in forested areas.

### Concluding remarks:

Given the limitations of the present e-bird data, it is clear that we cannot go much beyond this - indeed only the geographical regions and habitats that have the greatest impact on birds are visible. Nevertheless we speculate on analysis that can be done with more data of the same nature.

* We can look for species A and B such that
    * A and B are close in the force-directed graph, but
    * A and B do not co-occur as much as expected given their proximity.
 
 such species are likely to be occupying the same niche.
 
 * As mentioned in the introduction, one can naturally partition into _geographical regions_, _habitats_ and even _micro-habitats_, with geographical region clustering based on geodata as well as clustering of co-occurences.