## Microtask 4 ##
Perform any other analysis you may find interesting, based on GrimoireLab enriched indexes for git and GitHub repositories.

#### Analysing contributions by Geography ####
In this microtask we will be analysing user contributions by the geographical region

In [7]:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from pprint import pprint
import matplotlib.pyplot as plt
import pandas as pd
import requests
import os

We will be using the _[gmaps](https://github.com/pbugnion/gmaps)_ library for plotting the geographical data

In [4]:
import gmaps
gmaps.configure(api_key=os.environ["GOOGLE_API_KEY"])

**Step1**: Gather all the repositories for the Chaoss orgranization from the Github API and store the name of the repositories in the `repo_names` list

In [8]:
chaoss_repositories = requests.get('https://api.github.com/users/chaoss/repos')
repo_names = [chaoss_repo['name'] for chaoss_repo in chaoss_repositories.json()]

**Step2**: Use the `p2o.py` script to fetch the github data for all the repositories, and store the raw records in `chaoss_github_raw` index and the enriched data in `chaoss_github` index

In [None]:
for repo_name in repo_names:
    !p2o.py --enrich --index chaoss_github_raw --index-enrich chaoss_github -e http://localhost:9200 --no_inc --debug github chaoss {repo_name} -t 47fb3a29d189e294e3fdc83ecfffb13f737fb684 --sleep-for-rate

**Step3**: Instantiate the Elasticsearch client

In [5]:
es_client = Elasticsearch('http://localhost:9200')

**Step4**: Create a DSL search object to query the enriched `chaoss_github` index created earlier. The search query will pick out the `user_geolocation` field for all `pull requests`

In [27]:
github_search = Search(using=es_client, index='chaoss_github')\
               .source(['user_geolocation'])\
               .filter('term', item_type='pull request')
github_results = github_search.scan()

**Step5**: Pull out the `(latitude, longitude)` values from the pull requests data (when present) into a list

In [28]:
locations = [(result['user_geolocation']['lat'], result['user_geolocation']['lon']) 
             for result in github_results if result['user_geolocation']]

**Step6**: Plot a heatmap showing the distribution of pull requests by geographical location using the gmaps library

In [29]:
map_plot = gmaps.figure()
heatmap_layer = gmaps.heatmap_layer(locations, point_radius=10, max_intensity=5)
map_plot.add_layer(heatmap_layer)
map_plot