# LOADING DATABASE IN ELASTIC

**Description:** this notebook executes and explains the loading data process for the assignment. The goal for this notebook is to load and create the index with our personal mapping document.

**Data:** the data loaded was from the Cloud Native Computing Foundation database.

**Team members:** Verónica Gómez, Carlos Grande y Pablo Olmos

**GitHub URL:** https://github.com/charlstown/CNCF_SurvivalAnalysis.git

# INDEX

* [1. Loading data](#loadingData)
* [2. Loading mapping](#loadingMapping)
* [3. Creating Index](#creatingIndex)
* [4. Uploading data to Elastic](#uploadingData)
---

## Libraries needed
These are the libraries needed to run all the chunks in the notebook.

In [2]:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import json
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Connecting to Elasticsearch

In [2]:
elastic = Elasticsearch(["http://127.0.0.1:9200"])
print(elastic)

<Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])>


## Importing scripts
In this section we import the attributes from our python library **elastic_loader** saved on folder "src".

In [3]:
import sys
sys.path.append('../02_src/')
import elastic_loader as el

## 1. Loading Data <a class="anchor" id="loadingData">

In [4]:
cncf_data = el.load_json('../01_data/cncf_git_data.json')

Loading json file... 

25 percent completed: 122926 documents in 29.04s
51 percent completed: 250769 documents in 36.17s
76 percent completed: 373695 documents in 42.56s
100 percent completed: 491703 documents in 49.03s


json file was loaded succesfully!


## 2. Loading Mapping <a class="anchor" id="loadingMapping">

In [17]:
with open('../01_data/mapping.json') as json_file:
    mapping = json.load(json_file)

In [18]:
print(json.dumps(mapping, indent=4))

{
    "settings": {
        "number_of_shards": 5,
        "number_of_replicas": 2
    },
    "mappings": {
        "properties": {
            "Author_gender": {
                "type": "keyword"
            },
            "demography_min_date": {
                "type": "date"
            },
            "metadata__gelk_backend_name": {
                "type": "keyword"
            },
            "tz": {
                "type": "long"
            },
            "project": {
                "type": "keyword"
            },
            "metadata__timestamp": {
                "type": "date"
            },
            "uuid": {
                "type": "keyword"
            },
            "Author_user_name": {
                "type": "keyword"
            },
            "cm_title": {
                "type": "keyword"
            },
            "Commit_id": {
                "type": "keyword"
            },
            "Commit_user_name": {
                "type": "keyword"
            },


## 3. Creating the index <a class="anchor" id="creatingIndex">

In [8]:
idx = 'cncf_mapped'

In [11]:
elastic.indices.create(index = idx, body = mapping)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'cncf_mapped'}

### Checking the mapping

In [19]:
mapping = elastic.indices.get_mapping(idx) # extraemos el mapping
mapping_keys = mapping[idx]["mappings"].keys()
doc_type = list(mapping_keys)[0]
schema = mapping[idx]['mappings']['properties']
print(json.dumps(schema, indent=4))

{
    "Author_bot": {
        "type": "boolean"
    },
    "Author_domain": {
        "type": "keyword"
    },
    "Author_gender": {
        "type": "keyword"
    },
    "Author_gender_acc": {
        "type": "long"
    },
    "Author_id": {
        "type": "keyword"
    },
    "Author_name": {
        "type": "keyword"
    },
    "Author_org_name": {
        "type": "keyword"
    },
    "Author_user_name": {
        "type": "keyword"
    },
    "Author_uuid": {
        "type": "keyword"
    },
    "Commit_bot": {
        "type": "boolean"
    },
    "Commit_domain": {
        "type": "keyword"
    },
    "Commit_gender": {
        "type": "keyword"
    },
    "Commit_gender_acc": {
        "type": "long"
    },
    "Commit_id": {
        "type": "keyword"
    },
    "Commit_name": {
        "type": "keyword"
    },
    "Commit_org_name": {
        "type": "keyword"
    },
    "Commit_user_name": {
        "type": "keyword"
    },
    "Commit_uuid": {
        "type": "keyword"
    },


## 4. Uploading data to Elastic <a class="anchor" id="uploadingData">

In [9]:
el.upload_to_index(elastic, idx, cncf_data)

Starting indexing... 

25 percent completed: 122926 documents in 1820.79s
51 percent completed: 250769 documents in 3705.57s
76 percent completed: 373695 documents in 5524.62s
100 percent completed: 491703 documents in 7277.13s


json file was loaded succesfully!
