# Welome to the HackZurich Credit Suisse Challenge!

First before you begin you will have to gain access to the IS&P research data and the Dow Jones news feed. 
To get to the data you will need to do the following: 
    
1.  Clone the github repo (which you've probably already done) located here. 
2.  Locate the cert file within the repo and make note of it (you will insert the file path below).
3.  Get the ElasticSearch password from one of the Credit Suisse organizers.
4.  Make sure you have PIP installed the latest SSL and ElasticSearch Libraries.
5.  Next in the code below you will have to point the context variable to the cert you just downloaded from this git     repo and enter the password we will give you in the http_auth argument.  
6.  Run the code and you are off to the races!  

##  So whats in the data? and where and how are you processing it?
We are using ElasticSearch 7.1 hosted within a Kubernetes cluster on the Google Cloud running within Zurich.  
If you are unfamiliar with ElasticSearch you can read more about it here:  https://www.elastic.co, 
be sure to read about the 7.1 version docs.
                           
## There are two indexes within the ES cluster you will be connecting to:
1.  isp: This is the internal Credit Suisse research.  There are 10,000 documents in this index.  The data model is 
    pretty straight forward once you begin hitting the cluster and digging in.
2.  dj:  This is the DowJones machine readable news feed.  The data model is much more robust than IS&P's so below you find an example.

## GOOD LUCK!

In [45]:
import requests, json, os
from elasticsearch import Elasticsearch
from ssl import create_default_context
import datetime
import operator

context = create_default_context(cafile=r"C:\Users\Tyukhova\Desktop\HackZurich\HackZurich2019\client.cer")
es = Elasticsearch(['https://data.schnitzel.tech:9200'], http_auth=('hack_zurich', 'punctualunicorns'), ssl_context=context)


In [46]:
es.info()

{'cluster_name': 'hack-zurich-es-cluster',
 'cluster_uuid': 'HQ9h7ToKTQuFf8D8Sxf69w',
 'name': 'hack-zurich-es-cluster-es-b96tlj45g6',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2019-05-16T00:43:15.323135Z',
  'build_flavor': 'default',
  'build_hash': '606a173',
  'build_snapshot': False,
  'build_type': 'docker',
  'lucene_version': '8.0.0',
  'minimum_index_compatibility_version': '6.0.0-beta1',
  'minimum_wire_compatibility_version': '6.8.0',
  'number': '7.1.0'}}

### Querying the cluster.
Now you are connected to the ES Cluster and we can begin to query the results.  See below for a sample query and formatting the results.

In [None]:
key_words = {"U.S.":0, 
             "Switzerland":0, 
             "San Paolo":0,
             "China":0,
             "United Nations": 0,
             "EU":0,
             "sanctions":0, 
             "memorandum":0,
             "elections":0,
             "meeting":0,
             "Donald Trump":0,
             "Boris Jonson":0,
             "Kim Kardashian":0,
             "Greta Thunberg" : 0,
             "carbon emission":0,
             "flood":0,
             "rain forest": 0,
             "temperature": 0,
             "pollution": 0,
             "cyber attack": 0,
             "malware":0,
             "smartphones":0,
             "drones":0,
             "network":0,
             "HUAWEI": 0,
             "Siemens": 0,
             "Google":0,
             "Microsoft":0,
             "crude oil":0,
             "measles":0,
             "cancer":0,
             "penguins": 0,
             "film festival":0
            }



In [609]:
my_topic = "cyber security "
for key in key_words:
    res = es.search(index="dj", body={
        "query": {
            "bool":{
                   "must": [
                       { "match": {"body" : key} },
                       { "match": {"body" : my_topic} }
                       ],
                    "filter" : { 
                        "range": {
                        "publication_datetime" : {"gte":"now-60d", "lt" : "now"}
                    }
                }
            }
        }
    }
    )
    key_words[key] = res['hits']['total']['value']

{'U.S.': 2162, 'Switzerland': 82, 'San Paolo': 426, 'China': 1236, 'United Nations': 2947, 'EU': 393, 'sanctions': 378, 'memorandum': 93, 'elections': 562, 'meeting': 1502, 'Donald Trump': 1363, 'Boris Jonson': 138, 'Kim Kardashian': 223, 'Greta Thunberg': 4, 'carbon emission': 201, 'flood': 88, 'rain forest': 235, 'temperature': 79, 'pollution': 67, 'cyber attack': 1705, 'malware': 76, 'smartphones': 93, 'drones': 120, 'network': 1015, 'HUAWEI': 130, 'Siemens': 26, 'Google': 225, 'Microsoft': 139, 'crude oil': 645, 'measles': 7, 'cancer': 136, 'pinguins': 0, 'film festival': 457}


In [640]:
sorted_key_words = sorted(key_words.items(), key=operator.itemgetter(1), reverse = True)
top_values = 5;
list_of_pairs = []
for i in range(0,5):
    for j in range(i+1,5):
        list_of_pairs.append([sorted_key_words[i][0],sorted_key_words[j][0]])

[('United Nations', 2947), ('U.S.', 2162), ('cyber attack', 1705), ('meeting', 1502), ('Donald Trump', 1363), ('China', 1236), ('network', 1015), ('crude oil', 645), ('elections', 562), ('film festival', 457), ('San Paolo', 426), ('EU', 393), ('sanctions', 378), ('rain forest', 235), ('Google', 225), ('Kim Kardashian', 223), ('carbon emission', 201), ('Microsoft', 139), ('Boris Jonson', 138), ('cancer', 136), ('HUAWEI', 130), ('drones', 120), ('memorandum', 93), ('smartphones', 93), ('flood', 88), ('Switzerland', 82), ('temperature', 79), ('malware', 76), ('pollution', 67), ('Siemens', 26), ('measles', 7), ('Greta Thunberg', 4), ('pinguins', 0)]
[['United Nations', 'U.S.'], ['United Nations', 'cyber attack'], ['United Nations', 'meeting'], ['United Nations', 'Donald Trump'], ['U.S.', 'cyber attack'], ['U.S.', 'meeting'], ['U.S.', 'Donald Trump'], ['cyber attack', 'meeting'], ['cyber attack', 'Donald Trump'], ['meeting', 'Donald Trump']]


In [641]:
frequencies_word_pairs = []
for word_pair in list_of_pairs:
    res = es.search(index="dj", body={
        "query": {
            "bool":{
                   "must": [
                       { "match": {"body" : word_pair[0]} },
                       { "match": {"body" : word_pair[1]} },
                       { "match": {"body" : my_topic} }
                       ],
                    "filter" : { 
                        "range": {
                        "publication_datetime" : {"gte":"now-60d", "lt" : "now"}
                    }
                }
            }
        }
    }
    )
    frequencies_word_pairs.append(res['hits']['total']['value'])





In [660]:
frequency_key_pairs = zip(list_of_pairs, frequencies_word_pairs)
sorted_key_pairs = sorted(frequency_key_pairs, key=operator.itemgetter(1), reverse = True)

target_pair = sorted_key_pairs[0][0]



In [723]:
res = es.search(index="dj", body={
    "query": {
        "bool" : {
            "must" : [
                {"match" : { "body" : my_topic},
                 "match" : { "body" : target_pair[0]},
                 "match" : { "body" : target_pair[1]},
                }
            ],
            "filter" : { 
                "range": {
                    "publication_datetime" : {"gte": "now-60d", "lt": "now"}
                }
            }
        }
    },"sort":{         
            "_score": {
                "order":"desc"
                }
            }
})


app_json = json.dumps(res['hits']['hits'][0])
print(app_json)

{'_index': 'dj', '_type': '_doc', '_id': 'RWGsdG0Bzcm26q2GdUeH', '_score': 13.161455, '_source': {'an': 'RVESEN0020190802ef8200e4d', 'modification_datetime': '1564765603000', 'ingestion_datetime': '1564765603000', 'publication_date': '1564765583000', 'publication_datetime': '1564765583000', 'snippet': 'UNITED NATIONS, August 2 (Sputnik) - The United Nations urges all parties to the conflict in Syria to allow a new delivery of humanitarian aid to the Rukban refugee camp, UN Office for the Coordination of Humanitarian Affairs (OCHA) spokesperson Russell Geekie told Sputnik on Friday.\n\n“The United Nations… calls on all parties to allow another delivery," Geekie said.“[It] continues to advocate and call for safe, sustained and unimpeded humanitarian access to Rukban, as well as to all those in need throughout Syria\u200b\u200b\u200b.”', 'body': "In February, a United Nations convoy distributed food, health and nutrition supplies, hygiene kits and recreational materials to refugees in Ruk

In [8]:
# Example DJ Data model:
import json
obj = json.loads("""{
  "_id": "ID",
  "_index": "INDEX NAMEr",
  "_score": 0,
  "_source": {
    "action": "add",
    "an": "ARTICLE ID",
    "art": "",
    "body": "MAIN TEXT",
    "byline": "",
    "company_codes": "COMMA SEPERATED LIST",
    "company_codes_about": "COMMA SEPERATED LIST",
    "company_codes_association": "",
    "company_codes_lineage": "",
    "company_codes_occur": ",ulvr,amesec,",
    "company_codes_relevance": "COMMA SEPERATED LIST",
    "copyright": "CW",
    "credit": "",
    "currency_codes": "",
    "document_type": "article",
    "industry_codes": "COMMA SEPERATED LIST",
    "ingestion_datetime": "INT",
    "language_code": "LANGUAGE CODE",
    "market_index_codes": "",
    "modification_date": "INT",
    "modification_datetime": "INT",
    "person_codes": "",
    "publication_date": "INT",
    "publication_datetime": "INT",
    "publisher_name": "Business Wire, Inc.",
    "region_codes": "COMMA SEPERATED LIST",
    "region_of_origin": "NAMZ USA",
    "snippet": "SAMPLE SNIPPET",
    "source_code": "BWR",
    "source_name": "Business Wire",
    "subject_codes": "COMMA SEPERATED LIST",
    "title": "TITLE",
    "word_count": "INT"
  },
  "_type": "_doc",
  "_version": 1,
  "fields": {
    "modification_datetime": [
      "2014-01-02T19:02:26.000Z"
    ],
    "publication_datetime": [
      "2014-01-02T19:02:00.000Z"
    ]
  }
}""")
