# Welome to the HackZurich Credit Suisse Challenge!

First before you begin you will have to gain access to the IS&P research data and the Dow Jones news feed. 
To get to the data you will need to do the following: 
    
1.  Clone the github repo (which you've probably already done) located here. 
2.  Locate the cert file within the repo and make note of it (you will insert the file path below).
3.  Get the ElasticSearch password from one of the Credit Suisse organizers.
4.  Make sure you have PIP installed the latest SSL and ElasticSearch Libraries.
5.  Next in the code below you will have to point the context variable to the cert you just downloaded from this git     repo and enter the password we will give you in the http_auth argument.  
6.  Run the code and you are off to the races!  

##  So whats in the data? and where and how are you processing it?
We are using ElasticSearch 7.1 hosted within a Kubernetes cluster on the Google Cloud running within Zurich.  
If you are unfamiliar with ElasticSearch you can read more about it here:  https://www.elastic.co, 
be sure to read about the 7.1 version docs.
                           
## There are two indexes within the ES cluster you will be connecting to:
1.  isp: This is the internal Credit Suisse research.  There are 10,000 documents in this index.  The data model is 
    pretty straight forward once you begin hitting the cluster and digging in.
2.  dj:  This is the DowJones machine readable news feed.  The data model is much more robust than IS&P's so below you find an example.

## GOOD LUCK!

In [31]:
import requests, json, os
from elasticsearch import Elasticsearch
from ssl import create_default_context

context = create_default_context(cafile="/PATH TO CERT/git/HackZurich2019/client.cer")
es = Elasticsearch(['https://data.schnitzel.tech:9200'], http_auth=('hack_zurich', 'PASSWORD'), ssl_context=context)


In [32]:
es.info()

{'name': 'hack-zurich-es-cluster-es-n8jsk8wbrp',
 'cluster_name': 'hack-zurich-es-cluster',
 'cluster_uuid': 'HQ9h7ToKTQuFf8D8Sxf69w',
 'version': {'number': '7.1.0',
  'build_flavor': 'default',
  'build_type': 'docker',
  'build_hash': '606a173',
  'build_date': '2019-05-16T00:43:15.323135Z',
  'build_snapshot': False,
  'lucene_version': '8.0.0',
  'minimum_wire_compatibility_version': '6.8.0',
  'minimum_index_compatibility_version': '6.0.0-beta1'},
 'tagline': 'You Know, for Search'}

### Querying the cluster.
Now you are connected to the ES Cluster and we can begin to query the results.  See below for a sample query and formatting the results.

In [33]:
res = es.search(index="isp", body={"query": {"match_all": {}}})

In [34]:
print("Got %d Hits:" % res['hits']['total']['value'])
for hit in res['hits']['hits']:
    print("%(tm)s  %(body)s" % hit["_source"])

Got 10000 Hits:
2019-02-22T04:16:00.317Z  Waertsilae Corporation
2018-12-04T22:50:29.978Z  Hess Corp
2019-02-08T04:14:07.081Z  Investment case◾ With around one-third of earnings from each of Corporate & Investment Banking (CIB) and French retail, SocGen is more exposed to these areas than French peers. With International retail also delivering low returns, SocGen has the lowest 2018E group Return on Tangible Equity (ROTE) forecast at around 8%.
◾ A 50% payout gives SocGen a good dividend yield which is supportive, but its ability to absorb regulatory Risk Weighted Asset (RWA) inflation is lower than the mutually owned French banks, and there is some residual tail risk from litigation.
◾ Risks include performance of international operations (particularly Russia), greater exposure to French retail, and regulatory RWA inflation (where coverage at SocGen is lower than mutually owned peers).

2019-05-03T14:55:23.467Z  ◾ translated text: Der procure.ch Purchasing Managers’ Index (PMI)
◾ tran

In [3]:
# Example DJ Data model:
import json
obj = json.loads("""{
  "_id": "ID",
  "_index": "INDEX NAMEr",
  "_score": 0,
  "_source": {
    "action": "add",
    "an": "ARTICLE ID",
    "art": "",
    "body": "MAIN TEXT",
    "byline": "",
    "company_codes": "COMMA SEPERATED LIST",
    "company_codes_about": "COMMA SEPERATED LIST",
    "company_codes_association": "",
    "company_codes_lineage": "",
    "company_codes_occur": ",ulvr,amesec,",
    "company_codes_relevance": "COMMA SEPERATED LIST",
    "copyright": "CW",
    "credit": "",
    "currency_codes": "",
    "document_type": "article",
    "industry_codes": "COMMA SEPERATED LIST",
    "ingestion_datetime": "INT",
    "language_code": "LANGUAGE CODE",
    "market_index_codes": "",
    "modification_date": "INT",
    "modification_datetime": "INT",
    "person_codes": "",
    "publication_date": "INT",
    "publication_datetime": "INT",
    "publisher_name": "Business Wire, Inc.",
    "region_codes": "COMMA SEPERATED LIST",
    "region_of_origin": "NAMZ USA",
    "snippet": "SAMPLE SNIPPET",
    "source_code": "BWR",
    "source_name": "Business Wire",
    "subject_codes": "COMMA SEPERATED LIST",
    "title": "TITLE",
    "word_count": "INT"
  },
  "_type": "_doc",
  "_version": 1,
  "fields": {
    "modification_datetime": [
      "2014-01-02T19:02:26.000Z"
    ],
    "publication_datetime": [
      "2014-01-02T19:02:00.000Z"
    ]
  }
}""")


In [41]:
#SAMPLE Query to extract articles about Gold and Precious Metals from DJ data.

body_json = json.loads(
"""{
  "from": 0,
  "size": 2000,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "Gold",
                "fields": [
                  "title^3",
                  "body^2"
                ],
                "type": "most_fields",
                "fuzziness": "AUTO"
              }
            }
          ],
          "should": [
            {
              "multi_match": {
                "query": "Silver and Heavy Metal",
                "fields": [
                  "title^5",
                  "snippet^2",
                  "body^3"
                ],
                "type": "best_fields",
                "fuzziness": "AUTO"
              }
            },
            {
              "range": {
                "publication_date": {
                  "gte": "1569000000",
                  "lte": "1569668677"
                }
              }
            }
          ]
        }
      }
    }
  }
}""")

In [42]:
body_json

{'from': 0,
 'size': 2000,
 'query': {'function_score': {'query': {'bool': {'must': [{'multi_match': {'query': 'Gold',
        'fields': ['title^3', 'body^2'],
        'type': 'most_fields',
        'fuzziness': 'AUTO'}}],
     'should': [{'multi_match': {'query': 'Silver and Heavy Metal',
        'fields': ['title^5', 'snippet^2', 'body^3'],
        'type': 'best_fields',
        'fuzziness': 'AUTO'}},
      {'range': {'publication_date': {'gte': '1569000000',
         'lte': '1569668677'}}}]}}}}}

In [43]:
res = es.search(index="isp", body=body_json)

In [44]:
res

{'took': 1317,
 'timed_out': False,
 '_shards': {'total': 10, 'successful': 10, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 406, 'relation': 'eq'},
  'max_score': 67.237755,
  'hits': [{'_index': 'isp',
    '_type': '_doc',
    '_id': 'BOjPc20BlBsXk_7wmvdf',
    '_score': 67.237755,
    '_source': {'d': '◾ WPM provides investors with exposure to silver and gold through a portfolio of producing precious metals streaming agreements and a pipeline of longer-dated early deposit agreements. Silver and gold accounts for 60% and 40%, respectively, of WPM’s revenues providing investors with leverage to precious metal prices.\n◾ WPM’s streaming business provides investors with a high-margin business, while maintaining a fixed (and low) cost basis is a unique differentiator relative to traditional precious metals miners for exposure to the underlying silver and gold price.\n◾ Risks include (i) financial and commodity price risk (silver and gold), (ii) operations risk of miners opera