## Assignment-2: Tweets indexing, query processing and visualization

##### visualization link : 
https://docs.google.com/document/d/1BgA0HOoDmrNcvpjpzYpsJ9aDtGa3a0KgY3EQ6CnXRqU/edit?usp=sharing 

#### Write a python application that :

####  prepare the index mapping with the following properties:
- the tweet ID should be of type "keyword"
- the text of tweets should be of type "text"
- the tweets creation date should be of type "date"
- coordinates field should be geo_point

In [1]:
import sys
import json
from pprint import pprint
import json
import datetime
from elasticsearch import Elasticsearch, helpers

#elastic object prepare 
es = Elasticsearch("http://localhost:9200/")
index_name="tweet"

# This test is done during development only. 
if es.indices.exists(index=index_name):
    es.indices.delete(index=index_name)
    

# index settings
settings = {
"mappings": {
        "properties": {
          "created_at": {
          "type": "date",
          "fields": {
            "keyword": {
              "type": "keyword",
            }
          }
        },
        "id": {
          "type": "keyword",
        },
         "id_str": {
          "type": "keyword",
        },
        "text": {
          "type": "text"
        },
        "coordinates": {
          "type": "geo_point"
        }
        }

}

    }
# create index
es.indices.create(index=index_name , body=settings)



ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'tweet'})

## - reads and inserts tweets into an ES index (call the index 'tweets') 

In [2]:
#open file function in order to read a JSON file and save it into list
def read_file(file_name):
    with open(file_name,encoding="utf8") as f:
        while(True):
            line = f.readline()
            if not line:
                break
            line = json.loads(line)
            line['created_at'] = datetime.datetime.strptime(line['created_at'], '%a %b %d %H:%M:%S %z %Y').isoformat()
            #cleare the print
            yield line


In [3]:
# code for insert list of docs in JSON file into elastic search 

helpers.bulk(es,read_file("D:\\tweets\\boulder_flood_geolocated_tweets.json"),index = index_name)

print ("data added successfully into index name : " , index_name)


data added successfully into index name :  tweet


##  queries that combine creation date with text to search for tweets
### first query : filter by date and text :



In [19]:
search_query2 = {
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "created_at": {
                            "gte": "2013-12-31T07:14:22+00:00",
                            "lt": "2015-12-31T07:14:22+00:00"
                        }
                    }
                }
            ,
            {"geo_distance": {"distance": "50000km","coordinates": {"lat":"-78.96225","lon":"100.4083"}}},

            ],
                "must" :[
                    {
                    "match": {
                        "text": "Ringing"
                    }
                }
            ]
        }
    }
}

### second query : finding tweet published during a certain time interval, withen a certain bounding box and having a certain word.

In [20]:
search_query = {
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "created_at": {
                            "gte": "2013-12-31T07:14:22+00:00",
                            "lt": "2015-12-31T07:14:22+00:00"
                        }
                    }
                }
            ,
                {
                "geo_shape": {
                  "coordinates": {
                    "relation": "WITHIN",
                    "shape": {
                      "coordinates": [
                        [
                          [
                            -125.15606,
                            44.4083
                          ],
                          [
                            -125.15606,
                            29.44405
                          ],
                          [
                            -78.96225,
                            44.4083
                          ]
                        ]
                      ],
                      "type": "Polygon"
                    }
                  }
                }
                }
            ],
                "must" :[
                    {
                    "match": {
                        "text": "Ringing"
                    }
                }
            ]
        }
    }
}

In [21]:
#sending query :
resp = es.search(index=index_name, body=search_query2)
#resp2 = es.search(index=index_name, body=search)

  


In [22]:
# show the result of applying query :
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
    print("%(created_at)s id : %(id)s:  text : %(text)s" % hit["_source"])


Got 1 Hits:
2014-01-01T04:57:01+00:00 id : 418244446191239168:  text : Ringing in the #NewYear @BMoCA for their NYE at the Factory event! Surrounded by #art and Warholesque fun :) #Boulder
