### Problem Statement

The data is originally taken from the [NYC Open Data website](https://opendata.cityofnewyork.us/) and contains data related to park events in the New York City area.

The data provided here contain two collections - **events** and **neighbourhoods**.

**events** collection documents have the following fields - 

- `event_id` - Unique event id

- `title` - Name of the event

- `start_date_time` - The start date and time of the event

- `end_date_time` - The end date and time of the event

- `snippet` - A brief description of the event

- `cost_free` - Indicating whether an event is free (0) or not (1)

- `must_see` - Indicates if event should be featured on Parks website with "Must See" banner. 0 if event is not featured and 1 if event is featured.

- `location_name` - Location name where event takes place

- `location` - Longitude and latitude of the location of event


**neighbourhoods** collection documents have the following fields -

- `properties` - Embedded document containing information related to the neighbourhood

>- `ntacode` - Neighbourhood code
>- `ntaname` - Neighbourhood name
>- `boro_code` - Code of borough in which neighbourhood falls
>- `boro_name` - name of borough in which neighbourhood falls

- `geometry` - GEOJSON object containing coordinates of boundary of the neighbourhood 



----

*The data for **events** collection has been originally taken from - https://data.cityofnewyork.us/browse?Data-Collection_Data-Collection=NYC+Parks+Events&sortBy=most_accessed&utf8=%E2%9C%93*

*The data for **neighbourhoods** collectio has been originally taken from - https://data.cityofnewyork.us/City-Government/Neighborhood-Tabulation-Areas-NTA-/cpf4-rkhq*


----

### Connecting to MongoDB


----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp

pp.sorted = lambda x, key=None: x

In [2]:
client = pymongo.MongoClient("mongodb://localhost:27017/")

---
### Importing data

----

In [3]:
# # Restore database
# !mongorestore /home/avadmin/Desktop/Mongo/Content/Indexing/Assignment/Data/indexing_assignment

In [3]:
# Data
db = client['events']

In [4]:
# List collections
db.list_collection_names()

['neighbourhoods', 'events']

In [5]:
# Sample document
pp.pprint(
    db.events.find_one()
)

{'_id': ObjectId('60d9cb7310d0be7a77638579'),
 'event_id': 173635,
 'title': 'Central Park Tour: Iconic Views of Central Park',
 'start_date_time': datetime.datetime(2018, 10, 21, 11, 0),
 'end_date_time': datetime.datetime(2018, 10, 21, 12, 30),
 'snippet': 'Some of New York’s most iconic sights are found in Central Park, '
            'including the fountain at Bethesda Terrace and Bow Bridge. Join '
            'Central Park Conservancy guides for an insider’s look.',
 'cost_free': 0,
 'must_see': 0,
 'location_name': 'Dairy Visitor Center & Gift Shop',
 'location': {'type': 'Point',
              'coordinates': [-73.973614931107, 40.769109102536]}}


In [6]:
# Sample document
pp.pprint(
    db.neighbourhoods.find_one()
)

{'_id': ObjectId('60d9d8036fa8d9e558634f2c'),
 'properties': {'ntacode': 'BK88',
                'ntaname': 'Borough Park',
                'boro_name': 'Brooklyn',
                'boro_code': '3'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-73.97604935657381, 40.631275905646774],
                                [-73.97716511994669, 40.63074665412933],
                                [-73.97699848928193, 40.629871496125375],
                                [-73.9768496430902, 40.6290885814784],
                                [-73.97669604371914, 40.628354564208756],
                                [-73.97657775689153, 40.62757318681896],
                                [-73.9765146210018, 40.627294490493874],
                                [-73.97644970441577, 40.627008255472994],
                                [-73.97623453682755, 40.625976350730234],
                                [-73.97726150032737, 40.6258527728136],
                              

----
### Assignment Questions


Note - View all queries before attempting the questions. Use proper indexing to answer the questions.

----

### Q1

How many events were `must see events`?

In [10]:
# Enter your code here
db.events.count_documents({"must_see":{"$eq":1}})

4360

### Q2

How `many events` were must see as well as `cost free`?

In [11]:
# Enter your code here
db.events.count_documents({"$and":[{"must_see":{"$eq":1}},{"cost_free":{"$eq":1}}]})

3643

### Q3

How many `must see and cost free events` were held after `2018-01-01`?

In [87]:
db.events.create_index('start_date_time')

'start_date_time_1'

In [13]:
from datetime import datetime

In [88]:
# Enter your code here
db.events.count_documents({"$and":[{"must_see":{"$eq":1}},{"cost_free":{"$eq":1}},{"start_date_time":{"$gt":datetime(2018,1,1)}}]})

597

### Q4

How many indexes did you use to answer the above queries? List the index keys for each index used.

In [11]:
# Answer
1 index created on start_date_time
Key - start_date_time_1


### Q5

What was the combined size of all the index created for the above queries?

In [12]:
# Answer

### Q6

How many events have the exact term `"Central Park" but not the term "Tour"` in the `title` field? 

***Hint - You will need to create a text index here.***

In [33]:
# Enter your code here
#1st line is to  create an index on the text field ... once index gets created this doesnt need to be executed
#db.events.create_index([('title', 'text')],default_language ="english")
db.events.count_documents({"$text": {"$search": "Central Park -Tour"}})

18250

### Q7

How many events were held in `Williamsburg` neighbourhood of `Brooklyn` borough?

***Hint - Create geospatial index for this query. Use the `neighbourhoods` collection for geometry of the neighbourhood. Query on the `ntaname` and `boro_name` fields.***

In [44]:
db.neighbourhoods.create_index([("geometry",pymongo.GEOSPHERE)])

'geometry_2dsphere'

In [91]:
# Enter your code here
db.neighbourhoods.count_documents({"$and": [{"properties":{"ntaname":"Williamsburg"}},{"properties":{"boro_name":"Brooklyn"}}]})


0

### Q8

Name the title of the `paid and must see events` that are located maximum `500 meters` from the `Brooklyn Museum (coordinates = [-73.9636, 40.6712])` after `2018-06-06`.

In [97]:
# Enter your code here
result = db.events.aggregate([
    {"$match":{"$and":[{'location':{"$geoWithin":{"$centerSphere":[[-73.9636,40.6712],500]}}},{"must_see":1},{"cost_free":0},{"start_date_time":{"$gt":datetime(2018,6,6)}}]}
    },
    {"$project":{'title':1,"_id":0}}
])
for i in result:
    print(i)

{'title': 'Taste of the Valley Food Festival'}
{'title': "'Dearly Beloved': 1845 Wedding Reenactment and Reception"}
{'title': 'Annual Pug Fun Day'}
{'title': 'Music in the Garden: Zikrayat'}
{'title': 'THE B/R Football X Steve Nash Foundation Showdown FanFest'}
{'title': "Learn to Sail with TASCA's Basic Sailing Program: Summer Session"}
{'title': 'Bronx Food and Farm Walking Tours: Belmont Community Gardens'}
{'title': "Historic Richmond Town's Independence Day Celebration"}
{'title': 'Farm & Compost Tour'}
{'title': 'Thunderbird 40th Annual Grand Mid-Summer Pow Wow'}
{'title': 'Thunderbird 40th Annual Grand Mid-Summer Pow Wow'}
{'title': 'Thunderbird 40th Annual Grand Mid-Summer Pow Wow'}
{'title': 'An In-Depth Look at Art Deco on the Upper Grand Concourse - Walking Tour'}
{'title': "Music in the Garden: Women's Raga Massive"}
{'title': "A 'Spirited' Summer Saunter with Boroughs of the Dead"}
{'title': 'Snug Harbor Art & Architecture Tour'}
{'title': 'Battle of Brooklyn and Beyond M