# MongoDB Extra Credit

### Creating a GeoJSON File

We begin by loading the original database and creating an array.

```
import json
import pandas

with open('tweets_1M.json','r') as f:    
    tweets = json.load(f)  
```

For MongoDB to understand and read the file, we transform the JSON File into a GeoJSON File.

```
tweets_geojson_format3 = [{"type": "Feature", 
                           "location": {"type": "Point", "coordinates": [d["lng"],d["lat"]]}, 
                           "id": d["id"], "text": d["text"],
                           "user_id": d["user_id"]} for d in tweets]
```

We save the GeoJSON file to the disk to be imported later.
```
with open('tweets_geojson_format3.json', 'w') as fp:
    json.dump(tweets_geojson_format3, fp)
```

### Importing the GeoJSON File into MongoDB

To import the file into MongoDB, we write the following into the command prompt:  mongoimport --host=127.0.0.1 --port=27017 --db database_3 --collection twitter_3 --type=json --file tweets_geojson_format3.json --jsonArray

At the same time, we have to ensure that mongod is running at the same time in a command prompt, all folders have been created with the correct paths, and that the data has been stored in the correct folder with MongoDB commands.


We next do some basic commands to extract the database into the Python notebook.
```
import pymongo
from pymongo import MongoClient

client = MongoClient()
db = client.database_3
```

Two of the queries are spatial in nature and we create a spatial index for this purpose.
```
#Create a Spatial Index

db.twitter_3.create_index([('location','2dsphere')])
```

### Query all Tweets from 1138308091

```
cursor = db.twitter_3.find({"user_id": 1138308091})
```

```
for document in cursor:
    print(document)
```
The query resulted in 3 tweets.  Two are shown here.    
```
    {'user_id': 1138308091, 'text': 'According to a study at #UCBerkeley, each #tech #job in SF creates 5 nontech positions. Who am I supporting... Uber? laundry services? Food?', 'type': 'Feature', 'id': 378189982248091648, '_id': ObjectId('57f57cab01cc00c53b3bf50d'), 'location': {'coordinates': [-122.40190047, 37.78914447], 'type': 'Point'}}
    {'user_id': 1138308091, 'text': 'That moment your #shazam is #backstreetboys ...', 'type': 'Feature', 'id': 379122191872176128, '_id': ObjectId('57f57cb101cc00c53b3d9194'), 'location': {'coordinates': [-122.46826224, 37.65079252], 'type': 'Point'}}...

```

### Query 10 Tweets Nearest to 378189967014379520

```
cursor = db.twitter_3.find({"id": 378189967014379520})
```

```
for document in cursor:
    print(document)
```
The query resulted with a single tweet with a specific lat/long coordinate to be used for the next code cell.
```
    {'user_id': 172710354, 'text': '@DarrenArsenal1 Alexi Lalas', 'type': 'Feature', 'id': 378189967014379520, '_id': ObjectId('57f57cab01cc00c53b3bf50c'), 'location': {'coordinates': [-118.36353256, 34.0971366], 'type': 'Point'}}
    ```

Note that we input the coordinates from the last response into a new query.
```
cursor = db.twitter_3.aggregate([
   {'$geoNear': {
        'near': { 'type': 'Point', 'coordinates': [ -118.36353256, 34.0971366 ] },
        'num': 10,
        'distanceField': 'dist.calculated',
        'spherical': True}}])
```

```
for document in cursor:
    print(document)
```
The query resulted in 10 tweets.  Two are shown here.
```    
    {'user_id': 172710354, 'dist': {'calculated': 0.0}, 'text': '@DarrenArsenal1 Alexi Lalas', 'type': 'Feature', 'id': 378189967014379520, '_id': ObjectId('57f57cab01cc00c53b3bf50c'), 'location': {'coordinates': [-118.36353256, 34.0971366], 'type': 'Point'}}
    {'user_id': 135323671, 'dist': {'calculated': 7.562498675782954}, 'text': '“@nataliablanco83: Coming out soon!!!! #cwh #wellness #cousin #picoftheday @piamiller01 @ rose bay http://t.co/OG7a9mxhyp” #teamFamily 😉', 'type': 'Feature', 'id': 385990165321089024, '_id': ObjectId('57f57cdf01cc00c53b4a5a97'), 'location': {'coordinates': [-118.36360314, 34.09710197], 'type': 'Point'}}...

```

### Query all Tweets within the Polygon

To query successfully, we add the polygon coordinates to the cursor first.
```
cursor = db.twitter_3.find({
     'location': {
     '$geoWithin': {
     '$geometry': {
     'type' : "Polygon" ,
     'coordinates': [[[-122.412,37.810],[-122.412,37.804],[-122.403,37.806],[-122.407,37.810],[-122.412,37.810]]]}}}})
```

```
for document in cursor:
    print(document)
```
The query resulted in a large set of tweets, of which two are shown here.
```    
    {'user_id': 449285514, 'text': 'Ear cuffs: yay or nay?', 'type': 'Feature', 'id': 386233772888174592, '_id': ObjectId('57f57ce001cc00c53b4ab590'), 'location': {'coordinates': [-122.40376321, 37.80616142], 'type': 'Point'}}
    {'user_id': 308850121, 'text': '@ShellieMaitre @jkg1017 thought it would be too scary!', 'type': 'Feature', 'id': 382577182763003904, '_id': ObjectId('57f57cc701cc00c53b43e730'), 'location': {'coordinates': [-122.40423985, 37.80638461], 'type': 'Point'}}...
    ```