# GCD - Assignment 7 - MongoDB

MongoDB is a cross-platform, document oriented database that provides high performance, high availability and easy scalability. MongoDB works on concept of collection and document.

<b>Database</b>

Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple database.

<b>Collection</b>

Collection is a group of MongoDB documents. It is equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have differenet fields. Typically, all documents in a collection are of similar or related purpose.

<b>Document</b>

A document is a set of a key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

## Part 1

1. Download zips.json
2. Open a command window and run
<img src="mongo-1.png">

3. Inspect the data
    a. View the database
    b. View the collection
    c. View the data
    
<img src="mongo-2.png">

4. Indexes
    a. Run find query on the data<b>(without index)</b>:
    <img src="mongo-3.png">
    
    b. Run find query on the data<b>(with index)</b>:
    <img src="mongo-4.png">
    
    c. Explain the difference.
        A query that does not use an index can be identified by its BasicCursor type. Such a query must scan every document in the collection. Hence, the number of scanned documents will always be equal to the collection's count, which makes the query slow for big collections.
        A query that uses an index has a cursor of type BtreeCursor( or GeoBrowse-XYZ for a 2D index). The used index is part of the cursor type(_id_ in the case). MongoDB decides which query plan to use by occasionally executing them all, and using the first one to finish. You can see all plans for a query by passing true to explain.
        
5. Queries
    a.	How many records does the zips collection consist of?
        db.zips.find().count()        29353
    b.	How many zipcodes are there in the state of Massachusetts?
        db.zips.find({ "state":"MA"},{"pop" :1}).count()
        474
    c.	Give all state names from the zips collection.
        db.zips.distinct("state")
    d.	Sort the state names alphabetically.
        db.zips.distinct("state").sort()
    e.	How many states are there in the zips collection? 
        db.zips.distinct("state").length

        51
    f.	How many cities have population of less than 50?
        db.zips.find({"pop":{ $lt: 50 }},{name: 1}).count()
        356
        
    g.	Which cities have population of less than 50?
        db.zips.find({"pop":{ $lt: 50 }},{name: 1})
        
        { "_id" : "01338" }
        { "_id" : "02163" }
        Type "it" for more
        
6.	GeoNear index
    a.	Create a 2d index on location: 
        db.zips.ensureIndex({"loc": "2d"})
        
    b.	Find locations within 5000 meters from Flagstaff

        db.zips.ensureIndex({"loc" : "2dsphere"})
        
        {"createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "numIndexesAfter" : 4,
        "ok" : 1}
        
        db.zips.find({ "loc" : { $near : { $geometry: {type : "Point", coordinates : [ -111.661979, 35.185911]}, $maxDistance : 5000}}})
        
        <b>Result</b> : { "_id" : "86001", "city" : "FLAGSTAFF", "loc" : [ -111.661979, 35.185911 ], "pop" : 30174, "state" : "AZ" }


## Part 2

1. Design a MongoDB schema, on a case below
    <b>Case 3: [Blog site]</b>
    User --- 1:N --- Post --- 1:N --- Comment --- 1:1 --- Commenter
    Attributes: User (name), Post (date, text), Comment (date, text), Commenter (name)
    
    I chose and made such a schema because I assume that each User may have a post, made on particular date with some content(text).He/she may also have a comment, again posted on particular date with some text. Each user also may have several comments and posts.
    

2. Develop and create the database
    use firstdb
    <img src="mongo-5.png">
3. Insert at least 2 sample documents and subdocuments
    var us = db.commenter.findOne({"Name" : "Dan"})
    > us
    { "_id" : ObjectId("563fcdf9635f85d0bb7d8448""), "Name" : "Dan" }
    > db.comment.insert({ date : "10/15/2015", text : "Testing" , commenter : us._id})

    db.post.insert({"date": "12/30/2014", "text": "Happy New Year", "comment": [{"date":"12/30/2014", text:"Thank you"}, {"date":"01/01/2015", text:"Yooohoo"}]}) 

    db.user.insert({"Name":"Vesko", "post":[{"date":"03/27/2018", "text":"Cool"}, {"date":"03/28/2018", "text":"Hello"}]})

    db.user.insert({"Name": "Nikolay", "post":[{"date":"08/31/2015", "text":"Hi there"}, {"date":"08/13/2016", "text":"Nicee"}]})

4. Give an overview of the data using .pretty() method
    <img src="mongo-6.png">
    <img src="mongo-7.png">
    
5. Give a query and the result of the query on the sample data

    db.user.findOne({"Name":"Dan"})
    
    <b>Result: </b>{ "_id" : ObjectId("563fcd35635f85d0bb7d8446"), "Name" : "Dan" }

    

## Conclusion

After completing this assignment I could clearly see some of the advantages of MongoDB over any relational database. The structure of a single object is very clear(all fields of the objects are stored in the object itself), no more complex joins. All objects are automatically assigned with an unique index.It is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational database. 
Since MongoDB is schema-free, our code defines our schema. Which helps the database to be more scalable.