## Mongodb
### 1.[install mongodb ](#install)
### 2.[introduction](#intro)
### 3.[basics commands](#basic)
  - [database](#database)
  - [collection](#collection)
  - [data type](#datatype)
  - [document](#document)
  - [query](#query)
  - [projection](#projection)
  - [update](#update)
  - [Indexes](#indexes)
  - [aggregation](#aggregation)
  - [pipeline](#pipeline)

## install <a class="anchor" id="install"></a>
**docker**

In [None]:
%%bash
docker run -d -p 27017:27017 --name mongodb mongo

In [2]:
%%bash
docker ps

CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS                                           NAMES
120132e58129   mongo     "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes   0.0.0.0:27017->27017/tcp, :::27017->27017/tcp   mongodb


**install on machine**
[install on machine](https://www.tutorialspoint.com/mongodb/mongodb_environment.htm)

In [None]:
# with docker-compose

services:
    mongodb:
        image: mongo
        container_name: mongodb
        ports:
        - "27017:27017"
        volumes:
        - ./data:/data/db

## intro  <a class="anchor" id="intro"></a>
mongodb database is a document database, which stores data in JSON-like documents.
collections are analogous to tables in relational databases.
documents are analogous to rows in relational databases.

Advantages of MongoDB over RDBMS
* **Schema less** − MongoDB is a document database in which one collection holds different documents. Number of fields, content and size of the document can differ from one document to another.
* Structure of a single object is clear.
* No complex joins.
* Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
* Tuning.
* Ease of scale-out − MongoDB is easy to scale.
* Conversion/mapping of application objects to database objects not needed.
* Uses internal memory for storing the (windowed) working set, enabling faster access of data.

**Why Use MongoDB?**
* Document Oriented Storage − Data is stored in the form of JSON style documents.
* Index on any attribute
* Replication and high availability
* Auto-Sharding
* Rich queries
* Fast in-place updates

**Where to Use MongoDB?**
* Big Data
* Content Management and Delivery
* Mobile and Social Infrastructure
* User Data Management
* Data Hub

In [None]:
%%bash
docker exec -it mongodb /bin/bash
# inside docker container write mongosh you will go to mongo shell

## mongodb basic commands <a class="anchor" id="basic"></a>
### database <a class="anchor" id="database"></a>

```shell
# show databases
show dbs;
show databases;
# create database
use mydb;
# drop database
db.dropDatabase();
```

###  collection <a class="anchor" id="collection"></a>

```shell
# create collection
db.createCollection("customers");
# show collections
show collections;
# drop collection
db.customers.drop();
```

### data types <a class="anchor" id="datatype"></a>

|type|description|
|:---|:---|
|String|This is the most commonly used datatype to store the data. String in MongoDB must be UTF-8 valid.|
|Integer|This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.|
|Boolean|This type is used to store a boolean (true/ false) value.|
|Double|This type is used to store floating point values.|
|Min/ Max keys|This type is used to compare a value against the lowest and highest BSON elements.|
|Arrays|This type is used to store arrays or list or multiple values into one key.|
|Timestamp|ctimestamp. This can be handy for recording when a document has been modified or added.|
|Object|This datatype is used for embedded documents.|
|Null|This type is used to store a Null value.|
|Symbol|This datatype is used identically to a string; however, it's generally reserved for languages that use a specific symbol type.|
|Date|This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by creating object of Date and passing day, month, year into it.|
|Object ID|This datatype is used to store the document’s ID.|
|Binary data|This datatype is used to store binary data.|
|Code|This datatype is used to store JavaScript code into the document.|
|Regular expression|This datatype is used to store regular expression.|


### document <a class="anchor" id="document"></a>
```shell
db.createCollection("post")
> db.post.insert([
	{
		title: "MongoDB Overview",
		description: "MongoDB is no SQL database",
		by: "tutorials point",
		url: "http://www.tutorialspoint.com",
		tags: ["mongodb", "database", "NoSQL"],
		likes: 100
	},
	{
	title: "NoSQL Database",
	description: "NoSQL database doesn't have tables",
	by: "tutorials point",
	url: "http://www.tutorialspoint.com",
	tags: ["mongodb", "database", "NoSQL"],
	likes: 20,
	comments: [
		{
			user:"user1",
			message: "My first comment",
			dateCreated: new Date(2013,11,10,2,35),
			like: 0
		}
	]
}
])
```

```shell
db.customers.insertOne({workspace: "CR",limit:10});
db.customers.insertOne({workspace: "HDL",limit:10});
db.customers.insertOne({workspace: "BI",limit:5});
db.customers.insertOne({workspace: "QA",limit:5});
db.customers.insertMany([
    {workspace: "KM",limit:10},
    {workspace: "MK",limit:10},
    {workspace: "AI",limit:5}
]);
```




### Query <a class="anchor" id="query"></a>

```mongodb
db.customers.find();
# get documents in json format
db.customers.find().pretty();
# find one document
db.customers.findOne();
```
|Operation	|Syntax	|Example	|RDBMS Equivalent|
|:---|:---|:---|:---|
|Equality	|{<key>:{$eg;\<value\>}}	|db.customers.find({"workspace":"KM"}).pretty() |	|where workspace = 'KM'|
|Less Than	|{<key>:{$lt:\<value\>}}	|db.customers.find({"limit":{$lt:10}}).pretty()	|where limit < 10|
|Less Than Equals	|{<key>:{$lte:\<value\>}}	|db.customers.find({"limit":{$lte:10}}).pretty()|	where limit <= 10|
|Greater Than|	{<key>:{$gt:\<value\>}}	|db.customers.find({"limit":{$gt:10}}).pretty()|	where limit > 10 |
|Greater Than Equals|	{<key>:{$gte:\<value\>}}|	db.customers.find({"limit":{$gte:10}}).pretty()|	where limit >= 10|
|Not Equals|	{<key>:{$ne:\<value\>}} |	db.customers.find({"limit":{$ne:50}}).pretty() |	where limit != 50 |
|Values in an array|	{<key>:{$in:[\<value1\>, \<value2\>,……\<valueN\>]}}|	 db.customers.find({"workspace":{$in:["KM","MK"]}}).pretty()|	Where workspace matches any of the value in :["KM", "MK"] |
|Values not in an array|	{<key>:{$nin:\<value\>}}	| db.customers.find({"workspace":{$nin:["KM","MK"]}}).pretty()|	Where name values is not in the array :["KM", "MK"] or, doesn’t exist at all|

```shell
# and operator
db.customers.find({$and:[{"workspace":"KM"},{"limit":10}]}).pretty();
db.customers.find({$and:[{"workspace":{$in:["KM","MK"]}},{"limit":{$gt:5}}]}).pretty();
# or operator
db.customers.find({$or:[{"workspace":"KM"},{"limit":10}]}).pretty();
# and or operator
db.customers.find({$and:[{$or:[{"workspace":"KM"},{"workspace":"MK"}]},{"limit":10}]}).pretty();
# not operator
db.customers.find({workspace:{$not:{$eq:"KM"}}}).pretty();
# nor operator
db.customers.find({$nor:[{"workspace":"KM"},{"workspace":"MK"}]}).pretty();
```
**limit**
```shell
db.customers.find().limit(3).pretty();
```
**skip**
```shell
db.customers.find().skip(3).pretty();
```
**sort**
```shell
db.customers.find().sort({workspace:1}).pretty();
db.customers.find().sort({workspace:-1}).pretty();
```

### projection <a class="anchor" id="projection"></a>
```shell
# projection
db.customers.find({},{workspace:1,limit:1,_id:0}).pretty();
# show only workspace 
db.customers.find({},{workspace:1,_id:0}).pretty();

### update & remove <a class="anchor" id="update"></a>

```shell
# add field emails arrary to document 
db.customers.updateOne({workspace: "CR"},{$set: {emails: ["myemail1"]}}); 
# add new email to emails array
db.customers.updateOne({workspace: "CR"},{$push: {emails: "myemail2"}});
db.customers.update({'limit':{$le:5}},{$set:{'limit':6}},{multi:true})
db.customers.updateMany({'limit':{$le:5}},{$set:{'limit':6}})
```
```shell
# remove document
db.customers.remove({workspace: "CR"});
# remove all documents
db.mycol.remove({});
```

### Indexes
support the efficient resolution of queries. Without indexes, MongoDB must scan every document of a collection to select those documents that match the query statement. This scan is highly inefficient and require MongoDB to process a large volume of data.

Indexes are special data structures, that store a small portion of the data set in an easy-to-traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field as specified in the index.

```shell
# create index
db.customers.createIndex({workspace:1});
# drop index
db.customers.dropIndex({workspace:1});
# list indexes
db.customers.getIndexes();
```


### Aggregation <a class="anchor" id="aggregation"></a>
```shell
# group by workspace and sum limit
db.customers.insertMany([
    {workspace: "CR",limit:10},
    {workspace: "HDL",limit:10},
    {workspace: "BI",limit:5},
    {workspace: "QA",limit:5},
    {workspace: "KM",limit:10},
    {workspace: "MK",limit:10},
    {workspace: "AI",limit:5}
]);
db.customers.aggregate([
    {$group: {_id: "$workspace", total: {$sum: "$limit"}}}
]);
# group by workspace and count
db.customers.aggregate([
    {$group: {_id: "$workspace", count: {$sum: 1}}}
]);
# group by workspace and max limit
db.customers.aggregate([
    {$group: {_id: "$workspace", maxLimit: {$max: "$limit"}}}
]);
# group by workspace and min limit
db.customers.aggregate([
    {$group: {_id: "$workspace", minLimit: {$min: "$limit"}}}
]);
# group by workspace and avg limit
db.customers.aggregate([
    {$group: {_id: "$workspace", avgLimit: {$avg: "$limit"}}}
]);
# group by workspace and push limit to array
db.customers.aggregate([
    {$group: {_id: "$workspace", limits: {$push: "$limit"}}}
]);
# group by workspace and add to set (Inserts the value to an array in the resulting document but does not create duplicates.)
db.customers.aggregate([
    {$group: {_id: "$workspace", limits: {$addToSet: "$limit"}}}
]);
# first 
db.customers.aggregate([
    {$group: {_id: "$workspace", firstLimit: {$first: "$limit"}}}
]);
# last
db.customers.aggregate([
    {$group: {_id: "$workspace", lastLimit: {$last: "$limit"}}}
]);
```

### Pipeline Concept <a class="anchor" id="pipeline"></a>
Pipeline concept is used in MongoDB to perform aggregation. Using aggregation pipeline, you can process data records and return computed results.

```shell
db.customers.aggregate([
    {$match: {workspace: "CR"}},
    {$group: {_id: "$workspace", total: {$sum: "$limit"}}}
]);
```
```shell
# filter and sort 
db.customers.aggregate([
    {$match: {workspace: "CR"}},
    {$sort: {limit: -1}}
]);

### Replication <a class="anchor" id="replication"></a>
[replication](https://www.tutorialspoint.com/mongodb/mongodb_replication.htm)
### Sharding <a class="anchor" id="sharding"></a>
[sharding](https://www.tutorialspoint.com/mongodb/mongodb_sharding.htm)

### Backup <a class="anchor" id="backup"></a>
```shell
# mongodump
mongodump --db mydb --collection customers
# mongodump all databases
# mongorestore
mongorestore --db mydb dump/mydb/customers.bson

### integration with python <a class="anchor" id="python"></a>

## game example 
to find nearest eggs from the player


In [4]:
%%capture
!pip install pymongo

In [1]:
#pip install pymongo
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["game"]
collection = db["eggs"]



In [2]:
print(client.list_database_names())

['admin', 'config', 'local', 'mydb']


In [3]:
# show all collections
print(db.list_collection_names())

[]


In [5]:
%%capture
# remove all documents from collection
collection.delete_many({})

In [6]:
import json
def ingestandcleandata():
    """
    read and clean data
    """
    with open('eggs.json') as f:
        data = json.load(f)
    # extract value of key common_egg
    eggs = data['common_egg']
    # convert value string to json
    eggs = json.loads(eggs)
    # drop None values
    eggs = [egg for egg in eggs if egg is not None]
    return eggs

def insertdata(eggs):
    """ 
    insert data into mongodb
    """
    for egg in eggs:
        collection.insert_one(egg)


eggs = ingestandcleandata()
insertdata(eggs)

In [7]:


# retrieve number of documents in collection 
numberofeggs = collection.count_documents({})
print(f"Number of eggs: {numberofeggs}")
# add location field to each document
collection.update_many({}, [
    {
        "$set": {
            "location": {
                "type": "Point",
                "coordinates": ["$lon", "$lat"]  # Store lat and lng as [lng, lat]
            }
        }
    }
])

Number of eggs: 397


UpdateResult({'n': 397, 'nModified': 397, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [8]:
# retrivee 5 documents from collection
eggs5 = collection.find().limit(5)
for egg in eggs5:
    print(egg)

{'_id': ObjectId('671e3f0ca2b44fa83f68688e'), 'egg_number': 0, 'unique_egg_Identifier': 'UITO!FeMwK', 'placename': 'Salon Simon', 'lat': 50.0512772, 'lon': 10.3021812, 'location': {'type': 'Point', 'coordinates': [10.3021812, 50.0512772]}}
{'_id': ObjectId('671e3f0ca2b44fa83f68688f'), 'egg_number': 1, 'unique_egg_Identifier': 'g1!V1fKv1r', 'placename': 'Tasos Taverne', 'lat': 50.0764847, 'lon': 10.2387905, 'location': {'type': 'Point', 'coordinates': [10.2387905, 50.0764847]}}
{'_id': ObjectId('671e3f0ca2b44fa83f686890'), 'egg_number': 2, 'unique_egg_Identifier': '6REnnhdbFQ', 'placename': 'GetränkeMarkt Nord', 'lat': 50.0518646, 'lon': 10.220293, 'location': {'type': 'Point', 'coordinates': [10.220293, 50.0518646]}}
{'_id': ObjectId('671e3f0ca2b44fa83f686891'), 'egg_number': 3, 'unique_egg_Identifier': '0UnV&dT68j', 'placename': 'Pappert', 'lat': 50.07106, 'lon': 10.2226915, 'location': {'type': 'Point', 'coordinates': [10.2226915, 50.07106]}}
{'_id': ObjectId('671e3f0ca2b44fa83f68689

In [9]:
# index location field for faster queries  2dsphere
collection.create_index([("location", pymongo.GEOSPHERE)])

'location_2dsphere'

In [10]:
# find size of collection in megabytes
collection_stats = db.command("collstats", "eggs")

In [15]:
# size of collection in megabytes
import random
size = collection_stats['size'] / 1000000
print(f"Size of collection: {size} MB")

Size of collection: 0.080404 MB


In [17]:
# find eggs within 1000 meters of a given location

def findnearbyeggs(lat, lon, distance):
    """
    find eggs within 1000 meters of a given location
    """
    nearbyeggs = collection.find({
        "location": {
            "$near": {
                "$geometry": {
                    "type": "Point",
                    "coordinates": [lon, lat]
                },
                "$maxDistance": distance
            }
        }
    })
    return nearbyeggs

In [18]:
%%timeit -n 100 
# generate latatitude between 51.5 and 52.5 and longitude between 11.5 and 12.5

lat = random.uniform(51.5, 52.5)
lon = random.uniform(11.5, 12.5)
findnearbyeggs(lat, lon,6000)


14.7 µs ± 4.53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [20]:
# geomap of eggs
#pip install folium
import folium

def create_map(lat, lon, eggs):
    """
    create map with given location and eggs
    """
    map = folium.Map(location=[lat, lon], zoom_start=10)
    folium.Circle([lat, lon], radius=25, color='red', fill=True).add_to(map)
    # add red marker for given location
    folium.Marker([lat, lon], popup="You are here").add_to(map)
    for egg in eggs:
        #print(f"lat and lng", egg['lat'], egg['lon'])
        # pop up message is name of place and lat and lng
        message = f"{egg['placename']} {egg['lat']} {egg['lon']}"
        folium.Marker([egg['lat'], egg['lon']], popup=message).add_to(map)
        # add circle to map
        #folium.Circle([egg['lat'], egg['lon']], radius=1000, color='red', fill=False).add_to(map)
    return map
closedeges = findnearbyeggs(50.0712772 ,10.2021812, 5000)
map = create_map(50.06 ,10.3021812, closedeges)
# show map
map

In [38]:
# retrieve all eggs in a 1000m radius from the given coordinates lat 51.3685041,lng 12.3399398 with near query

def geteggs(lat,lng,radius):
    """
    get eggs in a radius of 1000m from given coordinates
    """
    eggs = collection.find(
        {
            "location": {
                "$nearSphere": {
                    "$geometry": {
                        "type": "Point",
                        "coordinates": [lat, lng]
                    },
                    "$maxDistance": radius
                }
            }
        }
    )
    return eggs



In [39]:
eggs = geteggs(51.3685041,12.3399398,1000)
for egg in eggs:
    print(egg)