# `Viewer Help: Mongolite (R-studio) "Lookup operator"`

# <font color=red>Mr Fugu Data Science</font>

# (◕‿◕✿)

In [None]:
# library(jsonlite)  # send files to Mongo
library(tidyverse)
library(knitr)     # help run code
library(markdown)  # create markdown files i.e. pdf
library(mongolite) # Create connection/Interface R<-> Mongodb

In [None]:
# Create Connection: 'localhost'

# mng_conn<-mongo(collection = 'recruiter_clients',db='berkeley')

In [None]:
# skills_count<-mng_conn$aggregate('[{"$project":{"candidate.first_name":1,
#                         "candidate.last_name":1,
#                         "numberOfSkills":{"$cond":{"if":{"$isArray":"$candidate.skills"},
#                         "then":{"$size":"$candidate.skills"},"else":"NA"}}}}]')

# $lookup: uses a `left outer join` to a collection in `same` database

+ Adds new array to document, where the fields match the document you joined
    + **The documentation says this is from the SAME DATABASE you are trying to perfom the join**
    + As of mongo 5.1 you can use on sharded collections

`
{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}
`
    
    
# `Let's specify what is going on here:`
    
+ `from`: is the collection within the same database you are working to perform your left-outer join
    + The documents are taken from this collection and placed in the other document
+ `localField`: you are asking the document field where the $lookup operator will use as input
+ `$` `lookup`: operator will check if `localField` and `foreignField` exist on both ends, if for some reason between each check fails you get a result of Null.
    + Takes the document from the `from collection` based on some query and allocates to document of `input collection`
+ `foreignField`: uses the document in the `from` collection and does a comparison with the `localField` and if there is not a match you end up with Null. 

# `Aggregations:`



# `Ex. 1) Use a pipeline with ` `$` `lookup`

+ `Insta_Post:`

`
{
    "title" : "my first job",
    "author" : "Jimmy",
    "insta_likes" : 4
},
{
    "title" : "my weekend",
    "author" : "Jimmy",
    "insta_likes" : 20
},
{
    "title" : "Hello :- )",
    "author" : "Craig",
    "insta_likes" : 10
}
`

`-----------------------`
+ `Comments:`

`
{
    "postTitle" : "my first job",
    "comment" : "Super fun day",
    "likes" : 5
},
{
    "postTitle" : "my weekend",
    "comment" : "how is your day?",
    "likes" : 2
},
{
    "postTitle" : "my second post ever",
    "comment" : "i like chocolate",
    "likes" : 11
},
{
    "postTitle" : "hello :- )",
    "comment" : "not my favorite day of the week",
    "likes" : 8
},
{
    "postTitle" : "my last post y'all",
    "comment" : null,
    "likes" : 0
}
`

`----------------------------------`

db.`insta_posts`.aggregate([

 { 
 
 `$` `lookup:`
     {
      from: `"comments"`,
       
       let: { post_likes: "$insta_likes", posts_title: "$title"},
       pipeline: [
            { $match:
                { $expr:
                    { $and:
                        [
                           { $gt: [ "$likes", "$$insta_likes"] },
                           { $eq: ["$$posts_title", "$postTitle" ] }
                        ]
                    }
                }
            }
        ],
        as: "comments"
        }
 }
])



# `We should breakdown the above problem:`

We are joining `insta_posts` with `comments` collection by `insta_likes` and checking if the `posts_title` *(local)* are in `postTitle` of the *(foreign)* 

+ **`let:`** this allows us to access input collection
    + Variables defined inside of `let` allows us to reference them: 
+ **`pipeline:`** this is executed for the collection on the join

+ *`Investigate this with a comparison between collections:`* { $gt: [ "$insta_likes", "$$posts_likes"] },
                           { $eq: ["$$posts_title", "$postTitle" ] }
    + `$$:` this is how we reference variables from the `let` statement
    + `$:` foreign collection variables 
    
Remember that the condition we are applying for the greater than `gt` opererator we are making a check to only take posts where `postTile` matches `title` field where the `comments` have a greater like count than `post_likes`

`OUTPUT:`


`{
    "title" : "my first job",
    "author" : "Jimmy",
    "insta_likes" : 5,
    "comments" : [
        {
            "postTitle" : "my first job",
            "comment" : "super fun day",
            "likes" : 4
        }
    ]
},`
`{
    "title" : "my weekend",
    "author" : "Jimmy",
    "likes" : 20,
    "comments" : []
},
{
"title" : "hello :- )",
    "author" : "Craig",
    "likes" : 10,
    "comments" : []
}
`

# `Ex. 2) When something doesn't match up and what happens`

`
db.Pokemon_types.insert([
   { "_id" : 100, "chr_type" : "Pikachu", description: "electric type", "card number" : 12 },
   { "_id" : 200, "chr_type" : "Mew", description: "psychic type", "card number" : 6 },
   { "_id" : 300, "chr_type" : "Charizard", description: "fire type", "card number" : 34 },
   { "_id" : 400, "chr_type" : "Polywhirl", description: "water type", "card number" : 86 },
   { "_id" : 500, "chr_type": null, description: "Incomplete" },
   { "_id" : 600 }
])
`

`------------------------------`

`
db.Pokemon_orders.insert([
   { "_id" : 100, "item" : "Pikachu", "price" : 6, "quantity" : 3 },
   { "_id" : 200, "item" : "Indeedee", "price" : 1, "quantity" : 1 },
   { "_id" : 300 }
])
`

`------------------------------`

`
db.Pokemon_orders.aggregate([`
   {
     `$` `lookup:`
       `{
         from: "Pokemon_types",
         localField: "item",
         foreignField: "chr_type",
         as: "inventory_sales_dta"
       }
  }
])`


# `What are we expecting for output?`

`
{
   "_id" : 100,
   "item" : "Pikachu",
   "price" : 6,
   "quantity" : 3,
   "inventory_sales" : [
      { "_id" : 100, "chr_type" : "Pikachu", "description" : "electric type", "card number" : 12 }
   ]
}
{
   "_id" : 200,
   "item" : "Indeedee",
   "price" : 1,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 400, "chr_type" : "Indeedee", "description" : "psychic type", "card number" : 6 }
   ]
}
{
   "_id" : 300,
   "inventory_docs" : [
      { "_id" : 500, "chr_type" : null, "description" : "Incomplete" },
      { "_id" : 600 }
   ]
}
`

# `Potential problems with speed, other options...`

+ If you are not using an index this will affect speed because, you need to do comparisons on everything.
    + Consider, `index on foreignField`
+ Consider doing profiling to track the speed if needed. 
    + Nested collection scans are also a problem for your speed as well!
    
+ `Match` operator should be considered if possible over `lookup` because of using less comparisons
   
   + 

`-----------------------------------------`

**`Speed Up Options For Your MongoDB Life:`**

+ If you have queries you do frequently, consider storing sub-queries in separate documents for faster reads

+ Create an `Index` for fields you use often is encouraged

+ Check for slow queries by looking at your logs and then at your indices used

+ **`Referencing:`** such as saving an _id from one document into another 
    + Many : Many relationships are good for `referencing` but, understand you are making more calls 'round trips'
    + Documents > 16mb in size
    + If you have a document that gets updated and grows relative to other parts of document 

+ **`Embedding:`** think of this as nesting a document
    + This can help if you are accessing the same items at once and improves read times of a document
    + Updates are improved because you are doing this all at once
    + Most but not all, 1:1 relationships could benefit from embedding into a single document
    + 1:many relationships are a good contender as well if multiple items/objects appear with the parent document
    
+ **`Large Files, Memory, Off loading Resources:`** If you have data that cannot fit into memory
    + `Sharding or Replication` are good options when dealing with large data, splitting up jobs, saving in different locations for redundancy or speed ups. 
    

# Investigate `unwind` operator:



# Like, Share & <font color=red>SUB</font>scribe

# `Citations & Help:`

# ◔̯◔

https://www.stackchief.com/tutorials/%24lookup%20Examples%20%7C%20MongoDB

https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/

https://hevodata.com/learn/mongodb-lookup/

https://kb.objectrocket.com/mongo-db/how-to-update-multiple-mongodb-documents-in-python-359

https://cran.r-project.org/web/packages/mongolite/mongolite.pdf

https://www.apiref.com/mongodb/reference/operator/aggregation/lookup.html

https://www.mongodb.com/community/forums/t/lookup-foreignfield-is-objectid/126872/7

https://medium.com/@liams_o/lookup-in-mongodb-if-you-are-using-it-something-is-wrong-45d3fd47ac61 (review!)

https://stackoverflow.com/questions/43742635/poor-lookup-aggregation-performance

https://mongodb.github.io/mongo-java-driver/3.2/builders/aggregation/

https://www.tutorialspoint.com/mongodb/mongodb_database_references.htm

`Examples with pipelines`

https://www.stackchief.com/tutorials/%24lookup%20Examples%20%7C%20MongoDB

https://www.educba.com/lookup-in-mongodb/

https://appradius.co/blog/5-mongodb-aggregate-methods

https://www.mongodb.com/docs/v4.2/reference/operator/aggregation/lookup/

`---------------------- Speed Up ---------------------`

https://medium.com/mongodb-performance-tuning/coding-efficient-mongodb-joins-97fe0627751a

https://medium.com/mongodb-performance-tuning/optimizing-the-order-of-aggregation-pipelines-44c7e3f4d5dd

https://www.mongodb.com/basics/best-practices
