# Sample mini Project ICNDB

# Summary, Overview

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. 

![componetns](images/100-mongodb-components-01.png)

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 


# Installation, Requirements

In [1]:
! echo $VIRTUAL_ENV
! pip3 list | grep -E 'pymongo|dnspython|pandas'
# ! pip3 install --upgrade --upgrade-strategy only-if-needed pymongo dnspython pandas

dnspython          2.1.0  
pandas             1.1.3  
pymongo            3.11.3 


In [2]:
# 2021-03, Bruno Grossniklaus, https://github.com/it-gro
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

import pandas as pd
import pymongo
from pprint import pprint

# Configuration

In [3]:
pd.set_option('precision', 2)
pd.set_option('max_rows', 30)
pd.set_option('max_colwidth', 50)
# pd.describe_option('max_rows')
# pd.describe_option('precision')
# pd.describe_option('max_colwidth')

HOST_mongo = 'localhost'
OPTIONS_mongo = ''
# OPTIONS_mongo = '?retryWrites=true&w=majority'
USER_mongo = ""
PASS_mongo = ""
if USER_mongo:
    credentials=f"{USER_mongo}:{PASS_mongo}@"
else:
    credentials=""
    
DB_ICNDB="imp_demo_icndb"
URL_API="http://api.icndb.com/jokes/random"
NUMEBR_OF_JOKES=100
COLL_SRC="src_jokes"
COLL_STG="stg_jokes"
COLL_JOKES="jokes"

# Connection

In [4]:
client = pymongo.MongoClient(f"mongodb://{credentials}{HOST_mongo}{OPTIONS_mongo}")
icndb = client[DB_ICNDB]

## Remove all existing documents

In [5]:
icndb[COLL_SRC].drop()
icndb[COLL_STG].drop()
icndb[COLL_JOKES].drop()
c = icndb.list_collections()
pd.DataFrame(c)

Unnamed: 0,name,type,options,info,idIndex
0,jokes_2,collection,{},"{'readOnly': False, 'uuid': 500cb384-ea0c-4182...","{'v': 2, 'key': {'_id': 1}, 'name': '_id_', 'n..."
1,jokes_1,collection,{},"{'readOnly': False, 'uuid': c939d900-2088-456c...","{'v': 2, 'key': {'_id': 1}, 'name': '_id_', 'n..."


# ETL

## Load into source area

In [6]:
%%bash -s "{URL_API}" "{NUMEBR_OF_JOKES}" "{DB_ICNDB}" "{COLL_SRC}" 
request=$1/$2
db=$3
collection=$4

curl -sL ${request} | 
  mongoimport --db ${db} --drop --collection ${collection}

2021-03-20T17:37:48.392+0100	connected to: mongodb://localhost/
2021-03-20T17:37:48.393+0100	dropping: imp_demo_icndb.src_jokes
2021-03-20T17:37:48.895+0100	1 document(s) imported successfully. 0 document(s) failed to import.


In [7]:
c = icndb[COLL_SRC].aggregate([
      {"$limit": 1},
])

for doc in c:
     pprint(f"{doc}"[:500])

("{'_id': ObjectId('605624dc363f3083a7352987'), 'type': 'success', 'value': "
 "[{'id': 449, 'joke': 'All arrays Chuck Norris declares are of infinite size, "
 "because Chuck Norris knows no bounds.', 'categories': ['nerdy']}, {'id': "
 '538, \'joke\': "Chuck Norris\'s log statements are always at the FATAL '
 'level.", \'categories\': [\'nerdy\']}, {\'id\': 206, \'joke\': \'Chuck '
 'Norris destroyed the periodic table, because Chuck Norris only recognizes '
 "the element of surprise.', 'categories': []}, {'id': 171, 'joke': 'Chuck")


In [8]:
c = icndb[COLL_SRC].aggregate([
      {"$limit": 1},
])

pd.DataFrame(c)

Unnamed: 0,_id,type,value
0,605624dc363f3083a7352987,success,"[{'id': 449, 'joke': 'All arrays Chuck Norris ..."


## Transport and Load

### Stage: Reformat

In [9]:
c = icndb[COLL_SRC].aggregate([
      {"$unwind": "$value"},
      {"$replaceWith": "$value" },
      {"$out": COLL_STG}
])

# pd.DataFrame(c)

In [10]:
c = icndb[COLL_STG].aggregate([
      {"$limit": 3},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,joke,categories
0,605624dda37040a69f1514df,449,All arrays Chuck Norris declares are of infini...,[nerdy]
1,605624dda37040a69f1514e0,538,Chuck Norris's log statements are always at th...,[nerdy]
2,605624dda37040a69f1514e1,206,"Chuck Norris destroyed the periodic table, bec...",[]


### Validate Fields

In [11]:
c = icndb[COLL_STG].aggregate([
    {"$match": {"categories": {"$exists" : False}}},
])

pd.DataFrame(c)

In [12]:
c = icndb[COLL_STG].aggregate([
    {"$match": {"categories": {"$in" : ["explicit"]}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,joke,categories
0,605624dda37040a69f1514ee,555,Chuck Norris doesn't have pubic hairs because ...,[explicit]
1,605624dda37040a69f15152d,608,Chuck Norris can stand on his head. His dick-h...,[explicit]


### (Re)Create joke collection

In [13]:
c = icndb[COLL_STG].aggregate([
    {"$match": {"categories": {"$nin" : ["explicit"]}}},
    {"$project": {"_id": "$id", "joke": 1, "categories": 1}},
])

pd.DataFrame(c)

Unnamed: 0,joke,categories,_id
0,All arrays Chuck Norris declares are of infini...,[nerdy],449
1,Chuck Norris's log statements are always at th...,[nerdy],538
2,"Chuck Norris destroyed the periodic table, bec...",[],206
3,Chuck Norris can set ants on fire with a magni...,[],171
4,Some kids play Kick the can. Chuck Norris play...,[],172
...,...,...,...
93,Chuck Norris does not &quot;style&quot; his ha...,[],155
94,The Great Wall of China was originally created...,[],56
95,Chuck Norris roundhouse kicks don't really kil...,[],135
96,Bill Gates thinks he's Chuck Norris. Chuck Nor...,[],483


In [14]:
c = icndb[COLL_STG].aggregate([
    {"$match": {"categories": {"$nin" : ["explicit"]}}},
    {"$project": {"_id": "$id", "joke": 1, "categories": 1}},
    {"$out": COLL_JOKES}
 ])

# pd.DataFrame(c)

### Cleanup

In [15]:
icndb[COLL_SRC].drop()
icndb[COLL_STG].drop()

# Data analysis

Quisque sit amet turpis lectus. Phasellus tincidunt mi metus, et ornare ipsum consectetur eu. Cras accumsan purus vel leo viverra, at mollis neque interdum. Sed non ultrices odio, vitae sodales neque. Quisque diam odio, gravida quis auctor ut, aliquet ac ex. Integer venenatis elit ex, vitae imperdiet tortor malesuada quis. Vestibulum dignissim est sed libero viverra interdum. 

## Overview

In [16]:
%%bash -s "{DB_ICNDB}" "{COLL_JOKES}" 
mongoeye --db $1 --col $2 --sample all 

MongoEYE v0.4 - MongoDB exploration tool

Connecting: ...OK

Analyzing: ...OK

            KEY           │ COUNT  │   %    
────────────────────────────────────────────
  all documents           │ 98     │        
  analyzed documents      │ 98     │ 100.0  
                          │        │        
  _id ➜ int               │ 98     │ 100.0  
  categories ➜ array      │ 98     │ 100.0  
  └╴[array item] ➜ string │ 23     │        
  joke ➜ string           │ 98     │ 100.0  

OK  0.003s (local analysis)
    98/98 docs (100.0%)
    4 fields, depth 2


### Categories

In [17]:
c = icndb[COLL_JOKES].aggregate([
    {"$project": {"joke": 0}},
    {"$unwind": "$categories"},
    {"$group": {"_id": "$categories", "count": {"$sum": 1}}},
 ])

pd.DataFrame(c)

Unnamed: 0,_id,count
0,nerdy,23


In [18]:
c = icndb[COLL_JOKES].aggregate([
    {"$match": {"categories":  {"$in" : ["nerdy"]}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,joke,categories
0,449,All arrays Chuck Norris declares are of infini...,[nerdy]
1,538,Chuck Norris's log statements are always at th...,[nerdy]
2,451,Chuck Norris writes code that optimizes itself.,[nerdy]
3,537,Each hair in Chuck Norris's beard contributes ...,[nerdy]
4,496,Chuck Norris went out of an infinite loop.,[nerdy]
5,497,"If Chuck Norris writes code with bugs, the bug...",[nerdy]
6,494,Chuck Norris breaks RSA 128-bit encrypted code...,[nerdy]
7,506,Chuck Norris programs do not accept input.,[nerdy]
8,461,Chuck Norris finished World of Warcraft.,[nerdy]
9,552,"Chuck Norris knows the value of NULL, and he c...",[nerdy]


### Jokes

Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec in risus sed augue blandit tincidunt eu nec leo. Phasellus suscipit ex ut luctus auctor. Mauris efficitur finibus nunc, gravida pulvinar metus commodo eget. Quisque quis orci vehicula, maximus tellus sit amet, dignissim ligula. Proin auctor, tellus eget tempus imperdiet, nunc nisi laoreet tellus, nec viverra ipsum quam in quam. 

Nam ut pellentesque arcu. Ut faucibus elit enim, nec tincidunt massa mattis id. Cras tortor urna, tempus eu viverra quis, suscipit sed magna. Mauris eget eleifend leo, ut tristique justo. In quis lectus eu neque euismod bibendum non in mi. In lobortis iaculis pulvinar. Morbi et mi neque. Etiam maximus elementum metus, non auctor dui eleifend ac.

In [19]:
c = icndb[COLL_JOKES].aggregate([
    {"$match": {"joke":  {"$regex" : "chuck",  "$options": ""}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,joke,categories
0,72,How much wood would a woodchuck chuck if a woo...,[]


In [20]:
c = icndb[COLL_JOKES].aggregate([
    {"$project": {"joke": 1}},
    {"$match": {"joke":  {"$not": {"$regex" : "chuck",  "$options": "i"}}}},
])

pd.DataFrame(c)

In [21]:
c = icndb[COLL_JOKES].aggregate([
    {"$match": {"joke":  {"$regex" : "chuck",  "$options": ""}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,joke,categories
0,72,How much wood would a woodchuck chuck if a woo...,[]


Curabitur vel magna nec ipsum pulvinar imperdiet vitae vitae nisi. Pellentesque mattis ultricies diam eu cursus. Maecenas eleifend ante arcu, at feugiat erat eleifend eu. In volutpat faucibus dui, sed faucibus ligula faucibus et. Maecenas convallis sodales sollicitudin. Ut consectetur, arcu ac imperdiet rutrum, massa nisi sollicitudin odio, vel mattis mi augue et sem. 

Fusce semper porta risus, vitae hendrerit mauris congue vitae. Praesent venenatis varius lacus. Cras tempor augue lectus, at iaculis ex pretium sit amet. In hac habitasse platea dictumst. Nunc pharetra est eu pellentesque hendrerit. Ut nec varius sem. Morbi eu elit id lacus laoreet pharetra.

# Conclusions

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. 

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. 

# Remarks

## Learnings

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.