<a href="https://colab.research.google.com/github/MohimenMG/Auto-Complete-Text/blob/master/Connect%2C_Explore%2C_and_Import_Mongdatabase.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Connect, Explor, and import MongoDB

### Mongo database elements
- Mongo server require an address to connect ('URL', 'localhost')
- Mongo servers may contain multiple databases (database)
- each data base contains multiple collections (table)
- each collection contains multiple doucuments (rows)

### BSON, JSON, and Dictionaries:
- `Dictionaries:` python data structure support all python data types with structure {key: value} all key values must be unique.

- `JSON:` flexible data structure support limited data types with structure {key: value} keys don't have to be unique

- `BSON:` stands for Binary JSON. 
    - MongoDB stores data in BSON format both internally, and over the network. Anything you can represent in JSON can be natively stored in MongoDB, and retrieved just as easily in JSON.
    - BSON’s binary structure encodes type and length information, which allows it to be traversed much more quickly compared to JSON.
    - BSON adds some non-JSON-native data types, like dates and binary data, without which MongoDB would have been missing some valuable support. 
    - MongoDB uses BSON to offer powerful indexing and querying features on top of the data format.

### workflow
0. **Import pymongo**
  ```python
  from pprint import PrettyPrinter
  pp = PrettyPrinter(indent= 3) # improves Json readability
  import datetime

  import pandas as pd
  from pymongo import MongoClient
  ```

1. **connect to Mongo client:** 
  ```python
  MongoClient('URL')
  ```

2. **list databases:** 
  ```python
  MongoClient().list_databases()
  ```
3. **choose the database:** 
  ```python
  MongoClient() ['database']
  ```
4. **list collections in the database:** 
  ```python
  MongoClient() ['database'].list_collections()
  ```
5. **choose a collection:** 
  ```python
  MongoClient() ['database']['collection']
  ```  
6. **examine a document:**  
  ```python
  MongoClient() ['database']['collection'] .find_one({})
  ```
7. **explore document values using aggregations:**

    ```python
    collection = MongoClient()['database']['collection']
    collection.count_document( {'key': document value})
    
    #'$match' filter collection by selected key value
    #'$group' counts for every unique value
    collection.aggregate([
    {'$match': {'document_key': value} },
    {'$group': {'_id': '$document_key', 'count':{'$count':{}}}} 
    ]) 
    ```
    
8. **filter data**
    
    ```python
    # filter collection to return all values that satisfy document_key values
    #projection control output columns 0 removes the primary key column from the output
    data = collection.find(
    {'document_key_1': value,
    'document_key_2': value,
    'document_key_3': {'$gte': datetime.datetime(yyyy, mm,dd) }} 
    projection= {'document_key_1' :1, 'document_key_2', 'document_key_3', '_id':0} 
    )
    ```
    
    
9. **import data**
    ```python
    df = pd.DataFrame(list(data)).set_index('time series column name')
    
    ```
     