<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="240" height="100" /></center>

# Data Abstraction End-to-End Series (Mongo DB)

---
# **Table of Contents**
---

**1.** [**Introduction**](#Section1)<br>
  - **1.1** [**What is MongoDB?**](#Section11)

**2.** [**Problem Statement**](#Section2)<br>

**3.** [**Installing and Importing Libraries**](#Section3)<br>

**4.** [**Importing CSV to MongoDB**](#Section6)<br>

---
<a name = Section1></a>
# **1. Introduction**
---

<a name = Section11></a>
### **1.1 What is MongoDB?**

- MongoDB is a **document database**, which means it **stores data** in **JSON-like documents**.

- JSON Document Characteristics:
 - The most **natural and productive** way to work with data.
 - Supports **arrays** and **nested objects** as values.
 - Allows for **flexible and dynamic schemas**.

- **Support** for aggregations and other modern use-cases such as **geo-based search**, **graph search**, and **text search**.

- **Queries are themselves JSON**, and thus easily composable. No more concatenating strings to dynamically generate SQL queries.

- All the power of a relational database:
  - **Support for joins** in queries.
  - Distributed multi-document **ACID transactions** with snapshot isolation

---
<a name = Section2></a>
# **2. Problem Statement**
---

Due to the **boom** in the telecom industry with **4G technology**, it has become a pain in the neck for the company to **retain their customers**.

<center><img src="https://raw.githubusercontent.com/insaid2018/Domain_Case_Studies/master/Telecom/churn2.png"width="350" height="220"/></center>

- They are in the **middle** of setting up more **cell sites** on the **4G network** to improve their **4G services**.

-  It is **plausible** for customers to choose **4G services** over **3G services** due to benefits of **cost,speed, latency etc**.



- Till now they have been using manual traditional ways which now has become a problem to handle due to work complication.

<center><img src = "https://raw.githubusercontent.com/insaid2018/Term-2/master/images/87217572-51866a00-c368-11ea-90b5-dd2e28fd00de.jpg"width="400" height="280"/></center>

- They have a detailed history of their customers and are looking for an automated solution toidentify the likeliness of customer churning from using their services.

- The data is **stored** in their **MongoDB** and you need to **extract to move further with your Data Science skills**.

---
<a name = Section3></a>
# **3. Installing & Importing Libraries**
---


### **Installing Libraries**

In [1]:
!pip install pymongo[srv]

Collecting dnspython<2.0.0,>=1.16.0; extra == "srv"
[?25l  Downloading https://files.pythonhosted.org/packages/ec/d3/3aa0e7213ef72b8585747aa0e271a9523e713813b9a20177ebe1e939deb0/dnspython-1.16.0-py2.py3-none-any.whl (188kB)
[K     |████████████████████████████████| 194kB 8.5MB/s 
[?25hInstalling collected packages: dnspython
Successfully installed dnspython-1.16.0



### **Importing Libraries**

In [2]:
import pymongo
pymongo.version

'3.11.4'

In [3]:
from pymongo import MongoClient
import urllib
import pandas as pd
import json
import numpy as np
from bson import ObjectId

---
<a name = Section4></a>
# **4. Data Acquisition using MongoDB**
---


### **Calling MongoClient to connect to our data base**

In [None]:
#"mongodb+srv://test":+urllib.parse.quote('test')+"@cluster0.xznab.mongodb.net/Telecom?retryWrites=true&w=majority"

In [4]:
string_mongo = "mongodb+srv://test:"+urllib.parse.quote('test')+"@cluster0.xznab.mongodb.net/Telecom?retryWrites=true&w=majority"
client = MongoClient(string_mongo)

In [5]:
db = client['Telecom']

In [6]:
db

Database(MongoClient(host=['cluster0-shard-00-01.xznab.mongodb.net:27017', 'cluster0-shard-00-00.xznab.mongodb.net:27017', 'cluster0-shard-00-02.xznab.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-k11l4q-shard-0', ssl=True), 'Telecom')


### **Getting the list of all the collections in our DB**

In [7]:
db.list_collection_names()   # To see the names of all the collections

['Churn']

In [8]:
db_c = db['Churn']


### **Counting number of Documents in our collection**

In [9]:
db_c.count_documents({})      # To count the number of documents present in the database

6


### **Finding the Correct Document**

In [13]:
cursor = db_c.find({})        # To see all the documents present in the collection
for document in cursor:
    print(document)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)




### **Fetching and calling the data in our notebook**

In [14]:
mongo_docs = db_c.find({'_id': ObjectId('601931bd1529e4bd2df2900b')})
mongo_docs

<pymongo.cursor.Cursor at 0x7f59b06eba50>


### **Converting MongoDB's JSON format to CSV**

In [15]:
fields = {}
for doc in mongo_docs:
    for key, val in doc.items():
        try:
            fields[Key] = np.append(fields[key], val)
        except KeyError:
            fields[key] = np.array([val])

print(fields)

{}


In [16]:
series_list = []
columns = []
for key, val in fields.items():
    if key != "_id":
        #fields[key] = pd.Series(fields[key])
        #fields[key].index = fields["_id"]
        columns.append(key)
        series_list.append(fields[key])

In [None]:
columns

['customerID',
 'tenure',
 'PhoneService',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod',
 'MonthlyCharges',
 'TotalCharges',
 'Churn']

In [17]:
df_series = {}
temp = []
for num, series in enumerate(series_list):
    val = series[0].values()
    df_series[columns[num]] = val
    temp.append(val)

mongo_df = pd.DataFrame(df_series)


In [19]:
mongo_df.head()

In [20]:
mongo_df.dtypes

Series([], dtype: object)

In [21]:
mongo_df.to_csv("Churn_data.csv")

---
<a name = Section4></a>
# **5. Data Acquisition using MySQL**
---

---
<a name = Section4></a>
# **6. importing CSV to MongoDB**
---

In [None]:
df = pd.read_csv("C:/Users/lenovo/Documents/churn_data.csv")

In [None]:
df.to_json('churn.json')

In [None]:
json_df = open('churn.json').read()
data = json.loads(json_df)

In [None]:
string_mongo = 
client = MongoClient(string_mongo)

In [None]:
db = client['Telecom']

In [None]:
db_c = db['Churn']

In [None]:
db_c.insert_many([data])