

# Data Abstraction End-to-End Series (Mongo DB)

---
# **Table of Contents**
---

**1.** [**Introduction**](#Section1)<br>
  - **1.1** [**What is MongoDB?**](#Section11)

**2.** [**Problem Statement**](#Section2)<br>

**3.** [**Installing and Importing Libraries**](#Section3)<br>

**4.** [**Importing CSV to MongoDB**](#Section6)<br>

---
<a name = Section1></a>
# **1. Introduction**
---

<a name = Section11></a>
### **1.1 What is MongoDB?**

- MongoDB is a **document database**, which means it **stores data** in **JSON-like documents**.

- JSON Document Characteristics:
 - The most **natural and productive** way to work with data.
 - Supports **arrays** and **nested objects** as values.
 - Allows for **flexible and dynamic schemas**.

- **Support** for aggregations and other modern use-cases such as **geo-based search**, **graph search**, and **text search**.

- **Queries are themselves JSON**, and thus easily composable. No more concatenating strings to dynamically generate SQL queries.

- All the power of a relational database:
  - **Support for joins** in queries.
  - Distributed multi-document **ACID transactions** with snapshot isolation

---
<a name = Section2></a>
# **2. Problem Statement**
---

Due to the **boom** in the telecom industry with **4G technology**, it has become a pain in the neck for the company to **retain their customers**.

<center><img src="https://raw.githubusercontent.com/insaid2018/Domain_Case_Studies/master/Telecom/churn2.png"width="350" height="220"/></center>

- They are in the **middle** of setting up more **cell sites** on the **4G network** to improve their **4G services**.

-  It is **plausible** for customers to choose **4G services** over **3G services** due to benefits of **cost,speed, latency etc**.



- Till now they have been using manual traditional ways which now has become a problem to handle due to work complication.

<center><img src = "https://raw.githubusercontent.com/insaid2018/Term-2/master/images/87217572-51866a00-c368-11ea-90b5-dd2e28fd00de.jpg"width="400" height="280"/></center>

- They have a detailed history of their customers and are looking for an automated solution toidentify the likeliness of customer churning from using their services.

- The data is **stored** in their **MongoDB** and you need to **extract to move further with your Data Science skills**.

---
<a name = Section3></a>
# **3. Installing & Importing Libraries**
---


### **Installing Libraries**

In [None]:
!pip install pymongo[srv]


### **Importing Libraries**

In [1]:
import pymongo
pymongo.version

'3.11.2'

In [2]:
from pymongo import MongoClient
import urllib
import pandas as pd
import json
import numpy as np
from bson import ObjectId

---
<a name = Section4></a>
# **4. Data Acquisition using MongoDB**
---


### **Calling MongoClient to connect to our data base**

In [3]:
#"mongodb+srv://test":+urllib.parse.quote('test')+"@cluster0.xznab.mongodb.net/Telecom?retryWrites=true&w=majority"

In [4]:
string_mongo = "mongodb+srv://test:"+urllib.parse.quote('test')+"@cluster0.xznab.mongodb.net/Telecom?retryWrites=true&w=majority"
client = MongoClient(string_mongo)

In [5]:
db = client['Telecom']

In [6]:
db

Database(MongoClient(host=['cluster0-shard-00-01.xznab.mongodb.net:27017', 'cluster0-shard-00-02.xznab.mongodb.net:27017', 'cluster0-shard-00-00.xznab.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-k11l4q-shard-0', ssl=True), 'Telecom')


### **Getting the list of all the collections in our DB**

In [7]:
db.list_collection_names()   # To see the names of all the collections

['Churn']

In [8]:
db_c = db['Churn']


### **Counting number of Documents in our collection**

In [9]:
db_c.count_documents({})      # To count the number of documents present in the database

1


### **Finding the Correct Document**

In [10]:
cursor = db_c.find({})        # To see all the documents present in the collection
for document in cursor:
    print(document)

{'_id': ObjectId('601931bd1529e4bd2df2900b'), 'customerID': {'0': '7590-VHVEG', '1': '5575-GNVDE', '2': '3668-QPYBK', '3': '7795-CFOCW', '4': '9237-HQITU', '5': '9305-CDSKC', '6': '1452-KIOVK', '7': '6713-OKOMC', '8': '7892-POOKP', '9': '6388-TABGU', '10': '9763-GRSKD', '11': '7469-LKBCI', '12': '8091-TTVAX', '13': '0280-XJGEX', '14': '5129-JLPIS', '15': '3655-SNQYZ', '16': '8191-XWSZG', '17': '9959-WOFKT', '18': '4190-MFLUW', '19': '4183-MYFRB', '20': '8779-QRDMV', '21': '1680-VDCWW', '22': '1066-JKSGK', '23': '3638-WEABW', '24': '6322-HRPFA', '25': '6865-JZNKO', '26': '6467-CHFZW', '27': '8665-UTDHZ', '28': '5248-YGIJN', '29': '8773-HHUOZ', '30': '3841-NFECX', '31': '4929-XIHVW', '32': '6827-IEAUQ', '33': '7310-EGVHZ', '34': '3413-BMNZE', '35': '6234-RAAPL', '36': '6047-YHPVI', '37': '6572-ADKRS', '38': '5380-WJKOV', '39': '8168-UQWWF', '40': '8865-TNMNX', '41': '9489-DEDVP', '42': '9867-JCZSP', '43': '4671-VJLCL', '44': '4080-IIARD', '45': '3714-NTNFO', '46': '5948-UJZLF', '47': '77


### **Fetching and calling the data in our notebook**

In [11]:
mongo_docs = db_c.find({'_id': ObjectId('601931bd1529e4bd2df2900b')})
mongo_docs

<pymongo.cursor.Cursor at 0x1ea85480160>


### **Converting MongoDB's JSON format to CSV**

In [12]:
fields = {}
for doc in mongo_docs:
    for key, val in doc.items():
        try:
            fields[Key] = np.append(fields[key], val)
        except KeyError:
            fields[key] = np.array([val])

print(fields)

{'_id': array([ObjectId('601931bd1529e4bd2df2900b')], dtype=object), 'customerID': array([{'0': '7590-VHVEG', '1': '5575-GNVDE', '2': '3668-QPYBK', '3': '7795-CFOCW', '4': '9237-HQITU', '5': '9305-CDSKC', '6': '1452-KIOVK', '7': '6713-OKOMC', '8': '7892-POOKP', '9': '6388-TABGU', '10': '9763-GRSKD', '11': '7469-LKBCI', '12': '8091-TTVAX', '13': '0280-XJGEX', '14': '5129-JLPIS', '15': '3655-SNQYZ', '16': '8191-XWSZG', '17': '9959-WOFKT', '18': '4190-MFLUW', '19': '4183-MYFRB', '20': '8779-QRDMV', '21': '1680-VDCWW', '22': '1066-JKSGK', '23': '3638-WEABW', '24': '6322-HRPFA', '25': '6865-JZNKO', '26': '6467-CHFZW', '27': '8665-UTDHZ', '28': '5248-YGIJN', '29': '8773-HHUOZ', '30': '3841-NFECX', '31': '4929-XIHVW', '32': '6827-IEAUQ', '33': '7310-EGVHZ', '34': '3413-BMNZE', '35': '6234-RAAPL', '36': '6047-YHPVI', '37': '6572-ADKRS', '38': '5380-WJKOV', '39': '8168-UQWWF', '40': '8865-TNMNX', '41': '9489-DEDVP', '42': '9867-JCZSP', '43': '4671-VJLCL', '44': '4080-IIARD', '45': '3714-NTNFO',

In [13]:
series_list = []
columns = []
for key, val in fields.items():
    if key != "_id":
        #fields[key] = pd.Series(fields[key])
        #fields[key].index = fields["_id"]
        columns.append(key)
        series_list.append(fields[key])

In [14]:
columns

['customerID',
 'tenure',
 'PhoneService',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod',
 'MonthlyCharges',
 'TotalCharges',
 'Churn']

In [25]:
df_series = {}
temp = []
for num, series in enumerate(series_list):
    val = series[0].values()
    df_series[columns[num]] = val
    temp.append(val)

mongo_df = pd.DataFrame(df_series)


In [26]:
mongo_df.head()

Unnamed: 0,customerID,tenure,PhoneService,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,"(7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOC...","(1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 5...","(No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes...","(Month-to-month, One year, Month-to-month, One...","(Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No,...","(Electronic check, Mailed check, Mailed check,...","(29.85, 56.95, 53.85, 42.3, 70.7, 99.65, 89.1,...","(29.85, 1889.5, 108.15, 1840.75, 151.65, 820.5...","(No, No, Yes, No, Yes, Yes, No, No, Yes, No, N..."
1,"(7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOC...","(1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 5...","(No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes...","(Month-to-month, One year, Month-to-month, One...","(Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No,...","(Electronic check, Mailed check, Mailed check,...","(29.85, 56.95, 53.85, 42.3, 70.7, 99.65, 89.1,...","(29.85, 1889.5, 108.15, 1840.75, 151.65, 820.5...","(No, No, Yes, No, Yes, Yes, No, No, Yes, No, N..."
2,"(7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOC...","(1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 5...","(No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes...","(Month-to-month, One year, Month-to-month, One...","(Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No,...","(Electronic check, Mailed check, Mailed check,...","(29.85, 56.95, 53.85, 42.3, 70.7, 99.65, 89.1,...","(29.85, 1889.5, 108.15, 1840.75, 151.65, 820.5...","(No, No, Yes, No, Yes, Yes, No, No, Yes, No, N..."
3,"(7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOC...","(1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 5...","(No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes...","(Month-to-month, One year, Month-to-month, One...","(Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No,...","(Electronic check, Mailed check, Mailed check,...","(29.85, 56.95, 53.85, 42.3, 70.7, 99.65, 89.1,...","(29.85, 1889.5, 108.15, 1840.75, 151.65, 820.5...","(No, No, Yes, No, Yes, Yes, No, No, Yes, No, N..."
4,"(7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOC...","(1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 5...","(No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes...","(Month-to-month, One year, Month-to-month, One...","(Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No,...","(Electronic check, Mailed check, Mailed check,...","(29.85, 56.95, 53.85, 42.3, 70.7, 99.65, 89.1,...","(29.85, 1889.5, 108.15, 1840.75, 151.65, 820.5...","(No, No, Yes, No, Yes, Yes, No, No, Yes, No, N..."


In [None]:
mongo_df.dtypes

In [None]:
mongo_df.to_csv("Churn_data.csv")

---
<a name = Section4></a>
# **5. Data Acquisition using MySQL**
---

---
<a name = Section4></a>
# **6. importing CSV to MongoDB**
---

In [None]:
df = pd.read_csv("C:/Users/lenovo/Documents/churn_data.csv")

In [None]:
df.to_json('churn.json')

In [None]:
json_df = open('churn.json').read()
data = json.loads(json_df)

In [None]:
string_mongo = 
client = MongoClient(string_mongo)

In [None]:
db = client['Telecom']

In [None]:
db_c = db['Churn']

In [None]:
db_c.insert_many([data])