In MongoDB, 
- we have a database created and named it as capstonejigsawdb, 
- The tables in SQL are similar to the collection in the MongoDB - here we have a single collection - loandata
- Finally the row is SQL is equivalent to documents in MongoDB

We have loaded the csv file into this database collections

In [11]:
from pymongo.mongo_client import MongoClient
from prettytable import PrettyTable
from pprint import pprint

In [2]:
uri = "<uri-of-data>"
client = MongoClient(uri)
# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


In [3]:
mydb = client["capstonejigsawdb"]
mycol = mydb["loandata"]


# Reporting:

#### 1. Get the total number of documents present

In [4]:
total_count = mycol.count_documents({})
print("Total number of documents :",total_count)

Total number of documents : 23315


#### 2. We will check how many customers are loan defualters and we will also see it in percentage

In [24]:
loansDefault_qry = [
  {
    "$group": {
      "_id": "$loan_default",
      "count": { "$sum": 1 }
    }
  },
  {
    "$project": {
      "_id": 0,
      "Loan_Default": "$_id",
      "count": 1,
      "percentage": {
        "$concat": [
          { "$toString": { "$multiply": [ { "$divide": [ "$count", total_count ] }, 100 ] } },
          "%"
        ]
      }
    }
  }
]

loan_defaulters = mycol.aggregate(loansDefault_qry)

#print('Number of Loans Defaulters in terms of percentage')
#for cust1 in loan_defaulters:
#    pprint(cust1)


In [25]:
pt = PrettyTable()
#pt = PrettyTable(
pt.field_names =['Loan Default', 'Count', 'Percentage']
for cust in loan_defaulters:
    pt.add_row([cust['Loan_Default'], cust['count'], cust['percentage']])
    
print(pt)    

+--------------+-------+------------+
| Loan Default | Count | Percentage |
+--------------+-------+------------+
|      0       | 18189 |  78.0142%  |
|      1       |  5126 |  21.9858%  |
+--------------+-------+------------+


#### 3. We will analyze of many types of data are their in Emloyment Type column

In [27]:
numEmps_qry = [
  {
    "$group": {
      "_id": "$Employment_Type",
      "count": { "$sum": { "$cond": [{ "$eq": ["$loan_default", 1] }, 1, 0] } }
    }
  },
  {
    "$match": {
      "count": { "$gt": 0 }
    }
  }
]
etype_defaults = mycol.aggregate(numEmps_qry)


In [29]:
pt = PrettyTable()
#pt = PrettyTable(
pt.field_names =['Employment Type', 'Count']
for cust in etype_defaults:
    pt.add_row([cust['_id'], cust['count']])
    
print(pt)  

+-----------------+-------+
| Employment Type | Count |
+-----------------+-------+
|  Self employed  |  3111 |
|     Salaried    |  2015 |
+-----------------+-------+


#### 4. We will examine the number of defaulters, non-defaulters accross the employment type

In [32]:
employeesDefault_qry = [
  {
    "$group": {
      "_id": {
        "eType": "$Employment_Type",
        "loanType": "$loan_default"
      },
      "total": {
        "$sum": 1
      }
    }
  },
  {
    "$group": {
      "_id": "$_id.eType",
      "eType": {
        "$push": "$$ROOT"
      },
      "total": {
        "$sum": "$total"
      }
    }
  },
  {
    "$addFields": {
      "eType": {
        "$map": {
          "input": "$eType",
          "in": {
            "_id": "$$this._id",
            "count":"$$this.total",
            "percentage": {
              "$multiply": [
                {
                  "$divide": [ "$$this.total", total_count ]
                },
                100
              ]
            }
          }
        }
      }
    }
  },
  {
    "$unwind": "$eType"
  },
  {
    "$replaceRoot": {
      "newRoot": "$eType"
    }
  }
]

default_by_etype = mycol.aggregate(employeesDefault_qry)

In [33]:
pt = PrettyTable()
#pt = PrettyTable(
pt.field_names =['Employment Type','LoanDefault', 'Count', 'Percentage']
for cust in default_by_etype:
    pt.add_row([cust['_id']['eType'], cust['_id']['loanType'], cust['count'], cust['percentage']])
    
print(pt)

+-----------------+-------------+-------+--------------------+
| Employment Type | LoanDefault | Count |     Percentage     |
+-----------------+-------------+-------+--------------------+
|  Self employed  |      0      | 10383 | 44.53356208449496  |
|  Self employed  |      1      |  3111 | 13.343341196654515 |
|     Salaried    |      0      |  7806 | 33.48059189363071  |
|     Salaried    |      1      |  2015 | 8.642504825219815  |
+-----------------+-------------+-------+--------------------+


#### 5. Get the defaulter percentage branch-wise

In [51]:
branchwise_qry = [
  {"$match": {"loan_default":1}},
  {  
    "$group": {
      "_id": {
        "branch": "$branch_id"
      },
      "total": {
        "$sum": 1
      }
    }
  },
  {
    "$group": {
      "_id": "$_id.branch",
      "branch": {
        "$push": "$$ROOT"
      },
      "total": {
        "$sum": "$total"
      }
    }
  },
  {
    "$addFields": {
      "branch": {
        "$map": {
          "input": "$branch",
          "in": {
            "_id": "$$this._id",
            "count":"$$this.total",
            "percentage": {
              "$multiply": [
                {
                  "$divide": [ "$$this.total", total_count ]
                },
                100
              ]
            }
          }
        }
      }
    }
  },  
  {
    "$unwind": "$branch"
  },
  {
    "$replaceRoot": {
      "newRoot": "$branch"
    }
  },
  {"$sort": {"count":-1}}    
]

branchWise = mycol.aggregate(branchwise_qry)


In [52]:

pt = PrettyTable()
pt.field_names =['Branch Id', 'Count', 'Percentage']
for cust in branchWise:
    pt.add_row([cust['_id']['branch'], cust['count'], cust['percentage']])
    
print(pt)

+-----------+-------+----------------------+
| Branch Id | Count |      Percentage      |
+-----------+-------+----------------------+
|     36    |  292  |  1.2524126099077846  |
|     2     |  241  |  1.0336693116019728  |
|     67    |  214  |  0.9178640360283079  |
|     5     |  198  |   0.84923868754021   |
|     16    |  180  |  0.7720351704911002  |
|    136    |  167  |  0.7162770748445206  |
|     3     |  162  |  0.6948316534419902  |
|    146    |  153  |  0.6562298949174351  |
|     74    |  139  |  0.5961827149903496  |
|     34    |  139  |  0.5961827149903496  |
|     18    |  133  |  0.5704482093073129  |
|    251    |  120  |  0.5146901136607335  |
|    147    |  115  |  0.4932446922582029  |
|     65    |  113  | 0.48466652369719065  |
|    120    |  108  |  0.4632211022946601  |
|     10    |  107  | 0.45893201801415395  |
|     61    |  104  | 0.44606476517263566  |
|     11    |   99  |  0.424619343770105   |
|    138    |   92  |  0.3945957538065623  |
|     20  

#### 6. Get the defaulter age wise refer: https://en.wikipedia.org/wiki/Generation#List_of_social_generations
#### Born in between 1946 - 1964:Baby Boomers; 1965 - 1976:Gen X; 1977 - 1995:Millennials; 1995 - 2010:GenZ

In [59]:
agegen_qry = [
  {"$match": {"loan_default":1}},  
  {
    "$group": {
      "_id": {
        "genage": "$GenCategory",
        "loanType": "$loan_default"
      },
      "total": {
        "$sum": 1
      }
    }
  },
  {
    "$group": {
      "_id": "$_id.genage",
      "genage": {
        "$push": "$$ROOT"
      },
      "total": {
        "$sum": "$total"
      }
    }
  },
  {
    "$addFields": {
      "genage": {
        "$map": {
          "input": "$genage",
          "in": {
            "_id": "$$this._id",
            "count":"$$this.total",
            "percentage": {
              "$multiply": [
                {
                  "$divide": [ "$$this.total", total_count ]
                },
                100
              ]
            }
          }
        }
      }
    }
  },
  {
    "$unwind": "$genage"
  },
  {
    "$replaceRoot": {
      "newRoot": "$genage"
    }
  }
]
agedefaults = mycol.aggregate(agegen_qry)

In [60]:
pt = PrettyTable()
pt.field_names =['Age-Generation-Name', 'Count', 'Percentage']
for cust in agedefaults:
    pt.add_row([cust['_id']['genage'], cust['count'], cust['percentage']])
    
print(pt)

+---------------------+-------+--------------------+
| Age-Generation-Name | Count |     Percentage     |
+---------------------+-------+--------------------+
|      millenials     |  3239 | 13.892343984559297 |
|       zoomers       |  290  | 1.2438344413467723 |
|         genx        |  1433 | 6.146257773965258  |
|       boomers       |  164  | 0.7034098220030024 |
+---------------------+-------+--------------------+
