# **Telecommunications Fraud Detection  Using MongoDB and Python**

# **Problem Statement**
Telecommunications companies need to detect fraudulent activities such as unauthorized use of
premium services or fake billing. Building a data pipeline with MongoDB and Python could help
identify suspicious activity by extracting data from billing systems, call logs, and other sources,
transforming the data to identify patterns or anomalies, and storing it in MongoDB for further
analysis.
# **Background Information**
Telecommunications companies generate a vast amount of data daily, which can be used to
detect fraud. Fraudulent activity can lead to substantial financial losses and damage the
company's reputation. With the help of data pipelines, companies can detect fraud before it
escalates.






In [2]:
pip install pymongo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pymongo
  Downloading pymongo-4.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m492.1/492.1 KB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dnspython<3.0.0,>=1.16.0
  Downloading dnspython-2.3.0-py3-none-any.whl (283 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m283.7/283.7 KB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.3.0 pymongo-4.3.3


In [3]:
import pandas as pd
import pymongo
import logging

In [4]:

from google.colab import files
uploaded = files.upload()

Saving billing_systems.csv to billing_systems.csv
Saving call_logs.csv to call_logs.csv


Extract the data

In [25]:
# Extraction functions
def extract_calls():
    """Extract call logs from CSV file and convert call duration to minutes for easier analysis."""
    # Load call log data from CSV file
    call_logs = pd.read_csv('call_logs.csv')

    # Convert call duration to minutes for easier analysis
    call_logs['duration_minutes'] = call_logs['call_duration'] / 60

    # Use Python logging module to log errors and activities
    logger = logging.getLogger(__name__)
    logger.info("Call logs extraction completed.")

    return call_logs

def extract_billing():
    """Extract billing systems from CSV file."""
    # Load billing system data from CSV file
    billing_data = pd.read_csv('billing_systems.csv')

    # Use Python logging module to log errors and activities
    logger = logging.getLogger(__name__)
    logger.info("Billing systems extraction completed.")

    return billing_data


In [26]:
df_calls = extract_calls()
df_calls.head()

Unnamed: 0,call_id,caller_number,receiver_number,call_duration,call_type,call_date,duration_minutes
0,1,700123456,712345678,120,Outgoing,2022-02-21,2.0
1,2,712345678,755555555,60,Incoming,2022-02-21,1.0
2,3,722222222,777777777,180,Outgoing,2022-02-22,3.0
3,4,712345678,766666666,90,Incoming,2022-02-23,1.5
4,5,733333333,722222222,240,Outgoing,2022-02-23,4.0


In [27]:
df_billing = extract_billing()
df_billing.head()

Unnamed: 0,transaction_id,customer_id,transaction_amount,transaction_date,transaction_type
0,1,1001,500.0,2022-02-21,Recharge
1,2,1002,200.0,2022-02-21,Recharge
2,3,1001,50.0,2022-02-22,Data
3,4,1003,1000.0,2022-02-22,Recharge
4,5,1004,500.0,2022-02-23,Recharge


**transform data**

In [42]:
# Transformation functions
def transform_call(call_logs):
    """Clean call logs data and transform it to a list of dictionaries."""
    # Data cleaning and handling missing values
    transformed_data = call_logs.dropna()
    transformed_data = transformed_data.drop_duplicates()

    # Use Python logging module to log errors and activities
    logger = logging.getLogger(__name__)
    logger.info("Call logs transformation completed.")
    
    transformed_data = transformed_data.to_dict('records')
    
    return transformed_data

def transform_billing(billing_systems):
    """Clean billing systems data and transform it to a list of dictionaries."""
    # Data cleaning and handling missing values
    transformed_data = billing_systems.dropna()
    transformed_data = transformed_data.drop_duplicates()

    # Use Python logging module to log errors and activities
    logger = logging.getLogger(__name__)
    logger.info("Billing systems transformation completed.")
    
    transformed_data = transformed_data.to_dict('records')
    
    return transformed_data

loading data

In [48]:
# Loading function
def load_data(combined_data):
    """Load merged data to MongoDB."""
    # Connect to MongoDB
    client = pymongo.MongoClient("mongodb+srv://busaz:changeme_123@cluster0.yj2pr.mongodb.net/minPoolSize=5&maxPoolSize=50?retryWrites=true&w=majority",ssl=True,tlsInsecure=True)
    db = client["busaz"]
    collection = db["busaz"]

    # Create indexes on the collection 
    collection.create_index([('call_duration',pymongo.DESCENDING)],
                            storageEngine={
                                'wiredTiger': {
                                    'configString': 'block_compressor=snappy'
                                }
                            }
                           )

    # Use bulk inserts to optimize performance
    collection.insert_many(combined_data)

  
    
    try:
        collection.bulk_write(requests)
    except BulkWriteError as bw:
        pprint(bw.details)

    # Use Python logging module to log errors and activities
    logger = logging.getLogger(__name__)
    logger.info("Data loading completed.")

In [52]:
# Example usage
if __name__ == '__main__':
    call_logs = extract_calls()
    billing_systems = extract_billing()
    transformed_call_logs = transform_call(call_logs)
    transformed_billing_systems = transform_billing(billing_systems)
    combined_data = transformed_call_logs + transformed_billing_systems
    load_data(combined_data)