**Problem Statement**

Equipment failure is a major cause of downtime in the telecommunications industry, which can
result in significant financial losses and customer dissatisfaction. To minimize downtime and
ensure optimal performance, it is crucial to identify potential equipment failures and schedule
maintenance accordingly proactively. This requires the collection and analysis of large amounts
of data generated by various equipment and network sensors.
The deliverable for this project is a data pipeline that can efficiently collect, clean, and analyze
equipment and network sensor data. The pipeline should be designed to identify potential
equipment failures and schedule maintenance proactively, minimizing downtime and improving
overall equipment performance. The data pipeline will be built using Python and PostgreSQL
and with the Postgres database hosted on Google Cloud.


**Install Pycopg2**

In [1]:
#Install Pycopg2
!pip install psycopg2-binary

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.9.5


In [3]:
#Import Libraries

import pandas as pd
import psycopg2
from sqlalchemy import create_engine

**Data Extraction**

In [6]:
def extract_data():
  #Equipment sensor dataframe
    equipment_df = pd.read_csv('equipment_sensor.csv')

    # Network sensor dataframe
    network_df = pd.read_csv('network_sensor.csv')

    # Maintenance dataframe
    maintenance_df = pd.read_csv('maintenance_records.csv')

    return equipment_df,network_df,maintenance_df

**Tranform Data**

Transform the data by removing duplicates, fixing missing data, and normalizing the data for consistency

In [19]:
def transform_data(equipment_df, network_df, maintenance_df):
    # Remove duplicates from the data
    equipment_df.drop_duplicates(inplace=True)
    network_df.drop_duplicates(inplace=True)
    maintenance_df.drop_duplicates(inplace=True)

    # Fixing missing data for easy analysis
    equipment_df=equipment_df.dropna()
    network_df=network_df.dropna()
    maintenance_df=maintenance_df.dropna()

    # Normalize the data by merging date and time into a new column then drop the other 2 columns
    equipment_df['date_time'] = pd.to_datetime(equipment_df['date'] + ' ' + equipment_df['time'])
    equipment_df.drop(['date', 'time'], axis=1, inplace=True)

    network_df['date_time'] = pd.to_datetime(network_df['date'] + ' ' + network_df['time'])
    network_df.drop(['date', 'time'], axis=1, inplace=True)

    maintenance_df['date_time'] = pd.to_datetime(maintenance_df['date'] + ' ' + maintenance_df['time'])
    maintenance_df.drop(['date', 'time'], axis=1, inplace=True)

    # Aggregate the data for equipments and network
    equipment_df = equipment_df.groupby('ID').agg({'date_time': ['min', 'max'], 'sensor_reading': ['mean', 'max']})
    equipment_df.columns = ['first_seen', 'last_seen', 'average_reading', 'max_reading']
    network_df = network_df.groupby('ID').agg({'date_time': ['min', 'max'], 'sensor_reading': ['mean', 'max']})
    network_df.columns = ['first_seen', 'last_seen', 'average_reading', 'max_reading']

    # Merge equipment data with network data into one dataset
    sensor_df = pd.merge(equipment_df, network_df, how='outer', left_index=True, right_index=True)
    sensor_df = sensor_df.reset_index()
    sensor_df = sensor_df.rename(columns={'ID': 'equipment_ID'})

    maintenance_df = maintenance_df[['date_time', 'equipment_ID', 'maintenance_type']]

    return sensor_df, maintenance_df

**Data Analysis**





**Data Loading**
We Load the data into a cloud based posgress database

In [10]:
#Define Cloud DB connection
POSTGRES_ADDRESS = '35.237.226.12'
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres'
POSTGRES_PASSWORD = 'password'
POSTGRES_DBNAME = 'telecommunications_data'

In [11]:
#Create an engine to load data

postgres_engine = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'
                .format(username=POSTGRES_USERNAME,
                        password=POSTGRES_PASSWORD,
                        ipaddress=POSTGRES_ADDRESS,
                        port=POSTGRES_PORT,
                        dbname=POSTGRES_DBNAME))
engine = create_engine(postgres_engine)

**Load data**
Weuse to_SQl function for this case

In [12]:
def load_data(sensor_df, maintenance_df):
   
    sensor_df.to_sql('sensor_summary', engine, if_exists='replace')
    maintenance_df.to_sql('maintenance_records', engine, if_exists='replace')

**Executing the program**

In [21]:
def main():
    equipment_df, network_df, maintenance_df = extract_data()
    sensor_df, maintenance_df = transform_data(equipment_df, network_df, maintenance_df)
    load_data(sensor_df, maintenance_df)
    
if __name__ == '__main__':
    main()

OperationalError: ignored