# Inbound Notebook

This notebook is designed to semi-automate the reporting process for the Inbound team. It will streamline data extraction, transformation, and loading into a pre-formatted Excel file.

## Manual Preparation

The first step involves manually preparing the data in Excel:

1. **Filter the Pivot Table:**
   - Apply filters to the pivot table to extract the following categories:
     - Active
     - Canceled
     - Pending Signature
     - Net

2. **Create Separate Sheets:**
   - For each category (Active, Canceled, Pending Signature, Net), create a separate sheet in the Excel file containing the filtered data.

3. **Save the Excel File:**
   - Save the prepared Excel file with a specific name, ensuring it contains the sheets with the filtered data.

4. **Upload the Excel File:**
   - Upload the prepared Excel file to the designated directory.

## Library Installation

Ensure that the necessary libraries are installed before running the notebook.

In [2]:
# Import necessary libraries
%pip install openpyxl
from openpyxl import load_workbook
import pandas as pd
import os
import re

print("Skeleton setup complete!")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Skeleton setup complete!


## Variable Declaration

Set the variables for file paths, sheet names, and other configurations. Update these variables for each specific project.

In [3]:
# Path to the Excel file (change this for each project)
excel_file_path = '/workspaces/Finetwork-Automation/FORDANI.xlsx'

# Sheet names for different categories
sheet_active = 'Active'
sheet_canceled = 'Canceled'
sheet_pending = 'Pending Signature'
sheet_net = 'Net'

# Range to read (change this for each project)
start_row = 8
end_row = 65
usecols = 'A:AF'

print("Variables defined correctly!")

Variables defined correctly!


## Extract Data from 'Active' Sheet

Extract data from the "Active" sheet within the specified range and convert it directly to a DataFrame.

## Verify Columns in 'Active' Sheet

Verify the number of columns in the "Active" sheet to ensure the range is within bounds.

In [4]:
# Function to verify the number of columns
def verify_columns(file_path, sheet_name):
    workbook = load_workbook(filename=file_path, data_only=True)
    sheet = workbook[sheet_name]
    max_column = sheet.max_column
    return max_column

# Check the number of columns in the 'Active' sheet
max_column_active = verify_columns(excel_file_path, 'Active')
print(f"Max column in 'Active' sheet: {max_column_active}")

# Check if the number of columns matches the expected range
expected_columns = 32  # Columns from A to AF (inclusive)
if max_column_active < expected_columns:
    usecols = f"A:{chr(64+max_column_active)}"
    print(f"Adjusted usecols to: {usecols}")
else:
    print(f"Using default usecols: {usecols}")

Max column in 'Active' sheet: 7
Adjusted usecols to: A:G


## Extract Data from 'Active' Sheet

Extract data from the "Active" sheet within the specified range and convert it directly to a DataFrame.

In [5]:
def load_sheet_as_dataframe(file_path, sheet_name, start_row, end_row, usecols):
    # Load data from the specified sheet and range into a DataFrame
    df = pd.read_excel(file_path, sheet_name=sheet_name, usecols=usecols, skiprows=start_row-1, nrows=end_row-start_row+1)
    print(f"Data from '{sheet_name}' sheet loaded successfully.")
    return df

# Extract data from 'Active' sheet
active_df = load_sheet_as_dataframe(excel_file_path, 'Active', start_row, end_row, usecols)

Data from 'Active' sheet loaded successfully.


## Extract Data from 'Canceled' Sheet

Extract data from the "Canceled" sheet within the specified range and convert it directly to a DataFrame.

In [6]:
# Extract data from 'Canceled' sheet
canceled_df = load_sheet_as_dataframe(excel_file_path, 'Canceled', start_row, end_row, usecols)

Data from 'Canceled' sheet loaded successfully.


## Extract Data from 'Pending Signature' Sheet

Extract data from the "Pending Signature" sheet within the specified range and convert it directly to a DataFrame.

In [7]:
# Extract data from 'Pending Signature' sheet
pending_signature_df = load_sheet_as_dataframe(excel_file_path, 'Pending Signature', start_row, end_row, usecols)

Data from 'Pending Signature' sheet loaded successfully.


## Extract Data from 'Net' Sheet

Extract data from the "Net" sheet within the specified range and convert it directly to a DataFrame.

In [8]:
# Extract data from 'Net' sheet
net_df = load_sheet_as_dataframe(excel_file_path, 'Net', start_row, end_row, usecols)

Data from 'Net' sheet loaded successfully.


## Display DataFrames

Display the first few rows of each DataFrame to verify the data.

In [9]:
# Display the DataFrames
print("Active DataFrame:")
display(active_df.head())

print("Canceled DataFrame:")
display(canceled_df.head())

print("Pending Signature DataFrame:")
display(pending_signature_df.head())

print("Net DataFrame:")
display(net_df.head())

Active DataFrame:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,382.0,356.0,217.0,141.0,48.0,1144
1,albaaraujo@originaltelecom.es,11.0,8.0,,,,19
2,albertocanto@originaltelecom.es,9.0,9.0,,,,18
3,albertosanchez@originaltelecom.es,17.0,11.0,,,,28
4,anasanchez@originaltelecom.es,,,24.0,19.0,,43


Canceled DataFrame:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,79.0,44.0,23.0,19.0,7.0,172
1,albertocanto@originaltelecom.es,,5.0,,,,5
2,albertosanchez@originaltelecom.es,2.0,,,,,2
3,anasanchez@originaltelecom.es,,,2.0,,,2
4,antonio.reina@originaltelecom.es,,5.0,,,,5


Pending Signature DataFrame:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,30.0,27.0,6.0,10.0,9.0,82
1,albaaraujo@originaltelecom.es,,1.0,,,,1
2,albertocanto@originaltelecom.es,4.0,,,,,4
3,antonio.reina@originaltelecom.es,,1.0,,,,1
4,azahara.garcia@originaltelecom.es,,,,1.0,,1


Net DataFrame:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,251.0,152.0,63.0,55.0,21.0,542
1,albaaraujo@originaltelecom.es,7.0,3.0,,,,10
2,albertocanto@originaltelecom.es,6.0,5.0,,,,11
3,albertosanchez@originaltelecom.es,11.0,5.0,,,,16
4,anasanchez@originaltelecom.es,,,7.0,6.0,,13


## Replace NaN with 0

Replace all NaN values in the DataFrames with 0 to facilitate further transformations.

In [10]:
def replace_nan_with_zero(df):
    """
    Replace all NaN values in the DataFrame with 0.
    
    Parameters:
    df (pd.DataFrame): The DataFrame to process.
    
    Returns:
    pd.DataFrame: The processed DataFrame with NaN replaced by 0.
    """
    df = df.fillna(0)
    print("Replaced NaN with 0.")
    return df

## Apply Transformation

Apply the transformation to replace NaN values with 0 in each DataFrame.

In [11]:
# Apply the transformation
active_df = replace_nan_with_zero(active_df)
canceled_df = replace_nan_with_zero(canceled_df)
pending_signature_df = replace_nan_with_zero(pending_signature_df)
net_df = replace_nan_with_zero(net_df)

# Display the transformed DataFrames
print("Active DataFrame after replacing NaN:")
display(active_df.head())

print("Canceled DataFrame after replacing NaN:")
display(canceled_df.head())

print("Pending Signature DataFrame after replacing NaN:")
display(pending_signature_df.head())

print("Net DataFrame after replacing NaN:")
display(net_df.head())

Replaced NaN with 0.
Replaced NaN with 0.
Replaced NaN with 0.
Replaced NaN with 0.
Active DataFrame after replacing NaN:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,382.0,356.0,217.0,141.0,48.0,1144
1,albaaraujo@originaltelecom.es,11.0,8.0,0.0,0.0,0.0,19
2,albertocanto@originaltelecom.es,9.0,9.0,0.0,0.0,0.0,18
3,albertosanchez@originaltelecom.es,17.0,11.0,0.0,0.0,0.0,28
4,anasanchez@originaltelecom.es,0.0,0.0,24.0,19.0,0.0,43


Canceled DataFrame after replacing NaN:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,79.0,44.0,23.0,19.0,7.0,172
1,albertocanto@originaltelecom.es,0.0,5.0,0.0,0.0,0.0,5
2,albertosanchez@originaltelecom.es,2.0,0.0,0.0,0.0,0.0,2
3,anasanchez@originaltelecom.es,0.0,0.0,2.0,0.0,0.0,2
4,antonio.reina@originaltelecom.es,0.0,5.0,0.0,0.0,0.0,5


Pending Signature DataFrame after replacing NaN:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,30.0,27.0,6.0,10.0,9.0,82
1,albaaraujo@originaltelecom.es,0.0,1.0,0.0,0.0,0.0,1
2,albertocanto@originaltelecom.es,4.0,0.0,0.0,0.0,0.0,4
3,antonio.reina@originaltelecom.es,0.0,1.0,0.0,0.0,0.0,1
4,azahara.garcia@originaltelecom.es,0.0,0.0,0.0,1.0,0.0,1


Net DataFrame after replacing NaN:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
0,Inbound Telec.Orig.Sevilla,251.0,152.0,63.0,55.0,21.0,542
1,albaaraujo@originaltelecom.es,7.0,3.0,0.0,0.0,0.0,10
2,albertocanto@originaltelecom.es,6.0,5.0,0.0,0.0,0.0,11
3,albertosanchez@originaltelecom.es,11.0,5.0,0.0,0.0,0.0,16
4,anasanchez@originaltelecom.es,0.0,0.0,7.0,6.0,0.0,13


## Load Agents List

Load the list of all agents from the "Agents" sheet.

In [12]:
# Load the list of agents
agents_df = pd.read_excel(excel_file_path, sheet_name='Agents', usecols='A')
agents_list = agents_df.iloc[:, 0].tolist()
print("Agents list loaded successfully!")
print(agents_list)

Agents list loaded successfully!
[nan, nan, nan, nan, nan, nan, nan, 'albaaraujo@originaltelecom.es', 'albertocanto@originaltelecom.es', 'albertosanchez@originaltelecom.es', 'anasanchez@originaltelecom.es', 'antonio.reina@originaltelecom.es', 'azahara.garcia@originaltelecom.es', 'beatriz.gomez@originaltelecom.es', 'carmen.cornejo@originaltelecom.es', 'carolinafuentes@originaltelecom.es', 'cesar.arnaldo@originaltelecom.es', 'chamberlayn.villegascenci@cgi.com', 'david.molero@originaltelecom.es', 'diego.temblador.fi@originaltelecom.es', 'diego.temblador@originaltelecom.es', 'dolores.cortes@originaltelecom.es', 'elenaborrero@originaltelecom.es', 'enrique.miranda@originaltelecom.es', 'estefania.panea@originaltelecom.es', 'formacion10@originaltelecom.es', 'formacion3@originaltelecom.es', 'formacion4@originaltelecom.es', 'franciscacoromoto.tovarnavarro@cgi.com', 'francisco.mariscal@originaltelecom.es', 'francisco.perdomo@originaltelecom.es', 'gonzalofalcon@originaltelecom.es', 'guillermo.hurt

## Verify and Complete Data

Verify that all agents are present in each DataFrame. If an agent is missing, add a row with zeros for that agent.

In [14]:
def ensure_all_agents(df, agents_list):
    """
    Ensure all agents are present in the DataFrame. Add missing agents with zero values and remove agents not in the list.
    
    Parameters:
    df (pd.DataFrame): The DataFrame to check and update.
    agents_list (list): The list of all agents.
    
    Returns:
    pd.DataFrame: The updated DataFrame with all agents.
    """
    # Get the list of agents in the DataFrame
    existing_agents = df.iloc[:, 0].tolist()
    
    # Find missing agents
    missing_agents = [agent for agent in agents_list if agent not in existing_agents]
    
    # Add rows for missing agents with zero values
    for agent in missing_agents:
        zero_row = pd.DataFrame([[agent] + [0] * (df.shape[1] - 1)], columns=df.columns)
        df = pd.concat([df, zero_row], ignore_index=True)
    
    # Remove agents not in the agents list
    df = df[df.iloc[:, 0].isin(agents_list)]
    
    print(f"Added {len(missing_agents)} missing agents and removed {len(existing_agents) - len(df)} agents not in the list.")
    return df

# Apply the function to each DataFrame
active_df = ensure_all_agents(active_df, agents_list)
canceled_df = ensure_all_agents(canceled_df, agents_list)
pending_signature_df = ensure_all_agents(pending_signature_df, agents_list)
net_df = ensure_all_agents(net_df, agents_list)

# Display the updated DataFrames
print("Active DataFrame after ensuring all agents:")
display(active_df.head())

print("Canceled DataFrame after ensuring all agents:")
display(canceled_df.head())

print("Pending Signature DataFrame after ensuring all agents:")
display(pending_signature_df.head())

print("Net DataFrame after ensuring all agents:")
display(net_df.head())

Added 10 missing agents and removed 3 agents not in the list.
Added 22 missing agents and removed -9 agents not in the list.
Added 28 missing agents and removed -19 agents not in the list.
Added 10 missing agents and removed 1 agents not in the list.
Active DataFrame after ensuring all agents:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
1,albaaraujo@originaltelecom.es,11.0,8.0,0.0,0.0,0.0,19
2,albertocanto@originaltelecom.es,9.0,9.0,0.0,0.0,0.0,18
3,albertosanchez@originaltelecom.es,17.0,11.0,0.0,0.0,0.0,28
4,anasanchez@originaltelecom.es,0.0,0.0,24.0,19.0,0.0,43
5,antonio.reina@originaltelecom.es,11.0,11.0,0.0,0.0,5.0,27


Canceled DataFrame after ensuring all agents:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
1,albertocanto@originaltelecom.es,0.0,5.0,0.0,0.0,0.0,5
2,albertosanchez@originaltelecom.es,2.0,0.0,0.0,0.0,0.0,2
3,anasanchez@originaltelecom.es,0.0,0.0,2.0,0.0,0.0,2
4,antonio.reina@originaltelecom.es,0.0,5.0,0.0,0.0,0.0,5
5,azahara.garcia@originaltelecom.es,0.0,0.0,0.0,3.0,0.0,3


Pending Signature DataFrame after ensuring all agents:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
1,albaaraujo@originaltelecom.es,0.0,1.0,0.0,0.0,0.0,1
2,albertocanto@originaltelecom.es,4.0,0.0,0.0,0.0,0.0,4
3,antonio.reina@originaltelecom.es,0.0,1.0,0.0,0.0,0.0,1
4,azahara.garcia@originaltelecom.es,0.0,0.0,0.0,1.0,0.0,1
5,carolinafuentes@originaltelecom.es,2.0,0.0,0.0,0.0,0.0,2


Net DataFrame after ensuring all agents:


Unnamed: 0,Etiquetas de fila,2024-08-01 00:00:00,2024-08-02 00:00:00,2024-08-03 00:00:00,2024-08-04 00:00:00,2024-08-05 00:00:00,Total general
1,albaaraujo@originaltelecom.es,7.0,3.0,0.0,0.0,0.0,10
2,albertocanto@originaltelecom.es,6.0,5.0,0.0,0.0,0.0,11
3,albertosanchez@originaltelecom.es,11.0,5.0,0.0,0.0,0.0,16
4,anasanchez@originaltelecom.es,0.0,0.0,7.0,6.0,0.0,13
5,antonio.reina@originaltelecom.es,7.0,5.0,0.0,0.0,2.0,14
