### 🧭 __Action Plan: Ineffective Operator Analysis for CallMeMaybe__

1. Introduction & Problem Definition   
__Objective__: Identify ineffective operators based on missed calls, long wait times, and low outbound activity.   

* Why it matters:

    - Helps supervisors optimize team performance.
    - Reduces customer dissatisfaction due to missed or delayed calls.
    - Enables data-driven decisions for staffing and training.

2. Library Setup   
__Action__: Import essential libraries (e.g., pandas, numpy, matplotlib, seaborn, scipy, plotly, dash).

* Justification:

    - Ensures access to powerful tools for data manipulation, visualization, and statistical analysis.
    - Promotes modular, readable, and efficient code.

3. Custom Function Definitions   
__Action__: Define reusable functions for cleaning, plotting, and analysis.

* Justification:

    - Improves code reusability and maintainability.
    - Speeds up repetitive tasks and enforces consistency across analysis steps.

4. Data Loading   
__Action__: Load telecom_dataset_us.csv and telecom_clients_us.csv.

* Justification:

    - Centralizes all relevant data for analysis.
    - Enables merging and cross-referencing between operator activity and client metadata.

5. Data Cleanup   
__Action__:

- Convert column names to snake_case.
- Remove duplicates.
- Handle missing values via imputation or deletion.
- Visualize nulls using heatmaps.

* Justification:

    - Standardizes data for easier manipulation.
    - Prevents bias or errors due to incomplete or redundant records.
    - Heatmaps help identify patterns in missingness.

6. Data Type Casting   
__Action__: Convert columns to appropriate types (e.g., dates, booleans, categories).

* Justification:

    - Enables accurate filtering, grouping, and time-based analysis.
    - Reduces memory usage and improves performance.

7. Exploratory Data Analysis (EDA) – Descriptive Statistics   
__Action__:

- Use .describe() for quantitative and qualitative variables.
- Analyze mean, median, skewness, coefficient of variation.
- Visualize distributions with histograms and boxplots.

* Justification:

    - Reveals central tendencies and variability.
    - Identifies outliers and data quality issues.
    - Supports hypothesis generation and feature engineering.

8. Feature Engineering   
__Action__: Create new features to capture operator efficiency, call ratios, missed call rates, etc.

* Justification:

    - Enhances model interpretability and predictive power.
    - Simplifies identification of trends and inefficiencies.
    - Tailors metrics to business goals.

9. Outlier Processing   
__Action__: Detect and handle outliers using statistical thresholds or domain logic.

* Justification:

    - Prevents distortion of statistical results and visualizations.
    - Improves robustness of models and insights.

10. Cleaned EDA & Distribution Analysis   
__Action__:

- Re-run EDA on cleaned data.
- Use QQ plots to assess normality.

* Justification:

    - Validates data quality post-cleanup.
    - Guides choice of statistical tests (parametric vs non-parametric).

11. Operator Efficiency Visualization   
__Action__:

- Build a dedicated dataframe for operator metrics.
- Define and calculate efficiency score.
- Visualize trends using bar charts, scatter plots, etc.

* Justification:

    - Makes inefficiencies visible and actionable.
    - Supports ranking and segmentation of operators.
    - Facilitates stakeholder communication.

12. Supporting Visualizations with Raw Data   
__Action__: Use original data to reinforce findings and provide context.

* Justification:

    - Ensures transparency and traceability.
    - Helps validate engineered features against raw behavior.

13. Statistical Inference   
__Action__:

- Formulate hypotheses .
    - The average wait time is equal between incoming and outgoing calls
    - The proportion of missed calls is equal between tariff A and tariff C
    - The average number of missed calls is the same on all days of the week
- Apply appropriate tests (t-test, ANOVA, z-test).

* Justification:

    - Quantifies significance of observed patterns.
    - Supports data-driven recommendations with statistical rigor.

14. Dashboard Creation (Dash)   
__Action__: Build an interactive dashboard to display key metrics and visualizations.

* Justification:

    - Empowers supervisors with real-time insights.
    - Enhances usability and decision-making.
    - Facilitates stakeholder engagement and feedback.

15. Data Export for Tableau   
__Action__: Save cleaned and enriched datasets for external visualization tools.

* Justification:

    - Enables advanced reporting and storytelling.
    - Supports integration with enterprise BI platforms.

### 🧭 __Telecommunications: Identifying ineffective operators__

The virtual phone service CallMeMaybe is developing a new feature that will provide supervisors with insight into the least effective operators. An operator is considered ineffective if they have a high number of missed incoming calls (internal and external) and a long wait time for incoming calls. Furthermore, if an operator is supposed to make outgoing calls, a low number of them will also be a sign of ineffectiveness.

- Conduct exploratory data analysis
- Identify ineffective operators
- Test statistical hypotheses

#### 🧾 __Data Dictionary__

The datasets contain information about the use of the CallMeMaybe virtual phone service. Its customers are organizations that need to distribute large numbers of incoming calls among multiple carriers or make outgoing calls through their carriers. Carriers can also make internal calls to communicate with each other. These calls are made through the CallMeMaybe network.

The compressed dataset `telecom_dataset_us.csv` contains the following columns:

- `user_id`: Customer account ID
- `date`: Date statistics were retrieved
- `direction`: Call direction (`out` for outgoing, `in` for incoming)
- `internal`: Whether the call was internal (between a customer's operators)
- `operator_id`: Operator ID
- `is_missed_call`: Whether the call was missed
- `calls_count`: Number of calls
- `call_duration`: Call duration (excluding hold time)
- `total_call_duration`: Call duration (including hold time)

The `telecom_clients_us.csv` dataset has the following columns:

- `user_id`: User ID
- `tariff_plan`: Customer's current rate
- `date_start`: customer registration date

### 💻 __1. Libraries__

### 🛠️ __2. Functions__

### 🔁 __3. Data Loading__

In [None]:
# Load Dataset from telecom_dataset_new.csv

In [None]:
# Load Dataset from telecom_clients.csv

##### `LSPL`

**_Note_:**

`"keep_default_na=False"` is used so that missing values ​​are later converted to "pd.NA". This is convenient because "pd.NA" provides:

- Consistency between data types
- Preservation of type integrity
- Cleaner logical operations
- Better control over missing data.

Since high performance and heavy computation are not required, it is appropriate to use "pd.NA".

### 🧹 __4. Data Cleanup__

In [None]:
# Show Dataframe from telecom_dataset_new.csv using .info()

In [None]:
# Show Dataframe from telecom_clients.csv using .info()

##### **4.1** Normalize String data

In [None]:
# Standardize dataframe's titles and data with dtype object or string as snake_case dataframe from telecom_dataset_new.csv

In [None]:
# Standardize dataframe's titles and data with dtype object or string as snake_case dataframe from telecom_clients.csv

##### **4.2** Explicit Duplicate Removal

In [None]:
# Checking for explicit duplicate values ​​in a DataFrame


##### **4.3** Missing values processing

In [None]:
# Check missing values for telecom_dataset_new.csv


In [None]:
# Check missing values for telecom_clients.csv

In [None]:
# Set missing values to pd.NA for telecom_dataset_new.csv


In [None]:
# Show missing values rate for telecom_clients.csv


In [None]:
# Show missing values heatmap for telecom_dataset_new.csv


In [None]:
# Show missing values heatmap for telecom_clients.csv

##### `LSPL`

**_Note_:**   
Describe how to handle missing values data imputation


In [None]:
# Check missing values for df_telecom_data


### 📦 __5. Casting Data types__

In [None]:
# Cast into datetime


In [None]:
# Cast into category



In [None]:
# Cast into boolean



In [None]:
# Double check casting performed correctly


### 📚 __6. EDA Descriptive Statistics__

##### **6.1** Descriptive statistics for quantitative data

In [None]:
# Show descriptive statistics for quantitative data with .describe() for telecom_dataset_new.csv

In [None]:
# Show descriptive statistics for quantitative data with .describe() for telecom_clients.csv

In [None]:
# Evaluate central trend values such as  mean, median, coefficient of variation, skewness for  telecom_dataset_new.csv


In [None]:
# Evaluate central trend values such as  mean, median, coefficient of variation, skewness for  telecom_clients.csv

In [None]:
# Plot Histogram for telecom_dataset_new.csv quantitative data


In [None]:
# Plot Boxplot for telecom_dataset_new.csv quantitative data so that outliers can be found easily


In [None]:
# Plot Histogram for telecom_clients.csv quantitative data

In [None]:
# Plot Boxplot for telecom_clients.csv quantitative data so that outliers can be found easily

##### **6.2** Descriptive statistics for qualitative data

In [None]:
# Show descriptive statistics for qualitative data with .describe(include=['object', 'boolean', 'category']) for telecom_dataset_new.csv

In [None]:
# Plot histogram for qualitative data in order to show data distribution for telecom_dataset_new.csv


In [None]:
# Show descriptive statistics for qualitative data with .describe(include=['object', 'boolean', 'category']) for telecom_clientscsv

In [None]:
# Plot histogram for qualitative data in order to show data distribution for telecom_clients.csv

### 🛠️ __7. Feature engineering__

In [None]:
# Enrich Data as needed in order to ease the findings of trends and insights

### 🛠️ __8. Outliers__

In [None]:
# Show Outliers for telecom_dataset_new.csv


In [None]:
# Show Outliers for telecom_clients.csv

`LSPL`

__Note:__   

Describe any actions for outliers data imputation or deletion according to the data findings


In [None]:
# Get rid off or imputate invalid data and fit outliers


### 🛠️ __9. EDA - Processed Dataset__

In [None]:
# Show related histograms after outliers imputation or deletion (work with relevant data) for telecom_dataset_new.csv


In [None]:
# Show a QQ Plot so that it can be visualized easily if the distribution of the data is normal or not telecom_dataset_new.csv


In [None]:
# Show related histograms after outliers imputation or deletion (work with relevant data) for telecom_clients.csv

In [None]:
# Show a QQ Plot so that it can be visualized easily if the distribution of the data is normal or not telecom_clients.csv

### 📊 __10. EDA - Data Visualization__

##### **10.1** Operators Efficiency

In [None]:
# build an Operator efficiency dataframe


`LSPL`

__Note:__ Efficiency calculation

Describe how ill Operator Efficiency calculation wil be handled and shown

In [None]:
# Apply Opperators efficiency calculation to the Dataset


In [None]:
# Plota a graph to show Operator´s efficiency


##### **10.2** Operators Efficiency EDA Visualizations

In [None]:
# Prepare data for Operator efficiency visualizations to support the Opperator efficiency calculation previously generated


In [None]:
# Plot a graph to ease insights and trends with ease



### 🧪 __11. Inferential Statistics__

##### **11.1** Propose Hypotheses regarding the Operators and users calling activity.


In [None]:
# Process and show Hypotheses



### 📊 __12. Dashboard__

In [None]:
# Create a Dashboard on Dash and Tableau based on Projects requirements and advices

### __13. Generate a new clean Data set .csv file__

In [None]:
# Save cleaned and transfomed dataset to csv file in order to be used with Tableau
