# Descriptive statistics:
2.1. Calculate summary statistics (mean, median, mode, range) for numerical fields.

2.2. Analyze categorical fields such as quality, stage, source, and product.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import plotly.graph_objects as go

In [None]:
deals = pd.read_pickle('deals_df.pkl')
calls = pd.read_pickle('calls_df.pkl')
contacts = pd.read_pickle('contacts_df.pkl')
spend = pd.read_pickle('spend_df.pkl')

In [None]:
deals.info()

# Function for analyzing and visualizing numerical fields

In [None]:
def analyze_numeric_field(df, columns, df_name):
    print(f"\nAnalysis of numeric fields in the dataframe '{df_name}':")

    for col in columns:
        print(f"Column: {col} ")
        plt.figure(figsize=(15, 4))

        # Histogram
        plt.subplot(1, 2, 1)
        df[col].hist(grid=False, edgecolor='black')
        plt.xlabel(col)
        plt.ylabel('Count')
        plt.title(f'Histogram: {col} ')

        # Boxplot
        plt.subplot(1, 2, 2)
        sns.boxplot(x=df[col])
        plt.xlabel(col)
        plt.title(f'Boxplot: {col} ')

        plt.tight_layout()
        plt.show()

# Function for analyzing and visualizing a categorical field

In [None]:
def analyze_categorical_field(df, column, df_name, plot=True):
    print(f"\n Analysis of the categorical field '{column}' in the DataFrame {df_name}:")

    # Counting frequencies
    freq = df[column].value_counts()
    total_count = freq.sum()
    percentages = (freq / total_count * 100).round(1)

    # Visualization
    if plot:
        fig = go.Figure()

        # Adding a bar chart with counts and percentages
        fig.add_trace(go.Bar(
            x=freq.index,
            y=freq.values,
            text=[f'{count} <br><span style="color:red"> {percent} %</span>' for count, percent in zip(freq.values, percentages)],
            textposition='outside',
            name='Field Values',
            marker_color='#1f77b4',
            hovertemplate='Value: %{x}<br>Count: %{y}<br>Percentage: %{text}<extra></extra>'
        ))

        # Setting up the layout
        fig.update_layout(
            title=f"Distribution of values in the field '{column}' ({df_name})",
            xaxis_title=column,
            yaxis_title="Count",
            bargap=0.1
        )
        fig.show()


# Сontacts

In [None]:
contacts.info()

In [None]:
display(contacts.describe())

## Analysis for numerical fields is not advisable for the 'Contacts' dataframe due to the absence of relevant numerical indicators:  
The table contains only one numerical column: 'Id', which represents unique identifiers, presumably generated automatically. The distribution appears to be uniform, with no anomalies.

## Analysis of categorical fields in the 'Contacts' DataFrame

In [None]:
print("\n=== Analysis of categorical fields in'Contacts' ===")
categorical_fields_contacts = ['Contact Owner Name']
for field in categorical_fields_contacts:
    analyze_categorical_field(contacts, field, "contacts")

### Conclusions:

1. The distribution of contacts among the individuals responsible for contact management (OL) is uneven. There are several OLs with a very large number of contacts (for example, Charlie Davis, Ulysses Adams, Julia Nelson), while most OLs have significantly fewer contacts.

2. This may indicate that the primary workload for client interactions falls on a small group of the most active OLs. The remaining OLs are either less involved in the process or have other responsibilities.

3. Such a concentration of contacts among a few Responsible Persons may create risks of overload and a decline in service quality. It is necessary to analyze the reasons and, if needed, redistribute the workload more evenly.

4. It is also important to understand whether this distribution of contacts aligns with the company's strategies and goals for client management. A review of the approaches to assigning OLs and distributing responsibilities may be required.

Overall, the data indicates the need for a more detailed analysis of the reasons for such a distribution of contacts among OLs and the search for opportunities to optimize this process.

# Calls

In [None]:
calls.info()

In [None]:
display(calls.describe())

## Analysis of numerical fields in the 'Calls' dataframe

**General Information**
Number of records: 95,874.0 — this is a significant volume of data that ensures the reliability of statistical conclusions.  
Data period: Call Start Time ranges from June 30, 2023, to June 21, 2024, covering approximately 12 months (about a year). This indicates a complete annual cycle of data and allows for the analysis of seasonality or trends over time.  
-**Id**,**CONTACTID**: these are identifiers likely generated automatically. The distribution appears uniform, without anomalies.  
-**Call Duration (in seconds)**:  
  -**Mean**: 164.83 seconds (about 2 minutes and 45 seconds) — most calls are relatively short, but there are also longer ones.  
  -**Minimum (min)**: 0.0 seconds — may indicate missed, canceled, or failed calls.  
  -**Maximum (max)**: 7,625.0 seconds (about 2 hours and 7 minutes).  
  -**Quantiles**:  
    - 25% of calls lasted less than 4.0 seconds,  
    - 50% (median) — 8.0 seconds, which may indicate a large number of missed, canceled, or quickly ended calls,  
    - 75% of calls lasted less than 97.0 seconds (1 minute and 37 seconds).  
  -**Standard Deviation (std)**: 401.27 seconds — significant, indicating the presence of long calls that differ greatly from the majority.

Calls lasting more than 3000 seconds should be considered anomalous. To understand the nature of abnormally long calls, the following analysis methods can be used:
Building a histogram of call duration distribution
Identifying outliers using methods like box plots

In [None]:
columns = ['Call Duration (in seconds)']
analyze_numeric_field(calls, columns, 'Calls')

### Analysis of Long Calls:  
I am studying the records of calls that last more than 5400 seconds (1.5 hours)

In [None]:
import pandas as pd

df = calls
df_cleaned = df[df['Call Duration (in seconds)'] <= 1368.633444]
print(f"Number of records before outlier removal: {len(df)}")
print(f"Number of records after outlier removal: {len(df_cleaned)}")

print("\nDescriptive statistics of the cleaned data:")
print(df_cleaned['Call Duration (in seconds)'].describe())

In [None]:
# Conversion of duration to minutes for readability
calls['Call Duration (in minutes)'] = calls['Call Duration (in seconds)'] / 60

print(calls[calls['Call Duration (in seconds)'] > 5400][['Call Type', 'Call Duration (in minutes)', 'Call Status', 'Call Start Time', 'Call Owner Name', 'Scheduled in CRM', 'Outgoing Call Status']])

There are 7 calls that have very long durations (over 1.5 hours each). These unusually long calls may indicate complex cases that require detailed discussion or potential issues with call completion.  
-**Call Types:**6 out of 7 calls are outgoing, and one is incoming. This may suggest that the company actively initiates lengthy conversations with clients. The outgoing calls have a status of "Completed," indicating successful completion. The incoming call has a status of "Unknown," assigned due to omissions during data cleaning.  
-**Call Owners:**The calls belong to various employees: Sam Young, Eva Kent, John Doe, Charlie Davis, and Victor Barnes. This may indicate that these lengthy calls were conducted by different specialists within the organization.  
-**Conclusion:**
Conduct an analysis of the reasons for such long calls. Investigate whether the lengthy calls are associated with specific products or services that may require improvement or additional documentation.

## Analysis of categorical fields in the dataframe calls

In [None]:
print("\n=== Analysis of categorical fields in Calls ===")
categorical_fields_calls = ['Call Owner Name', 'Call Type', 'Call Status', 'Outgoing Call Status', 'Scheduled in CRM']
for field in categorical_fields_calls:
    analyze_categorical_field(calls, field, "calls")

### Выводы:
Исходя из представленных диаграмм, можно сделать следующие выводы

- Анализ распределения значений полей "Call Status" и "Call Type" показывает, что большинство звонков были успешно совершены. Так, 76,0% звонков имеют статус "Attended Dialled", что означает, что они были успешно установлены и завершены. Кроме того, 91,4% звонков относятся к категории "Outbound", то есть были исходящими.

- Вместе с тем, присутствует значительная доля неотвеченных звонков - 16,7% имеют статус "Unattended Dialled", что говорит о том, что вызываемая сторона не ответила на эти звонки. Также 6,2% звонков были пропущены и имеют статус "Missed".

- Положительным моментом является то, что лишь небольшая доля звонков была просрочена или отложена - 0,0% имеют статусы "Overdue" и "Scheduled Unattended Delay".

- По распределению поля "Call Owner Name" видно, что большая часть звонков (около 39%) обрабатывается всего 5 самыми активными операторами, в то время как остальные операторы задействованы значительно меньше.

- В целом, можно сделать вывод, что система обработки звонков в основном функционирует эффективно, но требует некоторых улучшений для снижения доли неотвеченных и пропущенных звонков.

# Spend

In [None]:
spend.info()

In [None]:
display(spend.describe())

# Analysis of Numerical Fields in the Spend DataFrame

**Number of records**- 19,862 — a sufficiently large volume of data for reliable analysis.  
**Data period**- From July 3, 2023, to June 21, 2024 (approximately 12 months), covering a complete annual cycle.  
**Average date**- January 10, 2024, 18:21:56 — the central point of the data, indicating an even distribution of records over time with a possible skew towards the end of 2023 and the beginning of 2024.

-**Impressions Field (Number of Ad Impressions)**
  - Average: 2,571.70 — the average value indicates a moderate number of impressions.  
  - Median: 82.00 — significantly lower than the average, indicating a skewed distribution with outliers.  
  - Minimum: 0 — there may be records with no impressions (e.g., inactive campaigns).  
  - Maximum: 431,445.00 — a very high figure, indicating the presence of large campaigns or outliers.  
  - Quantiles: 25% — 1, 50% — 82, 75% — 760.75 — 75% of the data have impressions of less than 760.75, confirming the presence of outliers.  
  - Standard Deviation: 11,691.23 — a high value, highlighting significant deviations from the mean and the presence of anomalies.  
  - Range: 431,445.00 - 0 = 431,445.00.  
**Conclusion**: The distribution is heavily right-skewed due to large campaigns (outliers around 431,445 impressions). Most campaigns have low or moderate metrics (up to 760.75), which requires filtering outliers for typical value analysis.

-**Spend Field (Advertising Costs)**
  - Average: 7.53 — the average expenditure amount is relatively low.  
  - Median: 0.74 — even lower than the average, indicating a predominance of campaigns with minimal costs.  
  - Minimum: 0.00 — there may be records with no costs (inactive or test campaigns).  
  - Maximum: 774.00 — a significant amount, indicating large investments.  
  - Quantiles: 25% — 0.00, 50% — 0.74, 75% — 6.16 — 75% of records have costs of less than 6.16, confirming a low level of typical expenditures.  
  - Standard Deviation: 27.33 — a high value relative to the mean, indicating the presence of outliers.  
  - Range: 774.00 - 0.00 = 774.00.  
**Conclusion**: Expenditures have a skewed distribution with a predominance of low values (median 0.74) and rare large expenses (up to 774). This may indicate a testing strategy or a focus on small campaigns with infrequent large investments.

-**Clicks Field (Number of Clicks)**
  - Average: 25.10 — the average number of clicks is moderate.  
  - Median: 2.00 — significantly lower than the average, indicating a skewed distribution.  
  - Minimum: 0.00 — there may be records with no clicks (e.g., impressions without interaction).  
  - Maximum: 2,415.00 — a very high figure, indicating abnormally successful campaigns.  
  - Quantiles: 25% — 0.00, 50% — 2.00, 75% — 13.00 — 75% of records have fewer than 13 clicks.  
  - Standard Deviation: 87.03 — a high value, confirming the presence of outliers.  
  - Range: 2,415.00 - 0.00 = 2,415.00.  
**Conclusion**: The distribution of clicks is heavily right-skewed due to rare campaigns with a high number of clicks (up to 2,415). Most campaigns have low activity (median 2), which requires an analysis of effectiveness.

**Relationship Between Fields**
- CTR (Click-Through Rate): Average CTR = (25.10 / 2,571.70) * 100 ≈ 0.98%, which is below typical values (1-2%). This may indicate low user engagement.  
- Cost Per Click (CPC): Average cost = 7.53 / 25.10 ≈ 0.30, which is relatively low for advertising campaigns but requires checking for outliers.  
**Conclusion**: Low CTR and low CPC may be associated with the dominance of campaigns with zero or minimal clicks and costs, necessitating data segmentation.  
**Summary**:  
The data show an uneven distribution of advertising activity with a predominance of low metrics and rare large campaigns. The low CTR and CPC require further analysis of effectiveness, and outliers and zero values need to be checked.

#### Histogram
-**Objective**: To study the distribution of values for ` Impressions `, ` Spend `, and ` Clicks `.  
-**Reason**: The data has a high standard deviation and skewness (the median is significantly lower than the mean).  
-**What to look for**: Peaks of the distribution, presence of outliers (for example, values >75th percentile), proportion of zero values.  

I use ` sns.histplot() ` with a logarithmic scale for ` Impressions ` and ` Clicks ` to smooth the impact of large values (for example, 431445 and 2415).

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, col in zip(axes, ['Impressions', 'Spend', 'Clicks']):
  sns.histplot(data=spend, x=col, ax=ax, log_scale=True)
  ax.set_title(f'Distribution {col}')
plt.tight_layout()
plt.show()

#### Correlation Heatmap
-**Purpose**: To evaluate the correlation between ` Impressions `, ` Spend `, and ` Clicks `.  
-**Reason**: Low CTR and CPC require checking for linear dependencies.  
-**What to look for**: Strong correlations, especially between ` Spend ` and ` Clicks `.  

In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(spend[['Impressions', 'Spend', 'Clicks']].corr(), annot=True, cmap='YlOrRd', vmin=-1, vmax=1)
plt.title('Correlation between Impressions, Spend и Clicks')
plt.tight_layout()
plt.show()

#### Bar Plot by Categories  
-**Purpose**: Analysis by sources or campaigns  
-**What to observe**: Differences in the effectiveness of sources.

In [None]:
plt.figure(figsize=(10, 6))
sorted_sources = spend.groupby('Source', observed=False)['Clicks'].sum().sort_values(ascending=False).index
sns.barplot(data=spend, x='Source', y='Clicks', estimator=sum, order=sorted_sources) 
plt.title('Суммарное количество кликов по источникам')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


### Conclusions

After analyzing the presented diagrams, the following conclusions can be made:

1. Distribution of Impressions, Spend, and Clicks:
   - The distribution of Impressions has two distinct peaks. The main peak occurs in the range of 1 to 10 impressions, where the highest number of impressions is concentrated (approximately 800–1,000). The second noticeable peak is around 100 impressions. After that, the number of impressions gradually decreases, although some campaigns reach values in the hundreds of thousands (up to 431,445).
   - The distribution of Spend shows a primary peak in the range of 0.1 to 1 unit of expenditure, where the largest number of records is found (about 800–1,000). There is also a slight increase around 10 units of expenditure. However, rare campaigns have expenditures of up to several hundred units (maximum 774, as indicated in the data). This suggests that most campaigns incur minimal costs, but there is a small number of campaigns with large budgets.
   - The distribution of Clicks demonstrates an exponential decline. The highest number of records (approximately 1,400–1,600) is concentrated in the range of 1 to 10 clicks, which constitutes the bulk of the data. After this, the number of clicks decreases rapidly, although some campaigns reach values in the thousands of clicks (maximum 2,415, as indicated in the data). This shows that most campaigns receive few clicks, but there are rare successful cases with high activity.

2. Correlation between Impressions, Spend, and Clicks:
   - Impressions and Clicks have a strong positive correlation (0.89), meaning that as the number of impressions increases, the number of clicks also increases.
   - Spend and Clicks also have a positive correlation (0.59), indicating that an increase in advertising expenditure leads to an increase in the number of clicks.
   - Impressions and Spend have a moderate positive correlation (0.53), showing that an increase in impressions is associated with an increase in advertising expenditure.

3. Distribution of metrics by sources:
   - The highest total number of clicks comes from Google Ads, followed by YouTube Ads and Facebook Ads. This indicates the effectiveness of these advertising channels.
   - Significant volumes of clicks are also generated through CRM systems, bloggers, and SMM promotion.
   - Other sources, such as TikTok Ads, Telegram, webinars, offline activities, and others, contribute significantly less to the total number of clicks.

## Analysis of categorical fields in the Spend dataframe
-**Source (Channel)**on which the advertisement was shown  
-**Campaign**within which the advertisement was shown

In [None]:

print("\n=== Analysis of categorical fields in the Spend dataframe ===")
categorical_fields_spend = ['Source', 'Campaign']
for field in categorical_fields_spend:
    analyze_categorical_field(spend, field, "spend")

### Conclusions:

Based on the analysis of the presented diagrams, the following conclusions can be made: -**Facebook Ads**is the main channel, accounting for the overwhelming majority (58.1%) of all expenses. - Among the campaigns, a few large ones stand out (for example, 'performancemax_eng_DE', 'b_DE'). However, the majority of campaigns are marked as**Unknown**, indicating the need for improvement in tracking and analytics systems. - Among the segmented campaigns, there is a noticeable focus on video formats, webinars, and targeting specific audiences (for example, recently moved individuals or the female audience -**recentlymoved**,**LAL1**, and**women**). - Overall, the expense structure shows a strong dependence on a few major channels (Facebook, TikTok, YouTube) and the need for optimization for smaller sources to enhance their effectiveness.

# Deals

In [None]:
deals.info()

In [None]:
display(deals.describe().T)

## Анализ числовых полей в датафрейме Deals

#### Conclusions:
- Total records: 21593  

**Field Closing Date**  
-**Mean value**: June 24, 2024, 19:53:23.  
-**Median (50%)**: April 13, 2024.    
-**Range (min-> max)**: from July 3, 2023, to May 5, 2025.  
-**Conclusion**: The data covers the period from July 2023 to May 2025, with a median in mid-April 2024, indicating an even distribution of records over time.  

**Field SLA (Service Level)**  
-**Mean value**: 0 days 23:08:40.  
-**Median (50%)**: 1 day 01:45:35.  
-**Range (min-> max)**: from 0 to 311 days 10:34:24.  
-**Conclusion**: Most SLA values are concentrated around the median of 1 day 01:45:35, but there are also extreme values up to 311 days, which requires further analysis.  

**Field Course duration**  
-**Mean value**: 1.69 months.  
-**Median (50%)**: 0 months.  
-**Mode**: 0 months.  
-**Range (min-> max)**: from 0 to 11 months.  
-**Conclusion**: The largest number of courses has a duration of 0 months, which may indicate a high level of dropouts or unsuccessful deals. Courses with a duration of 11 months occupy the second position in number. Courses lasting 6 months are less common.
  
**Field Months of study**  
-**Mean value**: 0.21 months  
-**Median (50%)**: 0 months.  
-**Mode**: 0 months.  
-**Range (min-> max)**: from 0 to 11 months.  
-**Conclusion**: Most students either did not start studying or their study was very short.  

**Field Initial Amount Paid**  
-**Mean value**: 184.09.  
-**Median (50%)**: 0.  
-**Mode**: 0.  
-**Range (min-> max)**: from 0 to 11500.
-**Conclusion**: Most students either did not pay the initial amount or the deal was unsuccessful.

**Field Offer Total Amount**
-**Mean value**: 1390.81
-**Median (50%)**: 0.
-**Mode**: 0.
-**Range (min-> max)**: from 0 to 11500.  
-**Conclusion**: Half of the records have a zero training cost, confirming the presence of unsuccessful deals or dropouts.  

In [None]:
for col in ['Months of study', 'Initial Amount Paid', 'Offer Total Amount', 'Course duration']:  #SLA
    print(col)
    plt.figure(figsize = (15, 4))
    plt.subplot(1, 2, 1)
    deals[col].hist(grid=False)
    plt.ylabel('count')
    plt.subplot(1, 2, 2)
    sns.boxplot(x=deals[col])
    plt.show()

## Analysis of categorical fields in the Deals DataFrame.

In [None]:
print("=== Analysis of categorical fields in the Deals DataFrame. ===")
categorical_fields_deals = ['Quality', 'Stage', 'Source', 'Product']
for field in categorical_fields_deals:
    analyze_categorical_field(deals, field, "deals")

## Conclusions

#### 1. **Distribution of the "Quality" Field Values**
- **Data:**
  - The largest number of deals belongs to the categories "E - Non Qualified" (35.4%) and "D - Non Target" (29.0%).
  - High-quality deals ("A - High" and "B - Medium") make up only 9.2%.
- **Conclusions:**
  - The predominance of "Non Qualified" and "Non Target" categories indicates inefficiency in current lead generation methods. A detailed audit of marketing channels and qualification criteria may be necessary to increase the share of high-quality leads.

---

#### 2. **Distribution of the "Stage" Field Values**
- **Data:**
  - 72.9% of deals are at the "Lost" stage.
  - Only 4% of deals are completed with payment ("Payment Done").
  - The "Call Delayed" (10.4%) and "Registered" (9.6%) stages indicate some activity.

- **Conclusions:**
  - The high percentage of lost deals (the "Lost" stage) indicates serious issues in the process of converting leads into customers. It is necessary to analyze each step in the sales funnel, especially the transitions between the "Call Delayed," "Registered on Webinar," and final payment stages.

---

#### 3. **Distribution of the "Product" Field Values**
- **Data:**
  - 83.4% of deals fall into the "Unknown" category.
  - The largest number of deals with known products is associated with "Digital Marketing" (9.2%) and "UX/UI Design" (4.7%).

- **Conclusions:**
  - The high share of the "Unknown" category indicates the need for improvements in the CRM system for product tracking.
  - The popularity of "Digital Marketing" and "UX/UI Design" products can be leveraged to strengthen marketing campaigns.

### Correlation Heatmap

**Objective:** Assess the correlation between variables related to courses: number of months of study, initial payment amount, total offer amount, and course duration.

**Reason:** Understanding the relationships between these variables can help analyze the factors affecting the financial outcomes of courses and identify potential areas for improvement.

**What to look for:** Pay attention to strong correlations, especially between variables such as "Initial Amount Paid" and "Offer Total Amount," as well as any unexpected dependencies that may indicate interesting trends or anomalies in the data.

In [None]:
# Correlation heatmap
corr_matrix = deals[['Months of study', 'Initial Amount Paid', 'Offer Total Amount', 'Course duration']].corr()
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='YlOrRd')
plt.title('Тепловая карта корреляции')
plt.show()

### Multidimensional Analysis

-**High positive correlation** between * course duration * ("Course duration") and * total offer amount * ("Offer Total Amount")

-**Average positive correlation** observed between * initial amount paid * ("Initial Amount Paid") and * total offer amount * ("Offer Total Amount")

-**Weak positive correlation** identified between * number of months of study * ("Months of study") and * initial amount paid * ("Initial Amount Paid").

# Supplementing the analysis
## Reasons for losses

**Goal:** Visualize the distribution of reasons for deal losses to identify the main issues leading to customer loss and determine areas for improvement.

**Reason:** Understanding the key reasons for deal losses will allow us to focus efforts on the most significant areas and develop effective strategies to reduce losses and improve conversion rates.

**What to look at:** Pay attention to the most common reasons for deal losses displayed as a bar chart. Special attention should be given to the reasons with the highest number of lost deals, as they represent the greatest opportunity for improvement. Also, note any unexpected or unusual reasons that may indicate interesting trends or anomalies in the data.

In [None]:
# Reasons for Loss
import matplotlib.pyplot as plt
import seaborn as sns

# Filtering deals at the 'Lost' stage
lost_deals = deals[deals['Stage'] == 'Lost']
if 'Lost Reason' in deals.columns:
    lost_reasons = lost_deals['Lost Reason'].value_counts()
    total_lost_deals = lost_deals.shape[0]
    plt.figure(figsize=(8, 6))
    
    # Creating annotations with the sum for each column
    for i, v in enumerate(lost_reasons.values):
        plt.text(v, i, str(v), color='black', ha='left', va='center')
        
    sns.barplot(x=lost_reasons.values, y=lost_reasons.index, palette='Reds_r', hue=lost_reasons.index, dodge=False, legend=False)
    plt.title(f'Distribution of Loss Reasons (Total Lost Deals: {total_lost_deals})')
    plt.xlabel('Number of Deals')
    plt.ylabel('Loss Reasons')
    plt.show()


### Conclusions from the "Distribution of Loss Reasons" Chart:

## Analysis of Loss Reasons Distribution

### Main Reason for Losses
- **"Doesn't Answer"** remains the most common reason with over 4,000 cases. This confirms a communication issue during the follow-up stage or after the first contact, which requires immediate attention.

### Significant Reasons
- **"Changed Decision"** — around 2,000 cases. This may indicate customer uncertainty or insufficient conviction in the product/service.
- **"Stopped Answering"** — about 1,500 cases. Loss of customer interest or weak engagement from the company remains relevant.
- **"Invalid Number"** — approximately 1,200 cases. This highlights a problem with the quality of lead contact data.

### Product Perception Issues
- **"Expensive"** and **"Conditions are not suitable"** — around 1,000 and 800 cases respectively. This indicates a mismatch between customer expectations and the company's offerings, particularly regarding price and terms.

### Language Barrier and Qualification
- **"Does not speak English"** — about 700 cases. The language barrier remains a significant problem.
- **"Does not know how to use a computer"** — around 400 cases. Some leads do not meet technical requirements, indicating insufficient qualification.

### Rare Reasons
- Reasons such as **"Refugee,"** **"Thought for free,"** and **"The contract did not fit"** (fewer than 200 cases) do not have a significant impact but may be considered for niche segments.

### Recommendations:
- **Critical Communication Issue:** The more than 4,000 cases of "Doesn't Answer" and 1,500 "Stopped Answering" indicate that current customer interaction processes are ineffective. This is a key growth point—improving communication could significantly reduce losses. Research is needed to understand why customers are not responding and to develop a strategy to enhance engagement.
- **Optimize Lead Qualification Process:** The high number of "Invalid Number" (1,200) and "Does not know how to use a computer" (400) indicates the need for stricter lead qualification at the attraction stage. It is recommended to verify customer contact details and their alignment with the target audience at early stages.
- **Consider Multilingual Support:** If a significant portion of customers faces a language barrier, this could greatly improve conversion rates.
- **Work on Pricing Proposal:** The reason "Expensive" may be related to the perception of the product's value. It is advisable to review pricing policy or emphasize the benefits.