#### Exploratory Data Analysis (EDA)

Dataset: 

- _xxx_feature.csv_

Author: Luis Sergio Pastrana Lemus  
Date: 202Y-MM-DD

# Exploratory Data Analysis – xxx Activity Dataset

## __1. Libraries__.

In [None]:
from pathlib import Path
import sys

# Define project root dynamically, gets the current directory from which the notebook belongs and moves one level upper
project_root = Path.cwd().parent

# Add src to sys.path if it is not already
if str(project_root) not in sys.path:

    sys.path.append(str(project_root))

# Import function directly (more controlled than import *)
from src import *


from IPython.display import display, HTML
import os
import pandas as pd

## __2. Path to Data file__.

In [None]:
# Build route to data file and upload
data_file_path = project_root / "data" / "processed" / "feature"

df_xxx_feature = load_dataset_from_csv(data_file_path, "xxx_feature.csv", sep=',', header='infer')

## __3. Exploratory Data Analysis__.

### 3.0 Casting Data types.

In [None]:
df_ = cast_datatypes(df)

In [None]:
df_.info()

### 3.1  Descriptive Statistics.

#### 3.1.1 Descriptive statistics for Original datasets.

In [None]:
# Descriptive statistics for xxx dataset
df_xxx_feature.describe(include='all')

#### 3.1.2 Descriptive statistics for name dataset, quantitive values.

<table>
  <thead>
    <tr>
      <th>CV (%)</th>
      <th>Interpretation for Coefficient of Variation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><small><strong>0–10%</strong></small></td>
      <td><small><strong>Very low</strong> variability → <strong>very reliable</strong> Mean</small></td>
    </tr>
    <tr>
      <td><small><strong>10–20%</strong></small></td>
      <td><small><strong>Moderate</strong> variability → <strong>reliable</strong> Mean</small></td>
    </tr>
    <tr>
      <td><small><strong>20–30%</strong></small></td>
      <td><small><strong>Considerable</strong> variability → <strong>some what skewed</strong> Mean</small></td>
    </tr>
    <tr>
      <td><small><strong>>30%</strong></small></td>
      <td><small>High<strong> variability</strong> → <strong>prefer</strong> Median</small></td>
    </tr>
  </tbody>
</table>


In [None]:
df_xxx_feature['column_name'].describe()

In [None]:
# Evaluate the coefficient of variation to select the proper measure of central tendency
evaluate_central_trend(df_xxx_feature, 'column_name')

In [None]:
# Evaluate boundary thresholds and detect potential outliers
outlier_limit_bounds(df_xxx_feature, 'column_name', bound='both', clamp_zero=True)

In [None]:
# Show data distribution with detailed statistical info
plot_distribution_dispersion(df_, 'column', bins=43)

### 3.2 Data Visualization: Distributions and Relationships.

#### 3.2.1 Covariance and Correlation Analysis.

##### 3.2.1.1 Covariance Matrix.

In [None]:
# Covariance for services
df_xxx_feature[['column_name', 'column_name']].cov()

##### 3.2.1.2 Correlation Matrix.

| Correlation Value     | Interpretation                |
| --------------------- | ----------------------------- |
| `+0.7` to `+1.0`      | Strong positive correlation   |
| `+0.3` to `+0.7`      | Moderate positive correlation |
| `0.0` to `+0.3`       | Weak positive correlation     |
| `0`                   | No correlation                |
| `-0.3` to `0`         | Weak negative correlation     |
| `-0.7` to `-0.3`      | Moderate negative correlation |
| `-1.0` to `-0.7`      | Strong negative correlation   |


In [None]:
# Correlation for services
df_xxx_feature[['column_name', 'columna_name']].corr()

In [None]:
evaluate_correlation(df_xxx_feature)

In [None]:
plot_scatter_matrix(df_xxx_feature[['column_name', 'column_name']])

### 3.3 Data Visualization: Data dispersion and outliers.

3.3.1 Data dispersion and outliers for ...

In [None]:
# xxx Distribution Frequency and Frequency density
plot_frequency_density(df_xxx_feature['column_name'], bins=np.arange(min, max, step), color='grey', title='Frequency Density of name', 
                       xlabel='Name (units)', ylabel='Density', xticks_range=(min, max, step), show_kde=True, rotation=0)

In [None]:
# xxx data dispersion
plot_boxplots(ds_list=[df_xxx_feature['column_name']], xlabels=['name'], ylabel='Values', title='Name Data dispersion', 
              yticks_range=(min, max, step), rotation=0, color=['grey'])

#### 3.4 Data visualization for ...

3.4.1 Data visalization for ...

In [None]:
# Plots for insights

## 4. Conclusions and key insights

### 🎯 Key Findings

#### Behavioral Insights

- **XXX**: xxx 

#### Other Insights

- **XXX**: xxx 

### Final Takeaways

- **XXX**: xxx 

