# Using Custom Functions from .py Files in Jupyter Notebooks

This notebook demonstrates how to import and use custom functions from Python files that contain Seaborn visualization code. We'll show different methods for importing and using functions from the `utils/data_cleaning_utils.py` file.

## 1. Import Required Libraries

First, import all the standard libraries needed for data analysis and visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

# Set up matplotlib and seaborn styling
plt.style.use('default')
sns.set_palette("husl")
%matplotlib inline

## 2. Import Custom Module

There are several ways to import functions from a .py file. Here are the most common methods:

In [None]:
# Method 1: Import the entire module
from utils import data_cleaning_utils

# Method 2: Import specific functions
from utils.data_cleaning_utils import analyze_missing_data, safe_drop_columns

# Method 3: Import with alias
import utils.data_cleaning_utils as dcu

# Method 4: Add the utils directory to Python path (if needed)
# sys.path.append('utils')
# import data_cleaning_utils

print("Custom module imported successfully!")
print("Available functions:", [func for func in dir(data_cleaning_utils) if not func.startswith('_')])

## 3. Load Sample Dataset

Load the coupons dataset to demonstrate how to use the custom functions.

In [None]:
# Load the coupons dataset
data = pd.read_csv('data/coupons.csv')

print(f"Dataset shape: {data.shape}")
print(f"Columns: {list(data.columns)}")
print("\nFirst few rows:")
data.head()

## 4. Use Custom Function for Data Analysis

Now let's use the `analyze_missing_data()` function from our custom module. This function analyzes missing data patterns in the dataset.

In [None]:
# Method 1: Using the imported function directly
missing_stats = analyze_missing_data(data)

print("\n" + "="*50)
print("FUNCTION OUTPUT STORED FOR FURTHER USE")
print("="*50)

## 5. Create Visualizations with Seaborn

Now let's use Seaborn to visualize the missing data analysis results from our custom function.

In [None]:
# Create visualizations using the output from our custom function
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Missing data heatmap
plt.subplot(2, 2, 1)
# Create a boolean mask for missing values
missing_mask = data.isnull()
sns.heatmap(missing_mask, cbar=True, cmap='viridis', yticklabels=False)
plt.title('Missing Data Heatmap\n(Dark = Missing, Light = Present)')
plt.xlabel('Columns')

# 2. Missing data by column (bar plot)
plt.subplot(2, 2, 2)
missing_counts = pd.Series(missing_stats['missing_by_column'])
missing_counts = missing_counts[missing_counts > 0]  # Only show columns with missing data
if len(missing_counts) > 0:
    sns.barplot(x=missing_counts.values, y=missing_counts.index, palette='viridis')
    plt.title('Missing Data Count by Column')
    plt.xlabel('Number of Missing Values')
else:
    plt.text(0.5, 0.5, 'No Missing Data Found', ha='center', va='center', transform=plt.gca().transAxes)
    plt.title('Missing Data Count by Column')

# 3. Missing data percentage by column
plt.subplot(2, 2, 3)
missing_pct = pd.Series(missing_stats['missing_by_column_pct'])
missing_pct = missing_pct[missing_pct > 0]  # Only show columns with missing data
if len(missing_pct) > 0:
    sns.barplot(x=missing_pct.values, y=missing_pct.index, palette='plasma')
    plt.title('Missing Data Percentage by Column')
    plt.xlabel('Percentage of Missing Values')
else:
    plt.text(0.5, 0.5, 'No Missing Data Found', ha='center', va='center', transform=plt.gca().transAxes)
    plt.title('Missing Data Percentage by Column')

# 4. Overall missing data summary (pie chart)
plt.subplot(2, 2, 4)
sizes = [missing_stats['complete_percentage'], missing_stats['missing_percentage']]
labels = ['Complete Data', 'Missing Data']
colors = ['lightblue', 'lightcoral']
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
plt.title('Overall Data Completeness')

plt.tight_layout()
plt.show()

## 6. Handle Function Output

Demonstrate how to work with the dictionary output from the custom function for further analysis.

In [None]:
# Working with the function output dictionary
print("Keys in the missing_stats dictionary:")
for key in missing_stats.keys():
    print(f"  - {key}")

print(f"\nTotal missing percentage: {missing_stats['missing_percentage']:.2f}%")
print(f"Complete data percentage: {missing_stats['complete_percentage']:.2f}%")

# Extract specific information
columns_with_missing = {k: v for k, v in missing_stats['missing_by_column'].items() if v > 0}
if columns_with_missing:
    print(f"\nColumns with missing data:")
    for col, count in columns_with_missing.items():
        pct = missing_stats['missing_by_column_pct'][col]
        print(f"  - {col}: {count} missing values ({pct:.1f}%)")
else:
    print("\nNo missing data found in any columns!")

# Use the output for conditional logic
if missing_stats['missing_percentage'] > 5:
    print(f"\n⚠️  WARNING: Dataset has {missing_stats['missing_percentage']:.1f}% missing data")
    print("Consider data cleaning strategies.")
else:
    print(f"\n✅ Dataset quality is good with only {missing_stats['missing_percentage']:.1f}% missing data")

## Additional Tips for Using Custom Functions

### Key Points to Remember:

1. **Import Methods**: Choose the import method that works best for your use case:
   - `from utils import data_cleaning_utils` - Import the entire module
   - `from utils.data_cleaning_utils import function_name` - Import specific functions
   - `import utils.data_cleaning_utils as alias` - Import with alias

2. **Module Reloading**: If you modify your .py file, you may need to reload the module:
   ```python
   import importlib
   importlib.reload(data_cleaning_utils)
   ```

3. **Path Issues**: If you get import errors, make sure:
   - The .py file is in the correct directory
   - The directory has an `__init__.py` file (can be empty)
   - You're running the notebook from the correct working directory

4. **Function Dependencies**: Ensure all required libraries are imported in both:
   - The notebook (for interactive use)
   - The .py file (for the function to work)

5. **Error Handling**: Custom functions should include proper error handling and documentation

This approach keeps your notebooks clean while allowing you to reuse complex analysis functions across multiple projects!