# Example Notebook

This notebook demonstrates how to use Jupyter notebooks effectively in this project template. It includes examples of:

- Loading and configuring the project environment
- Importing project modules
- Basic data analysis workflow
- Data visualization
- Best practices for notebook organization

## Setup

First, let's set up our environment and imports. We'll use the autoreload extension to automatically reload modules when they change.

In [None]:
# Standard library imports
import os
import sys
from pathlib import Path

# Add the parent directory to the path so we can import from src
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Configure autoreload extension
%load_ext autoreload
%autoreload 2

In [None]:
# Third-party imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Project imports
from src.main import load_config, process_data

# Configure plotting
%matplotlib inline
plt.style.use("ggplot")
plt.rcParams["figure.figsize"] = (10, 6)

config = load_config("path/to/config/file")

## Loading Environment Variables and Configuration

It's a good practice to load environment variables and configuration at the beginning of the notebook.

In [None]:
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Define data directories
DATA_DIR = Path('../data')
RAW_DATA_DIR = DATA_DIR / 'raw'
PROCESSED_DATA_DIR = DATA_DIR / 'processed'
SAVED_DATA_DIR = DATA_DIR / 'saved'

print(f"Raw data directory: {RAW_DATA_DIR}")
print(f"Processed data directory: {PROCESSED_DATA_DIR}")
print(f"Saved data directory: {SAVED_DATA_DIR}")

## Generate Sample Data

For this example, we'll generate some sample data to work with.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate sample time series data
dates = pd.date_range('20220101', periods=100)
values = np.cumsum(np.random.randn(100))

# Create a DataFrame
df = pd.DataFrame({
    'date': dates,
    'value': values,
    'category': np.random.choice(['A', 'B', 'C'], size=100)
})

# Display the first few rows
df.head()

## Data Exploration

Let's explore our sample data to understand its characteristics.

In [None]:
# Basic statistics
df.describe()

In [None]:
# Count values by category
category_counts = df['category'].value_counts()
print(category_counts)

# Plot the distribution
plt.figure(figsize=(8, 5))
category_counts.plot(kind='bar')
plt.title('Distribution of Categories')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

## Data Visualization

Now let's visualize the time series data.

In [None]:
plt.figure(figsize=(12, 6))

# Plot the time series
plt.plot(df['date'], df['value'], marker='o', linestyle='-', markersize=3, alpha=0.7)

# Customize the plot
plt.title('Sample Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True, alpha=0.3)

# Add a trend line
z = np.polyfit(range(len(df)), df['value'], 1)
p = np.poly1d(z)
plt.plot(df['date'], p(range(len(df))), "r--", alpha=0.8, label=f"Trend: {z[0]:.4f}x + {z[1]:.4f}")

plt.legend()
plt.tight_layout()
plt.show()

## Using Project Functions

We can also use functions from our project modules in the notebook.

In [None]:
# Extract the values to process
data_to_process = df['value'].tolist()

# Use our project's process_data function
processed_data = process_data(data_to_process, scale=2.0)

# Compare original and processed data
plt.figure(figsize=(12, 6))
plt.plot(df['date'], data_to_process, label='Original Data', alpha=0.7)
plt.plot(df['date'], processed_data, label='Processed Data (scale=2.0)', alpha=0.7)
plt.title('Original vs Processed Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Saving Results

Finally, let's save our processed data and results.

In [None]:
# Create a new DataFrame with the processed data
results_df = df.copy()
results_df['processed_value'] = processed_data

# Calculate some additional metrics
results_df['diff'] = results_df['processed_value'] - results_df['value']
results_df['pct_change'] = results_df['processed_value'].pct_change()

# Display the results
results_df.head()

In [None]:
# Save the results to a CSV file
output_path = SAVED_DATA_DIR / 'notebook_results.csv'
results_df.to_csv(output_path, index=False)
print(f"Results saved to {output_path}")

## Conclusion

This notebook demonstrated:

1. Setting up the notebook environment for the project
2. Loading configuration and environment variables
3. Data generation, exploration and visualization
4. Using project functions in the notebook
5. Saving results for further processing

For more advanced examples, check out the other notebooks in this project.