# Ungraded Lab: AI-Assisted Documentation Lab

## Overview 
In this hands-on lab, you'll harness the power of AI to enhance your code documentation and version control practices. Using functions from your previous analysis work, you'll learn to generate, evaluate, and refine documentation using ChatGPT, making your code more maintainable and collaborative.

<b>Remember:</b> The screencast covers similar examples – feel free to pause your work and review the relevant section if you need guidance.

## Learning Outcomes 
By the end of this lab, you will be able to:
- Generate clear function documentation using AI tools
- Evaluate and refine AI-generated documentation for accuracy
- Create effective commit messages using AI assistance
- Ensure your Jupyter notebook is ready for documentation updates

## Activities
### Activity 1: Basic Function Documentation

<b>Step 1:</b> Select a Function for Documentation
- Open one of your previous lab's notebook
- Identify a data cleaning or analysis function to document

In [None]:
import pandas as pd

# Example function to document:
def process_employee_data(df):
     # Your existing code here

<b>Tip:</b> Do not worry about loading any dataframe for now. Here, you are required to create a function, which you can call later after loading a data file.

<b>Step 2:</b> Generate Initial Documentation

- Copy your function code 
- Ask ChatGPT: "Generate Python docstring documentation for this function following Google style guide."

In [None]:
# ChatGPT Prompt: "Generate Python docstring documentation for this function following Google style guide."

### Activity 2: Refining AI-Generated Documentation

<b>Step 1:</b> Review and Enhance
- Paste the AI-generated documentation below 
- Evaluate the documentation for accuracy and completeness

In [None]:
# Paste the AI-generated documentation here.


<b>Step 2:</b> Add Examples

In [None]:
# ChatGPT Prompt: "Add usage examples to this documentation"

<b>Tip:</b> Always verify that the AI-generated examples match your function's actual behavior.

### Activity 3: Test your Code & Commit Message Generation

<b>Step 1:</b> Prepare Changes Summary
- Test the function using employee_insights_cleaned.csv
- List the documentation changes you've made
- Use ChatGPT to generate a commit message

In [None]:
# Test the function using employee_insights_cleaned.csv

In [None]:
# Testing the function
df = pd.read_csv('employee_insights_cleaned.csv')

# your code to test your functions 

# ChatGPT Prompt Template:

"""
I've made these changes to my code:
1. Added documentation to process_employee_data function
2. Included usage examples
Generate a clear, concise git commit message.
"""
# AI Generated Commit Message:

<b>Test Your Work:</b>
1. Run documented functions to verify accuracy
2. Review documentation with a fresh perspective
3. Verify commit message clarity

## Success Checklist
- Functions have clear, complete docstrings
- Documentation includes practical examples
- Commit messages are descriptive and follow best practices
- AI-generated content has been reviewed and refined

## Common Issues & Solutions 
- Problem: AI generates incorrect parameter descriptions 
    - Solution: Provide more context about your data and function purpose in the prompt
- Problem: Generated examples don't match your use case 
    - Solution: Specify your exact data types and expected outputs in the prompt
    
## Summary
Congratulations on completing this AI-assisted documentation lab! You've learned to leverage ChatGPT for generating and refining code documentation while maintaining high standards for clarity and accuracy. This powerful combination of AI assistance and human oversight will help you create more maintainable, collaborative code in your data science projects.

### Key Points
- AI can accelerate documentation writing but requires human verification
- Good prompts lead to better AI-generated content
- Documentation should always prioritize clarity and accuracy
- Combine AI assistance with human expertise for best results

## Solution Code
Stuck on your code or want to check your solution? Here's a complete reference implementation to guide you. This represents just one effective approach—try solving independently first, then use this to overcome obstacles or compare techniques. The solution is provided to help you move forward and explore alternative approaches to achieve the same results. Happy coding!

### Activity 1: Basic Function Documentation - Solution Code

In [None]:
# Original function without AI-generated documentation
def process_employee_data(df):
    # Add the documentation generated by ChatGPT here

    # Clean column names
    df.columns = df.columns.str.lower().str.replace(' ', '_')
    
    # Handle missing values
    df['salary'] = df['salary'].fillna(df['salary'].mean())
    df['department'] = df['department'].fillna('Unknown')
    
    # Standardize categories
    df['work_mode'] = df['work_mode'].str.upper()
    
    return df

### Activity 2: Refining AI-Generated Documentation - Solution Code

In [None]:
# Step 1: Refined Documentation
def process_employee_data(df):
    """
    Clean and standardize employee data in a pandas DataFrame.
    
    Args:
        df (pandas.DataFrame): Input DataFrame containing employee data with columns
            including 'salary', 'department', and 'work_mode'.
    
    Returns:
        pandas.DataFrame: Cleaned DataFrame with standardized column names,
            filled missing values, and uppercase work modes.
  """
    # Verify required columns
    required_cols = {'salary', 'department', 'work_mode'}
    if not required_cols.issubset(set(df.columns.str.lower())):
        raise ValueError(f"DataFrame must contain columns: {required_cols}")
    
    # Clean column names
    df.columns = df.columns.str.lower().str.replace(' ', '_')
    
    # Handle missing values
    df['salary'] = df['salary'].fillna(df['salary'].mean())
    df['department'] = df['department'].fillna('Unknown')
    
    # Standardize categories
    df['work_mode'] = df['work_mode'].str.upper()
    
    return df

# Step 2: Refined Documentation with Examples
def process_employee_data(df):
    """
    Cleans and standardizes employee data in a DataFrame.

    This function performs the following operations:
    - Converts column names to lowercase and replaces spaces with underscores.
    - Fills missing values in the 'salary' column with the mean salary.
    - Fills missing values in the 'department' column with 'Unknown'.
    - Standardizes the 'work_mode' column by converting its values to uppercase.

    Args:
        df (pandas.DataFrame): A DataFrame containing employee data with at least
            the columns 'salary', 'department', and 'work_mode'.

    Returns:
        pandas.DataFrame: The cleaned and standardized employee DataFrame.

    Examples:
        >>> import pandas as pd
        >>> data = {
        ...     'Salary': [50000, None, 70000],
        ...     'Department': ['HR', None, 'Engineering'],
        ...     'Work Mode': ['remote', 'Hybrid', 'Onsite']
        ... }
        >>> df = pd.DataFrame(data)
        >>> clean_df = process_employee_data(df)
        >>> print(clean_df)
           salary   department work_mode
        0  50000.0           HR    REMOTE
        1  60000.0      Unknown    HYBRID
        2  70000.0  Engineering    ONSITE
    """

    # Clean column names
    df.columns = df.columns.str.lower().str.replace(' ', '_')
    
    # Handle missing values
    df['salary'] = df['salary'].fillna(df['salary'].mean())
    df['department'] = df['department'].fillna('Unknown')
    
    # Standardize categories
    df['work_mode'] = df['work_mode'].str.upper()
    
    return df

### Activity 3: Commit Message Generation - Solution Code

In [None]:
# Testing the function
import pandas as pd

df = pd.read_csv('employee_insights_cleaned.csv')

df_process = process_employee_data(df)
df_process.head()

# AI Generated Commit Message:
"Here’s a clear and concise Git commit message for your changes:"

"Add docstring and usage examples to process_employee_data function"