### **Table of Contents**
  * [read in data](#read-in-data)
  * [Update cleaning code](#update-cleaning-code)
  * [Generate report](#generate-report)
  * [Plots](#plots)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import dash
import os
import sys

## read in data
Psudo code:
- read in all the files in the data folder 
  - accounting for them being in xlsx or csv 
- dataframe variable name should end up being file name minus extension

- This allows us to just drop in any export with any name and it should run. 

In [None]:

# for data in sys.path:
#   if data.emndswith('.xlsx') or data.endswith('.csv'):
#     df = pd.read_excel(data) if data.endswith('.xlsx') else pd.read_csv(data)
#     print(f"Data loaded from: {data}")
#     break

def load_data_folder(folder_path="data"):
    dataframes = {}

    for file in os.listdir(folder_path):
        if file.endswith(".csv") or file.endswith(".xlsx"):
            file_path = os.path.join(folder_path, file)
            file_name = os.path.splitext(file)[0]  

            if file.endswith(".csv"):
                df = pd.read_csv(file_path)
            else:
                df = pd.read_excel(file_path)

            dataframes[file_name] = df

    return dataframes

## Update cleaning code 
- Look at our cleaning code that we have. 
- we should start to make changes to it to account for this. 
- We need to make it so it so the program doesn't crash when something fails 
  - [Try Except logic updates](https://www.w3schools.com/python/python_try_except.asp)
  - make the messages mean something meaningful
- Ideally we will not drop anything from our data 


In [None]:
'''
See the functions in files:
- src/Carmen_WORCEmployment.py
- src/cleaning_enrollments_data.py
- src/cleaning.py
'''

## Generate report 

- Overall completion of program only accounting for the new style of classes m1-m4
- completion by year 
- completion over all by pathway 
- completion by year by pathway 
- Feel free to get creative here adding gender etc to get us a better understanding 
- education level and the above... 
- export this as a txt file 

## Plots 
- Look at the various plots 
- make a consistent color scheme
- pick the plots that go with the report above 
- make missing plots 
- make plots have the option to show & save in the functions

see `src/notebooks/visualization_examples.ipynb`
See below from `src/Carmen_WORCEmployment_Plots.py`

In [None]:
def plot_salary_by_gender(data):
    plt.figure(figsize=(8, 5))
    sns.boxplot(data=data, x='Gender', y='Salary')
    plt.title("Salary Distribution by Gender")
    plt.show()


def plot_avg_salary_by_city(data):
    region_salary = data.groupby('Mailing City')['Salary'].mean().sort_values()
    region_salary.plot(kind='barh', figsize=(8, 5), title="Average Salary by KY Region")
    plt.xlabel("Average Salary")
    plt.show()


def plot_placements_over_time(data):
    data.set_index('Start Date').resample('M').size().plot(kind='line', marker='o', figsize=(10, 4))
    plt.title("Number of Placements Over Time")
    plt.ylabel("Placements")
    plt.show()


def plot_placement_type_by_program(data):
    plt.figure(figsize=(10, 6))
    sns.countplot(data=data, x='ATP Placement Type', hue='Program: Program Name')
    plt.xticks(rotation=45)
    plt.title("Placement Type by Program")
    plt.show()


def plot_top_cities(data):
    city_counts = data['Mailing City'].value_counts().head(10)
    city_counts.plot(kind='bar', title='Top Cities by Participant Count', figsize=(8, 4))
    plt.ylabel("Count")
    plt.show()

TOC generator 

In [4]:
import json
import os


def generate_toc_from_notebook(notebook_path):
    """
    Parses a local .ipynb file and generates Markdown for a Table of Contents.
    """
    if not os.path.isfile(notebook_path):
        print(f"❌ Error: File not found at '{notebook_path}'")
        return

    with open(notebook_path, 'r', encoding='utf-8') as f:
        notebook = json.load(f)

    toc_markdown = "### **Table of Contents**\n"
    for cell in notebook.get('cells', []):
        if cell.get('cell_type') == 'markdown':
            for line in cell.get('source', []):
                if line.strip().startswith('#'):
                    level = line.count('#')
                    title = line.strip('#').strip()
                    link = title.lower().replace(' ', '-').strip('-.()')
                    indent = '  ' * (level - 1)
                    toc_markdown += f"{indent}* [{title}](#{link})\n"

    print("\n--- ✅ Copy the Markdown below and paste it "
          "into a new markdown cell ---\n")
    print(toc_markdown)


notebook_path = 'ideal.ipynb'
generate_toc_from_notebook(notebook_path)



--- ✅ Copy the Markdown below and paste it into a new markdown cell ---

### **Table of Contents**
  * [read in data](#read-in-data)
  * [Update cleaning code](#update-cleaning-code)
  * [Generate report](#generate-report)
  * [Plots](#plots)

