```plaintext
Procedure createDF:
    Input: datasets (Path to the dataset)
    
    1. df = Read CSV from `datasets`, with specific type conversions
    2. Format 'POD' field to datetime
    3. Replace missing values in 'ENCODED_TYPE' with -1, and convert to integer
    4. Drop rows with missing 'ENCODED_TYPE'
    5. Replace missing values in 'RATE' with -1, and convert to float
    6. Drop rows with missing 'RATE'
    7. Convert 'ENCODED_TYPE' to integer
    8. Drop rows with missing values in df and reset index
    9. Select and rearrange columns
    10. Exclude rows where 'POD' year is 2002
    11. Sort rows by 'POD'
    
    Output: df_filtered

Procedure main:
    1. Define paths for old_data and new_data
    2. df1 = createDF(old_data)
    3. Display first few rows of df1
    4. df2 = createDF(new_data)
    5. Display first few rows of df2

```plaintext

Procedure getKeyPorts:
    INPUT: keybunch
    SET keybunch_pouch as empty list

    FOR each key in keybunch:
        IF the length of the dataframe for this key > 1000:
            SET key_row_counts[key] to length of the dataframe
    END FOR

    SET sorted_keys to keys in key_row_counts sorted in descending order of value

    FOR each key in sorted_keys:
        SET row_count to key_row_counts[key]
        PRINT "Number of rows in key: row_count"
        APPEND key to keybunch_pouch
    END FOR

    RETURN keybunch_pouch
END Procedure

PRINT "Old Dataset Keybunch:"
SET old_df to getKeyPorts(filtered_dataframe1)
PRINT "\n"

PRINT "New Dataset Keybunch:"
SET new_df to getKeyPorts(filtered_dataframe2)


```plaintext

    Perform Anderson-Darling test on 'RATE':
        PRINT Statistic
        IF Statistic < Critical Value at 5% significance level:
            PRINT "Data looks normal"
            SET Q1 to 25th percentile of 'RATE'
            SET Q3 to 75th percentile of 'RATE'
            SET IQR to Q3 - Q1
            SET lower_bound to Q1 - 1.5*IQR
            SET upper_bound to Q3 + 1.5*IQR
            REMOVE from dataframe rows where 'RATE' < lower_bound or 'RATE' > upper_bound
        ELSE:
            PRINT "Data does not look normal"
            CALCULATE z-scores for 'RATE'
            SET threshold to 3
            REMOVE from dataframe rows where absolute z-score > threshold
    
    RESET index of dataframe



```plaintext

SET robust_df to drop duplicates from robust_df based on 'POD' and 'RATE'
RESET index of robust_df

INITIALIZE new_df as an empty DataFrame
SET new_df['POD'] to a range of dates from min 'POD' in robust_df to max 'POD' in robust_df

SET df_interpolated as the result of merging new_df and robust_df on 'POD' using a 'left' merge
PERFORM polynomial interpolation on df_interpolated['RATE'] with order 1
ROUND df_interpolated['RATE'] to 3 decimal places

SET df_interpolated['YearMonthWeek'] to 'POD' - dayofweek of 'POD' in df_interpolated

SET all_weeks to a range of dates from min 'POD' in df_interpolated to max 'POD' in df_interpolated with a frequency of 1 week
SET all_weeks_df as a DataFrame with all_weeks as 'POD'
SET all_weeks_df['YearMonthWeek'] to 'POD' - dayofweek of 'POD' in all_weeks_df

SET merged_df as the result of merging all_weeks_df and df_interpolated on 'YearMonthWeek' using a 'left' merge
GROUP merged_df by 'YearMonthWeek' and SET grouped

INITIALIZE agg_df as an empty DataFrame with columns 'YearMonthWeek' and 'Rate'

FOR each group in grouped:
    SET year_month_week to group_name
    SET rate_sum to the sum of 'RATE' in group_df
    SET rate_skew to the skew of 'RATE' in group_df

    IF absolute rate_skew > 0.5:
        SET rate_metric to median of 'RATE' in group_df
    ELSE:
        SET rate_metric to mean of 'RATE' in group_df

    SET new_row to a dictionary with 'YearMonthWeek' as year_month_week and 'Rate' as rate_metric
    APPEND new_row to agg_df

SORT agg_df by 'YearMonthWeek'
RESET index of agg_df
ROUND 'Rate' in agg_df to 2 decimal places
