## Assembly Functions
These are a collection of functions that will work together to take the work (and libraries and functions) made with the function and library management and apply it. The end product will use user input to create a file and populate it with functions and the libraries and functions associated with those functions.

In [1]:
import base

import pandas
import time

from IPython.display import clear_output

### Part 1: Get module name

**Big Idea:** Get user input to get the name of the file to be created

**Final Product:** A function that can take user input to get the filename. It will check if the filename includes .py at the end and format the name appropriately if it doesn't.

In [3]:
def get_filename():
    filename = input("Enter the filename: ")

    # Check if the filename ends with .py
    if not filename.endswith('.py'):
        filename += '.py'

    return filename

In [30]:
get_filename()

Enter the filename:  wrangle.py


'wrangle.py'

### Part 2: Get functions to include

**Big Idea:** Get functions from the user and return that list for usage elsewhere.

**Final Product:** A function that displays a list of available functions for use. It will get a string from the user and parse that string into a list of functions (and drop any that don't exist). Then it will show that string to the user and prompt the user to start over or add more (or press enter if it's good). It will then return this list.

*Optional:* Use the the module name to suggest functions that are associated with that module name.

In [5]:
def select_functions():
    # Retrieve the functions
    df = base.data_saver(load_function=True)
    
    while True:
        # Clear screen
        time.sleep(0.05)
        clear_output()
        
        # Display available functions
        base.view_list(df)
        
        # Get user input
        input_string = input("Enter function names separated by commas: ")
        input_list = [name.strip() for name in input_string.split(',') if name.strip()]

        # Filter against DataFrame
        valid_functions = df[df['name'].isin(input_list)]['name'].tolist()

        if not valid_functions:
            print("None of the entered functions are available. Please try again.")
            time.sleep(1.5)
            continue

        print("Selected functions:", ', '.join(valid_functions))
        choice = input("Press Enter to confirm or 'restart' to start over: ").lower()

        if choice == 'restart':
            continue
        else:
            # Return the list of valid functions if user confirms
            return valid_functions

In [84]:
select_functions()

Current list of entries:
df_info: Function takes a dataframe and returns potentially relevant information about it
check_file_exists: Generic function to check if a file exists
drop_extras: Function to drop extra columns that may have a smaller impact on the model
split_categorical: Returns three dataframes split from one for use in model training, validation, and testing
drop_cols: Drops columns
encode_df: Takes a processed dataframe and encodes the object columns for usage in modeling
Xy_sets: Encodes and returns X_sets and y_sets
test_hypothesis: Runs a quick statistical test and informs of rejection or failure to reject



Enter function names separated by commas:  df_info, check_file_exists, drop_cols, Xy_sets


Selected functions: df_info, check_file_exists, drop_cols, Xy_sets


Press Enter to confirm or 'restart' to start over:  


['df_info', 'check_file_exists', 'drop_cols', 'Xy_sets']

### Part 3: Get function dependencies
**Big Idea:** Use the dependencies field of the functions list to get any dependencies related to functions.

**Final Product:** A function that takes a list containing the names of functions to include. It will then iterate through this list and get the names of any functions that are necessary to run the functions. It will do this recursively until there are no functions found. If no functions found, it returns none. Any returns (excepting none) are formatted as a list and appended to a list.

In [7]:
def find_dependencies(func_names, all_dependencies=set()):
    df = base.data_saver(load_function=True)
    
    new_dependencies = set()

    for func in func_names:
        # Get dependencies for the current function
        current_dependencies = df[df['name'] == func]['dependencies'].iloc[0]
        
        # Extract function dependencies if they exist
        func_dependencies = current_dependencies.get('function', []) if isinstance(current_dependencies, dict) else []

        # Add new dependencies to the set
        for dependency in func_dependencies:
            if dependency not in all_dependencies:
                new_dependencies.add(dependency)

    # Update the overall dependencies set
    all_dependencies.update(new_dependencies)

    # Recursively find dependencies for the new dependencies
    if new_dependencies:
        find_dependencies(new_dependencies, all_dependencies)

    return list(all_dependencies) if all_dependencies else None

In [98]:
functions = ['df_info', 'check_file_exists', 'drop_cols', 'Xy_sets']

find_dependencies(functions)

['encode_df', 'drop_extras']

### Part 4: Get library dependencies
**Big Idea:** Use the dependencies field of a functions list to get any dependencies related to libraries.

**Final Product:** A function that takes a list containing the names of functions to include. It will then iterate through this list and get the names of any libraries necessary to run the functions. It will then return this list.

In [9]:
def find_library_dependencies(func_names):
    df = base.data_saver(load_function=True)
    required_libraries = set()

    for func in func_names:
        # Retrieve the dependency record for the function
        dependency_record = df[df['name'] == func]['dependencies'].iloc[0]

        # Check if the record is a dictionary and contains the 'library' key
        if isinstance(dependency_record, dict) and 'library' in dependency_record:
            libraries = dependency_record['library']
            required_libraries.update(libraries)

    return list(required_libraries)

In [106]:
functions = ['encode_df', 'drop_extras','df_info', 'check_file_exists', 'drop_cols', 'Xy_sets']

find_library_dependencies(functions)

['os', 'pandas']

In [117]:
base.data_saver(load_function=True)

Unnamed: 0,name,desc,tags,dependencies,syntax
0,df_info,Function takes a dataframe and returns potenti...,,{'library': ['pandas']},"def df_info(df,include=False,samples=1):\n ..."
1,check_file_exists,Generic function to check if a file exists,,"{'library': ['os', 'pandas']}","def check_file_exists(filename,query,url):\n ..."
2,drop_extras,Function to drop extra columns that may have a...,,{'library': ['pandas']},"def drop_extras(df,target,degree=6):\n """"""..."
3,split_categorical,Returns three dataframes split from one for us...,,"{'library': ['pandas', 'train_test_split']}","def split_categorical(df,strat_var,seed=123):\..."
4,drop_cols,Drops columns,,"{'library': ['pandas'], 'function': ['drop_ext...","def drop_cols(df,cols=[],extras=False,degree=6..."
5,encode_df,Takes a processed dataframe and encodes the ob...,,{'library': ['pandas']},"def encode_df(df,target):\n '''\n Take..."
6,Xy_sets,Encodes and returns X_sets and y_sets,,"{'library': ['pandas'], 'function': ['encode_d...","def Xy_sets(tvt_set,target):\n '''\n E..."
7,test_hypothesis,Runs a quick statistical test and informs of r...,,,"def test_hypothesis(p,\n ..."


### Part 5: Syntax Grabber
**Big Idea:** For a given name, return the syntax.

**Final Product:** A function that can be given a list and a DataFrame, check the list against the DataFrame, and retrieve the syntax for each item.

In [11]:
def syntax_grabber(item_list, df):
    syntax_list = []

    for item in item_list:
        # Find the row in the DataFrame where 'name' matches 'item'
        matching_row = df[df['name'] == item]
        
        syntax_list.append(matching_row['syntax'].iloc[0])

    return syntax_list

In [161]:
functions = ['encode_df', 'drop_extras','df_info', 'check_file_exists', 'drop_cols', 'Xy_sets']

syntax_grabber(functions,base.data_saver(load_function=True))

["def encode_df(df,target):\n     '''\n     Takes a processed dataframe and encodes the object columns for usage in modeling.\n          Takes a dataframe and a target variable (assuming the target variable is an object). Target variable keeps the thing the model is being trained on from splitting and altering it.\n          !!! MAKE ME MORE DYNAMIC !!!\n     - Add functionality to check if passed a list or dataframe\n     - If dataframe, then run standard loop\n     - If list then check if each item is a dataframe (checking for train/validate/test)\n     - If list and each item is dataframe, then try loop on each dataframe\n     - Otherwise return an error\n     '''\n     # Get the object columns from the dataframe\n     obj_col = [col for col in df.columns if df[col].dtype == 'O']\n          # remove target variable\n     obj_col.remove(target)\n          # Begin encoding the object columns\n     for col in obj_col:\n         # Grab current column dummies\n         dummies = pd.get_d

### Part 6: Assembly
**Big Idea:** Build a file with the desired functions.

**Final Product:** A function that will get the filename to be created and the functions to be included from the user. It will then assemble these functions with all their dependencies and write it to a file.

In [177]:
# Get the filename from the user
filename = get_filename()

Enter the filename:  wrangle.py


In [181]:
filename

'wrangle.py'

In [90]:
# Select for the functions to be included
main_functions = select_functions()

Current list of entries:
df_info: Function takes a dataframe and returns potentially relevant information about it
check_file_exists: Generic function to check if a file exists
drop_extras: Function to drop extra columns that may have a smaller impact on the model
split_categorical: Returns three dataframes split from one for use in model training, validation, and testing
drop_cols: Drops columns
encode_df: Takes a processed dataframe and encodes the object columns for usage in modeling
Xy_sets: Encodes and returns X_sets and y_sets
test_hypothesis: Runs a quick statistical test and informs of rejection or failure to reject



Enter function names separated by commas:  df_info, check_file_exists, Xy_sets


Selected functions: df_info, check_file_exists, Xy_sets


Press Enter to confirm or 'restart' to start over:  


In [92]:
main_functions

['df_info', 'check_file_exists', 'Xy_sets']

In [94]:
# Get the secondary functions that may be required to run
primary_functions = find_dependencies(main_functions)

In [96]:
primary_functions

['encode_df']

In [62]:
# Establish the list of libraries
libraries = set()

In [98]:
# Add the libraries to the set
libraries.update(set(find_library_dependencies(main_functions) + find_library_dependencies(primary_functions)))

In [100]:
libraries

{'os', 'pandas'}

In [78]:
# Retrieve the library syntaxes
libraries_syntax = syntax_grabber(libraries,
                                  base.data_saver(load_library=True))

In [80]:
libraries_syntax

['import pandas as pd', 'import os']

In [102]:
# Retrieve the function syntaxes
function_syntax = syntax_grabber(primary_functions + main_functions,
                                base.data_saver(load_function=True))

In [104]:
function_syntax

["def encode_df(df,target):\n     '''\n     Takes a processed dataframe and encodes the object columns for usage in modeling.\n          Takes a dataframe and a target variable (assuming the target variable is an object). Target variable keeps the thing the model is being trained on from splitting and altering it.\n          !!! MAKE ME MORE DYNAMIC !!!\n     - Add functionality to check if passed a list or dataframe\n     - If dataframe, then run standard loop\n     - If list then check if each item is a dataframe (checking for train/validate/test)\n     - If list and each item is dataframe, then try loop on each dataframe\n     - Otherwise return an error\n     '''\n     # Get the object columns from the dataframe\n     obj_col = [col for col in df.columns if df[col].dtype == 'O']\n          # remove target variable\n     obj_col.remove(target)\n          # Begin encoding the object columns\n     for col in obj_col:\n         # Grab current column dummies\n         dummies = pd.get_d

In [106]:
for item in libraries_syntax:
    print(item)

import pandas as pd
import os


In [108]:
for item in function_syntax:
    print(item + '\n')

def encode_df(df,target):
     '''
     Takes a processed dataframe and encodes the object columns for usage in modeling.
          Takes a dataframe and a target variable (assuming the target variable is an object). Target variable keeps the thing the model is being trained on from splitting and altering it.
          !!! MAKE ME MORE DYNAMIC !!!
     - Add functionality to check if passed a list or dataframe
     - If dataframe, then run standard loop
     - If list then check if each item is a dataframe (checking for train/validate/test)
     - If list and each item is dataframe, then try loop on each dataframe
     - Otherwise return an error
     '''
     # Get the object columns from the dataframe
     obj_col = [col for col in df.columns if df[col].dtype == 'O']
          # remove target variable
     obj_col.remove(target)
          # Begin encoding the object columns
     for col in obj_col:
         # Grab current column dummies
         dummies = pd.get_dummies(df[col],drop_

In [124]:
# Define the function
def assembler():
    # Get the filename from the user
    filename = get_filename()
    
    # Select for the functions to be included
    main_functions = select_functions()
    
    # Get the secondary functions that may be required to run
    primary_functions = find_dependencies(main_functions)
    
    # Establish the list of libraries
    libraries = set()
    
    # Add the libraries to the set
    libraries.update(set(find_library_dependencies(main_functions) + find_library_dependencies(primary_functions)))
    
    # Retrieve the library syntaxes
    libraries_syntax = syntax_grabber(libraries,
                                      base.data_saver(load_library=True))
    
    # Retrieve the function syntaxes
    function_syntax = syntax_grabber(primary_functions + main_functions,
                                    base.data_saver(load_function=True))
    
    # Build the file
    print(f'Creating {filename}...')
    with open(filename,'w') as file:
        for item in libraries_syntax:
            file.write(item + '\n')
            
        file.write('\n')
        
        for item in function_syntax:
            file.write(item + '\n\n')

In [126]:
assembler()

Current list of entries:
df_info: Function takes a dataframe and returns potentially relevant information about it
check_file_exists: Generic function to check if a file exists
drop_extras: Function to drop extra columns that may have a smaller impact on the model
split_categorical: Returns three dataframes split from one for use in model training, validation, and testing
drop_cols: Drops columns
encode_df: Takes a processed dataframe and encodes the object columns for usage in modeling
Xy_sets: Encodes and returns X_sets and y_sets
test_hypothesis: Runs a quick statistical test and informs of rejection or failure to reject



Enter function names separated by commas:  df_info, check_file_exists, split_categorical, Xy_sets


Selected functions: df_info, check_file_exists, split_categorical, Xy_sets


Press Enter to confirm or 'restart' to start over:  


Creating wrangle.py...
