
In this notebook, I've developed several helper functions to enhance the creation of visualizations and enhance the overall code's clarity. These functions were pivotal in streamlining the visualization process for presenting model results.

The creation of these helper functions has simplified the generation of intricate visualizations, allowing me to dedicate less time to coding and more time to in-depth data analysis. Furthermore, these functions contribute to code modularity, making it more comprehensible by breaking down complex operations into smaller, more manageable segments. Without these functions, the visualization process would be both more laborious and time-intensive.

In [1]:
def create_pie_chart(dataframe, column_name):
    plot_df = dataframe[column_name].value_counts()
    plot_df.plot(kind="pie")
    plt.show() # a pie chart function 

In [2]:
def create_barplot(dataframe, column_name):
    color = sns.color_palette()
    int_level = dataframe[column_name].value_counts()
    plt.figure(figsize=(8,4))
    sns.barplot(int_level.index, int_level.values, alpha=0.8, color=color[1])
    plt.ylabel('Number of Occurrences', fontsize=12)
    plt.xlabel('column_name', fontsize=12)
    plt.show()

In [4]:
def create_freq_tables(dataframe, column_name,desc):
    # one way frequency table for the species column. 
    freq_table = pd.crosstab(dataframe[column_name], desc) # desc: description of the proportioon column 
  
    # frequency table in proportion of species 
    freq_table= freq_table/len(dataframe) 
    print(freq_table)

Below I define a "my_catplot" function and a "my_histplot" function.I used these two fuctions in the "fraud detection" project.

my_catplot is a function to create separate bars within each category in the plot. For instance, I can plot the distribution of categorical variable within the outcome = 1 category. 

my_histplot is a function to create separate histgrams for different categories. This allows us to visualize the distributiono of numeric data within different groups (outcome =1 or 0). It can be particularly useful when we want to compare the distributions of two or more groups. 

In [None]:
def my_catplot(dataframe, column_name,feature=None, ax=None):
    sns.countplot(data=dataframe[:100000], x=feature, hue=column_name, palette=(palette[0], palette[-1]), ax=ax)
    ax.set_xlabel('Number of Observations')
    ax.set_ylabel(f'{feature}')

def my_histplot(dataframe, column_name,feature=None, ax=None):
    sns.histplot(data=dataframe[:100000], 
                         x=feature, 
                         hue=column_name, 
                         kde=True, 
                         element='step', 
                         palette=(palette[0], 
                                  palette[-1]), 
                         ax=ax,
                         log_scale=True)
    ax.set_ylabel('Number of Observations')
    ax.set_xlabel(f'{feature}')
    mean_value_f = dataframe[dataframe[column_name]==False][feature].mean()
    mean_value_t = dataframe[dataframe[column_name]==True][feature].mean()
    ax.axvline(x=mean_value_f, 
               color=palette[0])
    ax.axvline(x=mean_value_t, 
                color=palette[-1])
    ax.annotate(f'Mean {feature}\n for regular transactions: ${mean_value_f:,.2f}', 
                 xy=(0.1, 0.5),
                 xycoords='axes fraction',
                 font='roboto',
                 fontstyle='italic')
    ax.annotate(f'Mean {feature}\n for fraudulent transactions: ${mean_value_t:,.2f}', 
                 xy=(0.1, 0.3),
                 xycoords='axes fraction',
                 font='roboto',
                 fontstyle='italic')
    