# Cheat Sheet Functions 

Functions can make or break the efficiency of code. Whether they are implemented by default or need to be created manually, functions are an insanely powerful tool designed to simplify, structure, and optimize information. 

Below will be several instances of functions, described by comments to best understand and dissect each line. The functions themselves will be paired with workable examples to show the effect these functions have and visualize why they are necessary.

## Functions: 
- #### Sorting and Searching Functions
- #### Kwargs and Args Functions
- #### Encoding Functions

## Sorting and Searching Functions

These functions will work to show how to sort and search for information in several different kinds of custom functions created using the ```def``` keyword. 

#### Fuzzy Matching and Replacing

In [32]:
# This function will take two string parameters and will calculate the similarity level between the two.
def commonality(sim1, sim2):
    common_characters = set(sim1) & set(sim2)
    commonality_rate = len(common_characters) / max(len(sim1), len(sim2))
    return commonality_rate

# This function finds words similar to a given word, calculates the similarity rate of each word and replaces the data based on a customizable similarity threshold.
def fuzzy_mr(word, data, threshold):
    similar_words = {}
    
    for obj in data:
        similarity_rate = commonality(word, obj)
        if similarity_rate >= threshold:
            similar_words[obj] = similarity_rate
    return similar_words

# EXAMPLE:

word = "apple"
data = ["appel", "aplÈ", "applE", "pear", "PeAr", "gRapefRuit", 'applε', 'applœs']
threshold = 0.2

similar_words = fuzzy_mr(word, data, threshold)

# This line will check if 'similar_words' has found any similar words. 
# If so, it will print all values found in 'data' as well as their similarity rate compared to the given word. 
if similar_words:
    print(f"Similar words to '{word}':")  
    for similar_word, similarity in similar_words.items():
        print(f"{similar_word} | Similarity Rate: {similarity:.2f}")

    # This will iterate through every index amount found in 'data'.
    # If the similarity rate is >= to the threshold amount, the specific word in the iterated index will be replaced with the given word. 
    for index in range(len(data)):
        similarity = commonality(word, data[index])
        if similarity >= threshold:
            data[index] = word
            
    print("\nUpdated data list:")
    print(data)
else:
    print(f"No similar words found for '{word}' within the given threshold.")

Similar words to 'apple':
appel | Similarity Rate: 0.80
aplÈ | Similarity Rate: 0.60
applE | Similarity Rate: 0.60
pear | Similarity Rate: 0.60
PeAr | Similarity Rate: 0.20
gRapefRuit | Similarity Rate: 0.30
applε | Similarity Rate: 0.60
applœs | Similarity Rate: 0.50

Updated data list:
['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple']


#### Binary Search & Counting Occurence

In [31]:
# The 'binary_search' function will use a given list('list1') and number('num') to locate the first index number of our desired value inside the list.
def binary_search(list1, num):
    left_half = 0 
    right_half = len(list1) - 1

    # This line will create a loop to check each portion of the list, from left to right to find the first occurrence of the number.
    # If no occurrence is found, the loop will end without a finding. 
    while left_half <= right_half:
        middle = (left_half + right_half) // 2
        if list1[middle] == num:
            return middle
        elif list1[middle] < num:
            left_half = middle + 1
        else:
            right_half = middle - 1
    return None


# The 'count_frequency' function will use a SORTED list of values, a given element, and the binary_search function to locate a value and count the occurrences. 
def count_frequency(list1, num):
    list1 = sorted(list1)
    index = binary_search(list1, num)
    left_half = index + 1
    right_half = index - 1
 
    if index == -1:
        return 0
    else:
        count = 1

    # This loop counts the number of occurrences of the given number in the list.
    # Starting from the index, it will go from left to right until it can't find any occurrences on the left side. 
    while left_half < len(list1) and list1[left_half] == num:
        count += 1
        left_half += 1

    # Starting from the index, it will go from right to left until it can't find any occurrences on the right side.
    while right_half >= 0 and list1[right_half] == num:
        count += 1
        right_half += 1
    return count

# EXAMPLE:

list1 = [1,5,4,3,6,8,1,3,6,8,3,1,2,4,6,7,5,3,1,3,5,6,8]
num = 5
print(f"Number: {num} | Frequency Amount: {count_frequency(list1, num)}")

Number: 5 | Frequency Amount: 3


##  Kwargs and Aargs Functions

These functions will work to show why the use of ```*args``` and ```**kwargs``` are used to make code far more variable. 

```*args```: Used to handle an arbitrary number amount of NON-keyword arguments in a function, organized via a tuple. 

```**kwargs```: Used to handle an arbitrary number amount of KEYWORD arguments in a function, organized via a dictionary.

#### Calculating Grocery Price and Coupon Discount Using Args and Kwargs

In [40]:
# This defining line will calculate the total price of groceries using '*args' and '**kwargs'.
def PriceCalculator(*args, **kwargs):
    total_price = sum(args)   
    coupon_discount = kwargs.get('discount', 0)
    
    if coupon_discount:
        total_price -= total_price * (coupon_discount / 100)  
    return total_price

# EXAMPLE:

grocery_prices = [20.99, 16.25, 4.99, 13.49, 76.46, 36.99, 45.67, 52.39, 78.28]
discount_percent = 12

total = round(PriceCalculator(*grocery_prices, discount=discount_percent), 2)

if discount_percent > 0: 
    print(f"Total price after {discount_percent}% discount: ${total}")
else:
    print(f"Total price without discount: ${total}")

Total price after 12% discount: $304.05


#### Data Processing Using Kwargs 

In [42]:
import pandas as pd
import numpy as np


# This defining line will allow for the preprocessing of our data via multiple optional functions as well as the inclusion of **kwargs.
def preprocess_data(df, dropna=False, new_order=None, fillna=False, fillna_col=None, **kwargs):
    if new_order is not None:
        df = df.reindex(columns=new_order)   
    if fillna:
        for column in fillna_col:
            df[column] = df[column].fillna(**kwargs.get('KwargFill'))
    return df


# EXAMPLE: 

data = {
    'A': [1, 2, None, 4, 5, np.nan, 7, np.nan, 9, 10],
    'B': [10, None, 30, 40, np.nan, 60, 70, 80, 90, 100],
    'C': [1.5, 3, 4.5, 6, 7.5, np.nan, 10.5, 12, 13.5, np.nan],
    'D': [100, None, 80, None, np.nan, 20, np.nan, 40, np.nan, 60]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

df_processed = preprocess_data(df, new_order=['D', 'C', 'B', 'A'], fillna=True, fillna_col=['D', 'C', 'B', 'A'], KwargFill={'value': 0})
print("\nAfter Preprocessing:")
print(df_processed)

Original DataFrame:
      A      B     C      D
0   1.0   10.0   1.5  100.0
1   2.0    NaN   3.0    NaN
2   NaN   30.0   4.5   80.0
3   4.0   40.0   6.0    NaN
4   5.0    NaN   7.5    NaN
5   NaN   60.0   NaN   20.0
6   7.0   70.0  10.5    NaN
7   NaN   80.0  12.0   40.0
8   9.0   90.0  13.5    NaN
9  10.0  100.0   NaN   60.0

After Preprocessing:
       D     C      B     A
0  100.0   1.5   10.0   1.0
1    0.0   3.0    0.0   2.0
2   80.0   4.5   30.0   0.0
3    0.0   6.0   40.0   4.0
4    0.0   7.5    0.0   5.0
5   20.0   0.0   60.0   0.0
6    0.0  10.5   70.0   7.0
7   40.0  12.0   80.0   0.0
8    0.0  13.5   90.0   9.0
9   60.0   0.0  100.0  10.0


## Encoding Functions 

These functions will show how to encode elements and make a DataFrame focus on organization. 

#### Label Encoding 

In [93]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# This line will define a function titled 'LabelEncoding' which will take a DataFrame('df') and specific columns in order to encode via numerical labels. 
def LabelEncoding(df, columns):
    LE = LabelEncoder()
    
    for column in columns:
        df[column + '_encoded'] = LE.fit_transform(df[column]) 
    return df

# EXAMPLE: 

data = {
    'Brand': ['BMW', 'Ford', 'Mercedes-Benz', 'Lancia', 'Lancia', 'BMW', 'BMW', 'Ford', 'Mercedes-Benz', 'Ford', 'BMW', 'Mercedes-Benz', 'Lancia', 'BMW'], 
    'Color': ['Red', 'Blue', 'White', 'Green', 'Red', 'Red', 'White', 'Green', 'White', 'Red', 'Blue', 'Blue', 'Blue', 'Red'], 
    'Length': [176.4, 182.8, 156.7, 167.5, 176.4, 182.8, 154.1, 183.9, 175.4, 143.6, 154.8, 165.8, 157.1, 187.5]} 

df = pd.DataFrame(data)
encoded_df = LabelEncoding(df, ['Brand'])
print(f"One Column Encoded:\n\n{encoded_df}")

encoded_df2 = LabelEncoding(df, ['Brand', 'Color'])
print(f"\n\nMultiple Columns Encoded:\n\n{encoded_df2}")

One Column Encoded:

            Brand  Color  Length  Brand_encoded
0             BMW    Red   176.4              0
1            Ford   Blue   182.8              1
2   Mercedes-Benz  White   156.7              3
3          Lancia  Green   167.5              2
4          Lancia    Red   176.4              2
5             BMW    Red   182.8              0
6             BMW  White   154.1              0
7            Ford  Green   183.9              1
8   Mercedes-Benz  White   175.4              3
9            Ford    Red   143.6              1
10            BMW   Blue   154.8              0
11  Mercedes-Benz   Blue   165.8              3
12         Lancia   Blue   157.1              2
13            BMW    Red   187.5              0


Multiple Columns Encoded:

            Brand  Color  Length  Brand_encoded  Color_encoded
0             BMW    Red   176.4              0              2
1            Ford   Blue   182.8              1              0
2   Mercedes-Benz  White   156.7         

#### Frequency Encoding

In [43]:
import pandas as pd

# This line defines the function, 'FrequencyEncoder' which will encode specific columns and assign each element with variables of frequency amount.
def FrequencyEncoder(df, columns):
    for column in columns:
        frequency = df[column].value_counts() / len(df)
        df[column + '_frequency'] = df[column].map(frequency)
    return df


# EXAMPLE:

data = {
    'Sex': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Male', 'Male', 'Female'],
    'Inches': [67, 56, 72, 43, 83, 76, 62, 53, 75, 66, 54, 62],
    'Weight': [142, 235, 176, 198, 110, 205, 298, 317, 123, 163, 367, 150]}

df = pd.DataFrame(data)
encoded_df = FrequencyEncoder(df, ['Sex'])
print(f"One Column Encoded:\n\n{encoded_df}")

encoded_df2 = FrequencyEncoder(df, ['Sex', 'Inches'])
print(f"\n\nMultiple Columns Encoded:\n\n{encoded_df2}")

One Column Encoded:

       Sex  Inches  Weight  Sex_frequency
0     Male      67     142       0.583333
1   Female      56     235       0.416667
2   Female      72     176       0.416667
3     Male      43     198       0.583333
4     Male      83     110       0.583333
5   Female      76     205       0.416667
6   Female      62     298       0.416667
7     Male      53     317       0.583333
8     Male      75     123       0.583333
9     Male      66     163       0.583333
10    Male      54     367       0.583333
11  Female      62     150       0.416667


Multiple Columns Encoded:

       Sex  Inches  Weight  Sex_frequency  Inches_frequency
0     Male      67     142       0.583333          0.083333
1   Female      56     235       0.416667          0.083333
2   Female      72     176       0.416667          0.083333
3     Male      43     198       0.583333          0.083333
4     Male      83     110       0.583333          0.083333
5   Female      76     205       0.416667   