# Exercise 1
## Conceptual Similarity with WordNet

### Task:
- The exercise involves implementing three similarity measures based on WordNet.

#### Wu & Palmer:
- The Wu & Palmer similarity measure is based on the structure of WordNet.
- LCS (Lowest Common Subsumer) is the most immediate common ancestor between senses s1 and s2, and `depth(x)` is a function that measures the distance from the WordNet root to synset x.

#### Shortest Path:
- For a specific version of WordNet, `depthMax` is a fixed value.
- The similarity between two senses `(s1, s2)` is a function of the shortest path length `len(s1, s2)` from s1 to s2:
  - If `len(s1, s2)` is `0`, `simpath(s1, s2)` gets the maximum value of `2 * depthMax`.
  - If `len(s1, s2)` is `2 * depthMax`, `simpath(s1, s2)` gets the minimum value of `0`.
  - Thus, the values of `simpath(s1, s2)` range between `0` and `2 * depthMax`.

#### Leacock & Chodorow:
- When s1 and s2 have the same sense, `len(s1, s2) = 0`. In practice, we add 1 to both `len(s1, s2)` and `2 * depthMax` to avoid `log(0)`.
  - Thus, the values of `simLC(s1, s2)` are within the interval `(0, log(2 * depthMax + 1)]`.

#### For each of these similarity measures, calculate the Spearman correlation indices and the Pearson correlation indices between the obtained results and the 'target' values present in the annotated file.


### Importo le librerie utilizzate

In [1]:
from nltk.corpus import wordnet as wn
import numpy as np
import math
import pandas as pd
from scipy import stats
from prettytable import PrettyTable, MARKDOWN
from itertools import product


### Correlation Coefficient Functions

The following functions calculate correlation coefficients between two arrays using different methods.

- **`pearson_correlation(x, y)` Function:**
  - This function calculates the Pearson correlation coefficient between two arrays of values, `x` and `y`. The Pearson correlation coefficient measures the linear relationship between two variables and ranges between -1 (perfect negative correlation) and 1 (perfect positive correlation). It provides an indication of how well the relationship between the two arrays can be described by a straight line.
  - Args:
    - `x` (array-like): The first array of values.
    - `y` (array-like): The second array of values.
  - Returns:
    - `float`: The Pearson correlation coefficient.

- **`spearman_rank_correlation(x, y)` Function:**
  - This function calculates the Spearman rank correlation coefficient between two arrays of values, `x` and `y`. The Spearman rank correlation coefficient assesses the monotonic relationship between two variables, meaning it captures whether the variables tend to change together but not necessarily linearly. It is based on the rank orders of the values rather than their actual values.
  - Args:
    - `x` (array-like): The first array of values.
    - `y` (array-like): The second array of values.
  - Returns:
    - `float`: The Spearman rank correlation coefficient.



In [2]:
def pearson_correlation(x, y):
    """
    Calculate the Pearson correlation coefficient between two arrays.

    Args:
        x (array-like): The first array of values.
        y (array-like): The second array of values.

    Returns:
        float: The Pearson correlation coefficient.
    """
    std_x = np.std(x)
    std_y = np.std(y)
    covariance = np.cov(x, y)

    return covariance[0][1] / (std_x * std_y)

def spearman_rank_correlation(x, y):
    """
    Calculate the Spearman rank correlation coefficient between two arrays.

    Args:
        x (array-like): The first array of values.
        y (array-like): The second array of values.

    Returns:
        float: The Spearman rank correlation coefficient.
    """
    x_df = pd.DataFrame(x, columns=["values"])
    y_df = pd.DataFrame(y, columns=["values"])
    x_df['Rank'] = x_df['values'].rank(method='max')
    y_df['Rank'] = y_df['values'].rank(method='max')
    ranked_x = x_df['Rank'].to_numpy()
    ranked_y = y_df['Rank'].to_numpy()

    covariance = np.cov(ranked_x, ranked_y)
    std_ranked_x = np.std(ranked_x)
    std_ranked_y = np.std(ranked_y)

    return covariance[0][1] / (std_ranked_x * std_ranked_y)

### Utility Functions

#### `get_depth(syn)` Function

This function calculates the depth of a given WordNet Synset to its topmost hypernym (root ancestor). It takes a WordNet Synset object, `syn`, as input and returns an integer representing the depth of `syn` to the root.

#### `find_lcs(s1, s2)` Function

This function finds the Lowest Common Subsumer (LCS), which is the most specific common ancestor, between two WordNet Synsets, `s1` and `s2`. It takes two WordNet Synset objects, `s1` and `s2`, as input and returns a tuple containing the LCS as a Synset object and the depth of the LCS as an integer. The process involves comparing hypernym paths (ancestor paths) of `s1` and `s2` to identify a common synset that has the maximum depth. The code initializes the LCS to the root synset 'entity.n.01' and iterates through hypernym paths to identify the common ancestor with the greatest depth.

#### `get_len(s1, s2)` Function

This function computes the minimum path length between any pair of synsets. It takes two WordNet Synset objects as input and returns an integer representing the shortest semantic path length. The function calculates the distance using the paths between synsets hypernyms. 

- **Args:**
  - `syn_arr_w1`, `syn_arr_w2`: Lists of WordNet Synset objects.
  
- **Returns:**
  - Minimum path length (int) between any pair of synsets from the two input lists.

#### `get_max_depth()` Function

This function calculates and returns the maximum depth among all synsets in WordNet. It uses a list comprehension to obtain the depths of all synsets and then finds the maximum depth from the list of depths.

- **Returns:**
  - int: The maximum depth among all synsets.

### Scaling Functions

#### `scale_list_sp(input_list)` Function

This function linearly scales a list of values to the range [0, 10]. It is used to transform values for specific visualization purposes.

- **Parameters:**
  - `input_list` (list): A list of values to be scaled.

- **Returns:**
  - list: A list of values scaled to the range [0, 10].

#### `scale_list_lc(input_list)` Function

This function logarithmically scales a list of values to the range [0, 10]. This scaling is applied to values related to WordNet synset depths and is used for specific visualization purposes.

- **Parameters:**
  - `input_list` (list): A list of values to be scaled logarithmically.

- **Returns:**
  - list: A list of values scaled logarithmically to the range [0, 10].


In [3]:
def get_depth(s1):
    """
    Function to get the depth of a given synset to its topmost hypernym.

    Parameters:
    - s1: A wordnet Synset object

    Returns:
    - The depth of s1 to the root as an integer
    """
    # print("getting depth of ", s1)
    hypernyms = s1.hypernyms()
    if not hypernyms:
        return 1
    else:
        return 1 + max(get_depth(h) for h in hypernyms)



def find_lcs(s1, s2):
    """
    Find the Lowest Common Subsumer (LCS) and its depth between two WordNet synsets.

    Args:
        s1 (Synset): The first WordNet synset.
        s2 (Synset): The second WordNet synset.

    Returns:
        tuple: A tuple containing the LCS synset and its depth.
    """
    # A default root value for the LCS
    lcs = wn.synset('entity.n.01')
    
    # Get the maximum depth of the default LCS
    max_depth = get_depth(lcs)
    
    # Get hypernym paths for both synsets
    hypernyms_s1 = wn.synset(s1.name()).hypernym_paths()
    hypernyms_s2 = wn.synset(s2.name()).hypernym_paths()
    
    # Loop through the hypernym paths of the first synset 's1'
    for hypernym_path_s1 in hypernyms_s1:
        # Loop through individual synsets within a specific path from 's1'
        for synset_s1 in hypernym_path_s1:
            # Loop through the hypernym paths of the second synset 's2'
            for hypernym_path_s2 in hypernyms_s2:
                # Loop through individual synsets within a specific path from 's2'
                for synset_s2 in hypernym_path_s2:
                    # Check if the current synsets are equal and if the current synset from 's1'
                    # has a depth greater than the LCS found so far (compared to 'max_depth')
                    if synset_s1 == synset_s2:
                        if get_depth(synset_s1) > max_depth:
                            # If the condition is met, update the LCS and the maximum depth
                            lcs = synset_s1
                            max_depth = get_depth(lcs)

    return lcs, max_depth

def get_len(s1, s2):
    # Get the hypernym paths for each synset
    paths1 = s1.hypernym_paths()
    paths2 = s2.hypernym_paths()

    # Initialize the shortest distance to a large number
    shortest_dist = float('inf')

    # For each pair of paths, compute the distance
    for path1 in paths1:
        for path2 in paths2:
            # Find the first common hypernym in the paths
            i = 0
            while i < min(len(path1), len(path2)) and path1[i] == path2[i]:
                i += 1

            # The distance is the sum of the lengths of the remaining parts of the paths
            dist = len(path1[i:]) + len(path2[i:])

            # Update the shortest distance
            if dist < shortest_dist:
                shortest_dist = dist

    return shortest_dist

def get_max_depth():
    """
    Get the maximum depth among all synsets in WordNet.

    Returns:
        int: The maximum depth among all synsets.
    """
    # Use a list comprehension to get the depths of all synsets in WordNet
    depths = [synset.max_depth() for synset in wn.all_synsets()]

    # Find the maximum depth from the list of depths
    max_depth = max(depths)

    return max_depth

max_depth = get_max_depth()


def scale_list_sp(input_list):
    min_val = 0
    max_val = 38
    new_min = 0
    new_max = 10

    # Compute the scaling factor
    scale_factor = (new_max - new_min) / (max_val - min_val)

    # Apply the scaling to all elements in the list
    scaled_list = [new_min + (item - min_val) * scale_factor for item in input_list]

    return scaled_list

def scale_list_lc(input_list):
    min_val = 0
    max_val = np.log(2 * max_depth +1 )
    new_min = 0
    new_max = 10

    # Compute the scaling factor
    scale_factor = (new_max - new_min) / (max_val - min_val)

    # Apply the scaling to all elements in the list
    scaled_list = [new_min + (item - min_val) * scale_factor for item in input_list]

    return scaled_list



### Similarity Calculation Functions

#### `max_stock_wp(syn_arr_w1, syn_arr_w2)` Function

This function calculates the maximum Wu-Palmer Similarity, a measure of semantic similarity, between two lists of WordNet Synsets.

- **Parameters:**
  - `syn_arr_w1, syn_arr_w2` (lists of WordNet Synset objects): Lists of WordNet Synset objects to compare for similarity.

- **Returns:**
  - The maximum Wu-Palmer Similarity as a float.

#### `custom_wp(s1_arr, s2_arr)` Function

Calculate the custom Wu-Palmer similarity between arrays of synsets.

- **Args:**
  - `s1_arr` (list): List of synsets from the first array.
  - `s2_arr` (list): List of synsets from the second array.

- **Returns:**
  - float: The maximum custom similarity found between any pair of synsets.

#### `compute_wp(df)` Function

Calculate the custom Wu-Palmer similarity between synsets and collect target values from the CSV file.

- **Returns:**
  - list: Custom Wu-Palmer similarity values.
  - list: Target (human reference) values.
  - DataFrame: Original data read from the CSV file.

#### `display_wup(wp_arr, target_arr, ref_wup_arr, df)` Function

Display the calculated custom Wu-Palmer similarity and target values in a pretty table.

- **Args:**
  - `wp_arr` (list): Custom Wu-Palmer similarity values.
  - `target_arr` (list): Target (human reference) values.
  - `ref_wup_arr` (list): Reference WuPalmer values.
  - `df` (DataFrame): Original data read from the CSV file.

- **Returns:**
  - None.

#### `show_wup_correlations(target_arr, wp_arr)` Function

Calculate and display correlation values between custom and Scipy-based Wu-Palmer similarity measures and target values.

- **Args:**
  - `target_arr` (list): Target (human reference) values.
  - `wp_arr` (list): Custom Wu-Palmer similarity values.

- **Returns:**
  - None.


In [4]:
def max_stock_wp(syn_arr_w1, syn_arr_w2):
    """
    Function to find the maximum Wu-Palmer Similarity (a measure of semantic similarity)
    between two lists of wordnet Synsets

    Parameters: 
    syn_arr_w1, syn_arr_w2: lists of wordnet Synset objects 

    Returns: 
    The maximum Wu-Palmer Similarity as a float
    """

    # Initialize empty list to store similarity scores
    similarities = []

    # Calculate Wu-Palmer Similarity for each pair of Synsets, one from each list
    for syn1 in syn_arr_w1:
        for syn2 in syn_arr_w2:
            wup_similarity = syn1.wup_similarity(syn2)
            similarities.append(wup_similarity)  # Add similarity score to list

    # Return the highest similarity score
    return np.max(similarities)

def custom_wp(s1_arr, s2_arr):
    """
    Calculate the custom Wu-Palmer similarity between arrays of synsets.

    Args:
        s1_arr (list): List of synsets from the first array.
        s2_arr (list): List of synsets from the second array.

    Returns:
        float: The maximum custom similarity found between any pair of synsets.
    """
    score_array = []

    # Loop through all possible pairs of synsets from the two input arrays
    for s1 in s1_arr:
        for s2 in s2_arr:
            # Find the Lowest Common Subsumer (LCS) and its depth
            lcs, lcs_depth = find_lcs(s1, s2)

            # Adjust depths to avoid division by zero in calculations
            if lcs_depth == 0:
                lcs_depth = 1
            depth_s1 = get_depth(s1)
            depth_s2 = get_depth(s2)
            if depth_s1 == 0:
                depth_s1 = 1
            if depth_s2 == 0:
                depth_s2 = 1
            
            # Calculate the custom Wu-Palmer similarity
            wp = 2 * lcs_depth / (depth_s1 + depth_s2)

            # Append the similarity score to the array
            if wp <= 1:
                score_array.append(wp)
            else:
                score_array.append(0)

    # Return the maximum custom similarity found between any pair of synsets
    return np.max(score_array)


def compute_wp(df):
    """
    Calculate the custom Wu-Palmer similarity between synsets and collect target values from the CSV file.

    Returns:
        list: Custom Wu-Palmer similarity values.
        list: Target (human reference) values.
        DataFrame: Original data read from the CSV file.
    """

    # Initialize arrays to store the target values and Wu-Palmer similarities
    wp_arr, target_arr, ref_wup =[], [], []

    # Iterate over each row in the DataFrame
    for i in range(len(df)):
    # Get the words from 'Word 1' and 'Word 2' columns of the current row
        w1, w2 = df["Word 1"][i], df["Word 2"][i]

        # Get the synsets (sets of cognitive synonyms) for both words
        syn_arr_w1 = [syn for syn in wn.synsets(w1) ]
        syn_arr_w2 = [syn for syn in wn.synsets(w2) ]

        # Ensure both words have synsets
        if len(syn_arr_w1) > 0 and len(syn_arr_w2) > 0:
            # Get the target value (human reference) from the 'Human (mean)' column of the current row
            target = df["Human (mean)"][i]
            target_arr.append(target)

            # Calculate the custom Wu-Palmer similarity between syn_arr_w1 and syn_arr_w2
            custom_wp_sim = custom_wp(syn_arr_w1, syn_arr_w2)
            wp_arr.append(custom_wp_sim)
            stock_wp = max_stock_wp(syn_arr_w1, syn_arr_w2)
            ref_wup.append(stock_wp)


    # Return the calculated arrays and the original DataFrame
    return wp_arr, target_arr, ref_wup


def display_wup(wp_arr, target_arr,ref_wup_arr, df):
    """
    Display the calculated custom Wu-Palmer similarity and target values in a pretty table.

    Args:
        wp_arr (list): Custom Wu-Palmer similarity values.
        target_arr (list): Target (human reference) values.
        df (DataFrame): Original data read from the CSV file.

    Returns:
        None
    """
    # Initialize pretty table
    table = PrettyTable()
    table.set_style(MARKDOWN)
    table.field_names = ['N', 'Word1', 'Word2', 'HumanReference', 'Reference WuPalmer', 'CustomWuPalmer']

    # Add calculated data into the table
    for i, (w1, w2, target, ref_wup_arr, custom_wp_sim) in enumerate(zip(df["Word 1"], df["Word 2"], target_arr,ref_wup_arr, wp_arr)):

        # Add to table
        table.add_row([i+1, w1, w2, target, ref_wup_arr, custom_wp_sim])

    # Display the table
    print("\n", table)

def show_wup_correlations(target_arr, wp_arr):
    # Calculate correlation values
    target_arr = [x / 10 for x in target_arr]
    pearson_corr = pearson_correlation(target_arr, wp_arr)
    spearman_corr = spearman_rank_correlation(target_arr, wp_arr)
    scipy_pearson = stats.pearsonr(target_arr, wp_arr).correlation
    scipy_spearman = stats.spearmanr(target_arr, wp_arr).correlation

    # Set up the table
    table = PrettyTable()

    # Set the field names for the table
    table.field_names = ["", "Pearson Correlation", "Spearman Correlation"]

    # Add rows
    table.add_row(["Custom", pearson_corr, spearman_corr])
    table.add_row(["Scipy", scipy_pearson, scipy_spearman])

    print(table) 

### Demo Function

#### `demo_wup()` Function

This function demonstrates the usage of the custom Wu-Palmer similarity calculation with a dataset from a CSV file.

- **Description:**
  - Reads the CSV file "WordSim353.csv" using Pandas to load the dataset.
  - Calculates custom Wu-Palmer similarity (`wp_arr`), collects target values (`target_arr`), and retrieves reference WuPalmer values (`ref`) using the `compute_wp(df)` function.
  - Displays a pretty table with the custom Wu-Palmer similarity, target values, and reference WuPalmer values using the `display_wup(wp_arr, target_arr, ref, df)` function.
  - Calculates and displays the correlation between custom Wu-Palmer similarity and target values (scaled to [0, 1]) using the `show_wup_correlations(target_arr, wp_arr)` function.
  - Calculates and displays the correlation between reference WuPalmer values and target values (scaled to [0, 1]) using the `show_wup_correlations(target_arr, ref)` function.

- **Returns:**
  - None.

#### Execution:

To run the demo, call `demo_wup()`.

```python
demo_wup()


In [5]:
def demo_wup():
    
    # Read the CSV file
    df = pd.read_csv("WordSim353.csv")

    wp_arr, target_arr, ref = compute_wp(df)
    display_wup(wp_arr, target_arr, ref,df)
    print('Custom Wu-Palmer')
    show_wup_correlations(target_arr, wp_arr)
    print('stock Wu-Palmer')
    show_wup_correlations(target_arr, ref)

demo_wup()



 |  N  |     Word1     |     Word2      | HumanReference |  Reference WuPalmer |    CustomWuPalmer   |
|:---:|:-------------:|:--------------:|:--------------:|:-------------------:|:-------------------:|
|  1  |      love     |      sex       |      6.77      |  0.9230769230769231 |  0.9230769230769231 |
|  2  |     tiger     |      cat       |      7.35      |  0.9655172413793104 |  0.9655172413793104 |
|  3  |     tiger     |     tiger      |      10.0      |         1.0         |         1.0         |
|  4  |      book     |     paper      |      7.46      |        0.875        |        0.875        |
|  5  |    computer   |    keyboard    |      7.62      |  0.8235294117647058 |  0.8235294117647058 |
|  6  |    computer   |    internet    |      7.58      |  0.631578947368421  |  0.631578947368421  |
|  7  |     plane     |      car       |      5.77      |  0.7272727272727273 |  0.7272727272727273 |
|  8  |     train     |      car       |      6.31      |  0.7368421052631579 | 

## Shortest path functions

#### `max_stock_sp(syn_arr_w1, syn_arr_w2)` Function

This function calculates the reference similarity between two arrays of synsets.

- **Args:**
  - `syn_arr_w1` (list): List of synsets from the first array.
  - `syn_arr_w2` (list): List of synsets from the second array.

- **Returns:**
  - float: The maximum reference similarity found between any pair of synsets.

#### `shortest_path(syn_arr_w1, syn_arr_w2)` Function

Calculate the shortest path similarity between two arrays of synsets.

- **Args:**
  - `syn_arr_w1` (list): List of synsets from the first array.
  - `syn_arr_w2` (list): List of synsets from the second array.

- **Returns:**
  - float: The calculated similarity based on the difference between the maximum depth and the minimum path length.


In [6]:
def max_stock_sp(syn_arr_w1, syn_arr_w2):
    """
    Calculate the reference similarity between two arrays of synsets.

    Args:
        syn_arr_w1 (list): List of synsets from the first array.
        syn_arr_w2 (list): List of synsets from the second array.

    Returns:
        float: The maximum reference similarity found between any pair of synsets.
    """
    max_path_similarity = 0  # Initialize the maximum path similarity to 0

    # Nested loops to iterate through all possible pairs of synsets, one from each array
    for synset1 in syn_arr_w1:
        for synset2 in syn_arr_w2:
            path_sim = synset1.path_similarity(synset2)  # Calculate the path similarity between synsets 's' and 'n'

            # Check if the calculated path similarity is not 'None' and greater than the current maximum
            if path_sim is not None and path_sim > max_path_similarity:
                max_path_similarity = path_sim  # Update the maximum path similarity if a higher value is found

    # print("\n REFERENCE DONE")
    return max_path_similarity  # Return the maximum path similarity found between any pair of synsets


def shortest_path(syn_arr_w1, syn_arr_w2):
    """
    Calculate the shortest path similarity between two arrays of synsets.

    Args:
        syn_arr_w1 (list): List of synsets from the first array.
        syn_arr_w2 (list): List of synsets from the second array.

    Returns:
        float: The calculated similarity based on the difference between the maximum depth and the minimum path length.
    """
    score_array = []
    # Calculate the minimum path length between synsets in 'syn_arr_w1' and 'syn_arr_w2'
    for s1 in syn_arr_w1:
        for s2 in syn_arr_w2:
            len_s1s2 = get_len(s1, s2)
            similarity = 2 * max_depth - len_s1s2
            score_array.append(similarity)

    # Calculate the similarity using a custom metric based on the difference between the maximum depth and the minimum path length

    # Return the calculated similarity and the minimum path length
    # print("\n done shortest path")
    return np.max(score_array)


### Similarity Calculation Functions for Shortest Path (SP) Measure

The following functions calculate semantic similarity scores using the Shortest Path (SP) measure between pairs of WordNet Synsets.

- **`compute_sp(df)` Function:**
  - This function computes custom Shortest Path (SP) similarity scores between synsets and collects target values from a DataFrame. It iterates through each row in the DataFrame, retrieves synsets for the corresponding words, calculates SP values, and stores them along with target values.
  - **Parameters:**
    - `df` (DataFrame): Original data read from the CSV file.
  - **Returns:**
    - `sp_arr` (list): List of custom Shortest Path similarity values.
    - `target_arr` (list): List of target (human reference) values.

- **`display_sp_table(sp_arr, target_arr, df)` Function:**
  - This function displays a formatted table with calculated custom Shortest Path (SP) similarity values, target values, and other information.
  - **Parameters:**
    - `sp_arr` (list): List of custom Shortest Path similarity values.
    - `target_arr` (list): List of target (human reference) values.
    - `df` (DataFrame): Original data read from the CSV file.
  - **Returns:**
    - None

- **`show_sp_correlations(target_arr, sp_arr)` Function:**
  - This function calculates and displays correlation values between target values and custom Shortest Path (SP) similarity values. It calculates both Pearson and Spearman correlation coefficients.
  - **Parameters:**
    - `target_arr` (list): List of target (human reference) values.
    - `sp_arr` (list): List of custom Shortest Path similarity values.
  - **Returns:**
    - None


In [7]:
def compute_sp(df):
    """
    Compute shortest path (SP) values between synsets of word pairs in a DataFrame.

    Parameters:
    - df: DataFrame containing word pairs and target values.

    Returns:
    - sp_arr: List of SP values.
    - target_arr: List of target values.
    """
    sp_arr, target_arr, ref_sp = [], [], []
    for i in range(len(df)):
        w1, w2 = df["Word 1"][i], df["Word 2"][i]
        syn_arr_w1, syn_arr_w2 = wn.synsets(w1), wn.synsets(w2)
        if len(syn_arr_w1) > 0 and len(syn_arr_w2) > 0:
            target = df["Human (mean)"][i]
            target_arr.append(target)
            custom_sp = shortest_path(syn_arr_w1, syn_arr_w2)
            sp_arr.append(custom_sp)
            stock_lc = max_stock_sp(syn_arr_w1, syn_arr_w2)
            ref_sp.append(stock_lc)

    return sp_arr, target_arr, ref_sp


def display_sp_table(sp_arr, target_arr, df):
    """
    Display a table with SP values, target values, and word pairs from a DataFrame.

    Parameters:
    - sp_arr: List of SP values.
    - target_arr: List of target values.
    - df: DataFrame containing word pairs and target values.

    Returns:
    - None
    """
    interval = [0, 2 * max_depth]
    table = PrettyTable()
    table.set_style(MARKDOWN)
    table.field_names = ["N", "Word 1", "Word 2", "Human (mean)", "Range",  "SP Value", "Normalized"]
    norm_array = scale_list_sp(sp_arr)
    for i, (w1, w2, target, custom_sp_sim, norm_array) in enumerate(zip(df["Word 1"], df["Word 2"], target_arr, sp_arr, norm_array)):
        # Add to table
        table.add_row([i+1, w1, w2, target, interval, custom_sp_sim, norm_array])

    # Display the table
    print("\n", table)


def show_sp_correlations(target_arr, sp_arr):
    """
    Show correlation values between target values and SP values.

    Parameters:
    - target_arr: List of target values.
    - sp_arr: List of SP values.

    Returns:
    - None
    """
    # Calculate correlation values
    pearson_corr = pearson_correlation(target_arr, sp_arr)
    spearman_corr = spearman_rank_correlation(target_arr, sp_arr)
    scipy_pearson = stats.pearsonr(target_arr, sp_arr).correlation
    scipy_spearman = stats.spearmanr(target_arr, sp_arr).correlation

    # Set up the table
    table = PrettyTable()

    # Set the field names for the table
    table.field_names = ["", "Pearson Correlation", "Spearman Correlation"]

    # Add rows
    table.add_row(["Custom", pearson_corr, spearman_corr])
    table.add_row(["Scipy", scipy_pearson, scipy_spearman])

    # Return the table as a string
    print(table)


### Demo Function

#### `demo_sp()` Function

This demo function computes shortest path (SP) values, displays a table, and shows correlation values.

- **Description:**
  - Reads the CSV file "WordSim353.csv" into a DataFrame using Pandas.
  - Computes shortest path (SP) values (`sp_arr`) and collects target values (`target_arr`) using the `compute_sp(df)` function.
  - Displays a table with SP values, target values, and other related data using the `display_sp_table(sp_arr, target_arr, df)` function.
  - Calculates and displays correlation values between target values and scaled SP values using the `show_sp_correlations(target_arr, sp_arr)` function.
  - Calculates and displays correlation values between target values and reference SP values using the `show_sp_correlations(target_arr, ref)` function.
  - The SP values are scaled using the `scale_list_sp(sp_arr)` function to normalize them within a specific range.

- **Returns:**
  - None.

#### Execution:

To run the demo, call `demo_sp()`.

```python
demo_sp()


In [8]:
def demo_sp():
    """
    Demo function to compute shortest path (SP) values, display a table, and show correlation values.

    Parameters:
    - None

    Returns:
    - None
    """
    # Read the CSV file into a DataFrame
    df = pd.read_csv("WordSim353.csv")

    # Compute shortest path (SP) values and target values
    sp_arr, target_arr, ref = compute_sp(df)

    # Display the SP table
    display_sp_table(sp_arr, target_arr, df)

    # Show correlation values between target values and SP values
    sp_arr = scale_list_sp(sp_arr)
    # print(sp_arr)
    show_sp_correlations(target_arr, sp_arr)
    print('Stock')
    show_sp_correlations(target_arr, ref)


# Call the demo_sp function to run the demo
demo_sp()



 |  N  |     Word 1    |     Word 2     | Human (mean) |  Range  | SP Value |     Normalized    |
|:---:|:-------------:|:--------------:|:------------:|:-------:|:--------:|:-----------------:|
|  1  |      love     |      sex       |     6.77     | [0, 38] |    37    | 9.736842105263158 |
|  2  |     tiger     |      cat       |     7.35     | [0, 38] |    37    | 9.736842105263158 |
|  3  |     tiger     |     tiger      |     10.0     | [0, 38] |    38    |        10.0       |
|  4  |      book     |     paper      |     7.46     | [0, 38] |    36    | 9.473684210526315 |
|  5  |    computer   |    keyboard    |     7.62     | [0, 38] |    35    | 9.210526315789473 |
|  6  |    computer   |    internet    |     7.58     | [0, 38] |    31    | 8.157894736842104 |
|  7  |     plane     |      car       |     5.77     | [0, 38] |    32    | 8.421052631578947 |
|  8  |     train     |      car       |     6.31     | [0, 38] |    33    |  8.68421052631579 |
|  9  |   telephone   | comm

### Leacock-Chodorow Similarity Calculation Function

#### `max_stock_lc(syn_arr_w1, syn_arr_w2)` Function

This function finds the maximum Leacock-Chodorow Similarity, a measure of semantic similarity, between two lists of WordNet Synsets.

- **Parameters:**
  - `syn_arr_w1, syn_arr_w2` (lists of WordNet Synset objects): Lists of WordNet Synset objects to compare for similarity.

- **Returns:**
  - float: The maximum Leacock-Chodorow Similarity found between any pair of synsets.

#### `leakcock_chodorow(s1_arr, s2_arr)` Function

Calculate the Leacock-Chodorow similarity between two synsets.

- **Args:**
  - `s1_arr` (list): List of synsets from the first array.
  - `s2_arr` (list): List of synsets from the second array.

- **Returns:**
  - float: The Leacock-Chodorow similarity score.

#### `compute_lc(df)` Function

Compute the Leacock-Chodorow Similarity (LCS) for each pair of words in the given DataFrame.

- **Parameters:**
  - `df` (Pandas DataFrame): A DataFrame containing columns "Word 1", "Word 2", and "Human (mean)".

- **Returns:**
  - `lc_arr` (list): A list of LCS values for each pair of words.
  - `target_arr` (list): A list of target values from the "Human (mean)" column.
  - `ref_lc` (list): A list of reference Leacock-Chodorow Similarity values.

#### `display_lc_table(lc_arr, target_arr, df)` Function

Display a table showing the Leacock-Chodorow Similarity (LCS) values, target values, and other information.

- **Parameters:**
  - `lc_arr` (list): A list of LCS values for each pair of words.
  - `target_arr` (list): A list of target values.
  - `df` (Pandas DataFrame): A DataFrame containing columns "Word 1", "Word 2", and "Human (mean)".

- **Returns:**
  - None.

#### `show_lc_correlations(target_arr, lc_arr)` Function

Display a table showing the correlation values between target values and LCS values.

- **Parameters:**
  - `target_arr` (list): A list of target values.
  - `lc_arr` (list): A list of LCS values.

- **Returns:**
  - None.


In [9]:

def max_stock_lc(syn_arr_w1, syn_arr_w2):
    """
    Function to find the maximum  Leacock-Chodorow  Similarity (a measure of semantic similarity)
    between two lists of wordnet Synsets

    Parameters: 
    syn_arr_w1, syn_arr_w2: lists of wordnet Synset objects 

    Returns: 
    The maximum  Leacock-Chodorow  Similarity as a float
    """

    # Initialize empty list to store similarity scores
    similarities = []

    # Calculate Wu-Palmer Similarity for each pair of Synsets, one from each list
    for syn1 in syn_arr_w1:
        for syn2 in syn_arr_w2:
            if syn1.pos() == syn2.pos():
                lc_similarity = syn1.lch_similarity(syn2)
            else : lc_similarity = 0
            similarities.append(lc_similarity)  # Add similarity score to list

    # Return the highest similarity score
    return np.max(similarities)

def leakcock_chodorow(s1_arr, s2_arr):
    """
    Calculate the Leacock-Chodorow similarity between two synsets.

    Args:
        s1: The first synset.
        s2: The second synset.

    Returns:
        float: The Leacock-Chodorow similarity score.
    """
    score_array = []
    for s1 in s1_arr:
        for s2 in s2_arr:

            length = get_len(s1, s2)
            # print(f'DISTANCE BETWEEN {s1} AND {s2} = {length}')
            denominator = 2 * max_depth

            if length == 0:  # Avoid division by zero in log calculation
                length += 1
                denominator += 1

            similarity = 0 - np.log(length / denominator)
            score_array.append(similarity)
    return np.max(score_array)


from nltk.corpus import wordnet as wn

def compute_lc(df):
    """
    Compute the Leacock-Chodorow Similarity (LCS) for each pair of words in the given DataFrame.

    Parameters:
    - df: A pandas DataFrame containing columns "Word 1", "Word 2", and "Human (mean)".

    Returns:
    - lc_arr: A list of LCS values for each pair of words.
    - target_arr: A list of target values from the "Human (mean)" column.
    """
    lc_arr, target_arr, ref_lc = [], [], []

    # Iterate over each row in the DataFrame
    for i in range(len(df)):
        w1, w2 = df["Word 1"][i], df["Word 2"][i]
        syn_arr_w1, syn_arr_w2 = wn.synsets(w1), wn.synsets(w2)

        # Check if both words have synsets
        if len(syn_arr_w1) > 0 and len(syn_arr_w2) > 0:
            target = df["Human (mean)"][i]
            target_arr.append(target)

            # Compute the Leacock-Chodorow Similarity
            custom_lc = leakcock_chodorow(syn_arr_w1, syn_arr_w2)
            lc_arr.append(custom_lc)
            stock_lc = max_stock_lc(syn_arr_w1, syn_arr_w2)
            ref_lc.append(stock_lc)

    return lc_arr, target_arr, ref_lc



def display_lc_table(lc_arr, target_arr, df):
    """
    Display a table showing the Leacock-Chodorow Similarity (LCS) values, target values, and other information.

    Parameters:
    - lc_arr: A list of LCS values for each pair of words.
    - target_arr: A list of target values.
    - df: A pandas DataFrame containing columns "Word 1", "Word 2", and "Human (mean)".

    Returns:
    - None
    """
    interval = [0, np.log(2 * max_depth + 1)]
    table = PrettyTable()
    table.set_style(MARKDOWN)
    table.field_names = ["N", "Word 1", "Word 2", "Human (mean)", "Range", "LC Value", 'Normalized']
    norm_lc = scale_list_lc(lc_arr)
    # Iterate over each row in the DataFrame
    for i, (w1, w2, target, custom_lc_sim, norm_lc) in enumerate(zip(df["Word 1"], df["Word 2"], target_arr, lc_arr, norm_lc)):
        # Add to table
        table.add_row([i+1, w1, w2, target, interval, custom_lc_sim, norm_lc])

    # Display the table
    print("\n", table)


def show_lc_correlations(target_arr, lc_arr):
    """
    Display a table showing the correlation values between target values and LCS values.

    Parameters:
    - target_arr: A list of target values.
    - lc_arr: A list of LCS values.

    Returns:
    - None
    """
    # Calculate correlation values
    pearson_corr = pearson_correlation(target_arr, lc_arr)
    spearman_corr = spearman_rank_correlation(target_arr, lc_arr)
    scipy_pearson = stats.pearsonr(target_arr, lc_arr).correlation
    scipy_spearman = stats.spearmanr(target_arr, lc_arr).correlation

    # Set up the table
    table = PrettyTable()

    # Set the field names for the table
    table.field_names = ["", "Pearson Correlation", "Spearman Correlation"]

    # Add rows
    table.add_row(["Custom", pearson_corr, spearman_corr])
    table.add_row(["Scipy", scipy_pearson, scipy_spearman])

    # Display the table
    print(table)


### Demo Function

#### `demo_lc()` Function

This demo function computes Leacock-Chodorow Similarity (LCS) values, displays a table, and shows correlation values.

- **Description:**
  - Reads the CSV file "WordSim353.csv" into a DataFrame using Pandas.
  - Computes Leacock-Chodorow Similarity (LCS) values (`lc_arr`) and collects target values (`target_arr`) using the `compute_lc(df)` function.
  - Displays a table with LCS values, target values, and other related data (commented out in the code) using the `display_lc_table(lc_arr, target_arr, df)` function.
  - Calculates and displays correlation values between target values and scaled LCS values using the `show_lc_correlations(target_arr, lc_arr)` function.
  - Calculates and displays correlation values between target values and reference LCS values using the `show_lc_correlations(target_arr, ref)` function.
  - The LCS values are scaled using the `scale_list_lc(lc_arr)` function to normalize them within a specific range.

- **Returns:**
  - None.

#### Execution:

To run the demo, call `demo_lc()`.

```python
demo_lc()


In [10]:
def demo_lc():
    """
    Demo function to compute Leacock-Chodorow Similarity (LCS) values, display a table, and show correlation values.

    Parameters:
    - None

    Returns:
    - None
    """
    # Read the CSV file into a DataFrame
    df = pd.read_csv("WordSim353.csv")

    # Compute Leacock-Chodorow Similarity (LCS) values and target values
    lc_arr, target_arr, ref = compute_lc(df)

    # Display the LCS table
    display_lc_table(lc_arr, target_arr, df)

    # Show correlation values between target values and LCS values
    lc_arr = scale_list_lc(lc_arr)
    show_lc_correlations(target_arr, lc_arr)
    print('stock:')
    show_lc_correlations(target_arr, ref)

# Call the demo_lc function to run the demo
demo_lc()



 |  N  |     Word 1    |     Word 2     | Human (mean) |          Range          |      LC Value      |     Normalized     |
|:---:|:-------------:|:--------------:|:------------:|:-----------------------:|:------------------:|:------------------:|
|  1  |      love     |      sex       |     6.77     | [0, 3.6635616461296463] | 3.6375861597263857 | 9.929097722620002  |
|  2  |     tiger     |      cat       |     7.35     | [0, 3.6635616461296463] | 3.6375861597263857 | 9.929097722620002  |
|  3  |     tiger     |     tiger      |     10.0     | [0, 3.6635616461296463] | 3.6635616461296463 |        10.0        |
|  4  |      book     |     paper      |     7.46     | [0, 3.6635616461296463] | 2.9444389791664407 | 8.037094127451303  |
|  5  |    computer   |    keyboard    |     7.62     | [0, 3.6635616461296463] | 2.538973871058276  | 6.930342973048001  |
|  6  |    computer   |    internet    |     7.58     | [0, 3.6635616461296463] | 1.6916760106710724 | 4.617572117172468  |
|  7  