# <center>Clustering Analysis<center>

<p>Team Name: Team Regular
<p>Student Names: Alameen Adeku, Adam Rodi, Adriean Lemoine, Nicholas Burgo

## Instructions
Use generic coding style unless hard-coded values are really necessary.<br>
Your code must be efficient and use self-explanatory naming.<br>
Use appropriate Python library methods for each task instead of using loops.<br>
Run your entire code and save. Then submit this <b>saved</b> copy.

## Imports

In [54]:
import numpy as np
import pandas as pd
import tifffile as tfl

In [59]:
def load_tiff_stacks(directory: str, n: int):
    '''Load tiff stacks as 3D numpy arrays into stack_array.'''
    stack_array = []
    
    for i in range(n):
        stack = tfl.imread(f'{directory}/ZS-{i+1}.tif').astype(float)
        stack_array.append(stack)

    return stack_array

def normalize_all_tiff_stacks(tiff_stack_array):
    normalized_tiff_stacks = []
    
    for i in range(len(tiff_stack_array)):
        normalized_tiff_stacks.append(normalize_tiff_stack(tiff_stack_array[i]))

    return normalized_tiff_stacks


def normalize_tiff_stack(tiff_stack):
    '''Normalize tiff stack in range 0-1.'''
    norm_tiff_stack = tiff_stack / tiff_stack.max()
    return norm_tiff_stack 

def get_all_pixel_tables(tif_stack_array):
    pixel_tables = []

    for i in range(len(tif_stack_array)):
        pixel_tables.append(stack_to_pixel_table(tif_stack_array[i]))

    return pixel_tables

def stack_to_pixel_table(tiff_array):
    depth, height, width = tiff_array.shape

    # Create coordinate grid
    z, y, x = np.meshgrid(
        np.arange(depth),
        np.arange(height),
        np.arange(width),
        indexing = 'ij'
    )

    # Flatten to build table
    pixel_table = np.column_stack((
        z.flatten(), # z coordinate
        y.flatten(), # y coordinate
        x.flatten(), # x coordinate
        tiff_array.flatten() # intensity
    ))

    return pixel_table

def stacks_to_dataframe(stacks_2d, group_label):
    '''Convert list of 2D pixel tables to one DataFrame.'''
    df_all = pd.DataFrame()
    
    for i, pixel_table in enumerate(stacks_2d):
        df = pd.DataFrame(pixel_table, columns=['z', 'y', 'x', 'intensity'])
        df[['z', 'y', 'x']] = df[['z', 'y', 'x']].astype(int) 
        df['fish_id'] = i + 1
        df['group'] = group_label
        df_all = pd.concat([df_all, df], ignore_index=True)
        
    return df_all

def build_group_dataframe(directory: str, n: int, group_label='control'):
    # Load stacks
    stacks = load_tiff_stacks(directory, n)
    
    # Normalize stacks
    stacks_norm = normalize_all_tiff_stacks(stacks)
    
    # Convert to 2D pixel tables
    stacks_2d = get_all_pixel_tables(stacks_norm)
    
    # Build a combined singular Pandas DataFrame
    df = stacks_to_dataframe(stacks_2d, group_label)
    
    return df

## Read Data

### Load Fish Scans

In [60]:
num_in_control = 3
control_df = build_group_dataframe('../Data/zebrafish-data', num_in_control, 'control')

fish_cat1 = []
num_fish_cat1 = 0

fish_cat2 = []
num_fish_cat2 = 0



In [61]:
print(control_df.head())
print()
print(control_df.dtypes)


   z  y  x  intensity  fish_id    group
0  0  0  0   0.024174        1  control
1  0  0  1   0.021707        1  control
2  0  0  2   0.024174        1  control
3  0  0  3   0.025160        1  control
4  0  0  4   0.025160        1  control

z              int64
y              int64
x              int64
intensity    float64
fish_id        int64
group         object
dtype: object


**To still do in this section:**
- Generate texture data for each pixel
- Attach texture data to each sample (pixel)
- Generate gradient data for each pixel
- Attach gradient data to each sample (pixel)

## Visual Exploration of Data

### Histograms

### Distributions

### Box-Whisker Plots

### Violin Plots

## Data Quality & Cleaning

Instruction: Add a comment for each method

## Handling Redundancy

### X-square Test

### Correlation Analysis

### Visual Exploration (scatter-plot matrix)

## Dimensionality Reduction

### PCA

## Discretization

### Histogram of Discretized Attribute

### X-square Test of Discretized Attributes

### Visual Exploration (scatter-plot matrix) of Discretized Attributes

## Feature Selection/Generation

### Select Features

### Generate Features

# Generate Clusters

## K-means

## Hierarchical

# Evaluation of Clusters

See instructions provided in the report template

## <center> REFERENCES </center>
List resources (book, internet page, etc.) that you used to complete this challenge.

- https://numpy.org/doc/2.3/index.html
- https://pandas.pydata.org/docs/
- http://pypi.org/project/tifffile/