### [Accelerometer-Based-Alcohol-Consumption-Analysis](https://github.com/Aniruthan-0709/Accelerometer-Based-Alcohol-Consumption-Analysis/tree/main)


In this project, I delved into the realm of analyzing accelerometer data from smartphones to identify patterns associated with heavy drinking episodes. By harnessing advanced computational techniques to calculate permutation entropy and complexity, I aimed to unveil subtle nuances in motion data that could signal variations in alcohol consumption levels.

### Objective
The primary goal was to develop a robust analytical tool that leverages state-of-the-art algorithms for processing accelerometer readings. This tool is designed to differentiate between sober and heavy drinking states, offering insights into an individual's drinking behaviors and facilitating early intervention when necessary.

### Methodology
The project unfolded in several key phases:

1. **Data Acquisition and Preprocessing**: Initially, I curated a comprehensive dataset by concatenating all TAC readings into a single dataframee and merging accelerometer data with transdermal alcohol content (TAC) readings after time conversion to seconds from milliseconds. This preparatory step ensured a rich dataset that captures both motion dynamics and quantifiable alcohol intake levels.

2. **Exploratory Data Analysis (EDA)**: Utilizing visualization techniques, I explored the accelerometer data across three axes in conjunction with TAC values for each participant. This exploration was instrumental in understanding the data's structure and laying the groundwork for subsequent analysis.

3. **Feature Extraction and Analysis**: At the core of my analytical endeavor was the computation of permutation entropy and complexity. I employed a sophisticated module to calculate these metrics, which measure the disorder and structural complexity within time series data. This approach provided a nuanced understanding of motion patterns in relation to alcohol consumption.

4. **Results Aggregation and Visualization**: By aggregating the calculated metrics and correlating them with TAC levels, I could discern patterns indicative of heavy drinking. The results were visualized through scatter plots, effectively illustrating the relationship between motion complexity and alcohol intake.

**Step 1**

The project 2 folder is downloaded from canvas and uploaded to my google drive. The necessary libraries are installed.

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, Dropdown
from IPython.display import display,clear_output
from scipy.stats import entropy
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [None]:
# loading all the datasets from the drive to dataframes
acc=pd.read_csv('/content/drive/MyDrive/FDA/Project 2/data/all_accelerometer_data_pids_13.csv')
phone=pd.read_csv('/content/drive/MyDrive/FDA/Project 2/data/phone_types.csv')

In [None]:
acc

Unnamed: 0,time,pid,x,y,z
0,0,JB3156,0.000000,0.000000,0.000000
1,0,CC6740,0.000000,0.000000,0.000000
2,1493733882409,SA0297,0.075800,0.027300,-0.010200
3,1493733882455,SA0297,-0.035900,0.079400,0.003700
4,1493733882500,SA0297,-0.242700,-0.086100,-0.016300
...,...,...,...,...,...
14057562,1493829248196,CC6740,-0.133956,0.124726,-0.010736
14057563,1493829248220,CC6740,-0.100764,0.180872,0.046449
14057564,1493829248245,CC6740,-0.131853,0.195934,0.181088
14057565,1493829248270,CC6740,-0.149704,0.194482,0.202393


**Step 2**

The 13 TAC reading files are concatenated to one dataframe.

In [None]:
#loading all the TAC files into a single dataframe along with their PIDs
concatenated_df = pd.concat(
    (pd.read_csv(f'/content/drive/MyDrive/FDA/Project 2/data/clean_tac/{y}_clean_TAC.csv').assign(pid=y) for y in phone['pid']),
    axis=0
)

In [None]:
concatenated_df.to_csv('/content/drive/MyDrive/FDA/Project 2/data/test.csv')
# Storing the concatenated data as a csv for verification
concatenated_df['pid'].nunique()
#verification

13

In [None]:
#converting accelerometer time readings from milliseconds to seconds.
acc['time']=acc['time']//1000

**Step 3**

The concatenated TAC datasets and the accelerometer dataset are merged into a single df.

In [None]:
# performing an inner join of the acceleormeter readings and the concatenated TAC readings
df_merged=pd.merge(acc,concatenated_df,left_on=['pid','time'],right_on=['pid','timestamp'])

**Step 4**

1) Plotting the sampled x,y,z and TAC values for all the users. The graphs for the selected pid is displayed.

2) Storing the sampled values of x,y,z and TAC values into a dictionary where there are 13 keys, one for each pid.

In [None]:
# Get unique PIDs
unique_pids = df_merged['pid'].unique()
# Function to plot data based on selected PID
def plot_data(pid):
    # Clear previous output
    clear_output(wait=True)

    # Get the selected PID data
    pid_data = df_merged[df_merged['pid'] == pid]
    time = np.arange(pid_data['y'].size) / 40  #sampling the data to rectify the frequency mismatch in TAC and accelerometer readings

    # Create new figure and subplots
    fig, ax = plt.subplots(4, figsize=(15, 8))

    # Plot accelerometer data
    ax[0].plot(time, pid_data['x'], label='X-axis')
    ax[1].plot(time, pid_data['y'], label='Y-axis')
    ax[2].plot(time, pid_data['z'], label='Z-axis')

    # Plot TAC readings
    ax[3].plot(time, pid_data['TAC_Reading'], label='TAC Reading', color='g')

    # Add horizontal line at TAC = 0.08 to indicate the legal intoxication limit
    ax[3].axhline(y=0.08, color='r', linestyle='--')

    # Add legend to each subplot
    for i in range(4):
        ax[i].legend()

    # Show the plot
    plt.xlabel("Time in seconds")
    plt.tight_layout()
    plt.show()



# Create interactive dropdown widget for selecting PID
interact(plot_data, pid=Dropdown(options=unique_pids, description='PID:'))



interactive(children=(Dropdown(description='PID:', options=('SA0297', 'PC6771', 'BK7610', 'JR8022', 'JB3156', …

In [None]:
filtered_dict={}
for i in df_merged['pid'].unique():
  filtered=df_merged[df_merged['pid']==i]

  time=np.arange(filtered['x'].size)/40
  filtered_data = {
        'time': time,
        'x': filtered['x'].values,
        'y': filtered['y'].values,
        'z': filtered['z'].values,
        'TAC': filtered['TAC_Reading'].values
    }

    # Add the DataFrame to the dictionary
  filtered_dict[i] = pd.DataFrame(filtered_data)


**Step 4**

1)Calculating complexity and entropy for each user and the sampled data stored in the filtered_dict. The final dataframe is has the columns: pid, x_entropy, x_complexity, y_entropy, y_complexity, z_entropy, z_complexity, and TAC.

2)Plotting a line graph between {axis}_entropy and {axis}_complexity with TAC values for the selected pid.

3)Plotting a scatterplot between x_entropy and y_complexity with TAC values. The size of the points determine the TAC value and the color indicates whether its above(Red) or below 0.08(Green).

In [None]:
def s_entropy(freq_list):
    ''' This function computes the shannon entropy of a given frequency distribution.
    USAGE: shannon_entropy(freq_list)
    ARGS: freq_list = Numeric vector representing the frequency distribution
    OUTPUT: A numeric value representing shannon's entropy'''
    freq_list = [element for element in freq_list if element != 0]
    sh_entropy = 0.0
    for freq in freq_list:
        sh_entropy += freq * np.log(freq)
    sh_entropy = -sh_entropy
    return(sh_entropy)

def ordinal_patterns(ts, embdim, embdelay):
    ''' This function computes the ordinal patterns of a time series for a given embedding dimension and embedding delay.
    USAGE: ordinal_patterns(ts, embdim, embdelay)
    ARGS: ts = Numeric vector representing the time series, embdim = embedding dimension (3<=embdim<=7 prefered range), embdelay =  embdding delay
    OUPTUT: A numeric vector representing frequencies of ordinal patterns'''
    m, t = embdim, embdelay
    x = ts if isinstance(ts, np.ndarray) else np.array(ts)

    tmp = np.zeros((x.shape[0], m))
    for i in range(m):
        tmp[:, i] = np.roll(x, i*t)
    partition = tmp[(t*(m-1)):, :]
    permutation = np.argsort(partition)
    idx = _hash(permutation)

    counts = np.zeros(np.math.factorial(m))
    for i in range(counts.shape[0]):
        counts[i] = (idx == i).sum()
    return list(counts[counts != 0].astype(int))

def _hash(x):
    m, n = x.shape
    if n == 1:
        return np.zeros(m)
    return np.sum(np.apply_along_axis(lambda y: y < x[:, 0], 0, x), axis=1) * np.math.factorial(n-1) + _hash(x[:, 1:])


def p_entropy(op):
    ordinal_pat = op
    max_entropy = np.log(len(ordinal_pat))
    p = np.divide(np.array(ordinal_pat), float(sum(ordinal_pat)))
    return(s_entropy(p)/max_entropy)

def complexity(op):
    ''' This function computes the complexity of a time series defined as: Comp_JS = Q_o * JSdivergence * pe
    Q_o = Normalizing constant
    JSdivergence = Jensen-Shannon divergence
    pe = permutation entopry
    ARGS: ordinal pattern'''
    pe = p_entropy(op)
    constant1 = (0.5+((1 - 0.5)/len(op)))* np.log(0.5+((1 - 0.5)/len(op)))
    constant2 = ((1 - 0.5)/len(op))*np.log((1 - 0.5)/len(op))*(len(op) - 1)
    constant3 = 0.5*np.log(len(op))
    Q_o = -1/(constant1+constant2+constant3)

    temp_op_prob = np.divide(op, sum(op))
    temp_op_prob2 = (0.5*temp_op_prob)+(0.5*(1/len(op)))
    JSdivergence = (s_entropy(temp_op_prob2) - 0.5 * s_entropy(temp_op_prob) - 0.5 * np.log(len(op)))
    Comp_JS = Q_o * JSdivergence * pe
    return(Comp_JS)

def weighted_ordinal_patterns(ts, embdim, embdelay):
    m, t = embdim, embdelay
    x = ts if isinstance(ts, np.ndarray) else np.array(ts)

    tmp = np.zeros((x.shape[0], m))
    for i in range(m):
        tmp[:, i] = np.roll(x, i*t)
    partition = tmp[(t*(m-1)):, :]
    xm = np.mean(partition, axis=1)
    weight = np.mean((partition - xm[:, None])**2, axis=1)
    permutation = np.argsort(partition)
    idx = _hash(permutation)
    counts = np.zeros(np.math.factorial(m))
    for i in range(counts.shape[0]):
        counts[i] = sum(weight[i == idx])

    return list(counts[counts != 0])

results = []

for pid, df in filtered_dict.items():
    unique_tac_values = df['TAC'].unique()

    for tac_value in unique_tac_values:
        segment = df[df['TAC'] == tac_value]

        for axis in ['x', 'y', 'z']:
            # Use ordinal_patterns to get the patterns
            op = ordinal_patterns(segment[axis].values, embdim=3, embdelay=1)

            # Calculate permutation entropy using the new logic
            pe = p_entropy(op)

            # Calculate complexity using the new logic
            comp = complexity(op)

            # Append the results
            results.append({
                'pid': pid,
                f'{axis}_entropy': pe,
                f'{axis}_complexity': comp,
                'TAC': tac_value
            })

result_df = pd.DataFrame(results)
result_df = result_df[['pid', 'x_entropy', 'x_complexity', 'y_entropy', 'y_complexity', 'z_entropy', 'z_complexity', 'TAC']]


  return np.sum(np.apply_along_axis(lambda y: y < x[:, 0], 0, x), axis=1) * np.math.factorial(n-1) + _hash(x[:, 1:])
  counts = np.zeros(np.math.factorial(m))
  return(s_entropy(p)/max_entropy)
  Q_o = -1/(constant1+constant2+constant3)
  Comp_JS = Q_o * JSdivergence * pe


In [None]:
result_df
# this has 486 rows since each row only has one axis value for each pid and TAC value

Unnamed: 0,pid,x_entropy,x_complexity,y_entropy,y_complexity,z_entropy,z_complexity,TAC
0,SA0297,0.937429,0.062262,,,,,0.032672
1,SA0297,,,0.939361,0.055267,,,0.032672
2,SA0297,,,,,0.929069,0.061885,0.032672
3,SA0297,0.957442,0.040166,,,,,0.182644
4,SA0297,,,0.939857,0.061235,,,0.182644
...,...,...,...,...,...,...,...,...
481,DC6359,,,0.989990,0.009645,,,0.050705
482,DC6359,,,,,0.962532,0.036533,0.050705
483,DC6359,0.908885,0.083686,,,,,0.068986
484,DC6359,,,0.890672,0.088809,,,0.068986


In [None]:
result_df.fillna(0,inplace=True)
# fill the NaN will zeros.

In [None]:
# Group the DataFrame by 'pid' and 'TAC', summing entropy and complexity for each axis.
result_df = result_df.groupby(['pid', 'TAC']).agg({
    'x_entropy': 'sum',
    'x_complexity': 'sum',
    'y_entropy': 'sum',
    'y_complexity': 'sum',
    'z_entropy': 'sum',
    'z_complexity': 'sum'
}).reset_index() # Reset index to keep DataFrame format


In [None]:
result_df
#486/3= 162 rows, only one row with all the axis complexity and entropy and the corresponding TAc values

Unnamed: 0,pid,TAC,x_entropy,x_complexity,y_entropy,y_complexity,z_entropy,z_complexity
0,BK7610,0.041689,0.771728,0.176715,0.701939,0.212403,0.768767,0.180278
1,BK7610,0.046559,0.967383,0.031972,0.977528,0.021814,0.960643,0.037884
2,BK7610,0.050424,0.912898,0.080056,0.910012,0.077904,0.932597,0.061309
3,BK7610,0.065357,0.900914,0.086503,0.875105,0.108894,0.961033,0.037932
4,BK7610,0.065953,0.892937,0.094573,0.950350,0.046908,0.986392,0.013538
...,...,...,...,...,...,...,...,...
157,SF3079,0.107148,0.905814,0.082710,0.982092,0.017516,0.952369,0.047392
158,SF3079,0.130305,0.965278,0.033813,0.976511,0.022224,0.959477,0.037484
159,SF3079,0.135561,0.901746,0.089102,0.911236,0.080826,0.904784,0.086394
160,SF3079,0.159026,0.798506,0.162395,0.802287,0.156495,0.881665,0.100869


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, Dropdown
from IPython.display import clear_output

# Get unique PIDs (You will need to replace result_df with your actual dataframe)
unique_pids = result_df['pid'].unique()

def plot_entropy_complexity_for_pid(pid):
    # Clear previous output
    clear_output(wait=True)

    # Filter data for the selected PID (You will need to replace result_df with your actual dataframe)
    pid_data = result_df[result_df['pid'] == pid]

    # Plotting each metric (x, y, z) vs TAC separately
    fig, axes = plt.subplots(3, 2, figsize=(15, 20))

    # Plot for X Entropy vs TAC
    axes[0, 0].plot(pid_data['TAC'], pid_data['x_entropy'], marker='o', linestyle='-', color='blue')
    axes[0, 0].set_title('X Entropy vs TAC')
    axes[0, 0].set_xlabel('TAC')
    axes[0, 0].set_ylabel('X Entropy')
    axes[0, 0].grid(True)

    # Plot for Y Entropy vs TAC
    axes[1, 0].plot(pid_data['TAC'], pid_data['y_entropy'], marker='o', linestyle='-', color='green')
    axes[1, 0].set_title('Y Entropy vs TAC')
    axes[1, 0].set_xlabel('TAC')
    axes[1, 0].set_ylabel('Y Entropy')
    axes[1, 0].grid(True)

    # Plot for Z Entropy vs TAC
    axes[2, 0].plot(pid_data['TAC'], pid_data['z_entropy'], marker='o', linestyle='-', color='red')
    axes[2, 0].set_title('Z Entropy vs TAC')
    axes[2, 0].set_xlabel('TAC')
    axes[2, 0].set_ylabel('Z Entropy')
    axes[2, 0].grid(True)

    # Plot for X Complexity vs TAC
    axes[0, 1].plot(pid_data['TAC'], pid_data['x_complexity'], marker='o', linestyle='-', color='blue')
    axes[0, 1].set_title('X Complexity vs TAC')
    axes[0, 1].set_xlabel('TAC')
    axes[0, 1].set_ylabel('X Complexity')
    axes[0, 1].grid(True)

    # Plot for Y Complexity vs TAC
    axes[1, 1].plot(pid_data['TAC'], pid_data['y_complexity'], marker='o', linestyle='-', color='green')
    axes[1, 1].set_title('Y Complexity vs TAC')
    axes[1, 1].set_xlabel('TAC')
    axes[1, 1].set_ylabel('Y Complexity')
    axes[1, 1].grid(True)

    # Plot for Z Complexity vs TAC
    axes[2, 1].plot(pid_data['TAC'], pid_data['z_complexity'], marker='o', linestyle='-', color='red')
    axes[2, 1].set_title('Z Complexity vs TAC')
    axes[2, 1].set_xlabel('TAC')
    axes[2, 1].set_ylabel('Z Complexity')
    axes[2, 1].grid(True)

    plt.tight_layout()
    plt.show()

# Create interactive dropdown widget for selecting PID
interact(plot_entropy_complexity_for_pid, pid=Dropdown(options=unique_pids, description='Select PID:'))


interactive(children=(Dropdown(description='Select PID:', options=('BK7610', 'BU4707', 'CC6740', 'DC6359', 'DK…

In [None]:
import ipywidgets as widgets


# Assuming result_df is already loaded and processed as per your requirements

def plot_data(pid):
    # Filter the data for the selected PID
    result = result_df if pid == 'All' else result_df[result_df['pid'] == pid]

    # Use TAC values for point sizes (using an arbitrary scale for visualization)
    point_sizes = (result['TAC'] / result['TAC'].max()) * 1000

    # Determine point colors based on TAC values
    colors = np.where(result['TAC'] >= 0.08, 'red', 'green')

    # Plotting
    plt.figure(figsize=(10, 6))
    plt.scatter(result['x_entropy'], result['y_complexity'], s=point_sizes, c=colors, alpha=0.6)
    plt.title(f'Scatter Plot for PID: {pid}')
    plt.xlabel('X_Entropy')
    plt.ylabel('Y_Complexity')
    plt.grid(True)
    plt.show()

# Add an "All" option to view all PIDs
unique_pids_with_all = np.append(['All'], result_df['pid'].unique())

# Create a dropdown widget for selecting PIDs, including an "All" option
pid_dropdown = widgets.Dropdown(options=unique_pids_with_all, description='PID:', value='All')

# Display the widget and bind the interactive plot function
widgets.interactive(plot_data, pid=pid_dropdown)

interactive(children=(Dropdown(description='PID:', options=('All', 'BK7610', 'BU4707', 'CC6740', 'DC6359', 'DK…

### Key Insights
**Variability Across Axes**
- Entropy and complexity vary across the x, y, and z axes for identical TAC levels, indicating that alcohol's impact on motion is direction-dependent.

**Relationship Between TAC Levels and Motion Complexity**
- Observations show that with increasing TAC levels, movement complexity can either increase or decrease, suggesting alcohol's differential impact on movement predictability.

**Insight into the Dynamics of Movement**
- Significant differences in entropy and complexity at similar TAC levels across axes highlight alcohol's multi-dimensional effect on movement.

**Potential for Predictive Modeling**
- The metrics offer a dataset for developing models to predict TAC levels from movement complexity and entropy, useful for real-time monitoring and interventions.

**Implications for Intervention Strategies**
- Analyzing movement patterns related to heavy drinking can inform proactive intervention strategies to mitigate alcohol-related harm.

### Conclusion

In this project, I explored how alcohol consumption affects human movement by analyzing accelerometer data with advanced entropy and complexity measures. The analysis revealed clear patterns that vary with alcohol levels, showing promise for developing models to predict alcohol consumption based on movement. This work opens new doors for using technology to monitor and address excessive drinking, highlighting the value of sophisticated data analysis in health and behavior studies.

