# Full Data Visualization of the Prakriti Dataset

**Aim:** To replicate all visualization types from the provided PDF using the `Prakriti_With_Features.csv` dataset.

This notebook implements every plot type from the reference material. Because the Prakriti dataset is primarily categorical, we will create **synthetic numerical scores** to generate plots that require numerical data, such as histograms, scatter plots, and heatmaps. The assumptions made during this conversion are explained in the relevant sections.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

# Suppress future warnings for cleaner output
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

print("Libraries imported successfully!")

In [None]:
# Load data from the provided CSV file.
try:
    df = pd.read_csv('Prakriti_With_Features.csv')
    print("Dataset 'Prakriti_With_Features.csv' loaded successfully.")
except FileNotFoundError:
    print("Error: 'Prakriti_With_Features.csv' not found. Please ensure the file is in the correct directory.")
    df = None

# Set the default style for all plots
if df is not None:
    sns.set(style="whitegrid")

    # --- Data Exploration ---
    print("\n--- Dataset Information ---")
    print("\nHead of the Dataset:")
    display(df.head())
    print(f"\nShape of the Dataset: {df.shape}")

### Creating Synthetic Numerical Data for Visualization

To create plots like histograms, scatter plots, and heatmaps, we need numerical data. We will engineer two synthetic scores: a **`Body Metric Score`** and a **`Lifestyle Score`**.

**1. Body Metric Score:**
- **Body Size**: `Slim`=1, `Medium`=2, `Large`=3
- **Body Weight**: `Low`=1, `Moderate`=2, `Heavy`=3
- **Height**: `Short`=1, `Average`=2, `Tall`=3

**2. Lifestyle Score:**
- **Physical Activity Level**: `sedentary`=1, `moderate`=2, `high`=3
- **Water Intake**: `low`=1, `moderate`=2, `high`=3
- **Sleep Patterns**: `short`=1, `moderate`=2, `long`=3

We will also use **Label Encoding** to convert individual categorical columns into numbers for the Pair Plot and Heatmap.

In [None]:
if df is not None:
    # --- Create Mappings ---
    body_size_map = {'Slim': 1, 'Medium': 2, 'Large': 3}
    # Simplified mapping for body weight
    weight_map = {
        'Low - difficulties in gaining weight': 1,
        'Moderate - no difficulties in gaining or losing weight': 2,
        'Heavy - difficulties in losing weight': 3
    }
    height_map = {'Short': 1, 'Average': 2, 'Tall': 3}
    activity_map = {'sedentary': 1, 'moderate': 2, 'high': 3}
    water_map = {'low': 1, 'moderate': 2, 'high': 3}
    sleep_map = {'short': 1, 'moderate': 2, 'long': 3}

    # --- Apply Mappings ---
    df['body_size_score'] = df['Body Size'].map(body_size_map).fillna(0)
    df['weight_score'] = df['Body Weight'].map(weight_map).fillna(0)
    df['height_score'] = df['Height'].map(height_map).fillna(0)
    df['activity_score'] = df['Physical Activity Level'].map(activity_map).fillna(0)
    df['water_score'] = df['Water Intake'].map(water_map).fillna(0)
    df['sleep_score'] = df['Sleep Patterns'].map(sleep_map).fillna(0)

    # --- Create Composite Scores ---
    df['Body Metric Score'] = df['body_size_score'] + df['weight_score'] + df['height_score']
    df['Lifestyle Score'] = df['activity_score'] + df['water_score'] + df['sleep_score']

    # --- Create Label Encoded Columns for Heatmap/Pairplot ---
    le = LabelEncoder()
    df['dosha_encoded'] = le.fit_transform(df['Dosha'])

    print("Numerical scores and encoded columns created successfully.")
    display(df[['Dosha', 'Body Metric Score', 'Lifestyle Score', 'dosha_encoded']].head())

## Visualization 1 & 2: Line Plots

**Note:** Line plots are best for showing trends over a continuous variable, like time. Since our dataset lacks this, a line plot can be misleading. A bar chart is more appropriate. However, to fulfill the request, we will generate a line plot over the dataset's index. **This does not represent a meaningful trend.**

In [None]:
# --- 1. Line Plot of Body Metric Score ---
# This plot is for demonstration only and does not show a meaningful trend.
print("Generating Line Plot (Demonstration)...")
plt.figure(figsize=(12, 6))
# We group by an arbitrary window of 50 samples and calculate the mean to smooth the line
df_grouped_mean = df['Body Metric Score'].rolling(window=50).mean()
plt.plot(df_grouped_mean)
plt.title('Line Plot: Rolling Average of Body Metric Score (Demonstration Only)', fontsize=16)
plt.xlabel('Index (Not Time)', fontsize=12)
plt.ylabel('Average Body Metric Score', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

# --- 2. Multi-line Plot by Dosha Type ---
# Again, this is for demonstration. It shows the rolling average score for each Dosha type.
print("\nGenerating Multi-line Plot (Demonstration)...")
plt.figure(figsize=(12, 6))
for dosha_type in df['Dosha'].unique():
    subset = df[df['Dosha'] == dosha_type]
    rolling_mean = subset['Body Metric Score'].rolling(window=20, min_periods=1).mean()
    plt.plot(rolling_mean.index, rolling_mean, label=dosha_type)

plt.title('Multi-line Plot: Rolling Average of Body Metric Score by Dosha (Demonstration Only)', fontsize=16)
plt.xlabel('Index (Not Time)', fontsize=12)
plt.ylabel('Average Body Metric Score', fontsize=12)
plt.legend(title='Dosha Type')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

## Visualization 3, 4, 8, 9: Bar, Pie, and Donut Plots

These plots are well-suited for categorical data and do not require numerical conversion. They help visualize counts and proportions.

In [None]:
# --- 3. Bar Plot ---
print("Generating Bar Plot for Body Size Counts...")
plt.figure(figsize=(10, 6))
body_size_counts = df['Body Size'].value_counts()
colors = ['skyblue', 'salmon', 'lightgreen']
plt.bar(body_size_counts.index, body_size_counts.values, width=0.6, color=colors)
plt.title('Bar Plot: Body Size Counts', fontsize=16)
plt.xlabel('Body Size', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.tight_layout()
plt.show()



In [None]:
# --- 4. Horizontal Bar Plot ---
print("\nGenerating Horizontal Bar Plot for Bone Structure Counts...")
plt.figure(figsize=(10, 8))
bone_structure_counts = df['Bone Structure'].value_counts()
bone_structure_counts.sort_values().plot.barh(color='teal')
plt.title('Horizontal Bar Plot: Bone Structure by Count', fontsize=16)
plt.xlabel('Number of Individuals', fontsize=12)
plt.ylabel('Bone Structure Type', fontsize=12)
plt.tight_layout()
plt.show()



In [None]:
# --- 8. Pie Chart ---
print("\nGenerating Pie Chart for Skin Sensitivity...")
plt.figure(figsize=(8, 8))
skin_sensitivity_counts = df['Skin Sensitivity'].value_counts()
skin_sensitivity_counts.plot.pie(autopct='%1.1f%%', startangle=90, colors=['lightcoral', 'skyblue', 'lightgreen'], explode=[0.05, 0, 0])
plt.title('Pie Chart: Proportion of Skin Sensitivity Types', fontsize=16)
plt.ylabel('')
plt.tight_layout()
plt.show()


In [None]:

# --- 9. Donut Plot ---
print("\nGenerating Donut Plot for Dietary Habits...")
plt.figure(figsize=(8, 8))
diet_counts = df['Dietary Habits'].value_counts()
colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99']
explode = [0.05 if i == 0 else 0 for i in range(len(diet_counts))]
plt.pie(diet_counts, labels=diet_counts.index, autopct='%1.1f%%', startangle=90, wedgeprops=dict(width=0.4, edgecolor='w'), colors=colors, explode=explode)
plt.title('Donut Plot: Proportion of Dietary Habits', fontsize=16)
plt.ylabel('')
plt.show()

## Visualization 5 & 6: Grouped and Stacked Bar Plots

These plots are used to visualize the relationship between two categorical variables.

In [None]:
# --- 5. Grouped Bar Plot ---
print("Generating Grouped Bar Plot for Dosha by Dietary Habits...")
plt.figure(figsize=(12, 7))
sns.countplot(data=df, x='Dosha', hue='Dietary Habits', order=df['Dosha'].value_counts().index)
plt.title('Grouped Bar Plot: Dosha by Dietary Habits', fontsize=16)
plt.xlabel('Dosha Type', fontsize=12)
plt.ylabel('Number of Individuals', fontsize=12)
plt.legend(title='Dietary Habits')
plt.tight_layout()
plt.show()

In [None]:

# --- 6. Stacked Bar Plot ---
print("\nGenerating Stacked Bar Plot for Dosha and Dietary Habits...")
dosha_diet_counts = df.groupby(['Dosha', 'Dietary Habits']).size().unstack(fill_value=0)
dosha_diet_counts.plot(kind='bar', stacked=True, figsize=(12, 7), colormap='viridis')
plt.title('Stacked Bar Plot: Dietary Habits within Each Dosha Type', fontsize=16)
plt.xlabel('Dosha Type', fontsize=12)
plt.ylabel('Number of Individuals', fontsize=12)
plt.xticks(rotation=45)
plt.legend(title='Dietary Habits')
plt.tight_layout()
plt.show()

## Visualization 5, 6, 8, 9: Plots for Numerical Distributions

Now that we have created the synthetic `Lifestyle Score` and `Body Metric Score`, we can use plots designed for numerical data distributions.

- **Histogram**: Shows the frequency distribution of a single numerical variable.
- **Density Plot**: A smoothed version of a histogram.
- **Box Plot**: Shows the median, quartiles, and outliers of a numerical variable across different categories.
- **Violin Plot**: Combines a box plot with a density plot.

In [None]:
# --- Histogram of Lifestyle Score ---
print("Generating Histogram for the synthetic 'Lifestyle Score'...")
plt.figure(figsize=(10, 6))
sns.histplot(df['Lifestyle Score'], bins=15, kde=True, color='purple')
plt.title('Histogram: Distribution of Synthetic Lifestyle Score', fontsize=16)
plt.xlabel('Lifestyle Score (1-9)', fontsize=12)
plt.ylabel('Frequency (Number of Individuals)', fontsize=12)
plt.tight_layout()
plt.show()


In [None]:

# --- Density Plot of Lifestyle Score by Dosha ---
print("\nGenerating Density Plot of Lifestyle Score by Dosha...")
plt.figure(figsize=(12, 7))
sns.kdeplot(data=df, x='Lifestyle Score', hue='Dosha', fill=True, common_norm=False)
plt.title('Density Plot: Lifestyle Score Distribution by Dosha Type', fontsize=16)
plt.xlabel('Lifestyle Score', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.legend(title='Dosha Type')
plt.tight_layout()
plt.show()


In [None]:

# --- Box Plot of Body Metric Score by Dosha ---
print("\nGenerating Box Plot of Body Metric Score by Dosha...")
plt.figure(figsize=(12, 7))
sns.boxplot(data=df, x='Dosha', y='Body Metric Score', palette='Set2', order=df['Dosha'].value_counts().index)
plt.title('Box Plot: Body Metric Score by Dosha Type', fontsize=16)
plt.xlabel('Dosha Type', fontsize=12)
plt.ylabel('Body Metric Score (1-9)', fontsize=12)
plt.tight_layout()
plt.show()


In [None]:

# --- Violin Plot of Lifestyle Score by Dosha ---
print("\nGenerating Violin Plot of Lifestyle Score by Dosha...")
plt.figure(figsize=(12, 7))
sns.violinplot(data=df, x='Dosha', y='Lifestyle Score', palette='Set2', inner='quartile', order=df['Dosha'].value_counts().index)
plt.title('Violin Plot: Lifestyle Score Distribution by Dosha Type', fontsize=16)
plt.xlabel('Dosha Type', fontsize=12)
plt.ylabel('Lifestyle Score (1-9)', fontsize=12)
plt.tight_layout()
plt.show()

## Visualization 6, 7, 10: Plots for Relationships and Correlations

These plots help visualize the relationships between two or more variables.

- **Scatter Plot**: Shows the relationship between two numerical variables.
- **Heatmap**: Visualizes a correlation matrix, showing how numerical variables relate to each other.
- **Pair Plot**: Creates a grid of scatter plots for multiple numerical variables.

**Reminder:** These plots are based on our **synthetically created numerical scores**.

In [None]:
# --- Scatter Plot: Body Metric Score vs. Lifestyle Score ---
print("Generating Scatter Plot of Body Metric Score vs. Lifestyle Score...")
plt.figure(figsize=(10, 7))
sns.scatterplot(data=df, x='Body Metric Score', y='Lifestyle Score', hue='Dosha', alpha=0.6, palette='viridis')
plt.title('Scatter Plot: Body Metric Score vs. Lifestyle Score by Dosha', fontsize=16)
plt.xlabel('Body Metric Score (Higher = Larger Body)', fontsize=12)
plt.ylabel('Lifestyle Score (Higher = More Active/Healthy Habits)', fontsize=12)
plt.legend(title='Dosha')
plt.tight_layout()
plt.show()

In [None]:
# --- Heatmap: Correlation Matrix of Encoded Features ---
print("\nGenerating Heatmap of Correlations...")
# Select the numerical and encoded columns for the heatmap
heatmap_cols = ['body_size_score', 'weight_score', 'height_score', 'activity_score', 'water_score', 'sleep_score', 'dosha_encoded']
corr = df[heatmap_cols].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr, cmap='coolwarm', annot=True, fmt=".2f", linewidths=.5, cbar_kws={'label': 'Correlation Coefficient'})
plt.title('Heatmap: Correlation Matrix of Encoded Metrics', fontsize=16)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# --- Pair Plot: Key Encoded Metrics ---
print("\nGenerating Pair Plot...")
# Select a subset of columns for the pairplot to keep it readable
pairplot_cols = ['Body Metric Score', 'Lifestyle Score', 'dosha_encoded']
# We map the encoded dosha back to labels for the legend
pairplot_df = df[pairplot_cols + ['Dosha']].copy()

g = sns.pairplot(pairplot_df, hue='Dosha', palette='viridis', diag_kind='kde')
g.fig.suptitle('Pair Plots of Key Synthetic Metrics', y=1.02, fontsize=16)
plt.tight_layout()
plt.show()