Perform the following operations using Python on Hepatitis Dataset.  
a. Create data subsets for different sex.  
b. Merge two subsets  
c. Sort Data using age, SGOT, PROTIME.  
d. Transposing Data  
e. Melting Data to long format  
f. Casting data to wide format

In [4]:
import pandas as pd

# Load the dataset
df = pd.read_csv('../files/hepatitis_csv.csv')  # Replace with your actual file name

# (a) Create data subsets for different sex
male_df = df[df['sex'] == 'male']
female_df = df[df['sex'] == 'female']

print("Male Subset:\n", male_df.head())
print("\nFemale Subset:\n", female_df.head())

# (b) Merge two subsets (e.g., vertically concatenate)
merged_df = pd.concat([male_df, female_df], axis=0)
print("\nMerged DataFrame:\n", merged_df.head())

# (c) Sort data using 'age', 'sgot', 'protime'
# Ensure missing values do not cause issues
sorted_df = df.sort_values(by=['age', 'sgot', 'protime'], na_position='last')
print("\nSorted DataFrame:\n", sorted_df.head())

# (d) Transpose the data
transposed_df = df.transpose()
print("\nTransposed DataFrame:\n", transposed_df.head())

# (e) Melting data to long format (e.g., for a few columns)
melted_df = pd.melt(df, id_vars=['age', 'sex'], value_vars=['bilirubin', 'sgot', 'albumin'])
print("\nMelted DataFrame (long format):\n", melted_df.head())

# (f) Casting data back to wide format (pivot)
cast_df = melted_df.pivot_table(
    index=['age', 'sex'], 
    columns='variable', 
    values='value', 
    aggfunc='first'  # or 'mean', 'max', etc. depending on your need
).reset_index()

print("\nCast DataFrame (wide format):\n", cast_df.head())


Male Subset:
     age   sex steroid  antivirals fatigue malaise  ... alk_phosphate  sgot albumin protime histology class
0    30  male   False       False   False   False  ...          85.0  18.0     4.0     NaN     False  live
20   22  male    True        True    True   False  ...          48.0  20.0     4.2    64.0     False  live
24   25  male   False        True   False   False  ...          45.0  18.0     4.3    70.0     False  live
27   58  male    True       False    True   False  ...         175.0  55.0     2.7    36.0     False  live
32   41  male    True        True    True    True  ...          81.0  53.0     5.0    74.0     False  live

[5 rows x 20 columns]

Female Subset:
    age     sex steroid  antivirals fatigue malaise  ... alk_phosphate   sgot albumin protime histology class
1   50  female   False       False    True   False  ...         135.0   42.0     3.5     NaN     False  live
2   78  female    True       False    True   False  ...          96.0   32.0     4.0  