Perform the following operations using Python on Hepatitis Dataset.  
a. Create data subsets for different sex.  
b. Merge two subsets  
c. Sort Data using age, SGOT, PROTIME.  
d. Transposing Data  
e. Melting Data to long format  
f. Casting data to wide format

In [3]:
import pandas as pd

# Load the dataset (Replace with the correct path)
df = pd.read_csv('../files/hepatitis.csv')

# Rename columns for clarity based on the dataset attributes
df.columns = ['class', 'age', 'sex', 'steroid', 'antivirals', 'fatigue', 'malaise', 'anorexia', 'liver_big',
              'liver_firm', 'spleen_palpable', 'spiders', 'ascites', 'varices', 'bilirubin', 'alk_phosphate', 
              'sgot', 'albumin', 'protime', 'histology']

# 1. Handle Missing Values
# Replace '?' with NaN
df.replace('?', pd.NA, inplace=True)

# Convert relevant columns to numeric types (where applicable)
df = df.apply(pd.to_numeric, errors='ignore')

# 2. Create Data Subsets Based on Sex (e.g., Male and Female)
male_data = df[df['sex'] == 'male']
female_data = df[df['sex'] == 'female']

# 3. Merge Two Subsets (Example: Merge Male and Female Subsets)
merged_data = pd.concat([male_data, female_data], axis=0)

# 4. Sort Data by Age, SGOT, and PROTIME
sorted_data = df.sort_values(by=['age', 'sgot', 'protime'], ascending=[True, False, False])

# 5. Transpose the Data
transposed_data = df.T

# 6. Melting Data to Long Format
long_format_data = pd.melt(df, id_vars=['age', 'sex'], value_vars=['sgot', 'protime', 'bilirubin', 'alk_phosphate'],
                           var_name='variable', value_name='value')

# 7. Casting Data to Wide Format
wide_format_data = long_format_data.pivot_table(index=['age', 'sex'], columns='variable', values='value', aggfunc='mean')

# Check the Results
print("Original Data (First 5 Rows):")
print(df.head())
print("\nMerged Data (First 5 Rows):")
print(merged_data.head())
print("\nSorted Data (First 5 Rows):")
print(sorted_data.head())
print("\nTransposed Data (First 5 Rows):")
print(transposed_data.head())
print("\nLong Format Data (First 5 Rows):")
print(long_format_data.head())
print("\nWide Format Data (First 5 Rows):")
print(wide_format_data.head())


Original Data (First 5 Rows):
   class  age  sex  steroid  antivirals  fatigue  malaise  anorexia  \
0      2   50    1      1.0           2      1.0      2.0       2.0   
1      2   78    1      2.0           2      1.0      2.0       2.0   
2      2   31    1      NaN           1      2.0      2.0       2.0   
3      2   34    1      2.0           2      2.0      2.0       2.0   
4      2   34    1      2.0           2      2.0      2.0       2.0   

   liver_big  liver_firm  spleen_palpable  spiders  ascites  varices  \
0        1.0         2.0              2.0      2.0      2.0      2.0   
1        2.0         2.0              2.0      2.0      2.0      2.0   
2        2.0         2.0              2.0      2.0      2.0      2.0   
3        2.0         2.0              2.0      2.0      2.0      2.0   
4        2.0         2.0              2.0      2.0      2.0      2.0   

   bilirubin  alk_phosphate   sgot  albumin  protime  histology  
0        0.9          135.0   42.0      3.5 

  df = df.apply(pd.to_numeric, errors='ignore')
