# Heart Murmur Analysis
An exploration of patient data related to heart murmurs.

## Introduction
This notebook provides a detailed analysis of a dataset containing medical evaluations of pediatric patients, particularly focusing on heart murmurs and related characteristics.

In [None]:
import pandas as pd
data = pd.read_csv('/path/to/training_data.csv')
data.head()

## Data Exploration
Displaying the initial few rows and basic statistics of the dataset.

In [None]:
data.describe(include='all')

## Distribution Analysis
Exploring the distribution of key features such as age, sex, murmur presence, and outcomes.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
# Plotting distributions of age, sex, murmur presence, and outcomes
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
sns.countplot(data=data, x='Age', ax=axes[0, 0])
sns.countplot(data=data, x='Sex', ax=axes[0, 1])
sns.countplot(data=data, x='Murmur', ax=axes[1, 0])
sns.countplot(data=data, x='Outcome', ax=axes[1, 1])
plt.tight_layout()

## Correlation Analysis
Investigating the relationships between numerical attributes and health outcomes.

In [None]:
correlation_data = data[['Height', 'Weight', 'Sex', 'Outcome']].copy()
correlation_data['Sex'] = correlation_data['Sex'].astype('category').cat.codes
correlation_data['Outcome'] = correlation_data['Outcome'].astype('category').cat.codes
correlation_data.corr()

## Murmur Location Analysis
Analyzing how different murmur locations correlate with health outcomes.

In [None]:
murmur_location_outcome = data.groupby(['Murmur locations', 'Outcome']).size().unstack(fill_value=0)
murmur_location_outcome

## Murmur Characteristics Analysis
Further exploration of murmur characteristics such as pitch and quality.

In [None]:
sns.countplot(data=data, x='Systolic murmur pitch', hue='Outcome')
sns.countplot(data=data, x='Systolic murmur quality', hue='Outcome')

## Predictive Modeling
Building a model to predict health outcomes based on the dataset.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
# Encoding categorical variables and preparing data
label_encoder = LabelEncoder()
encoded_features = label_encoder.fit_transform(data[['Age', 'Sex', 'Murmur', 'Most audible location']])
# Model training and evaluation
X_train, X_test, y_train, y_test = train_test_split(encoded_features, data['Outcome'], test_size=0.3)
model = RandomForestClassifier().fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

## Conclusion
Summary of the findings and potential next steps for further analysis or model improvement.