# Lesson 2: Exploring Relationships in Heart Attack Data

## Objective
This lesson will guide us through the process of investigating potential relationships between different variables in the heart attack dataset.

## Skills Covered
- Creating scatter plots to explore relationships between two variables
- Grouping data and calculating aggregate statistics
- Interpreting scatter plots

---

## Lesson Steps

### Step 1: Scatter Plot Between Age and Maximum Heart Rate
Examine the relationship between age and the maximum heart rate achieved during exercise.

Enter this code in the code box after the code to load the dataset.

```python
# Scatter plot for Age vs. Maximum Heart Rate
plt.scatter(df['age'], df['thalachh'])
plt.title('Age vs. Maximum Heart Rate Achieved')
plt.xlabel('Age')
plt.ylabel('Max Heart Rate')
plt.show()
```

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
url = 'https://raw.githubusercontent.com/SleeplessOrphan/colabfiles/main/heart.csv'
df = pd.read_csv(url)

# Enter your code here



### Step 2: Aggregate Analysis on Cholesterol Levels by Age Group
Group the data by age and calculate the average cholesterol level for each group.

```python
# Group the data by age and calculate mean cholesterol levels
age_groups = df.groupby('age')['chol'].mean().reset_index()

# Plotting the average cholesterol level by age
plt.plot(age_groups['age'], age_groups['chol'], marker='o')
plt.title('Average Cholesterol Levels by Age')
plt.xlabel('Age')
plt.ylabel('Average Cholesterol Level (mg/dl)')
plt.show()
```

In [None]:
# Enter your code here



### Step 3: Relationship Between Blood Pressure and Heart Rate
Create a scatter plot to explore if higher resting blood pressure is associated with lower maximum heart rates.

```python
# Scatter plot for Resting Blood Pressure vs. Max Heart Rate
plt.scatter(df['trtbps'], df['thalachh'], alpha=0.5)
plt.title('Resting Blood Pressure vs. Maximum Heart Rate Achieved')
plt.xlabel('Resting Blood Pressure (mm Hg)')
plt.ylabel('Max Heart Rate')
plt.show()
```

In [None]:
# Enter your code here



### Step 4: Grouping by Chest Pain Type and Analysing Heart Rate
Group the dataset by the type of chest pain and compare the average maximum heart rate for each type.

```python
# Group the data by chest pain type and calculate mean max heart rate
cp_groups = df.groupby('cp')['thalachh'].mean().reset_index()

# Bar plot for different chest pain types vs. average max heart rate
cp_groups.plot(kind='bar', x='cp', y='thalachh')
plt.title('Average Max Heart Rate by Chest Pain Type')
plt.xlabel('Chest Pain Type')
plt.ylabel('Average Max Heart Rate')
plt.xticks(ticks=range(len(cp_groups)), labels=cp_groups['cp'], rotation=0)
plt.show()
```

In [None]:
# Enter your code here



### Step 5: Wrap Up
We've used scatter plots and grouping to explore relationships between age, cholesterol levels, blood pressure, chest pain types, and heart rate. These techniques help us to start uncovering patterns that could be important for predicting or understanding heart attacks.

---

# Independent Study Tasks for Lesson 2
## Hints are shown below


#### Task 1: Investigate the 'Oldpeak' Variable
The 'oldpeak' variable represents the ST depression induced by exercise relative to rest, a risk marker for heart attacks. Examine the distribution of 'oldpeak' values and assess any potential skewness in the data.

**Instructions:**
- Create a histogram to visualise the distribution of the 'oldpeak' values.
- Calculate and interpret the skewness of the distribution.
- Reflect on what the skewness might suggest about the cardiac health of the individuals in the dataset.

In [None]:
# Enter your code here



#### Task 2: Chest Pain and Exercise-Induced Angina
Explore the relationship between the type of chest pain ('cp') and the occurrence of exercise-induced angina ('exng').

**Instructions:**
- Plot a bar chart showing the count of individuals with exercise-induced angina for each chest pain category.
- Determine if there's a noticeable trend or pattern relating the type of chest pain to the likelihood of experiencing exercise-induced angina.
- Consider the implications of any patterns found for understanding the risks associated with different types of chest pain.

In [None]:
# Enter your code here



#### Task 3: Fasting Blood Sugar and Cholesterol Levels
Fasting blood sugar ('fbs') and cholesterol levels ('chol') are both indicators of potential heart health issues. Analyse how they interact within this dataset.

**Instructions:**
- Compare the average cholesterol levels between two groups: individuals with fasting blood sugar above 120 mg/dl (considered high 'fbs' = 1) and those with normal fasting blood sugar ('fbs' = 0).
- Create a box plot to show the distribution of cholesterol levels within these two groups.
- Discuss any differences observed and what they might imply about the relationship between fasting blood sugar and cholesterol levels.

In [None]:
# Enter your code here



# Hints

Here are some hints for each independent study task to help guide the you in your exploration without giving away the full solution:

### Hints for Task 1: Investigate the 'Oldpeak' Variable

1. Consider using the `hist()` function from Matplotlib to create the histogram; this will give you a visual representation of the distribution.
2. To calculate skewness, you may want to use the `skew()` function available within the pandas library.
3. When interpreting skewness, remember that a value closer to 0 suggests a symmetrical distribution, while a positive or negative value indicates a skew to the right or left, respectively.

### Hints for Task 2: Chest Pain and Exercise-Induced Angina

1. To create the bar chart, first aggregate the data using `groupby` on 'cp' and count the occurrences of 'exng'.
2. When plotting, consider using the `bar()` function from Matplotlib and make sure to label your axes clearly.
3. Reflect on the clinical significance of chest pain that triggers angina during exercise and what that might mean for patients with different types of chest pain.

### Hints for Task 3: Fasting Blood Sugar and Cholesterol Levels

1. Use `groupby` to segment the data into groups where 'fbs' is 0 or 1, and then calculate the mean of 'chol' for these groups.
2. The `boxplot()` function from Matplotlib can help visualise the cholesterol level distributions for the two 'fbs' groups. Remember to look for differences in the median, as well as the spread and outliers in the data.
3. Discuss the potential implications considering what high fasting blood sugar levels might indicate for a person's metabolism and how this could be related to cholesterol levels.

