# Assignment 5: Datasets


In this assignment, we'll work with a simple biological dataset containing information about hypothetical patients and their white blood cell (WBC) counts. We'll use Python and pandas to load, inspect, and analyze the data.


## <font color = "pink" >  Part 1: Load and Inspect the Dataset



**Task:**

- Load the dataset into Python.
- Display the first 5 rows of the dataset.
- Count the total number of patients in the dataset.

**Instructions:**

- Use `pd.read_csv('filename.csv')` to read the CSV file.
- Use the `.head()` method to display the first few rows.
- Use `len(data)` or `data.shape[0]` to count the number of rows.



In [None]:
import pandas as pd
data = pd.read_csv('patient_data.csv')
print(f'{data.head(5)}')
rows = len(data)
print(f'number of patients: {rows}')


## <font color = "pink" > Task 2: Summarize the data

**Task**

Calculate the average (mean) WBC count.
Find the minimum and maximum WBC count.
Determine the number of unique conditions in the dataset.
Instructions:

Use data['WBC_Count'].mean() to compute the mean WBC count.
Use .min() and .max() to find the minimum and maximum WBC counts.
Use data['Condition'].nunique() to find the number of unique conditions.



In [None]:
print(f'mean WBC count: {data['WBC_Count'].mean()}')
print(f'min WBC count: {data['WBC_Count'].min()}')
print(f'max WBC count: {data['WBC_Count'].mean()}')
unique_conditions = data['WBC_Count'].unique()
num_unq_con = len(unique_conditions)
print(f'number of unique WBC count: {num_unq_con}: {unique_conditions}')


## <font color = "pink" >Task 3: Filtering the data

**Task**

Extract patients with WBC count above 10,000.
Extract patients with the condition "Healthy".
Instructions:

Use boolean indexing to filter the DataFrame.
Store the filtered DataFrames in variables and display them.



In [None]:
# Filter patients with WBC count > 10,000
# TODO: Create a DataFrame 'high_wbc' containing patients with WBC_Count > 10000
high_wbc = data['WBC_Count']>10000
print("\nPatients with WBC count above 10,000:")
print(high_wbc)

# Filter patients with the condition 'Healthy'
# TODO: Create a DataFrame 'healthy_patients' containing patients with Condition == 'Healthy'
healthy_patients = data['Condition']='Healthy'
print("\nPatients with the condition 'Healthy':")
print(healthy_patients)


## <font color = "pink" >Task 4: Counting the Data

In [None]:
# Count patients by condition
# TODO: Use a method to count the number of patients in each condition and store in 'condition_counts'
condition_counts = data['Condition'].value_counts()
print("\nNumber of patients by condition:")
print(condition_counts)


## <font color ='pink'> # Task 5: Visualize Data

Task:

Create a bar chart showing the number of patients for each condition.
Create a scatter plot showing the relationship between age and WBC count.
Instructions:

Use matplotlib for plotting.
Customize the plots with titles and labels.






In [None]:
import matplotlib.pyplot as plt

# Bar chart for number of patients by condition
# TODO: Use 'condition_counts' to create a bar chart (you can use condition_counts.plot())
plt.figure(figsize=(8, 5))
condition_counts.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title('Number of Patients by Condition', fontsize=14)
plt.xlabel('Condition', fontsize=12)
plt.ylabel('Number of Patients', fontsize=12)
plt.xticks(rotation=0, fontsize=10)
plt.yticks(fontsize=10)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# Scatter plot of Age vs. WBC Count
# TODO: Create a scatter plot with 'Age' on the x-axis and 'WBC_Count' on the y-axis (you can use plt.scatter())
plt.figure(figsize=(8, 5))
plt.scatter(data['Age'], data['WBC_Count'], color='purple', alpha=0.7)
plt.title('Age vs. WBC Count', fontsize=14)
plt.xlabel('Age', fontsize=12)
plt.ylabel('WBC Count', fontsize=12)
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()



