Skip to content

chankjen/Python_Data_Analysis

Repository files navigation

#๐ŸŒŸ Ubuntu-Powered Python Data Analysis: โ€œI Am Because We Areโ€ in Code ๐ŸŒ๐Ÿฅ


๐ŸŽฏ Lesson Goal:

By the end of this lesson, youโ€™ll be able to load, explore, clean, and visualize hospital data using Python โ€” all while embracing the Ubuntu philosophy: โ€œI am because we are.โ€ In data, no row is an island. Every patient, every record, every number โ€” they matter because they are part of a community. Letโ€™s honor that together.


๐Ÿง‘โ€โš•๏ธ What Youโ€™ll Need:

  • Python (with Jupyter Notebook or Google Colab recommended)
  • Libraries: pandas, matplotlib, seaborn, numpy
  • A fun, beginner-friendly attitude!

๐Ÿ’ก Ubuntu Tip: Just like in a village, we help each other learn. If you get stuck, ask a friend, search online, or revisit โ€” learning is a shared journey.


๐Ÿ“ฆ Step 1: Create Our Ubuntu Hospital Dataset ๐Ÿฅ

Since weโ€™re focusing on fun and accessibility, letโ€™s create a fictional hospital dataset inspired by Ubuntu values โ€” where every patientโ€™s story contributes to the health of the whole community.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for beautiful plots โ€” because beauty honors the community!
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

# ๐ŸŒฟ Ubuntu Hospital Dataset: "We care because we are together."
data = {
    'Patient_ID': range(1, 101),
    'Age': np.random.randint(1, 90, size=100),
    'Gender': np.random.choice(['Male', 'Female'], size=100),
    'Condition': np.random.choice(['Flu', 'Diabetes', 'Hypertension', 'Asthma', 'Healthy'], size=100, p=[0.3, 0.2, 0.2, 0.2, 0.1]),
    'Treatment_Duration_Days': np.random.randint(1, 15, size=100),
    'Satisfaction_Score': np.random.randint(1, 11, size=100),  # 1-10 scale
    'Follow_Up_Needed': np.random.choice([True, False], size=100, p=[0.4, 0.6])
}

# Create DataFrame โ€” our digital village square ๐ŸŒณ
df = pd.DataFrame(data)

print("๐Ÿฅ Welcome to Ubuntu General Hospital!")
print("Where every patientโ€™s data is honored and cared for.\n")
print(df.head(10))  # Show first 10 villagers (patients)

๐Ÿ’ฌ Ubuntu Reflection: Each row is a person. Their age, condition, satisfaction โ€” these arenโ€™t just numbers. They represent lived experiences. Handle them with care.


๐Ÿ” Step 2: Explore the Data โ€” Know Your Community

Before we act, we listen. In Ubuntu, understanding comes before action.

# ๐Ÿงญ Basic exploration โ€” Who are our patients?
print("=== Community Snapshot ===")
print(f"Total Patients: {len(df)}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Most Common Condition: {df['Condition'].mode()[0]}")
print(f"Average Satisfaction: {df['Satisfaction_Score'].mean():.2f}/10")

# ๐Ÿ“Š Letโ€™s see the gender distribution โ€” balance matters in Ubuntu
print("\n=== Gender Harmony ===")
print(df['Gender'].value_counts())

# ๐Ÿ“ˆ Visualize Conditions โ€” see what the community is facing together
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Condition', palette='viridis')
plt.title("๐Ÿฉบ Community Health Conditions โ€” We Rise By Lifting All", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Number of Patients")
plt.xticks(rotation=45)
plt.show()

๐Ÿ’ฌ Ubuntu Reflection: When we visualize data, weโ€™re not just making charts โ€” weโ€™re giving voice to the communityโ€™s needs. Who is suffering most? Who needs more care?


๐Ÿงน Step 3: Clean the Data โ€” Healing the Gaps Together

In Ubuntu, we donโ€™t leave anyone behind โ€” not even missing data!

Letโ€™s imagine some records have missing satisfaction scores. Weโ€™ll fill them with the community average โ€” because we support each other.

# ๐Ÿค• Introduce some missing data (for teaching purposes)
df.loc[np.random.choice(df.index, size=10), 'Satisfaction_Score'] = np.nan

print("\n=== Before Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")

# ๐Ÿค Ubuntu Clean: Fill missing scores with community average โ€” collective care!
avg_satisfaction = df['Satisfaction_Score'].mean()
df['Satisfaction_Score'].fillna(avg_satisfaction, inplace=True)

print("=== After Ubuntu Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")
print(f"Filled with community average: {avg_satisfaction:.2f}")

๐Ÿ’ฌ Ubuntu Reflection: Instead of deleting incomplete records (abandoning people), we uplift them using the wisdom of the whole. Thatโ€™s Ubuntu data science.


๐Ÿ“ˆ Step 4: Analyze & Visualize โ€” Wisdom Through Sharing

Letโ€™s find insights that help our hospital serve better โ€” together.

# ๐Ÿง  Question: Do certain conditions lead to longer treatments?
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='Condition', y='Treatment_Duration_Days', palette='Set2')
plt.title("โณ Treatment Duration by Condition โ€” Understanding to Serve Better", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Days of Treatment")
plt.xticks(rotation=45)
plt.show()

# ๐Ÿ’ก Insight: Maybe Asthma patients need longer care? Let's check!
avg_treatment_by_condition = df.groupby('Condition')['Treatment_Duration_Days'].mean()
print("\n=== Average Treatment Duration by Condition ===")
print(avg_treatment_by_condition.round(2))

# โค๏ธ Question: Is satisfaction linked to treatment duration?
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Treatment_Duration_Days', y='Satisfaction_Score', hue='Condition', palette='deep', s=100)
plt.title("๐Ÿ˜Š Satisfaction vs Treatment Duration โ€” Are Longer Stays Less Happy?", fontsize=16)
plt.xlabel("Treatment Duration (Days)")
plt.ylabel("Satisfaction (1-10)")
plt.legend(title='Condition')
plt.show()

# ๐Ÿ”ข Correlation check
corr = df['Treatment_Duration_Days'].corr(df['Satisfaction_Score'])
print(f"\nCorrelation between Treatment Duration and Satisfaction: {corr:.2f}")
print("Slight negative? Maybe longer stays = slightly less happy. Letโ€™s dig deeper together!")

๐Ÿ’ฌ Ubuntu Reflection: Data reveals patterns โ€” but only when we ask compassionate questions. Why might longer treatments lower satisfaction? Can we improve the experience?


๐Ÿค– Step 5: Simple Prediction โ€” Serving the Future Together

Letโ€™s predict if a patient will need follow-up, based on their condition and satisfaction โ€” so we can prepare resources before they ask.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# ๐Ÿ”„ Convert categorical to numeric (so the machine can understand our village)
df_encoded = pd.get_dummies(df, columns=['Condition', 'Gender'], drop_first=True)

# ๐ŸŽฏ Target: Follow_Up_Needed
X = df_encoded.drop(['Follow_Up_Needed', 'Patient_ID'], axis=1)
y = df_encoded['Follow_Up_Needed']

# ๐Ÿคฒ Ubuntu Split: Share data between training (learning) and testing (serving)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# ๐ŸŒฑ Grow a Forest of Wisdom (Random Forest Classifier)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# ๐ŸŽฏ Predict
y_pred = model.predict(X_test)

# ๐Ÿ“‹ Ubuntu Report: How well did we serve?
print("\n=== ๐ŸŒฟ Ubuntu Care Prediction Report ===")
print(classification_report(y_test, y_pred))

# ๐Ÿ–ผ๏ธ Confusion Matrix โ€” Who did we miss? Letโ€™s learn together.
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Follow-Up', 'Follow-Up'], yticklabels=['No Follow-Up', 'Follow-Up'])
plt.title("๐Ÿ” Confusion Matrix โ€” Where Can We Improve Our Care?")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()

๐Ÿ’ฌ Ubuntu Reflection: Even machines must learn with humility. If our model misclassifies, we donโ€™t blame the data โ€” we ask: โ€œWhat did we miss? How can we listen better next time?โ€


๐ŸŒˆ Step 6: Celebrate & Reflect โ€” Ubuntu Closing Circle

print("\n๐ŸŽ‰ CONGRATULATIONS! Youโ€™ve completed the Ubuntu Data Journey.")
print("You didnโ€™t just analyze data โ€” you honored stories, healed gaps, and predicted needs with compassion.")

print("\n๐ŸŒฟ Ubuntu Principles in Your Code:")
print("โ†’ You treated missing data with community averages โ€” no one left behind.")
print("โ†’ You visualized conditions to understand collective burdens.")
print("โ†’ You predicted needs to prepare resources โ€” proactive care.")
print("โ†’ You reflected on meaning, not just metrics.")

print("\n๐Ÿ’ก Next Steps:")
print("โ†’ Try this with a real Kaggle health dataset (e.g., โ€˜Heart Disease UCIโ€™ or โ€˜Diabetes Health Indicatorsโ€™).")
print("โ†’ Add more Ubuntu: Ask โ€” โ€˜Who is not represented in this data?โ€™")
print("โ†’ Share your notebook with a friend. Learning grows when shared.")

print("\n๐ŸŒ Because in data, as in life: I am because we are.")

๐Ÿ“š Bonus: Recommended Kaggle Datasets (for your next Ubuntu project!)

๐Ÿ’ก Ubuntu Challenge: When you use real data, ask:
โ€œWhose voices are missing? How can my analysis serve the underserved?โ€


โœ… Summary Checklist (Ubuntu Edition):

  • Created or loaded a health dataset with care ๐Ÿฅ
  • Explored the communityโ€™s needs ๐Ÿ“Š
  • Cleaned data with compassion (no one deleted!) ๐Ÿค
  • Visualized to understand, not just to impress ๐ŸŽจ
  • Built a model to serve, not just to score ๐Ÿค–
  • Reflected on the human stories behind the numbers โค๏ธ
  • Shared your learning or notebook with someone ๐ŸŒ

๐Ÿ™ Final Ubuntu Blessing:

โ€œMay your code be clean, your charts be clear, and your heart remember โ€” behind every row, there is a person.
Analyze with wisdom. Serve with compassion.
I am because we are.โ€


๐Ÿ“ฌ Share your Ubuntu Data Notebook with #UbuntuDataScience โ€” letโ€™s build a global village of compassionate analysts!

Happy coding, data healer. ๐ŸŒฟ๐Ÿ๐Ÿ“Š

About

PLP ACADEMY WEEK 7

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published