#๐ Ubuntu-Powered Python Data Analysis: โI Am Because We Areโ in Code ๐๐ฅ
By the end of this lesson, youโll be able to load, explore, clean, and visualize hospital data using Python โ all while embracing the Ubuntu philosophy: โI am because we are.โ In data, no row is an island. Every patient, every record, every number โ they matter because they are part of a community. Letโs honor that together.
- Python (with Jupyter Notebook or Google Colab recommended)
- Libraries:
pandas
,matplotlib
,seaborn
,numpy
- A fun, beginner-friendly attitude!
๐ก Ubuntu Tip: Just like in a village, we help each other learn. If you get stuck, ask a friend, search online, or revisit โ learning is a shared journey.
Since weโre focusing on fun and accessibility, letโs create a fictional hospital dataset inspired by Ubuntu values โ where every patientโs story contributes to the health of the whole community.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set style for beautiful plots โ because beauty honors the community!
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
# ๐ฟ Ubuntu Hospital Dataset: "We care because we are together."
data = {
'Patient_ID': range(1, 101),
'Age': np.random.randint(1, 90, size=100),
'Gender': np.random.choice(['Male', 'Female'], size=100),
'Condition': np.random.choice(['Flu', 'Diabetes', 'Hypertension', 'Asthma', 'Healthy'], size=100, p=[0.3, 0.2, 0.2, 0.2, 0.1]),
'Treatment_Duration_Days': np.random.randint(1, 15, size=100),
'Satisfaction_Score': np.random.randint(1, 11, size=100), # 1-10 scale
'Follow_Up_Needed': np.random.choice([True, False], size=100, p=[0.4, 0.6])
}
# Create DataFrame โ our digital village square ๐ณ
df = pd.DataFrame(data)
print("๐ฅ Welcome to Ubuntu General Hospital!")
print("Where every patientโs data is honored and cared for.\n")
print(df.head(10)) # Show first 10 villagers (patients)
๐ฌ Ubuntu Reflection: Each row is a person. Their age, condition, satisfaction โ these arenโt just numbers. They represent lived experiences. Handle them with care.
Before we act, we listen. In Ubuntu, understanding comes before action.
# ๐งญ Basic exploration โ Who are our patients?
print("=== Community Snapshot ===")
print(f"Total Patients: {len(df)}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Most Common Condition: {df['Condition'].mode()[0]}")
print(f"Average Satisfaction: {df['Satisfaction_Score'].mean():.2f}/10")
# ๐ Letโs see the gender distribution โ balance matters in Ubuntu
print("\n=== Gender Harmony ===")
print(df['Gender'].value_counts())
# ๐ Visualize Conditions โ see what the community is facing together
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Condition', palette='viridis')
plt.title("๐ฉบ Community Health Conditions โ We Rise By Lifting All", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Number of Patients")
plt.xticks(rotation=45)
plt.show()
๐ฌ Ubuntu Reflection: When we visualize data, weโre not just making charts โ weโre giving voice to the communityโs needs. Who is suffering most? Who needs more care?
In Ubuntu, we donโt leave anyone behind โ not even missing data!
Letโs imagine some records have missing satisfaction scores. Weโll fill them with the community average โ because we support each other.
# ๐ค Introduce some missing data (for teaching purposes)
df.loc[np.random.choice(df.index, size=10), 'Satisfaction_Score'] = np.nan
print("\n=== Before Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")
# ๐ค Ubuntu Clean: Fill missing scores with community average โ collective care!
avg_satisfaction = df['Satisfaction_Score'].mean()
df['Satisfaction_Score'].fillna(avg_satisfaction, inplace=True)
print("=== After Ubuntu Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")
print(f"Filled with community average: {avg_satisfaction:.2f}")
๐ฌ Ubuntu Reflection: Instead of deleting incomplete records (abandoning people), we uplift them using the wisdom of the whole. Thatโs Ubuntu data science.
Letโs find insights that help our hospital serve better โ together.
# ๐ง Question: Do certain conditions lead to longer treatments?
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='Condition', y='Treatment_Duration_Days', palette='Set2')
plt.title("โณ Treatment Duration by Condition โ Understanding to Serve Better", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Days of Treatment")
plt.xticks(rotation=45)
plt.show()
# ๐ก Insight: Maybe Asthma patients need longer care? Let's check!
avg_treatment_by_condition = df.groupby('Condition')['Treatment_Duration_Days'].mean()
print("\n=== Average Treatment Duration by Condition ===")
print(avg_treatment_by_condition.round(2))
# โค๏ธ Question: Is satisfaction linked to treatment duration?
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Treatment_Duration_Days', y='Satisfaction_Score', hue='Condition', palette='deep', s=100)
plt.title("๐ Satisfaction vs Treatment Duration โ Are Longer Stays Less Happy?", fontsize=16)
plt.xlabel("Treatment Duration (Days)")
plt.ylabel("Satisfaction (1-10)")
plt.legend(title='Condition')
plt.show()
# ๐ข Correlation check
corr = df['Treatment_Duration_Days'].corr(df['Satisfaction_Score'])
print(f"\nCorrelation between Treatment Duration and Satisfaction: {corr:.2f}")
print("Slight negative? Maybe longer stays = slightly less happy. Letโs dig deeper together!")
๐ฌ Ubuntu Reflection: Data reveals patterns โ but only when we ask compassionate questions. Why might longer treatments lower satisfaction? Can we improve the experience?
Letโs predict if a patient will need follow-up, based on their condition and satisfaction โ so we can prepare resources before they ask.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
# ๐ Convert categorical to numeric (so the machine can understand our village)
df_encoded = pd.get_dummies(df, columns=['Condition', 'Gender'], drop_first=True)
# ๐ฏ Target: Follow_Up_Needed
X = df_encoded.drop(['Follow_Up_Needed', 'Patient_ID'], axis=1)
y = df_encoded['Follow_Up_Needed']
# ๐คฒ Ubuntu Split: Share data between training (learning) and testing (serving)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# ๐ฑ Grow a Forest of Wisdom (Random Forest Classifier)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# ๐ฏ Predict
y_pred = model.predict(X_test)
# ๐ Ubuntu Report: How well did we serve?
print("\n=== ๐ฟ Ubuntu Care Prediction Report ===")
print(classification_report(y_test, y_pred))
# ๐ผ๏ธ Confusion Matrix โ Who did we miss? Letโs learn together.
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Follow-Up', 'Follow-Up'], yticklabels=['No Follow-Up', 'Follow-Up'])
plt.title("๐ Confusion Matrix โ Where Can We Improve Our Care?")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()
๐ฌ Ubuntu Reflection: Even machines must learn with humility. If our model misclassifies, we donโt blame the data โ we ask: โWhat did we miss? How can we listen better next time?โ
print("\n๐ CONGRATULATIONS! Youโve completed the Ubuntu Data Journey.")
print("You didnโt just analyze data โ you honored stories, healed gaps, and predicted needs with compassion.")
print("\n๐ฟ Ubuntu Principles in Your Code:")
print("โ You treated missing data with community averages โ no one left behind.")
print("โ You visualized conditions to understand collective burdens.")
print("โ You predicted needs to prepare resources โ proactive care.")
print("โ You reflected on meaning, not just metrics.")
print("\n๐ก Next Steps:")
print("โ Try this with a real Kaggle health dataset (e.g., โHeart Disease UCIโ or โDiabetes Health Indicatorsโ).")
print("โ Add more Ubuntu: Ask โ โWho is not represented in this data?โ")
print("โ Share your notebook with a friend. Learning grows when shared.")
print("\n๐ Because in data, as in life: I am because we are.")
๐ก Ubuntu Challenge: When you use real data, ask:
โWhose voices are missing? How can my analysis serve the underserved?โ
- Created or loaded a health dataset with care ๐ฅ
- Explored the communityโs needs ๐
- Cleaned data with compassion (no one deleted!) ๐ค
- Visualized to understand, not just to impress ๐จ
- Built a model to serve, not just to score ๐ค
- Reflected on the human stories behind the numbers โค๏ธ
- Shared your learning or notebook with someone ๐
โMay your code be clean, your charts be clear, and your heart remember โ behind every row, there is a person.
Analyze with wisdom. Serve with compassion.
I am because we are.โ
๐ฌ Share your Ubuntu Data Notebook with #UbuntuDataScience โ letโs build a global village of compassionate analysts!
Happy coding, data healer. ๐ฟ๐๐