# Python Data Visualization Course
## 2-Hour Beginner-Friendly Training

**Course Goal:** By the end of this session, participants will be able to produce clean, formal, and scientifically acceptable charts and diagrams using Python.

---

## Part 1: Python Basics for Data Analysis

### 1.1 Introduction to Python

**What is Programming?**
Programming is giving instructions to a computer to perform tasks.

**What is Python?**
- General-purpose programming language, built on top of C
- Easy to learn and read
- Used for data analysis, machine learning, web development, and more
- Fast development: write less code, do more

**Why Python for Data Analysis?**
- Easy syntax (readable like English)
- Powerful libraries (pandas, NumPy, Matplotlib)
- Large community and extensive documentation

In [2]:
# Quick setup check
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print("All libraries loaded successfully!")
print("Python is ready for data analysis!")

# Your first Python program
print("\nHello, mighty SERG-Analyst! ðŸŒ±ðŸ“Š")
print("Python is working correctly!")

All libraries loaded successfully!
Python is ready for data analysis!

Hello, mighty SERG-Analyst! ðŸŒ±ðŸ“Š
Python is working correctly!


### 1.2 Variables and Basic Data Types

**What is a Variable?**
A variable is a named container that stores data. Think of it as a labeled box.

**Understanding Data Types:**
Every piece of data has a "type" - this tells Python what kind of data it is.

In [3]:
# Creating variables with different types
temperature = 23.5      # float - decimal number
sensor_id = "S001"      # string - text
is_active = True        # boolean - True/False
reading_count = 100     # integer - whole number

# Check the type of each variable
print(f"temperature is: {type(temperature)}")
print(f"sensor_id is: {type(sensor_id)}")
print(f"is_active is: {type(is_active)}")
print(f"reading_count is: {type(reading_count)}")

# Why types matter: operations depend on type
print(f"\nTemperature + 10 = {temperature + 10}")  # Works: number + number
# print(sensor_id + 10)  # ERROR: can't add string + number!
print(f"sensor_id + '_active' = {sensor_id + '_active'}")  # Works: string + string

temperature is: <class 'float'>
sensor_id is: <class 'str'>
is_active is: <class 'bool'>
reading_count is: <class 'int'>

Temperature + 10 = 33.5
sensor_id + '_active' = S001_active


### 1.3 Lists

**What are Lists?**
A list is an ordered collection of items. Lists are:
- **Ordered:** Items stay in order
- **Indexed:** Each item has a position (starting from 0!)
- **Mutable:** You can change, add, or remove items

In [4]:
# Creating lists
temperatures = [20.5, 21.0, 22.3, 23.1, 24.0]
# Index:         0      1      2      3      4

# Understanding indexes (start from 0!)
print(f"First element (index 0): {temperatures[0]}")
print(f"Second element (index 1): {temperatures[1]}")
print(f"Last element (index -1): {temperatures[-1]}")

# Slicing (getting portions of the list)
print(f"\nSlice [1:3]: {temperatures[1:3]}")  # Items at index 1 and 2
print(f"First 3 items [:3]: {temperatures[:3]}")
print(f"From index 2 to end [2:]: {temperatures[2:]}")

# Common operations
temperatures.append(25.0)  # Add to end
temperatures.insert(0, 19.5)  # Insert at beginning
print(f"\nAfter adding elements: {temperatures}")
print(f"Length: {len(temperatures)}")
print(f"Is 22.3 in list? {22.3 in temperatures}")

# Real-world: finding max, min, average
print(f"\nMax temperature: {max(temperatures)}")
print(f"Min temperature: {min(temperatures)}")
print(f"Average: {sum(temperatures) / len(temperatures):.2f}")

First element (index 0): 20.5
Second element (index 1): 21.0
Last element (index -1): 24.0

Slice [1:3]: [21.0, 22.3]
First 3 items [:3]: [20.5, 21.0, 22.3]
From index 2 to end [2:]: [22.3, 23.1, 24.0]

After adding elements: [19.5, 20.5, 21.0, 22.3, 23.1, 24.0, 25.0]
Length: 7
Is 22.3 in list? True

Max temperature: 25.0
Min temperature: 19.5
Average: 22.20


### 1.4 Dictionaries

**What are Dictionaries?**
A dictionary stores data as key-value pairs. Think of it like a phone book:
you look up a name (key) to find a number (value).

In [None]:
# Creating dictionaries
sensor_info = {
    "sensor_id": "S001",      # key: "sensor_id", value: "S001"
    "location": "Room A",      # key: "location", value: "Room A"
    "temperature": 23.5,        # key: "temperature", value: 23.5
    "humidity": 45.0,
    "is_active": True
}

# Accessing values (using keys, not indexes!)
print(f"Sensor ID: {sensor_info['sensor_id']}")
print(f"Temperature: {sensor_info.get('temperature')}")  # Safer method

# Get all keys, values, and items
print(f"\nKeys: {list(sensor_info.keys())}")
print(f"Values: {list(sensor_info.values())}")

# Modifying dictionaries
sensor_info["pressure"] = 1013.25  # Add new key-value pair
sensor_info["temperature"] = 24.0   # Update existing value
print(f"\nUpdated sensor_info: {sensor_info}")

# Real-world: Multiple sensors
sensors = {
    "S001": {"temp": 23.5, "humidity": 45, "location": "Room A"},
    "S002": {"temp": 24.1, "humidity": 48, "location": "Room B"},
    "S003": {"temp": 22.8, "humidity": 42, "location": "Room C"}
}
print(f"\nSensor S001 temperature: {sensors['S001']['temp']}Â°C")
print(f"This structure is similar to pandas DataFrames!")

### 1.5 Tuples and Sets

**Tuples:** Like lists but immutable (cannot change)
**Sets:** Unordered collection of unique items

In [None]:
# Tuples (immutable - cannot change)
coordinates = (10, 20)
sensor_location = ("Building A", "Floor 2", "Room 101")
print(f"Coordinates: {coordinates}")
print(f"First coordinate: {coordinates[0]}")

# Common use: returning multiple values from functions
def get_sensor_info():
    return ("S001", 23.5, True)  # Returns a tuple

sensor_id, temp, active = get_sensor_info()  # Unpacking
print(f"\nUnpacked: Sensor {sensor_id}, Temp: {temp}Â°C, Active: {active}")

# Sets (unique values only, unordered)
unique_temperatures = {20.5, 21.0, 22.3, 20.5}  # Duplicate removed
print(f"\nUnique temperatures: {unique_temperatures}")

# Finding unique values from a list
temperatures = [20.5, 21.0, 22.3, 20.5, 21.0, 22.3]
unique = set(temperatures)
print(f"Unique from list: {unique}")

# Fast membership testing
print(f"\nIs 20.5 in set? {20.5 in unique_temperatures}")

# Set operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}
print(f"\nUnion: {set1.union(set2)}")  # All items
print(f"Intersection: {set1.intersection(set2)}")  # Items in both

### 1.6 Strings

**What are Strings?**
A string is a sequence of characters (text). Essential for data analysis:
column names, labels, data cleaning, file paths.

In [None]:
# String basics
sensor_name = "Temperature_Sensor_01"
location = "Building A - Room 101"

# Formatting with f-strings (preferred method)
sensor_id = "S001"
temp = 23.5
humidity = 45
message = f"Sensor {sensor_id} reads {temp}Â°C"
print(message)

# Multiple variables in f-string
report = f"Sensor {sensor_id}: Temp={temp}Â°C, Humidity={humidity}%"
print(f"\nReport: {report}")

# Common string methods
print(f"\nLowercase: {sensor_name.lower()}")
print(f"Replace underscore: {sensor_name.replace('_', ' ')}")
print(f"Split into list: {sensor_name.split('_')}")

# String indexing (like lists)
sensor_id = "S001"
print(f"\nFirst character: {sensor_id[0]}")
print(f"Last character: {sensor_id[-1]}")

# Data cleaning example
messy_name = "  TEMPERATURE_SENSOR_01  "
clean_name = messy_name.strip().lower().replace("_", " ")
print(f"\nCleaned: '{clean_name}'")

### Quick Check: Part 1 Review

In [None]:
# Exercise: Create a small dataset
temperatures = [20.5, 21.0, 22.3, 23.1, 24.0, 23.5, 22.8]

sensor = {
    "id": "S001",
    "readings": temperatures,
    "location": "Room A"
}

# Calculate average temperature
average = sum(temperatures) / len(temperatures)
print(f"Average temperature: {average:.2f}Â°C")

## Part 2: Data Manipulation with pandas

### 2.1 Loading Data

In [None]:
# Load sensor data from CSV
df = pd.read_csv("sensor_data.csv")

# Display first few rows
df.head()

In [None]:
# Display basic information
df.info()

In [None]:
# Summary statistics
df.describe()

### 2.2 Exploring Data

In [None]:
# Basic operations
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"Data types:\n{df.dtypes}")

# Count missing values
print(f"\nMissing values:\n{df.isnull().sum()}")

In [None]:
# Selecting data
print("Temperature column:")
print(df["temperature"].head())

print("\nMultiple columns:")
print(df[["sensor_id", "temperature"]].head())

print("\nFiltered data (temp > 23):")
print(df[df["temperature"] > 23].head())

### 2.3 Data Cleaning

In [None]:
# Convert timestamp to datetime
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Ensure numeric types
df["temperature"] = pd.to_numeric(df["temperature"])
df["humidity"] = pd.to_numeric(df["humidity"])
df["pressure"] = pd.to_numeric(df["pressure"])

# Check data types
df.dtypes

In [None]:
# Sorting data
df_sorted = df.sort_values("temperature", ascending=False)
df_sorted.head()

### 2.4 Basic Calculations

In [None]:
# Aggregations
print(f"Mean temperature: {df['temperature'].mean():.2f}Â°C")
print(f"Median temperature: {df['temperature'].median():.2f}Â°C")
print(f"Std deviation: {df['temperature'].std():.2f}Â°C")

# Group by sensor
print("\nAverage temperature by sensor:")
print(df.groupby("sensor_id")["temperature"].mean())

# Add new column
df["temp_fahrenheit"] = df["temperature"] * 9/5 + 32
df.head()

## Part 3: Data Visualization with Matplotlib & Seaborn

### 3.1 Basic Matplotlib Setup

In [None]:
# Set style for scientific plots
plt.style.use('seaborn-v0_8-whitegrid')

# Simple line plot
temperatures = [20.5, 21.0, 22.3, 23.1, 24.0]
time_points = [1, 2, 3, 4, 5]

plt.figure(figsize=(8, 5))
plt.plot(time_points, temperatures, marker='o', linewidth=2)
plt.xlabel("Time (hours)", fontsize=12)
plt.ylabel("Temperature (Â°C)", fontsize=12)
plt.title("Temperature Over Time", fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 3.2 Professional Line Plot

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(df["timestamp"], df["temperature"], 
         marker='o', linewidth=2, markersize=6, color='#2E86AB')
plt.xlabel("Time", fontsize=12, fontweight='bold')
plt.ylabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.title("Temperature Readings Over Time", fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("temperature_plot.png", dpi=300, bbox_inches='tight')
plt.show()

### 3.3 Scatter Plot

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(df["temperature"], df["humidity"], 
           s=100, alpha=0.6, color='#A23B72')
plt.xlabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.ylabel("Humidity (%)", fontsize=12, fontweight='bold')
plt.title("Temperature vs Humidity", fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 3.4 Bar Chart

In [None]:
sensor_means = df.groupby("sensor_id")["temperature"].mean()

plt.figure(figsize=(8, 6))
plt.bar(sensor_means.index, sensor_means.values, 
        color='#F18F01', edgecolor='black', linewidth=1.5)
plt.xlabel("Sensor ID", fontsize=12, fontweight='bold')
plt.ylabel("Average Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.title("Average Temperature by Sensor", fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

### 3.5 Histogram

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(df["temperature"], bins=20, color='#6A994E', 
         edgecolor='black', alpha=0.7)
plt.xlabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.ylabel("Frequency", fontsize=12, fontweight='bold')
plt.title("Temperature Distribution", fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

### 3.6 Introduction to Seaborn

In [None]:
# Set Seaborn style
sns.set_style("whitegrid")
sns.set_palette("husl")

In [None]:
# Line plot with confidence interval
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x="timestamp", y="temperature", 
             marker='o', linewidth=2)
plt.xlabel("Time", fontsize=12, fontweight='bold')
plt.ylabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.title("Temperature Over Time with Confidence Interval", 
          fontsize=14, fontweight='bold')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x="temperature", y="humidity", s=100)
sns.regplot(data=df, x="temperature", y="humidity", 
            scatter=False, color='red')
plt.xlabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.ylabel("Humidity (%)", fontsize=12, fontweight='bold')
plt.title("Temperature vs Humidity with Trend Line", 
          fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Box plot
plt.figure(figsize=(8, 6))
sns.boxplot(data=df, x="sensor_id", y="temperature")
plt.xlabel("Sensor ID", fontsize=12, fontweight='bold')
plt.ylabel("Temperature (Â°C)", fontsize=12, fontweight='bold')
plt.title("Temperature Distribution by Sensor", 
          fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Heatmap (correlation matrix)
plt.figure(figsize=(8, 6))
correlation = df[["temperature", "humidity", "pressure"]].corr()
sns.heatmap(correlation, annot=True, cmap="coolwarm", 
            center=0, square=True, linewidths=1)
plt.title("Correlation Matrix", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Part 4: Hands-on Project

### Task 1: Load and Explore Data

In [None]:
# Your code here
# Load the sensor_data.csv file
# Display first few rows
# Show basic information

### Task 2: Clean Data

In [None]:
# Your code here
# Convert timestamp to datetime
# Ensure numeric types
# Check for missing values

### Task 3: Create Visualizations

#### 3a: Time Series Plot

In [None]:
# Your code here
# Create a line plot showing temperature over time
# Color-code by sensor_id
# Add proper labels and title
# Save as high-resolution image

#### 3b: Distribution Analysis

In [None]:
# Your code here
# Create a histogram of temperature distribution
# Create a box plot showing temperature by sensor
# Use subplots to show both side-by-side

#### 3c: Correlation Analysis

In [None]:
# Your code here
# Create a correlation heatmap
# Include temperature, humidity, and pressure
# Add annotations showing correlation values