# Lab 01: Getting Started with Python Libraries for Data Visualization

**Objectives**

- Understand lab tools and environments.
- Learn basic operations with NumPy, Pandas, Matplotlib, and Seaborn.
- Understand types of data and how to visualize them simply.

**Tools Used**

- Google Colab
- NumPy
- Pandas
- Matplotlib
- Seaborn

**Datasets Used**

NYC Weather Dataset (nyc_weather)
This dataset contains daily weather records for New York City and is sourced from Kaggle (https://www.kaggle.com/datasets/jacob55/nyc-weather). It includes observations such as maximum and minimum temperatures (in Fahrenheit), precipitation, snowfall, and snow depth for each day. The dataset helps students work with real-world data involving missing values, continuous variables, and temporal attributes. It will be used in this lab to demonstrate data loading, inspection, and basic visualization using Pandas and Matplotlib. This dataset will be accessed by uploading it to Google Drive and loading it from there.

Tips Dataset (tips)
The tips dataset is a built-in dataset provided by the Seaborn library. It contains information about restaurant bills and tips, along with categorical attributes such as the gender of the customer, smoking status, day of the week, time of the meal, and party size. This dataset is often used for introductory data visualization tasks because of its mix of numerical and categorical variables. In this lab, students will use it to create histograms and scatter plots with color encoding based on categorical values. The dataset will be loaded directly using Seaborn's load_dataset function.

Iris Dataset (iris)
The iris dataset is another classic dataset included in the Seaborn library. It contains measurements of iris flowers from three different species—setosa, versicolor, and virginica. Each entry includes values for sepal length, sepal width, petal length, and petal width. This dataset is widely used for demonstrating basic statistical analysis and visualization due to its clean structure and balanced class labels. In this lab, students will use it to construct a correlation heatmap and gain insights into the relationships between numerical features.

# Load Files

In [None]:
# !unzip /content/lab01.zip

# NumPy

NumPy (short for Numerical Python) is a powerful Python library used for numerical and scientific computing. It provides efficient tools for working with large arrays, matrices, and performing mathematical operations on them.

A NumPy array (also called ndarray) is a fast and flexible container for storing and manipulating numerical data. It looks like a list or list of lists, but is more efficient and supports a wide range of operations.

## Import NumPy

In [None]:
import numpy as np

## Task: Given two matrices A and B (of same dimensions), find C = 2A - 3B

In [None]:
a = [[3, 4], [7, 8]]
b = [[1, 2], [5, 6]]

c = []

for i in range(len(a)):
    row = []
    for j in range(len(a[i])):
        value = 2 * a[i][j] - 3 * b[i][j]
        row.append(value)
    c.append(row)

print(c)

In [None]:
a_np = np.array([[3, 4], [7, 8]])
b_np = np.array([[1, 2], [5, 6]])
c_np = 2 * a_np - 3 * b_np
print(c_np)

## Creating NumPy Array

In [None]:
np.array([1, 2, 3])

In [None]:
np.zeros((2, 3))

In [None]:
np.ones((3, 3))

In [None]:
np.eye(4)

In [None]:
np.random.rand(2, 2)

In [None]:
np.random.randn(2, 2)

In [None]:
np.arange(0, 10, 2)

In [None]:
np.linspace(0, 10, 5)

## Array Properties

In [None]:
a = np.random.rand(3, 4)

print(a)

In [None]:
print(a.shape)
print(a.dtype)
print(a.size)

## Indexing and Slicing

In [None]:
a = np.array([[10, 20, 30], [40, 50, 60]])

In [None]:
print(a)

In [None]:
print(a[0, 2])

In [None]:
print(a[:, 1])

In [None]:
print(a[1, :])

In [None]:
print(a[0:2, 1:3])

In [None]:
print(a[a > 25])

## Operations and Broadcasting

In [None]:
a = np.array([[1, 2], [3, 4]])
print(a)

In [None]:
print(a + 10)

In [None]:
print(a * 2)

In [None]:
print(a @ a)

In [None]:
print(a.T)

In [None]:
print(a.reshape(2, -1))

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a.dot(b))

## Useful Functions

In [None]:
a = np.array([[1, 5], [2, 6]])
print(a)

In [None]:
print(a.min())

In [None]:
print(np.max(a))

In [None]:
print(np.mean(a))

In [None]:
print(np.argmax(a))

## Task

- Create a 3x3 matrix A of random integers between 0 and 9.
- Transpose the matrix A and keep the result in B.
- Replace all values of B below the mean with 0 and keep the result in C.
- Multiply this C matrix with the original A matrix.

In [None]:
# YOUR CODE HERE

# Pandas

A Pandas DataFrame is a 2D labeled data structure with columns of potentially different types — similar to an Excel sheet or SQL table.

## Import Pandas

In [None]:
import pandas as pd

## Creating DataFrame

### From Dictionary

In [None]:
weather_data = {
 'day': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],
 'temperature': [32,35,28,24,32,31],
 'windspeed': [6,7,2,7,4,2],
 'event': ['Rain', 'Sunny', 'Snow','Snow','Rain', 'Sunny']
}

df = pd.DataFrame(weather_data)
display(df)

In [None]:
print(df.shape)

### From List/Tuple

In [None]:
weather_data = [
 ('1/1/2017',32,6,'Rain'),
 ('1/2/2017',35,7,'Sunny'),
 ('1/3/2017',28,2,'Snow')
]

df = pd.DataFrame(data=weather_data, columns=['day','temperature','windspeed','event'])

In [None]:
df

### From List of Dictionaries

In [None]:
weather_data = [
 {'day': '1/1/2017', 'temperature': 32, 'windspeed': 6, 'event': 'Rain'},
 {'day': '1/2/2017', 'temperature': 35, 'windspeed': 7, 'event': 'Sunny'},
 {'windspeed': 2, 'day': '1/3/2017', 'temperature': 28, 'event': 'Snow'},
]

df = pd.DataFrame(data=weather_data, columns=['day','temperature','windspeed','event'])

In [None]:
df

### From CSV File

In [None]:
df = pd.read_csv('/content/lab01/nyc_weather.csv')

In [None]:
print(df.shape)

In [None]:
df.head(10)

## Viewing Columns

In [None]:
df[['Temperature']].head()

In [None]:
df[['EST', 'Temperature']].head()

In [None]:
df.EST

In [None]:
print(df.columns)

## Viewing Rows

In [None]:
df[1:5]

In [None]:
df[5:10]

In [None]:
df[2:]

## Operations on DataFrame

In [None]:
df['Temperature'].max()

In [None]:
df['Temperature'].min()

In [None]:
df['Temperature'].mean()

In [None]:
df[df['Temperature'] > 40]

In [None]:
df[df['Temperature'] == df['Temperature'].max()]

In [None]:
df.describe()

## Task

From the nyc_weather dataset, Find the average WindSpeedMPH of days where the temperature is greater than or equal to 30.

In [None]:
# YOUR CODE HERE

# Matplotlib

Matplotlib is a versatile plotting library in Python. It allows the creation of static, animated, and interactive visualizations. The most commonly used module is pyplot.

## Import Matplotlib

In [None]:
import matplotlib.pyplot as plt

## Basic Plotting

In [None]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.title("Line Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(False)
plt.show()

## Bar Plot

In [None]:
categories = ["A", "B", "C"]
values = [4, 7, 1]

plt.bar(categories, values, color='skyblue')
plt.title("Bar Plot")
plt.show()

## Histogram

In [None]:
data = np.random.randn(1000)

plt.hist(data, bins=30, color='orange', edgecolor='black')
plt.title("Histogram of Normally Distributed Data")
plt.show()

## Scatter Plot

In [None]:
x = np.random.randn(50)
y = np.random.randn(50)

plt.scatter(x, y, color='green')
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

## Customization and Subplots

In [None]:
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label='sin(x)')
plt.plot(x, y2, label='cos(x)')
plt.legend()
plt.title("Sine and Cosine")
plt.show()

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(10, 4))
axs[0].plot(x, y1)
axs[0].set_title("Sine")
axs[1].plot(x, y2)
axs[1].set_title("Cosine")
plt.show()

## Task

- Create a line plot of any quadratic function.
- Generate a histogram of 500 uniform random numbers.
- Create a subplot with the line plot and the histogram.

In [None]:
# YOUR CODE HERE

# Seaborn

Seaborn is built on top of Matplotlib and tightly integrates with Pandas. It provides a high-level interface for drawing attractive and informative statistical graphics.

## Import Seaborn

In [None]:
import seaborn as sns

## Load Built-in Dataset

In [None]:
tips = sns.load_dataset("tips")
tips.head()

## Histogram

In [None]:
sns.histplot(tips["total_bill"], kde=True)
plt.title("Distribution of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Frequency")
plt.show()

## Scatter Plot

In [None]:
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex")
plt.title("Tip vs Total Bill by Gender")
plt.xlabel("Total Bill")
plt.ylabel("Tip")
plt.show()

## Correlation Heatmap

In [None]:
iris = sns.load_dataset("iris")

corr = iris.corr(numeric_only=True)

sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap (Iris Dataset)")
plt.show()

## Task

Use the tips dataset to:
- Plot a histogram of the tip column.
- Create a scatter plot of tip vs size, colored by 'smoker'.

Use the iris dataset to:
- Create a correlation heatmap and describe any two strong correlations you find.

In [None]:
# YOUR CODE HERE