Objective: This notebook complements your introductory ML lecture. Here, you will learn the absolute basics of Pandas, the primary tool we use to load, look at, and understand our data before we build models.

Dataset: We will use the classic Iris dataset. It's simple, clean, and perfect for practice.

# Step 1: Importing Libraries
### First, we need to import the tools for our job. Run the cell below.

In [None]:
%pip install sklearn

In [None]:
# Pandas is for data manipulation and analysis. It's our main tool.
import pandas as pd

# NumPy is for numerical operations. Pandas is built on top of it.
import numpy as np

# Matplotlib is for creating visualizations. We'll use it for basic plots.
import matplotlib.pyplot as plt

print("All libraries imported successfully!")

# Step 2: Loading a Dataset
### We need data to work with! We'll use a built-in dataset from the `scikit-learn` library. Don't worry about the `load_iris()` function for now; just know that it gives us a well-known dataset.

In [None]:
from sklearn.datasets import load_iris

# Load the iris dataset
iris_data = load_iris()

# Now, let's create a Pandas DataFrame from this data.
# A DataFrame is like a spreadsheet or a SQL table – it's how we store our data.
df = pd.DataFrame(data=iris_data['data'], columns=iris_data['feature_names'])

# Let's see what we've created!
print("Type of the object:", type(df))
print("")
print("Shape of the DataFrame (Rows, Columns):", df.shape)

# Step 3: Your First Look at the Data
### Now that the data is loaded, let's see what it looks like.

In [None]:
# .head() shows the first 5 rows of the DataFrame. This is your first look at the data.
df.head()

In [None]:
# .tail() shows the last 5 rows.
df.tail()

# Step 4: Getting Basic Information
### The `.info()` method gives us a technical summary of the DataFrame. This is crucial for understanding your data's structure.

In [None]:
df.info()

# Step 5: Getting Basic Statistics
### Machine learning is about understanding patterns in numbers. The `.describe()` method gives a quick statistical summary of all numerical columns.

In [None]:
df.describe()

# Step 6: Selecting Data
### We often need to work with specific columns or rows. Let's see how to do that.
# Selecting a Single Column
### You can select a column by using its name, like this: `df['column_name']`.

In [None]:
# Select the 'sepal length (cm)' column
sepal_lengths = df['sepal length (cm)']
print("Type of the selected column:", type(sepal_lengths))
print("First 10 values:")
print(sepal_lengths.head(10))

### Selecting Multiple Columns
### To select multiple columns, you pass a list of column names: `df[['col1', 'col2']]`.

In [None]:
# Select just the 'sepal' columns
sepal_data = df[['sepal length (cm)', 'sepal width (cm)']]
sepal_data.head()

# Step 7: Simple Data Visualization
### Let's make a simple plot to see the distribution of a feature. This is a basic form of Exploratory Data Analysis (EDA).

In [None]:
# Let's create a histogram of the 'sepal length (cm)' column.
# A histogram shows how many times each value appears.

plt.figure(figsize=(8, 5)) # This sets the size of the plot
plt.hist(df['sepal length (cm)'], color='skyblue', edgecolor='black', bins=15)
plt.title('Distribution of Sepal Lengths')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Count')
plt.show()