# Introduction to Pandas and DataFrames

### What is Pandas?

**Pandas** is one of the most powerful libraries in Python for **data analysis and manipulation**. It helps us work with **structured data** (think: tables like Excel or CSV files), and it’s one of the first tools we use in our AI/ML journey.

Pandas gives us easy access to:

- Rows and columns (like a spreadsheet)
- Fast filtering, grouping, sorting
- Handling missing values
- Working with real-world datasets

It’s built on top of **NumPy**, so it’s fast, reliable, and works well with machine learning libraries like **Scikit-learn**, **TensorFlow**, and **PyTorch**.

### Why Do We Use Pandas?

- **Easy CSV & Excel Import/Export**
    
    Load and save datasets with a single line.
    
- **Real-World Data Handling**
    
    Missing values, duplicates, messy columns? Pandas handles them all.
    
- **Preprocessing for AI/ML**
    
    Prepare data for models: filtering, feature engineering, splitting, etc.
    
- **Readable & Powerful Syntax**
    
    Pandas code reads like English and supports chaining operations cleanly.
    

### Importing Pandas

In [1]:
import pandas as pd

We always use `pd` as a short alias — it’s the standard in the data world.

### Loading a Real Dataset (Titanic)

We’ll use the **Kaggle Titanic Dataset** (specifically `train.csv`). It’s a great starter dataset: Each row is a passenger and columns contain age, gender, ticket class, etc. We’ll use this one file to learn all the important Pandas concepts.

In [2]:
df = pd.read_csv('data/train.csv')
print(type(df)) 

<class 'pandas.core.frame.DataFrame'>


### What's a DataFrame?

A **DataFrame** is the main object in Pandas — it’s like an Excel sheet with labels for rows and columns. Each column is a **Series** (like a single list of values).

In [3]:
print(type(df))
print(type(df['Age']))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


This structure makes it super easy to:

- Select columns: `df['Age']`
- Filter rows: `df[df['Age'] > 30]`
- Group, sort, modify, and clean

We’ll dig deeper into how to explore and analyze this data in later topics.

### Summary

Pandas is the **foundation of data handling** in Python, just like NumPy is the base for numeric operations. It gives us the **DataFrame**, which is our primary tool to explore, clean, transform, and analyze data in rows and columns.

In this first step, we:

- Imported Pandas and loaded a real-world CSV dataset (`train.csv`)
- Learned that each dataset is stored as a DataFrame — a table-like structure
- Got a feel for what kind of data we're working with (names, ages, survival, ticket class)
- Understood the difference between a **DataFrame** (2D table) and a **Series** (1D column)

This sets us up for more advanced topics like data creation, filtering, grouping, and analysis — all crucial steps before building any AI/ML model.