# 🐼 Part 1: Pandas Basics - DataFrame vs. Series

**Goal:** To establish fundamental knowledge of the Pandas library, the workhorse of data science in Python. This notebook covers creating, inspecting, and understanding the core Pandas data structures.

---
### Key Learning Objectives
1.  Understand the difference between a **DataFrame** (2D table) and a **Series** (1D column).
2.  Learn how to create a simple DataFrame from a Python dictionary.
3.  Use essential attributes for quick data inspection (`.shape`, `.columns`, `.dtypes`).

In [3]:
import pandas as pd
import os

# Build a tiny table from a Python dict
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'London', 'Tokyo', 'Paris']
}
df = pd.DataFrame(data)

print("--- Data Frame Preview ---")
print(df)

--- Data Frame Preview ---
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Tokyo
3    David   40     Paris


## 2. Creating and Previewing a DataFrame

A Pandas **DataFrame** is the most common object used for data analysis. It is essentially a 2-dimensional labeled data structure, similar to a SQL table or a spreadsheet. We typically create them by loading a file (like a CSV), but they can also be built directly from a Python dictionary, where the keys become the column headers.

In [4]:
print('Shape (Rows x Columns):', df.shape)
print('Column Names:', list(df.columns))
print('\nData Types (Dtypes):')
print(df.dtypes)

Shape (Rows x Columns): (4, 3)
Column Names: ['Name', 'Age', 'City']

Data Types (Dtypes):
Name    object
Age      int64
City    object
dtype: object


## 3. Quick Inspection Attributes

These attributes provide essential metadata about the DataFrame without needing to print the entire contents:

* **`.shape`**: Returns a tuple of (rows, columns). Crucial for knowing the size of your dataset.
* **`.columns`**: Lists all column headers.
* **`.dtypes`**: Shows the data type for each column (e.g., `int64`, `object` for strings). Data types influence how you can perform calculations and analysis.

In [5]:
# Select a single column, which returns a Series
names_series = df['Name']

print("--- 'Name' Column Type ---")
print(type(names_series))  # <class 'pandas.core.series.Series'>

print("\n--- Series Preview (Head) ---")
print(names_series.head())

print("\n--- DataFrame Type ---")
print(type(df))            # <class 'pandas.core.frame.DataFrame'>

--- 'Name' Column Type ---
<class 'pandas.core.series.Series'>

--- Series Preview (Head) ---
0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object

--- DataFrame Type ---
<class 'pandas.core.frame.DataFrame'>


## 4. Understanding Series

A **Series** is a one-dimensional labeled array, essentially a single column extracted from a DataFrame.

| Feature | DataFrame | Series |
| :--- | :--- | :--- |
| **Dimensions** | 2D (Rows and Columns) | 1D (Single Column) |
| **Data Type** | A collection of Series, potentially with mixed types | All elements must be the same data type |

When you select a single column using bracket notation (`df['Name']`), Pandas returns a Series object.

In [6]:
# --- Summary Text (Optional, for an additional file or final section) ---
summary = f"""
PANDAS BASICS QUICK REVIEW
--------------------------
Rows × Columns: {df.shape[0]} × {df.shape[1]}
Data Types:
{df.dtypes.to_string()}
"""
print(summary)

# --- File Export ---
# Ensure the target directory exists and save the demo data
os.makedirs('data-visualization/data', exist_ok=True)
df.to_csv('data-visualization/data/part1_demo_people.csv', index=False)
print('\nSaved demo file to: data-visualization/data/part1_demo_people.csv')


PANDAS BASICS QUICK REVIEW
--------------------------
Rows × Columns: 4 × 3
Data Types:
Name    object
Age      int64
City    object


Saved demo file to: data-visualization/data/part1_demo_people.csv
