# Level 1: Introduction & Basics

## 1.1 What is Pandas?

Pandas is a powerful, open-source data analysis and manipulation library for Python. It is built on top of the NumPy library, which means it's fast and efficient.

### Use Cases:
- **Data Cleaning:** Handling missing data, removing duplicates, and transforming data into a usable format.
- **Data Analysis:** Calculating statistics, aggregating data, and exploring relationships between variables.
- **Data Transformation:** Reshaping datasets, merging multiple sources, and preparing data for machine learning models.

### Key Data Structures:
Pandas introduces two main data structures:
- **Series:** A one-dimensional labeled array capable of holding any data type.
- **DataFrame:** A two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table.

## 1.2 Installation & Setup

To use Pandas, you first need to install it. You can do this using `pip`, the Python package installer.

In [1]:
!pip install pandas




[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Once installed, you need to import it into your Python script or Jupyter Notebook. The standard convention is to import it with the alias `pd`.

In [2]:
import pandas as pd

You can check the version of Pandas you have installed.

In [3]:
print(pd.__version__)

2.3.1


## 1.3 Jupyter & Data Workflow

Jupyter Notebooks, VS Code with the Jupyter extension, and Google Colab are popular environments for working with Pandas. They allow you to write and execute code in cells, making it easy to experiment and see the results of your data manipulation immediately.

Let's create a simple DataFrame to see how it works.

In [4]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Henry'],
    'Age': [25, 30, 35, 40, 22, 45, 28, 50],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego']
}

df = pd.DataFrame(data)

print("DataFrame created successfully!")

DataFrame created successfully!


### Viewing DataFrames
When working with large datasets, you'll want to inspect a small portion of the data rather than printing the entire DataFrame. Pandas provides several useful functions for this:

#### `.head()`
The `.head()` method returns the first `n` rows of the DataFrame (by default, `n=5`).

In [5]:
df.head()

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eva,22,Phoenix


#### `.tail()`
The `.tail()` method returns the last `n` rows of the DataFrame (by default, `n=5`).

In [6]:
df.tail()

Unnamed: 0,Name,Age,City
3,David,40,Houston
4,Eva,22,Phoenix
5,Frank,45,Philadelphia
6,Grace,28,San Antonio
7,Henry,50,San Diego


#### `.sample()`
The `.sample()` method returns a random sample of rows from the DataFrame.

In [7]:
df.sample(3) # Get 3 random rows

Unnamed: 0,Name,Age,City
3,David,40,Houston
4,Eva,22,Phoenix
5,Frank,45,Philadelphia
