In [1]:
import numpy as np
import pandas as pd
import mercury as mr
app = mr.App(description="it goes left to right",
             show_code = True,
             static_notebook=False)


# Introduction to Pandas

In this chapter, we will introduce you to the basics of Pandas, a powerful Python library for data manipulation and analysis.

## What is Pandas?

Pandas is an open-source data manipulation and analysis library for Python. It provides data structures and functions for working with structured data, such as spreadsheets, SQL tables, and time-series data. Some of its key features include:

- **DataFrame:** A two-dimensional, size-mutable, and heterogeneous tabular data structure, similar to a spreadsheet or SQL table.

- **Series:** A one-dimensional labeled array capable of holding data of any type.

- **Data Cleaning:** Pandas allows you to clean, transform, and preprocess data efficiently.

- **Data Analysis:** It provides powerful tools for data analysis, including grouping, aggregation, and statistical analysis.

- **Data Visualization:** While Pandas has some basic plotting capabilities, it can be combined with libraries like Matplotlib and Seaborn for more advanced visualizations.


# Key Data Structures
## DataFrame
A DataFrame is the primary data structure in Pandas. It is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). You can think of it as a spreadsheet or a SQL table.

## Series
A Series is a one-dimensional labeled array capable of holding data of any type. It is like a single column in a DataFrame. Series can be used to represent a single variable or a single row or column of a DataFrame.

## Creating DataFrames and Series
You can create DataFrames and Series from various data sources, including:

- Lists or NumPy arrays
- Dictionaries
- CSV files
- Excel spreadsheets
- SQL databases

Let's see some basic examples:

In [2]:
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Creating a Series from a list
numbers = [1, 2, 3, 4, 5]
series = pd.Series(numbers)

Accessing Data
You can access data in a DataFrame or Series using various methods:

- **Column Selection**: You can select a single column or multiple columns by name.

In [3]:
df['Name']  # Selects the 'Name' column
df[['Name', 'Age']]  # Selects both 'Name' and 'Age' columns


Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


- **Row Selection**: You can select rows based on conditions.

In [4]:
df[df['Age'] > 30]  # Selects rows where 'Age' is greater than 30

Unnamed: 0,Name,Age
2,Charlie,35


- **Indexing**: You can use .loc[] and .iloc[] for label-based and integer-based indexing, respectively.

In [5]:
df.loc[0]  # Selects the first row by label
df.iloc[0]  # Selects the first row by integer index

Name    Alice
Age        25
Name: 0, dtype: object

In the next chapter, we will dive deeper into data manipulation and analysis using Pandas.

Continue your journey with Pandas in the next part: Data Manipulation with Pandas.