## Introduction to Pandas

Pandas is a powerful Python **library** for data manipulation and analysis.

**_Pandas_** is an open-source Python library designed for data manipulation and analysis.
It provides powerful, flexible data structures—primarily the one-dimensional Series and two-dimensional DataFrame—which make it easy **to work** with structured data such as spreadsheets, SQL tables, or CSV files.

It supports automatic alignment, missing data handling, and rich data manipulation functions.


##### _Pandas provides a convenient way to analyze and clean data._

##### _The Pandas library introduces two new data structures to Python - Series and DataFrame, both of which are built on top of NumPy._

Pandas offers a wide range of functions for analyzing, cleaning, exploring, and transforming data. Common tasks include handling missing values, filtering and merging datasets, grouping and summarizing data, and preparing data for visualization or machine learning. It is especially valued in data science for its ability to efficiently process **large datasets** and streamline repetitive data-wrangling tasks

#### Analogy (Expanded):
        - Think of a Pandas DataFrame like an Excel spreadsheet in Python.

        - It has rows and columns, labels, and allows you to perform operations like sorting, filtering, and calculations — but with the full power and speed of Python and NumPy behind it.

### What is Pandas Used for?

Pandas is a powerful library generally used for:

        - Data Cleaning
        - Data Transformation
        - Data Analysis
        - Machine Learning
        - Data Visualization

### Why Use Pandas?

Some of the reasons why we should use Pandas are as follows:

1. Handle Large Data Efficiently

Pandas is designed for handling large datasets. It provides powerful tools that simplify tasks like data filtering, transforming, and merging.

It also provides built-in functions to work with formats like CSV, JSON, TXT, Excel, and SQL databases.

2. Tabular Data Representation

Pandas DataFrames, the primary data structure of Pandas, handle data in tabular format. This allows easy indexing, selecting, replacing, and slicing of data.

3. Data Cleaning and Preprocessing

Data cleaning and preprocessing are essential steps in the data analysis pipeline, and Pandas provides powerful tools to facilitate these tasks. It has methods for handling missing values, removing duplicates, handling outliers, data normalization, etc.

4. Time Series Functionality

Pandas contains an extensive set of tools for working with dates, times, and time-indexed data as it was initially developed for financial modeling.

5. Free and Open-Source

Pandas follows the same principles as Python, allowing you to use and distribute Pandas for free, even for commercial use.


#### Import Pandas in Python

We can import Pandas in Python using the import statement.


In [1]:
# This code imports the pandas library into our program with the alias pd.
import pandas as pd

After this import statement, we can use Pandas functions and objects by calling them with pd.

##### **Notes:**

        - If we import pandas without an alias using import pandas, we can create a DataFrame using the pandas.DataFrame() function.

        - Using an alias pd is a common convention among Python programmers, as it makes it easier and quicker to refer to the pandas library in your code.

#### Main Data Structures of Pandas
Pandas, a popular Python library for data manipulation and analysis, is built around **two** primary data structures: Series and DataFrame.

##### **Series**

A Series is a ***one-dimensional*** labeled array capable of holding data of any type (integer, string, float, etc.).

Each element in a Series has an associated label, called an index, which allows for fast and flexible data access and manipulation.

You can think of a Series as similar to a single column in a spreadsheet or a database table.

##### **DataFrame**

A DataFrame is a ***two-dimensional***, size-mutable, and potentially heterogeneous tabular data structure.

It consists of an ordered collection of columns, each of which can be a different data type (numeric, string, boolean, etc.).

DataFrames are analogous to spreadsheets or SQL tables, with both row and column indices.

Each column in a DataFrame is essentially a Series, and the DataFrame organizes these Series into a table-like structure.

***Additional Notes***
Both Series and DataFrame support a wide range of data types, including numeric, boolean, string (object), categorical, and datetime types.


In summary, pandas’ core data structures—Series (1D) and DataFrame (2D)—enable efficient and flexible handling of labeled data, making them essential tools for data science and analytics in Python.