# **Introduction to Fundamental Data Structures in Pandas**

1. Pandas objects as enhanced NumPy arrays:
   * Pandas objects are similar to NumPy structured arrays, but with added features.
   * The key difference is that rows and columns in Pandas objects have labels, not just integer indices.

2. Three fundamental Pandas data structures:
   a. Series
   b. DataFrame
   c. Index

Let's briefly explain each of these:

a. Series:
   * Can be thought of as a one-dimensional labeled array.
   * Similar to a single column in a spreadsheet or a single column of a DataFrame.
   * A Series is like a column in a spreadsheet. It's a single list of data, all of the same type, like a list of names or a list of numbers. Each item in the list has a label, called an index, which helps you find that item quickly.

b. DataFrame:
   * A two-dimensional labeled data structure.
   * Like a spreadsheet or SQL table, or multiple Series objects put together to share the same index.
   * A DataFrame is like a whole spreadsheet or table. It's made up of multiple Series (columns) put together. Each column can have a different type of data, just like in a spreadsheet where one column might be names (text) and another might be ages (numbers)

c. Index:
   * The object responsible for axis labeling in both Series and DataFrame.
   * The Index is like the labels on the side of a spreadsheet
   * Allows for easy data alignment and provides a means to identify data.
   * An Index in pandas is like the row labels in a spreadsheet. It helps you quickly find and refer to specific rows in your data. By default, pandas uses numbers starting from 0 as the index, but you can use anything as an index - dates, names, or custom labels

3. Standard imports:
   When working with Pandas, it's common to start your Python scripts or notebook cells with these import statements:


In [8]:
import numpy as np
import pandas as pd

#np is the conventional alias for NumPy, and pd is the conventional alias for Pandas.

# Python Objects and Pandas Objects

## Python Objects:
1. In Python, everything is an object. This includes numbers, strings, functions, classes, and more.
2. An object is an instance of a class. It has:
   * Attributes (data)
   * Methods (functions that operate on the data)
3. Objects are created from classes, which act as blueprints.
4. Python uses object-oriented programming (OOP), allowing for encapsulation, inheritance, and polymorphism.

## Pandas Objects:
Pandas introduces its own set of objects, built on top of Python and NumPy. The main Pandas objects are:

1. Series:
   * One-dimensional labeled array
   * Can hold data of any type (integers, floats, strings, Python objects, etc.)
   * Similar to a column in a spreadsheet or a single column of a DataFrame

2. DataFrame:
   * Two-dimensional labeled data structure
   * Like a spreadsheet or SQL table
   * Contains columns of potentially different types
   * Can be thought of as a dictionary of Series objects

3. Index:
   * Immutable array-like object holding axis labels
   * Used for alignment and identification in Series and DataFrame

## Key Points about Pandas Objects:
* They're built on top of NumPy arrays, providing additional functionality.
* They include labels for rows and columns, unlike plain NumPy arrays.
* They can handle heterogeneous data types across columns (in a DataFrame).
* They have built-in handling for missing data (represented as NaN).
* They come with many methods for data manipulation, analysis, and cleaning.

## Relationship between Python and Pandas Objects:
* Pandas objects are Python objects, created from classes defined in the Pandas library.
* They follow Python's OOP principles but are specialized for data manipulation.
* You can use standard Python operations on them, but they also have their own specialized methods.
* Purpose:
* -Python objects: General-purpose programming
* -Pandas objects: Specialized for data analysis and manipulation
* Structure:
* -Python objects: Can have any structure defined by their class
* -Pandas objects: Have predefined structures (Series for 1D, DataFrame for 2D data)
* Performance:
* -Python objects: Standard Python performance
* -Pandas objects: Optimized for numerical operations and large datasets
* Data handling:
* -Python objects: Manual implementation of data operations
* -Pandas objects: Built-in methods for common data operations (filtering, grouping, merging)
* Indexing:
* -Python objects: Typically use integer indexing for sequences
* -Pandas objects: Support label-based indexing, integer-based indexing, and boolean indexing
* Vectorization:
* -Python objects: Often require explicit loops for element-wise operation
* -Pandas objects: Support vectorized operations, applying functions to entire columns or datasets at once
* Data types:
* -Python objects: Use Python's built-in types (int, float, str, etc.)
* -Pandas objects: Use NumPy data types, allowing for more efficient storage and computation
* Memory usage:
* -Python objects: Each object has its own memory overhead
* -Pandas objects: More memory-efficient for large datasets due to underlying NumPy arrays

In [17]:
# Python list (a Python object)
python_list = [1, 2, 3, 4, 5]

# Pandas Series (a pandas object)
import pandas as pd
pandas_series = pd.Series([1, 2, 3, 4, 5])

# Adding 1 to each element
# Python way (explicit loop)
python_result = [x + 1 for x in python_list]

# Pandas way (vectorized operation)
pandas_result = pandas_series + 1

print("Python result:", python_result)
print("Pandas result:", pandas_result)

Python result: [2, 3, 4, 5, 6]
Pandas result: 0    2
1    3
2    4
3    5
4    6
dtype: int64
