## Introduction to Pandas

### What is Pandas?
Pandas is a popular open-source Python library used for data manipulation and analysis. It offers powerful data structures like DataFrame and Series, along with a wide range of functions to perform various operations on structured data.

### Why Pandas?
- Simplifies data manipulation tasks such as cleaning, transforming, and analyzing data.
- Provides intuitive and flexible data structures suitable for handling both structured and semi-structured data.
- Integrates well with other Python libraries such as NumPy, Matplotlib, and Scikit-learn, making it a crucial tool in the data science ecosystem.

## Getting Started

### Installation
You can install Pandas using pip:

```bash
pip install pandas
```

### Importing Pandas
After installation, you can import Pandas into your Python environment:


In [None]:
import pandas as pd

## Pandas Data Structures

### 1. Series
A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, etc.).


In [None]:
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)


### 2. DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table.


In [None]:
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
print(df)

## Data Input and Output

### 1. CSV
Pandas provides functions to read and write data in CSV format.


In [None]:
# Reading a CSV file
df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv')

# Writing to a CSV file
df.to_csv('output.csv', index=False)

### 2. Excel
Pandas also supports reading and writing Excel files.


In [None]:
# Reading an Excel file
df2 = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Writing to an Excel file
df2.to_excel('output.xlsx', sheet_name='Sheet1', index=False)


## Essential DataFrame Operations

### 1. Viewing Data
You can quickly view the contents of a DataFrame using various functions.


In [None]:
# Display the first few rows
print(df.head())

print("--"*20)
# Display the last few rows
print(df.tail())

### 2. Data Information
You can get information about the DataFrame using functions like `info()` and `describe()`.


In [None]:
# Summary of DataFrame
print(df.info())

print('--'*20)
# Summary statistics
print(df.describe())


### 3. Selection and Indexing
Pandas allows selecting specific rows, columns, or elements using indexing.


In [None]:
# Selecting a single column
print("\nSelecting a single column:")
print(df['species'])

# Selecting multiple columns
print("\nSelecting multiple columns:")
print(df[['sepal_length', 'sepal_width']])

# Selecting rows by index
print("\nSelecting rows by index:")
print(df.iloc[0])

### 4. Filtering Data
Filtering allows you to select rows based on specific conditions.


In [None]:
# Filtering based on condition
print("\nFiltering based on condition:")
df[df['sepal_length'] > 5.0]

### 5. Sorting Data
You can sort the DataFrame based on column values.


In [None]:
# Sorting by column
print("\nSorting by column:")
print(df.sort_values(by='sepal_length', ascending=False))


### 6. Grouping Data
Grouping allows you to aggregate data based on specific criteria.


In [None]:
# Grouping by a column
print("\nGrouping by a column:")
grouped = df.groupby('species')
print(grouped.mean())

## Advanced Operations

### 1. Handling Missing Data
Pandas provides functions to handle missing data, such as `dropna()` and `fillna()`.


In [None]:
# Handling missing values
print("\nHandling missing values:")

# Dropping missing values
print("Dropping missing values:")
print(df.dropna())

# Filling missing values
print("Filling missing values:")
print(df.fillna(0))

### 2. Combining DataFrames
You can concatenate or merge multiple DataFrames together.


In [None]:
# Combining DataFrames
print("\nCombining DataFrames:")

# Concatenating DataFrames
df1 = df.head(2)
df2 = df.tail(2)

print("Concatenating DataFrames:")
print(pd.concat([df1, df2]))

In [None]:
# Merging DataFrames
print("Merging DataFrames:")
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40]}
df_left = pd.DataFrame(data)
print(df_left)

In [None]:
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Salary': [1500, 900,1800, 1200]}
df_right = pd.DataFrame(data)
print(df_right)

In [None]:
pd.merge(df_left, df_right, on='Name')

### 3. Pivot Tables
Pivot tables allow you to summarize and aggregate data in a DataFrame.


In [None]:
import pandas as pd

# Create a sample sales DataFrame
data = {
    'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Region': ['North', 'North', 'North', 'North', 'South', 'South', 'South', 'South'],
    'Sales': [100, 200, 150, 250, 120, 220, 130, 240]
}

sales_df = pd.DataFrame(data)
sales_df.head(3)

In [None]:
# Creating a pivot table
print("\nCreating a pivot table:")
pivot_table = pd.pivot_table(sales_df, values='Sales', index=['Region'], columns=['Product'], aggfunc='sum')
print(pivot_table)

### 4. MultiIndexing
MultiIndexing enables working with higher-dimensional data efficiently.


In [None]:
# MultiIndexing
print("\nMultiIndexing:")
multi_index_df = sales_df.set_index(['Region', 'Date'])
multi_index_df

### 5. Time Series Analysis
Pandas provides robust support for handling time series data.


In [None]:
# Create a sample sales DataFrame with time series data and categorical data
data = {
    'Date': pd.date_range(start='2024-01-01', periods=8),
    'Product': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Region': ['North', 'North', 'North', 'North', 'South', 'South', 'South', 'South'],
    'Sales': [100, 200, 150, 250, 120, 220, 130, 240]
}

sales_df = pd.DataFrame(data)
sales_df.head(3)

In [None]:
# Convert 'Product' column to categorical data type
sales_df['Product'] = sales_df['Product'].astype('category')

# Display the first few rows of the dataset
print("First few rows of the sales dataset:")
print(sales_df.head())

In [None]:
# Time Zone Handling
print("\nTime Zone Handling:")

# Localize the 'Date' column to a specific time zone
sales_df['Date'] = sales_df['Date'].dt.tz_localize('UTC')
print(sales_df.head())

### 6. Working with Big Data
Pandas supports out-of-core computing for working with large datasets.


In [None]:
# Working with Big Data
print("\nWorking with Big Data:")


chunk_size = 2
num_chunks = len(sales_df) // chunk_size + 1

for i in range(num_chunks):
    start_idx = i * chunk_size
    end_idx = min((i + 1) * chunk_size, len(sales_df))
    chunk = sales_df.iloc[start_idx:end_idx]
    print(f"Processing chunk {i+1}/{num_chunks}:")
    # Perform operations on the chunk
    print(chunk)
    print("--"*20)

## Plots

Pandas provides some basic plotting capabilities without the need for Matplotlib. These plotting functions are built on top of Matplotlib but provide a simplified interface for quick data visualization. Let's create some plots using these built-in Pandas functions.

In [None]:
# Importing necessary libraries
import pandas as pd

# Create a sample DataFrame
data = {
    'x': [1, 2, 3, 4, 5],
    'y': [2, 4, 6, 8, 10],
    'z': [1, 3, 5, 7, 9]
}

df = pd.DataFrame(data)

# Line plot
print("Line plot:")
df.plot(kind='line', title='Line Plot')

In [None]:
# Bar plot
print("\nBar plot:")
df.plot(kind='bar', title='Bar Plot')

In [None]:
# Box plot
print("\nBox plot:")
df.plot(kind='box', title='Box Plot')

In [None]:
# Density plot
print("\nDensity plot:")
df.plot(kind='density', title='Density Plot')

In [None]:
# Pie chart (using a Series)
print("\nPie chart:")
df['y'].plot(kind='pie', title='Pie Chart')


## Conclusion

This comprehensive Pandas notebook covered various aspects of the Pandas library, including basic operations, data input/output, essential DataFrame operations, and advanced techniques such as MultiIndexing, time series analysis, handling categorical data, and working with big data. With Pandas, you have a powerful tool at your disposal for all your data manipulation and analysis needs.

---

This notebook provides a comprehensive guide to Pandas, from basic operations to advanced techniques. Feel free to expand on each section with additional examples, explanations, and use cases to further enhance your understanding of Pandas.