# Python `pandas` Module: Methods, Functions, and Examples

This notebook provides detailed notes, explanations, and examples for working with Python's `pandas` library. Each cell covers a specific concept or function.

## Introduction to `pandas`

`pandas` is a powerful Python library for data analysis and manipulation. It provides data structures like `Series` and `DataFrame` for handling tabular and time series data.

## Importing `pandas`

To use `pandas`, you need to import it. The convention is to use `import pandas as pd`.

In [None]:
import pandas as pd

## Creating a Series

A `Series` is a one-dimensional labeled array. You can create it from a list, array, or dictionary.

In [None]:
# Create a Series
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print('Series:')
print(s)

## Creating a DataFrame

A `DataFrame` is a two-dimensional labeled data structure. You can create it from dictionaries, lists, or arrays.

In [None]:
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print('DataFrame:')
print(df)

## Inspecting Data

Use methods like `head()`, `tail()`, and `info()` to inspect the contents and structure of a DataFrame.

In [None]:
# Inspecting data
print('Head:')
print(df.head())
print('Tail:')
print(df.tail())
print('Info:')
df.info()

## Selecting Data

You can select columns and rows using bracket notation, `loc`, and `iloc`.

In [None]:
# Selecting data
print('Select column:')
print(df['Name'])
print('Select row by label:')
print(df.loc[0])
print('Select row by index:')
print(df.iloc[1])

## Filtering Data

You can filter rows based on conditions using boolean indexing.

In [None]:
# Filtering data
filtered_df = df[df['Age'] > 25]
print('Filtered DataFrame:')
print(filtered_df)

## Descriptive Statistics

Use methods like `mean()`, `sum()`, and `describe()` to compute statistics on your data.

In [None]:
# Descriptive statistics
print('Mean age:', df['Age'].mean())
print('Sum of ages:', df['Age'].sum())
print('Describe:')
print(df.describe())

## Handling Missing Data

Use methods like `isnull()`, `dropna()`, and `fillna()` to detect and handle missing values.

In [None]:
# Handling missing data
import numpy as np
df_missing = df.copy()
df_missing.loc[1, 'Age'] = np.nan
print('With missing value:')
print(df_missing)
print('Is null:')
print(df_missing.isnull())
print('Drop NA:')
print(df_missing.dropna())
print('Fill NA:')
print(df_missing.fillna(0))

## Grouping Data

Use `groupby()` to group data and perform aggregate operations.

In [None]:
# Grouping data
df_group = pd.DataFrame({'Team': ['A', 'A', 'B', 'B'], 'Points': [10, 15, 20, 25]})
grouped = df_group.groupby('Team')['Points'].sum()
print('Grouped sum:')
print(grouped)

## Merging and Joining DataFrames

Use `merge()` and `join()` to combine DataFrames based on keys or indexes.

In [None]:
# Merging DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'val2': [4, 5, 6]})
merged = pd.merge(df1, df2, on='key', how='outer')
print('Merged DataFrame:')
print(merged)

## Summary

- `pandas` is essential for data analysis in Python.
- Key features: Series, DataFrame, data selection, filtering, statistics, missing data handling, grouping, and merging.
- Explore more in the [pandas documentation](https://pandas.pydata.org/docs/).

Let me know if you want to add advanced topics or specific examples!