### 1. Introduction to Pandas

Pandas is a versatile library that facilitates data analysis and manipulation. It integrates well with NumPy and other scientific libraries like SciPy, and is especially useful for working with structured or tabular data.

In [10]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame


### 2. Pandas Data Structures

2.1 Series
A Series is a one-dimensional array-like object with labels (the index).

In [11]:
# Creating a Series
obj = pd.Series([4, 7, -5, 3])
print(obj)


0    4
1    7
2   -5
3    3
dtype: int64


Explanation: The Series object consists of the data and its corresponding labels, called an index.

In [12]:
# Custom index for a Series
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
print(obj2)


d    4
b    7
a   -5
c    3
dtype: int64


Explanation: Here, the index has been customized.

### 3. DataFrame

A DataFrame is a two-dimensional tabular structure with labeled rows and columns.

In [13]:
# Creating a DataFrame
data = {'state': ['Ohio', 'Nevada'], 'year': [2000, 2001], 'pop': [1.5, 2.4]}
frame = pd.DataFrame(data)
print(frame)


    state  year  pop
0    Ohio  2000  1.5
1  Nevada  2001  2.4


Explanation: This creates a DataFrame from a dictionary, with each key-value pair becoming a column.

In [14]:
# Accessing specific columns
print(frame['state'])


0      Ohio
1    Nevada
Name: state, dtype: object


Explanation: You can retrieve a column from a DataFrame using either dictionary-style indexing or attribute access.

### 4. Essential Functionality

#### 4.1 Reindexing

Reindexing allows you to realign data to a new index.

In [15]:
# Reindexing a Series
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
print(obj2)


a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64


Explanation: Reindexing can introduce missing values (NaN) when the new index has labels not present in the original index.

#### 4.2 Dropping Entries

You can drop specific entries from an axis in a Series or DataFrame.

In [16]:
# Dropping an entry
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
new_obj = obj.drop('c')
print(new_obj)


a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64


Explanation: This demonstrates how to drop an entry by label.

### 5. Arithmetic and Data Alignment

#### 5.1 Arithmetic Operations

Pandas automatically aligns data for arithmetic operations based on the index.

In [17]:
s1 = pd.Series([7.3, -2.5, 3.4], index=['a', 'c', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4.0], index=['a', 'c', 'f', 'g'])
result = s1 + s2
print(result)


a    5.2
c    1.1
e    NaN
f    NaN
g    NaN
dtype: float64


Explanation: Arithmetic operations between Series are automatically aligned by index, with missing values filled by NaN.

### 6. Function Application and Mapping

Apply a function to each element in a DataFrame or Series.

In [18]:
# Applying a function to each element in the DataFrame
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
f = lambda x: x.max() - x.min()
result = frame.apply(f)
print(result)


b    1.481313
d    2.974441
e    3.055326
dtype: float64


Explanation: The apply method allows you to apply a function along either axis (columns or rows) of the DataFrame.



### 7. Sorting and Ranking

7.1 Sorting
Sort by labels or values.

In [19]:
# Sorting by index
obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
sorted_obj = obj.sort_index()
print(sorted_obj)


a    1
b    2
c    3
d    0
dtype: int64


Explanation: You can sort a Series by its index labels using the sort_index() method.



### 8. Summarizing and Computing Descriptive Statistics

Pandas offers several methods for summarizing data, such as sum(), mean(), describe(), etc.

In [20]:
# Summary statistics for a DataFrame
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]], 
                  index=['a', 'b', 'c', 'd'], columns=['one', 'two'])
summary = df.describe()
print(summary)


            one       two
count  3.000000  2.000000
mean   3.083333 -2.900000
std    3.493685  2.262742
min    0.750000 -4.500000
25%    1.075000 -3.700000
50%    1.400000 -2.900000
75%    4.250000 -2.100000
max    7.100000 -1.300000


Explanation: describe() provides a summary of statistics for numeric columns in the DataFrame.

### 9. Correlation and Covariance

Calculate correlations and covariances between data.

In [21]:
# Example of correlation
returns = df.pct_change()
corr_matrix = returns.corr()
print(corr_matrix)


     one  two
one  1.0  1.0
two  1.0  1.0


Explanation: This computes the correlation matrix for a DataFrame of percentage changes.