A Pandas DataFrame is a two-dimensional, tabular data structure in the pandas library for Python, designed for efficient data manipulation and analysis. It can be thought of as a spreadsheet or SQL table, consisting of rows and columns, where each column is a Pandas Series sharing a common index. Below, I provide a comprehensive explanation of DataFrames in the context of coding, covering their creation, operations, key features, and practical examples, tailored to your interest in pandas following the discussion on Series.

### Pandas DataFrame
- A DataFrame is a two-dimensional, labeled data structure with rows and columns, capable of holding heterogeneous data (e.g., integers, floats, strings, objects).
- **Components**:
  - **Data**: Values organized in a tabular format.
  - **Index**: Labels for rows (default is integer-based, starting from 0).
  - **Columns**: Labels for columns, allowing each column to act as a Series.
  - Built on NumPy arrays, DataFrames combine the performance of NumPy with flexible indexing and data alignment features.
#### Creating a DataFrame
DataFrames can be created from various data sources.

1. From a Dictionary of Lists or Arrays:

In [5]:
import pandas as pd
data = {'A': [1, 2, 3], 'B': ['x', 'y', 'z']}
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'])
print(df)

      A  B
row1  1  x
row2  2  y
row3  3  z


2. From a List of Dictionaries:

In [8]:
data = [{'A': 1, 'B': 'x'}, {'A': 2, 'B': 'y'}, {'A': 3, 'B': 'z'}]
df = pd.DataFrame(data)
print(df)

   A  B
0  1  x
1  2  y
2  3  z


3. From a NumPy Array:

In [11]:
import numpy as np
data = np.array([[1, 'x'], [2, 'y'], [3, 'z']])
df = pd.DataFrame(data, columns=['A', 'B'], index=['row1', 'row2', 'row3'])
print(df)

      A  B
row1  1  x
row2  2  y
row3  3  z


4. From a List of Series:

In [14]:
s1 = pd.Series([1, 2, 3], index=['row1', 'row2', 'row3'])
s2 = pd.Series(['x', 'y', 'z'], index=['row1', 'row2', 'row3'])
df = pd.DataFrame({'A': s1, 'B': s2})
print(df)

      A  B
row1  1  x
row2  2  y
row3  3  z


#### Key Attributes of a DataFrame
Access metadata using attributes:

- **.values**: Returns the underlying data as a NumPy array.

In [17]:
print(df.values)

[[1 'x']
 [2 'y']
 [3 'z']]


- **.index**: Returns the row index.

In [20]:
print(df.index)

Index(['row1', 'row2', 'row3'], dtype='object')


- **.columns**: Returns the column labels.

In [23]:
print(df.columns)

Index(['A', 'B'], dtype='object')


- **.dtypes**: Returns the data type of each column.

In [26]:
print(df.dtypes)

A     int64
B    object
dtype: object


- **.shape**: Returns the dimensions (rows, columns).

In [29]:
print(df.shape)

(3, 2)


#### Indexing and Selection
DataFrames support flexible indexing for accessing rows, columns, or specific elements:

**1. Selecting Columns** (returns a Series):

In [32]:
print(df['A'])

row1    1
row2    2
row3    3
Name: A, dtype: int64


 - **Multiple columns** (returns a DataFrame):

In [40]:
print(df[['A', 'B']])

      A  B
row1  1  x
row2  2  y
row3  3  z


**2. Label-Based Indexing** (using .loc):

In [43]:
print(df.loc['row1', 'A']) 
print(df.loc['row1':'row2', 'A'])

1
row1    1
row2    2
Name: A, dtype: int64


**3. Integer-Based Indexing** (using .iloc):

In [46]:
print(df.iloc[0, 0]) 
print(df.iloc[0:2, 0])

1
row1    1
row2    2
Name: A, dtype: int64


**4. Boolean Indexing:**

In [49]:
print(df[df['A'] > 1])

      A  B
row2  2  y
row3  3  z


**5. Fast Scalar Access** (using .at or .iat):

In [52]:
print(df.at['row1', 'A']) 
print(df.iat[0, 0])

1
1


#### Operations on DataFrames
DataFrames support vectorized operations, aggregation, and transformations.

**1. Arithmetic Operations**:

In [55]:
df['A'] = df['A'] + 10
print(df)

       A  B
row1  11  x
row2  12  y
row3  13  z


**2. Applying Functions**:

- Element-wise with applymap (for entire DataFrame, deprecated in favor of map in newer versions):

In [59]:
df_numeric = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df_numeric.map(lambda x: x * 2))

   A   B
0  2   8
1  4  10
2  6  12


- Column/row-wise with apply:

In [62]:
print(df_numeric.apply(lambda x: x.sum(), axis=0))

A     6
B    15
dtype: int64


**3. Statistical Operations**:

In [65]:
print(df_numeric.mean())

print(df_numeric.describe())  # Summary statistics

A    2.0
B    5.0
dtype: float64
         A    B
count  3.0  3.0
mean   2.0  5.0
std    1.0  1.0
min    1.0  4.0
25%    1.5  4.5
50%    2.0  5.0
75%    2.5  5.5
max    3.0  6.0


**4. GroupBy Operations**:
- Split data into groups, apply a function, and combine results.

In [68]:
df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': [1, 2, 3]})
print(df.groupby('A').sum())

   B
A   
x  4
y  2


#### Handling Missing Data
DataFrames handle missing data (NaN for numerics, None for objects):

- **Detecting**:

In [71]:
df = pd.DataFrame({'A': [1, None, 3], 'B': ['x', 'y', None]})
print(df.isna())

       A      B
0  False  False
1   True  False
2  False   True


- **Filling**:

In [74]:
print(df.fillna({'A': 0, 'B': 'z'}))

     A  B
0  1.0  x
1  0.0  y
2  3.0  z


- **Dropping**:

In [77]:
print(df.dropna())

     A  B
0  1.0  x


#### Merging, Joining, and Concatenating
DataFrames support SQL-like operations to combine datasets:

**1. Concatenation** (stack vertically or horizontally):

In [80]:
df1 = pd.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['z', 'w']})
print(pd.concat([df1, df2]))

   A  B
0  1  x
1  2  y
0  3  z
1  4  w


**2. Merging** (SQL-like joins):

In [83]:
df1 = pd.DataFrame({'key': ['x', 'y'], 'A': [1, 2]})
df2 = pd.DataFrame({'key': ['x', 'z'], 'B': [3, 4]})
print(pd.merge(df1, df2, on='key', how='inner'))

  key  A  B
0   x  1  3


**3. Joining** (index-based):

In [86]:
df1 = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['x', 'z'])
print(df1.join(df2, how='inner'))

   A  B
x  1  3


#### Reshaping and Pivoting
DataFrames support reshaping for different data formats:

**1. Pivot** (long to wide format):

In [89]:
df = pd.DataFrame({'foo': ['one', 'one', 'two'], 'bar': ['A', 'B', 'A'], 'baz': [1, 2, 3]})
print(df.pivot(index='foo', columns='bar', values='baz'))

bar    A    B
foo          
one  1.0  2.0
two  3.0  NaN


**2. Melt** (wide to long format):

In [92]:
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
print(pd.melt(df, id_vars='A', value_vars='B'))

   A variable  value
0  1        B      3
1  2        B      4


#### Time-Series Functionality
DataFrames excel at handling time-series data:

In [None]:
dates = pd.date_range('2023-01-01', periods=3)
df = pd.DataFrame({'value': [10, 20, 30]}, index=dates)
print(df.resample('M').mean())