### Reindexing and Iteration

### 1: Reindexing a DataFrame
#### This code creates a DataFrame and reindexes it to include only specific rows (0, 2, 5) and columns (A, C, B). Missing columns (like B) are filled with NaN.


In [1]:
import pandas as pd
import numpy as np

N = 20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01', periods=N, freq='D'),
   'x': np.linspace(0, stop=N-1, num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})

df_reindexed = df.reindex(index=[0, 2, 5], columns=['A', 'C', 'B'])
print(df_reindexed)

           A     C   B
0 2016-01-01  High NaN
2 2016-01-03  High NaN
5 2016-01-06   Low NaN


### 2: Reindex to Align with Other Objects
#### The reindex_like method aligns df1 with the index and columns of df2. Rows in df1 that don't exist in df2 are dropped or filled with NaN.

In [2]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10, 3), columns=['col1', 'col2', 'col3'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['col1', 'col2', 'col3'])

df1 = df1.reindex_like(df2)
print(df1)

       col1      col2      col3
0  0.782214 -0.135319  0.182093
1  1.345943 -0.181890 -0.095614
2  0.504307 -0.001843  0.048715
3  0.751621 -0.697434 -0.282950
4 -0.612639  0.089298  0.574106
5 -0.733311  1.196204 -0.834162
6  0.112335  0.008859  0.175290


### 3: Filling while ReIndexing
#### This code reindexes df2 to match df1, filling missing values with NaN. The method='ffill' argument fills missing values using the last valid observation.

In [3]:
df1 = pd.DataFrame(np.random.randn(6, 3), columns=['col1', 'col2', 'col3'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['col1', 'col2', 'col3'])

print(df2.reindex_like(df1))
print("Data Frame with Forward Fill:")
print(df2.reindex_like(df1, method='ffill'))

       col1      col2      col3
0 -0.451607 -1.448622  0.297669
1 -0.480563 -0.760197  0.363539
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0 -0.451607 -1.448622  0.297669
1 -0.480563 -0.760197  0.363539
2 -0.480563 -0.760197  0.363539
3 -0.480563 -0.760197  0.363539
4 -0.480563 -0.760197  0.363539
5 -0.480563 -0.760197  0.363539


### 4: Limits on Filling while Reindexing
#### This code limits the forward fill (ffill) to only one consecutive missing value. Additional missing values remain as NaN.

In [4]:
df1 = pd.DataFrame(np.random.randn(6, 3), columns=['col1', 'col2', 'col3'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['col1', 'col2', 'col3'])

print(df2.reindex_like(df1))
print("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1, method='ffill', limit=1))

       col1      col2      col3
0  0.548612 -0.601636  0.355207
1 -0.080276  0.944091 -1.262286
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill limiting to 1:
       col1      col2      col3
0  0.548612 -0.601636  0.355207
1 -0.080276  0.944091 -1.262286
2 -0.080276  0.944091 -1.262286
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN


### 5: Renaming Columns and Rows
#### This code renames specific columns (col1 to c1, col2 to c2) and rows (0 to apple, 1 to banana, etc.) in the DataFrame.

In [5]:
df1 = pd.DataFrame(np.random.randn(6, 3), columns=['col1', 'col2', 'col3'])
print(df1)

print("After renaming the rows and columns:")
print(df1.rename(columns={'col1': 'c1', 'col2': 'c2'},
                 index={0: 'apple', 1: 'banana', 2: 'durian'}))

       col1      col2      col3
0  0.391220 -0.998930 -1.385806
1  0.532465  1.544081 -0.386162
2  0.621266  0.040398  0.239738
3  0.909737  0.373803  1.372684
4 -0.448955  0.441225 -2.649434
5  0.914784 -2.399003 -0.565606
After renaming the rows and columns:
              c1        c2      col3
apple   0.391220 -0.998930 -1.385806
banana  0.532465  1.544081 -0.386162
durian  0.621266  0.040398  0.239738
3       0.909737  0.373803  1.372684
4      -0.448955  0.441225 -2.649434
5       0.914784 -2.399003 -0.565606


###  6: Iterating Over DataFrame Columns
#### This code iterates over the columns of the DataFrame and prints each column name.

In [6]:
N = 20
df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01', periods=N, freq='D'),
   'x': np.linspace(0, stop=N-1, num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})

for col in df:
   print(col)

A
x
y
C
D


###  7: Iterating Over DataFrame Columns with iteritems()
##### The iteritems() method iterates over columns, returning the column name (key) and the column data (value) as a Series. The items() method iterates over the columns of the DataFrame, returning the column name (key) and the column data (value) as a Series. This is the updated replacement for the deprecated iteritems() method.

In [15]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])
for key, value in df.items():
   print(key, value)

col1 0    0.362136
1    0.334225
2    0.331824
3   -0.863047
Name: col1, dtype: float64
col2 0   -0.081730
1   -1.362476
2    0.629966
3    0.112758
Name: col2, dtype: float64
col3 0    1.510487
1    1.592317
2    0.066958
3   -2.066370
Name: col3, dtype: float64


### 8: Iterating Over DataFrame Rows with iterrows()
#### The iterrows() method iterates over rows, returning the row index and the row data as a Series.

In [9]:
df = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])
for row_index, row in df.iterrows():
   print(row_index, row)

0 col1   -1.944606
col2    0.737124
col3    2.036539
Name: 0, dtype: float64
1 col1   -0.560856
col2    0.331619
col3   -0.769374
Name: 1, dtype: float64
2 col1   -0.532753
col2    0.275520
col3   -1.102272
Name: 2, dtype: float64
3 col1   -1.499655
col2   -1.938137
col3   -0.856091
Name: 3, dtype: float64


###  9: Iterating Over DataFrame Rows with itertuples()
#### The itertuples() method iterates over rows, returning each row as a named tuple. This is faster and more memory-efficient than iterrows().

In [10]:
df = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])
for row in df.itertuples():
    print(row)

Pandas(Index=0, col1=0.8261887695976218, col2=0.9303177528653156, col3=0.10186352610818401)
Pandas(Index=1, col1=0.4225426788912268, col2=0.16867164231862153, col3=-0.2543201270325081)
Pandas(Index=2, col1=0.3869847530617681, col2=0.021062862361256148, col3=0.5448058466607731)
Pandas(Index=3, col1=0.07810378438912467, col2=0.8857889645730975, col3=0.04378990086095557)


### Code 10: Modifying Rows While Iterating
#### This code attempts to modify rows while iterating, but changes are not reflected in the original DataFrame. Modifications need to be applied directly to the DataFrame.

#### Each code demonstrates different operations on DataFrames, such as reindexing, renaming, and iterating over rows or columns.



In [12]:
df = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])

for index, row in df.iterrows():
   row['a'] = 10
print(df)

       col1      col2      col3
0 -1.283132 -2.240721 -1.150408
1 -0.695113  0.313063  0.798360
2  0.915406  0.973817 -1.481904
3 -1.486234 -1.846104 -0.364376
