#### Setting and Resetting the Index

Setting the index in pandas allows you to specify which column(s) should be used as the index of the DataFrame. Resetting the index, on the other hand, moves the index back into the DataFrame as a column and creates a default integer index.

In [1]:
import pandas as pd

# Creating a sample dataframe
data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Setting 'A' column as the index
df.set_index('A', inplace=True)
print("\nDataFrame after setting index:")
print(df)

# Resetting the index
df.reset_index(inplace=True)
print("\nDataFrame after resetting index:")
print(df)


Original DataFrame:
   A  B
0  1  4
1  2  5
2  3  6

DataFrame after setting index:
   B
A   
1  4
2  5
3  6

DataFrame after resetting index:
   A  B
0  1  4
1  2  5
2  3  6


#### Renaming the Index and Columns Labels

You can rename index and column labels in pandas using the rename() method.

In [2]:
# Renaming index and column labels
df.rename(index={1: 'one', 2: 'two', 3: 'three'}, columns={'A': 'Alpha', 'B': 'Beta'}, inplace=True)
print("\nDataFrame after renaming:")
print(df)


DataFrame after renaming:
     Alpha  Beta
0        1     4
one      2     5
two      3     6


#### Retrieving Data Using .loc and .iloc
The .loc[] is used for label-based indexing while .iloc[] is used for positional indexing.

In [3]:
# Retrieving data using .loc[] and .iloc[]
print("\nUsing .loc[]:")
print(df.loc['one'])

print("\nUsing .iloc[]:")
print(df.iloc[0])


Using .loc[]:
Alpha    2
Beta     5
Name: one, dtype: int64

Using .iloc[]:
Alpha    1
Beta     4
Name: 0, dtype: int64


#### Creating Random Sample with the sample() Method
You can create a random sample of rows or columns using the sample() method.

In [4]:
# Creating a random sample
sample = df.sample(n=2)  # Get 2 random rows
print("\nRandom sample:")
print(sample)


Random sample:
     Alpha  Beta
two      3     6
0        1     4


#### Using smallest()/largest() Method
These methods return the n smallest or largest values from a Series.

In [5]:
# Using smallest/largest method
smallest = df['Beta'].nsmallest(2)
largest = df['Beta'].nlargest(2)

print("\nSmallest values:")
print(smallest)

print("\nLargest values:")
print(largest)


Smallest values:
0      4
one    5
Name: Beta, dtype: int64

Largest values:
two    6
one    5
Name: Beta, dtype: int64


#### Extraction Using the where() Method and query() Method
The where() method is used to conditionally replace values, while the query() method allows you to filter rows according to a given condition.

In [6]:
# Extraction using where method
where_result = df.where(df['Beta'] > 4)
print("\nExtraction using where method:")
print(where_result)

# Extraction using query method
query_result = df.query("Alpha > 1")
print("\nExtraction using query method:")
print(query_result)


Extraction using where method:
     Alpha  Beta
0      NaN   NaN
one    2.0   5.0
two    3.0   6.0

Extraction using query method:
     Alpha  Beta
one      2     5
two      3     6


#### Extraction Using the apply() Method
The apply() method applies a function along any axis of the DataFrame.

In [7]:
# Extraction using apply method
apply_result = df.apply(lambda x: x * 2)
print("\nExtraction using apply method:")
print(apply_result)


Extraction using apply method:
     Alpha  Beta
0        2     8
one      4    10
two      6    12


#### Extraction Using the copy() Method
The copy() method is used to make a copy of the DataFrame.

In [8]:
# Extraction using copy method
copy_df = df.copy()
print("\nCopy of DataFrame:")
print(copy_df)


Copy of DataFrame:
     Alpha  Beta
0        1     4
one      2     5
two      3     6


#### Understanding the Data by Using Mean, Median, Efficient, and Cumulative Methods
Pandas provides methods to calculate mean, median, and cumulative values efficiently.

In [9]:
# Calculating mean, median, efficient, and cumulative values
print("\nMean:")
print(df.mean())

print("\nMedian:")
print(df.median())

print("\nEfficient (summary statistics):")
print(df.describe())

print("\nCumulative sum:")
print(df.cumsum())


Mean:
Alpha    2.0
Beta     5.0
dtype: float64

Median:
Alpha    2.0
Beta     5.0
dtype: float64

Efficient (summary statistics):
       Alpha  Beta
count    3.0   3.0
mean     2.0   5.0
std      1.0   1.0
min      1.0   4.0
25%      1.5   4.5
50%      2.0   5.0
75%      2.5   5.5
max      3.0   6.0

Cumulative sum:
     Alpha  Beta
0        1     4
one      3     9
two      6    15


#### Use of Groupby, Crosstab, and Pivot Tables
Groupby allows you to group data based on some criteria, crosstab computes a cross-tabulation of two or more factors, and pivot tables provide a way to summarize and aggregate data in a DataFrame.

In [10]:
# Groupby, crosstab, and pivot tables
grouped = df.groupby('Alpha').sum()
print("\nGroupby:")
print(grouped)

crosstab_result = pd.crosstab(df['Alpha'], df['Beta'])
print("\nCrosstab:")
print(crosstab_result)

pivot_table = df.pivot_table(index='Alpha', columns='Beta', aggfunc='sum')
print("\nPivot Table:")
print(pivot_table)


Groupby:
       Beta
Alpha      
1         4
2         5
3         6

Crosstab:
Beta   4  5  6
Alpha         
1      1  0  0
2      0  1  0
3      0  0  1

Pivot Table:
Empty DataFrame
Columns: []
Index: [1, 2, 3]
