## pandas
- Used for EDA
- used for data cleaning, etc
- importing/exporting data, creating/deleting columns,

#### Topics:
- Series: difference between pandas **Series** and pandas **Dataframe**, access series,
- Series: statistical operation, element-wise function, boolean function, mapping/transformation, missing value, arithmatics function,etc
- Dataframe: data collection->read from CSV XLSX, JSON,pickle, web
- Dataframe: access rows / columns ( loc, iloc ) , slicing, 
- Dataframe: add columns, drop column, add row, remove row, replacing values in a column
- Dataframe: indexng

In [102]:
import pandas as pd
import numpy as np

print(pd.__version__)
data_file = 'data.csv'

2.2.0


# Series
## Difference between pandas Series and pandas Dataframe
```
Feature         Series              DataFrame
Dimensions        1D                  2D
Shape             (n,)             (rows, columns)
Data Structure    Single column   Table with multiple columns
Index             Yes               Yes (rows & columns)
Usage         Single column or row   Full dataset
```

In [103]:
# Series declaration

s = pd.Series([10, 20, 30, 40]) # From a List
print(type(s)) # <class 'pandas.core.series.Series'>
print(s)

print("#####################")
s = pd.Series([10, 20, 30], index=['a', 'b', 'c']) # From a List with Custom Index
print(s)

print("$$$$$$$$$$$$$$$$$$$$$$$$$")
data = {'Alice': 25, 'Bob': 30, 'Charlie': 35} #  From a Dictionary
s = pd.Series(data)
print(s)
print("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
s = pd.Series(np.random.randint(10,20,5))
print(s)


<class 'pandas.core.series.Series'>
0    10
1    20
2    30
3    40
dtype: int64
#####################
a    10
b    20
c    30
dtype: int64
$$$$$$$$$$$$$$$$$$$$$$$$$
Alice      25
Bob        30
Charlie    35
dtype: int64
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0    15
1    14
2    18
3    18
4    10
dtype: int32


In [104]:
# access series
s = pd.Series([10, 20, 30, 40, 50])
print(s[0])
print(s[:3])
print(s[2:])
print(s[1:3])


10
0    10
1    20
2    30
dtype: int64
2    30
3    40
4    50
dtype: int64
1    20
2    30
dtype: int64


### operation on series

In [105]:
# 1. Arithmetic Operations
s = pd.Series([10, 20, 30])

print(s + 5)       # Add 5 to each element
print(s * 2)       # Multiply each element by 2
print(s / 10)      # Divide each element by 10

# 2. Statistical Operations
s = pd.Series([10, 20, 30, 40])

print(s.mean())      # Average
print(s.median())    # Median
print(s.std())       # Standard deviation
print(s.max())       # Maximum
print(s.min())       # Minimum
print(s.sum())       # Sum of all elements
print(s.cumsum())
print(s.cumprod())
print(s.describe())
print(s.quantile(0.25))  # 25th percentile

print(s.sem())  # Standard error mean
print(s.nunique())  # Unique count

print(s.value_counts())  # Value counts (frequency of each unique value)

print(s.idxmin())  # Index of first min value
print(s.idxmax())  # Index of first max value



# 3. Element-wise Functions
s = pd.Series([1, 2, 3, 4])
print(np.sqrt(s))      # Square root
print(np.exp(s))       # Exponential
print(np.log(s))       # Logarithm

# 4. Boolean Filtering
s = pd.Series([10, 20, 30, 40])
print(s>25)
print(s[s > 25])       # Filter elements greater than 25

# 5. Value Counts & Uniqueness
s = pd.Series(['apple', 'banana', 'apple', 'orange'])
print(s.value_counts())   # Frequency of unique values
print(s.unique())         # Unique values

# 6. Mapping / Transformation
s = pd.Series([1, 2, 3])
print(s.map(lambda x: x * 10))  # Apply a function to each element

# 7. String Operations (for string Series)
s = pd.Series(['hello', 'my friend'])
print(s.str.upper())     # Convert to uppercase
print(s.str.len())       # Length of each string

# 8. Handling Missing Data
s = pd.Series([1, 2, None, 4])
print(s.isnull())        # Check for NaNs
print(s.fillna(0))       # Replace NaNs with 0
print(s.dropna())        # Drop NaNs

# 9. Sorting
s = pd.Series([10, 2, 30])
print(s.sort_values())   # Sort by value
print(s.sort_index())    # Sort by index

# 10. Combine / Arithmetic Between Series
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
print(s1 + s2)   # Aligns by index and adds

0    15
1    25
2    35
dtype: int64
0    20
1    40
2    60
dtype: int64
0    1.0
1    2.0
2    3.0
dtype: float64
25.0
25.0
12.909944487358056
40
10
100
0     10
1     30
2     60
3    100
dtype: int64
0        10
1       200
2      6000
3    240000
dtype: int64
count     4.000000
mean     25.000000
std      12.909944
min      10.000000
25%      17.500000
50%      25.000000
75%      32.500000
max      40.000000
dtype: float64
17.5
6.454972243679028
4
10    1
20    1
30    1
40    1
Name: count, dtype: int64
0
3
0    1.000000
1    1.414214
2    1.732051
3    2.000000
dtype: float64
0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64
0    0.000000
1    0.693147
2    1.098612
3    1.386294
dtype: float64
0    False
1    False
2     True
3     True
dtype: bool
2    30
3    40
dtype: int64
apple     2
banana    1
orange    1
Name: count, dtype: int64
['apple' 'banana' 'orange']
0    10
1    20
2    30
dtype: int64
0        HELLO
1    MY FRIEND
dtype: object
0    5
1

## Dataframe

In [106]:
# Create a DataFrame: Exampl1 2hardcode the data
data = {
    'A': [1, 3, 6, 9],
    'B': [True, False, True, True],
    'C': ['ash', 'timmy', 'jimmy', 'Samantha'],
    
}
df = pd.DataFrame(data)

print(df)

   A      B         C
0  1   True       ash
1  3  False     timmy
2  6   True     jimmy
3  9   True  Samantha


In [107]:
# Create a DataFrame: Example 2 hardcode the data
data = {
    'A': pd.Series(pd.date_range("2023-01-01", periods=5, freq='D')),  # Time data
    'B': pd.Series([120.5, 123.0, 121.3, 125.6, 124.2]),
    'C': pd.Series(['Buy', 'Sell', 'Hold', 'Buy', 'Sell']),
    'D': pd.Series(np.random.randint(1, 11, size=5))
}
df = pd.DataFrame(data)

print(df)

           A      B     C   D
0 2023-01-01  120.5   Buy   3
1 2023-01-02  123.0  Sell  10
2 2023-01-03  121.3  Hold   6
3 2023-01-04  125.6   Buy   4
4 2023-01-05  124.2  Sell   3


In [108]:
# Create a DataFrame: Example 3 hardcode the data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Helen', 'Helen', 'Helen', 'Jerry'],
    'Age': [25, 30, 35, 40, 22, 28, 32, 26, 26, 26, 23],
    'City': ['Chicago', 'Los Angeles', 'Chicago', 'Houston', 'Houston', np.nan, 'San Antonio', 'San Diego', 'San Diego', 'San Diego', 'Phoenix'],
    'Experience': [2, 5, 7, 10, 1, 3, 6, 2, 2, 2, 6],
    'Experience2': [2, 5, 7, 10, 1, 3, 6, 2, 2, 2, 6],
    'Salary': [70000.0, 80000.0, np.nan, 90000.0, 48000.0, 72000.0, 85000.0, 62000.0, 62000.0, 62000.0, 78000.0]
}

df = pd.DataFrame(data)
print(df)

       Name  Age         City  Experience  Experience2   Salary
0     Alice   25      Chicago           2            2  70000.0
1       Bob   30  Los Angeles           5            5  80000.0
2   Charlie   35      Chicago           7            7      NaN
3     David   40      Houston          10           10  90000.0
4       Eva   22      Houston           1            1  48000.0
5     Frank   28          NaN           3            3  72000.0
6     Grace   32  San Antonio           6            6  85000.0
7     Helen   26    San Diego           2            2  62000.0
8     Helen   26    San Diego           2            2  62000.0
9     Helen   26    San Diego           2            2  62000.0
10    Jerry   23      Phoenix           6            6  78000.0


In [109]:
series_age = df['Age']
print(type(series_age)) # <class 'pandas.core.series.Series'>
print(series_age)

<class 'pandas.core.series.Series'>
0     25
1     30
2     35
3     40
4     22
5     28
6     32
7     26
8     26
9     26
10    23
Name: Age, dtype: int64


In [110]:
row_labels = ['Row1', 'Row2', 'Row3', 'Row4']
column_headings = ['A', 'B', 'C', 'D', 'E']
data = np.random.randint(10, 100, size=(4, 5))

df = pd.DataFrame(data, index=row_labels, columns=column_headings)

print(f"Generated DataFrame:\n{df}")

Generated DataFrame:
       A   B   C   D   E
Row1  35  74  57  91  57
Row2  64  77  78  23  84
Row3  59  30  84  71  54
Row4  89  20  50  83  60


### importing data from file (and also exporting)

In [111]:
# import to CSV
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Helen', 'Helen', 'Helen', 'Jerry'],
    'Age': [25, 30, 35, 40, 22, 28, 32, 26, 26, 26, 23],
    'City': ['Chicago', 'Los Angeles', 'Chicago', 'Houston', 'Houston', np.nan, 'San Antonio', 'San Diego', 'San Diego', 'San Diego', 'Phoenix'],
    'Experience': [2, 5, 7, 10, 1, 3, 6, 2, 2, 2, 6],
    'Experience2': [2, 5, 7, 10, 1, 3, 6, 2, 2, 2, 6],
    'Salary': [70000.0, 80000.0, np.nan, 90000.0, 48000.0, 72000.0, 85000.0, 62000.0, 62000.0, 62000.0, 78000.0]
}

df = pd.DataFrame(data)
print(df)

# # different ways to export data to file
df.to_csv(data_file, index=False)
df.to_excel('data.xlsx', index=False, sheet_name='Sheet1')
df.to_json('data.json', orient='records', lines=True)
df.to_pickle('data.pkl')

#Skip following
# df.to_sql('my_table', conn, if_exists='replace', index=False)  # Requires SQLAlchemy or sqlite3


       Name  Age         City  Experience  Experience2   Salary
0     Alice   25      Chicago           2            2  70000.0
1       Bob   30  Los Angeles           5            5  80000.0
2   Charlie   35      Chicago           7            7      NaN
3     David   40      Houston          10           10  90000.0
4       Eva   22      Houston           1            1  48000.0
5     Frank   28          NaN           3            3  72000.0
6     Grace   32  San Antonio           6            6  85000.0
7     Helen   26    San Diego           2            2  62000.0
8     Helen   26    San Diego           2            2  62000.0
9     Helen   26    San Diego           2            2  62000.0
10    Jerry   23      Phoenix           6            6  78000.0


In [112]:
# Or Export from CSV with comma seperator
# from pathlib import Path
# data_file = Path.cwd() / 'data.csv'

# df = pd.read_csv(data_file)
df = pd.read_csv(data_file)

print("\nDataFrame1:\n", df)
# print(f"\nDataFrame1:\n{}")


DataFrame1:
        Name  Age         City  Experience  Experience2   Salary
0     Alice   25      Chicago           2            2  70000.0
1       Bob   30  Los Angeles           5            5  80000.0
2   Charlie   35      Chicago           7            7      NaN
3     David   40      Houston          10           10  90000.0
4       Eva   22      Houston           1            1  48000.0
5     Frank   28          NaN           3            3  72000.0
6     Grace   32  San Antonio           6            6  85000.0
7     Helen   26    San Diego           2            2  62000.0
8     Helen   26    San Diego           2            2  62000.0
9     Helen   26    San Diego           2            2  62000.0
10    Jerry   23      Phoenix           6            6  78000.0


In [113]:
# different ways to import

# df = pd.read_csv('semi-colon.txt', sep=';') # file has , in the data
# print(df)

df = pd.read_csv('https://raw.githubusercontent.com/ash322ash422/tut_pandas_numpy/refs/heads/master/titanic.csv', sep=',')
print(df.head(5))

# df = pd.read_excel('data.xlsx', sheet_name='Sheet1')  # Requires openpyxl or xlrd
# print(df.head(5))


# df = pd.read_excel('data.xlsx', names = ['a', 'b', 'c', 'd', 'e', 'f'], skiprows=[1], sheet_name='Sheet1')  # Requires openpyxl or xlrd
# print(df.head(5))



# df = pd.read_json('data.json',  lines=True)
# print(df.head(5))

# df = pd.read_pickle('data.pkl')
# print(df.head(5))

#NOTE: LEGALITY
# url = 'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population'
# tables = pd.read_html(url)  # returns a list of DataFrames
# print(f"Total tables: {len(tables)}")
# df = tables[2]
# print(df.head(5))

# url = 'https://en.wikipedia.org/wiki/Minnesota'
# tables = pd.read_html(url)  # returns a list of DataFrames
# print(f"Total tables: {len(tables)}")
# df = tables[2]
# print(df.head(5))

### Following skip for now############
# import sqlite3
# conn = sqlite3.connect('my_database.db')
# df = pd.read_sql('SELECT * FROM my_table', conn)



   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


## Access columns

In [114]:
df = pd.read_csv(data_file)

# style1
series_age = df['Age']
print(type(series_age)) # <class 'pandas.core.series.Series'>
print(series_age)

# style2
series_age = df.Age
print(type(series_age)) # <class 'pandas.core.series.Series'>
print(series_age)

<class 'pandas.core.series.Series'>
0     25
1     30
2     35
3     40
4     22
5     28
6     32
7     26
8     26
9     26
10    23
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>
0     25
1     30
2     35
3     40
4     22
5     28
6     32
7     26
8     26
9     26
10    23
Name: Age, dtype: int64


In [115]:
# access mutiple columns
df_temp = df[['Name', 'City']]
print(df_temp)

       Name         City
0     Alice      Chicago
1       Bob  Los Angeles
2   Charlie      Chicago
3     David      Houston
4       Eva      Houston
5     Frank          NaN
6     Grace  San Antonio
7     Helen    San Diego
8     Helen    San Diego
9     Helen    San Diego
10    Jerry      Phoenix


In [116]:
names_of_col = df.columns.tolist()
print(names_of_col)

['Name', 'Age', 'City', 'Experience', 'Experience2', 'Salary']


### slicing

In [117]:
# Slice rows from index 2 to 8
print("\nSlicing rows [2:9]:\n", df[2:9])  # Rows 2,3,... 8

# Slice rows from index 2 to 8, step 2
print("\nSlicing rows [2:9:2]:\n", df[2:9:2])  # Rows 2, 4, 6, 8

# Slice all rows with a step of 3
print("\nEvery 3rd row:\n", df[::3])  # Rows 0, 3, 6, 9

# Reverse the DataFrame
print("\nReversed DataFrame:\n", df[::-1])  # From last to first


# Example: out-of-range slicing doesn't error
print("\nOut-of-bounds slicing [1:3:9] just returns matching row(s):\n", df[1:3:20])  # Row 1 only (step 9 has no effect since only one row in range)


Slicing rows [2:9]:
       Name  Age         City  Experience  Experience2   Salary
2  Charlie   35      Chicago           7            7      NaN
3    David   40      Houston          10           10  90000.0
4      Eva   22      Houston           1            1  48000.0
5    Frank   28          NaN           3            3  72000.0
6    Grace   32  San Antonio           6            6  85000.0
7    Helen   26    San Diego           2            2  62000.0
8    Helen   26    San Diego           2            2  62000.0

Slicing rows [2:9:2]:
       Name  Age         City  Experience  Experience2   Salary
2  Charlie   35      Chicago           7            7      NaN
4      Eva   22      Houston           1            1  48000.0
6    Grace   32  San Antonio           6            6  85000.0
8    Helen   26    San Diego           2            2  62000.0

Every 3rd row:
     Name  Age         City  Experience  Experience2   Salary
0  Alice   25      Chicago           2            2  7000

## loc, iloc
- loc - Label-based selection
- .iloc[] – Integer position-based selection

In [118]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 22],
    'City': ['Chicago', 'LA', 'Chicago', 'Houston', 'Houston']
}

df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e'])
print(df)
#############################################
# 1. Access a single row by label:
df.loc['b']

# 2. Access multiple rows by labels:
df.loc[['a', 'c', 'e']]

# 3. Access specific rows and columns:
df.loc[['a', 'b'], ['Name', 'City']]

# 4. Filter rows with a condition:
df.loc[df['Age'] > 30]
df.loc[df['Age'] > 30 , ['Name', 'City']]
###########################################
# 1. Access a single row by index:
df.iloc[1]  # second row

# 2. Access a specific cell (row 1, column 2):
df.iloc[1, 2]  # LA

# 3. Access multiple rows and columns:
df.iloc[0:3, 0:2]  # First 3 rows, first 2 columns

# 4. Modify a value:
df.iloc[0, 1] = 26  # Change Alice's age from 25 to 26

      Name  Age     City
a    Alice   25  Chicago
b      Bob   30       LA
c  Charlie   35  Chicago
d    David   40  Houston
e      Eva   22  Houston


## Add rows/ columns ; remove rows / columns 

In [119]:
df = pd.read_csv('data.csv')
print(df)

       Name  Age         City  Experience  Experience2   Salary
0     Alice   25      Chicago           2            2  70000.0
1       Bob   30  Los Angeles           5            5  80000.0
2   Charlie   35      Chicago           7            7      NaN
3     David   40      Houston          10           10  90000.0
4       Eva   22      Houston           1            1  48000.0
5     Frank   28          NaN           3            3  72000.0
6     Grace   32  San Antonio           6            6  85000.0
7     Helen   26    San Diego           2            2  62000.0
8     Helen   26    San Diego           2            2  62000.0
9     Helen   26    San Diego           2            2  62000.0
10    Jerry   23      Phoenix           6            6  78000.0


In [120]:
# add a recoord
df.loc[len(df)] = ['Mihindou', 29, 'Tokyo', 4, 4, 67000.0]
print(df)

        Name  Age         City  Experience  Experience2   Salary
0      Alice   25      Chicago           2            2  70000.0
1        Bob   30  Los Angeles           5            5  80000.0
2    Charlie   35      Chicago           7            7      NaN
3      David   40      Houston          10           10  90000.0
4        Eva   22      Houston           1            1  48000.0
5      Frank   28          NaN           3            3  72000.0
6      Grace   32  San Antonio           6            6  85000.0
7      Helen   26    San Diego           2            2  62000.0
8      Helen   26    San Diego           2            2  62000.0
9      Helen   26    San Diego           2            2  62000.0
10     Jerry   23      Phoenix           6            6  78000.0
11  Mihindou   29        Tokyo           4            4  67000.0


In [121]:
# drop row based on index
df = df.drop([11,])
print(df)

       Name  Age         City  Experience  Experience2   Salary
0     Alice   25      Chicago           2            2  70000.0
1       Bob   30  Los Angeles           5            5  80000.0
2   Charlie   35      Chicago           7            7      NaN
3     David   40      Houston          10           10  90000.0
4       Eva   22      Houston           1            1  48000.0
5     Frank   28          NaN           3            3  72000.0
6     Grace   32  San Antonio           6            6  85000.0
7     Helen   26    San Diego           2            2  62000.0
8     Helen   26    San Diego           2            2  62000.0
9     Helen   26    San Diego           2            2  62000.0
10    Jerry   23      Phoenix           6            6  78000.0


In [122]:
df['company'] = "Lucent Technologies" # create a new column with same values all across rows
print("\nAfter :\n",df)


After :
        Name  Age         City  Experience  Experience2   Salary  \
0     Alice   25      Chicago           2            2  70000.0   
1       Bob   30  Los Angeles           5            5  80000.0   
2   Charlie   35      Chicago           7            7      NaN   
3     David   40      Houston          10           10  90000.0   
4       Eva   22      Houston           1            1  48000.0   
5     Frank   28          NaN           3            3  72000.0   
6     Grace   32  San Antonio           6            6  85000.0   
7     Helen   26    San Diego           2            2  62000.0   
8     Helen   26    San Diego           2            2  62000.0   
9     Helen   26    San Diego           2            2  62000.0   
10    Jerry   23      Phoenix           6            6  78000.0   

                company  
0   Lucent Technologies  
1   Lucent Technologies  
2   Lucent Technologies  
3   Lucent Technologies  
4   Lucent Technologies  
5   Lucent Technologies  
6  

In [123]:
# Lets drop the above 'Salary_increase' column
df.drop(columns=['company','Experience2'], inplace=True)
print("\nAfter dropping a column:\n",df)


After dropping a column:
        Name  Age         City  Experience   Salary
0     Alice   25      Chicago           2  70000.0
1       Bob   30  Los Angeles           5  80000.0
2   Charlie   35      Chicago           7      NaN
3     David   40      Houston          10  90000.0
4       Eva   22      Houston           1  48000.0
5     Frank   28          NaN           3  72000.0
6     Grace   32  San Antonio           6  85000.0
7     Helen   26    San Diego           2  62000.0
8     Helen   26    San Diego           2  62000.0
9     Helen   26    San Diego           2  62000.0
10    Jerry   23      Phoenix           6  78000.0


In [124]:
df['City'] = df['City'].replace({'Los Angeles': 'LA', 'San Diego': 'SD'})
print(df)

       Name  Age         City  Experience   Salary
0     Alice   25      Chicago           2  70000.0
1       Bob   30           LA           5  80000.0
2   Charlie   35      Chicago           7      NaN
3     David   40      Houston          10  90000.0
4       Eva   22      Houston           1  48000.0
5     Frank   28          NaN           3  72000.0
6     Grace   32  San Antonio           6  85000.0
7     Helen   26           SD           2  62000.0
8     Helen   26           SD           2  62000.0
9     Helen   26           SD           2  62000.0
10    Jerry   23      Phoenix           6  78000.0


## indexing

In [2]:
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Rahim', 'Alice', 'Timmy', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'LA', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print("000000000000000000000")

# 1. Set a column as the index
df_indexed = df.set_index('Name')
print("\nSet 'Name' as index:")
print(df_indexed)
print("11111111111111111111")

# 2. Reset index
df_reset = df_indexed.reset_index()
print("\nReset index:")
print(df_reset)
print("22222222222222222222222")

# 3. Set index inplace
df.set_index('City', inplace=True)
print("\nSet 'City' as index inplace:")
print(df)
print("3333333333333333333333333")

# 4. Change the index manually
df.index = ['row1', 'row2', 'row3', 'row4'] 
print("\nManual index change:")
print(df)

# 5. Accessing rows using index
df = df_reset  # Resetting to original

# Set index to 'Name'
df.set_index('Name', inplace=True)

print("\nAccess row using label 'Alice':")
print(df.loc['Alice'])

print("\nAccess row using position 0:")
print(df.iloc[0])

# 6. Multi-indexing
df_reset = df.reset_index()
df_multi = df_reset.set_index(['City', 'Name'])
print("\nMulti-indexed DataFrame:")
print(df_multi)

# 7. Sorting index
df_sorted = df.sort_index()
print("\nSorted by index:")
print(df_sorted)


Original DataFrame:
    Name  Age      City
0  Rahim   25  New York
1  Alice   30        LA
2  Timmy   35   Chicago
3  David   40   Houston
000000000000000000000

Set 'Name' as index:
       Age      City
Name                
Rahim   25  New York
Alice   30        LA
Timmy   35   Chicago
David   40   Houston
11111111111111111111

Reset index:
    Name  Age      City
0  Rahim   25  New York
1  Alice   30        LA
2  Timmy   35   Chicago
3  David   40   Houston
22222222222222222222222

Set 'City' as index inplace:
           Name  Age
City                
New York  Rahim   25
LA        Alice   30
Chicago   Timmy   35
Houston   David   40
3333333333333333333333333

Manual index change:
       Name  Age
row1  Rahim   25
row2  Alice   30
row3  Timmy   35
row4  David   40

Access row using label 'Alice':
Age     30
City    LA
Name: Alice, dtype: object

Access row using position 0:
Age           25
City    New York
Name: Rahim, dtype: object

Multi-indexed DataFrame:
                Age
Cit