# Pandas

## 1. Introduction to Pandas
Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation library, built on top of the Python programming language.

**Why Pandas?**

While NumPy is great for homogeneous numerical data, Pandas offers labeled data structures, making it far superior for handling the messy, real-world data typically found in spreadsheets or databases.

**Labeled Data**: It allows you to refer to data by meaningful column names (like 'Date', 'Sales', 'Product ID') and row names (indices).

**Missing Data Handling**: Pandas handles missing data (represented as **NaN**) easily.

**Data Alignment**: Operations automatically align data by labels, preventing calculation errors when working with non-aligned datasets.

Pandas introduces two Core primary Data structures:

**Series**: A 1D labeled array (think of a single column of data). It is built on a 1D NumPy array.

**DataFrame**: A 2D labeled data structure with columns of potentially different types (think of an entire spreadsheet or SQL table). It is built on 2D NumPy arrays.

## 2. The Pandas Series (1D Labeled Array)
A **Series** is like a 1D NumPy array but with an explicit index associated with each element.

### 2.1 Creating a **Series** We start by importing Pandas, conventionally aliased as **pd**.

In [1]:
import pandas as pd # importing pandas as pd
import numpy as np # importing numpy as np

In [2]:
# creating numpy array
numpy_array=np.array([10,15,20,25,30])
print(f"Numpy array: {numpy_array}")

# creating pandas series with numpy array
pandas_series=pd.Series(numpy_array)
print("Pandas series: ")
print(pandas_series)

Numpy array: [10 15 20 25 30]
Pandas series: 
0    10
1    15
2    20
3    25
4    30
dtype: int64


**Explanation**: Notice the two columns. The left column (**0, 1, 2, 3,4**) is the Index (or label). The right column is the Value (the data itself).

### 2.2 Adding Custom Labels (Index)

In [3]:
# creating series with custom lables (index)
sales_data=[11,21,31,41,51,61,71]
sales_lables=['sun','mon','tue','wed','thu','fri','sat']


sales_series=pd.Series(data=sales_data, index=sales_lables)
print("Week sales series: ")
print(sales_series)

Week sales series: 
sun    11
mon    21
tue    31
wed    41
thu    51
fri    61
sat    71
dtype: int64


**Key Takeaway**: The data is stored efficiently in a NumPy array, but the labels make it easy to understand and access.

### 2.3 Series Indexing and Operations
Indexing works just like NumPy arrays, but you can also use the custom labels.

In [4]:
# Accessing data using label indexing (like a dictionary key)
wed_sales=sales_series['wed']
print(f"Wed sales: {wed_sales}")
# Accessing data using position indexing (like a list index)
mon_sales=sales_series[1]
print(f"Mon sales: {mon_sales}")

Wed sales: 41
Mon sales: 21


  mon_sales=sales_series[1]


## 3. The Pandas $\texttt{DataFrame}$ (2D Labeled Structure)

A **DataFrame** is the central Pandas object. It is a collection of **Series** objects that share the same index, effectively forming a table.

### 3.1 Creating a Data Frame
The most common way to create a **DataFrame** manually is from a dictionary of Python lists or NumPy arrays. The keys of the dictionary become the column names.

In [5]:
# Data structured as a dictionary (Keys are Column Headers)
students_data = {
    'Exam Score': [85, 92, 78, 95, 88],
    'Attendance': [95, 98, 85, 100, 92],
    'Result': ['Pass', 'Pass', 'Fail', 'Pass', 'Pass']
}

# creating a DataFrame with dict
df_students=pd.DataFrame(students_data)
print("students DataFrame: ")
print(df_students)

students DataFrame: 
   Exam Score  Attendance Result
0          85          95   Pass
1          92          98   Pass
2          78          85   Fail
3          95         100   Pass
4          88          92   Pass


### 3.2 Creating Custom Row Lables (Index)

In [6]:
# Set custom row labels (Student IDs)
students_ids = ['S101', 'S102', 'S103', 'S104', 'S105']

df_students=pd.DataFrame(students_data, index=students_ids)
print("Students DataFrame: ")
print(df_students)

Students DataFrame: 
      Exam Score  Attendance Result
S101          85          95   Pass
S102          92          98   Pass
S103          78          85   Fail
S104          95         100   Pass
S105          88          92   Pass


## 4. Selection and Filtering with .loc and .iloc
In Pandas, direct bracket indexing (like df **['Column']**) is used mainly for selecting columns. For selecting both rows and columns simultaneously, especially with filtering, we use special accessors.

Syntax: **df.loc[row\_label\_selection, column\_label\_selection]**

In [7]:
# Data structured as a dictionary (Keys are Column Headers)
week_sales_data = {
    'Stock': [105,110,115,120,125],
    'Price': [21000,65000,1500,2000,5000],
    'Sun': [45000, 32000, 78000, 95000, 88000],
    'Mon': [85000, 92000, 68000, 75000, 98000],
    'Tue': [55000, 98000, 85000, 100000, 92000],
    'Wed': [65000, 62000, 48000, 35000, 58000],
    'Thu': [65000, 68000, 65000, 100000, 72000],
    'Fri': [35000, 82000, 88000, 85000, 58000],
    'Sat': [95000, 68000, 95000, 100000, 92000]
}
Product=['Phone','Laptop','Mouse','Keyboard','Speaker']
# create Dataframe with Dict
df_sales= pd.DataFrame(week_sales_data, index=Product)
print(df_sales)

          Stock  Price    Sun    Mon     Tue    Wed     Thu    Fri     Sat
Phone       105  21000  45000  85000   55000  65000   65000  35000   95000
Laptop      110  65000  32000  92000   98000  62000   68000  82000   68000
Mouse       115   1500  78000  68000   85000  48000   65000  88000   95000
Keyboard    120   2000  95000  75000  100000  35000  100000  85000  100000
Speaker     125   5000  88000  98000   92000  58000   72000  58000   92000


#### Adding the Total_sales Column (Vectorised Operation)

This is where the NumPy foundation shines! You create a new column by multiplying two existing Series objects (columns). The operation is automatically broadcast element-wise across the rows.

In [8]:
# creating a new column Total sales 
# Create the new column using element-wise multiplication (Broadcasting!)
df_sales['Total_value']= df_sales['Stock'] * df_sales['Price']
print(df_sales)

          Stock  Price    Sun    Mon     Tue    Wed     Thu    Fri     Sat  \
Phone       105  21000  45000  85000   55000  65000   65000  35000   95000   
Laptop      110  65000  32000  92000   98000  62000   68000  82000   68000   
Mouse       115   1500  78000  68000   85000  48000   65000  88000   95000   
Keyboard    120   2000  95000  75000  100000  35000  100000  85000  100000   
Speaker     125   5000  88000  98000   92000  58000   72000  58000   92000   

          Total_value  
Phone         2205000  
Laptop        7150000  
Mouse          172500  
Keyboard       240000  
Speaker        625000  


### 4.1 Accessor .loc (Label-based Indexing)
The **.loc** accessor uses labels (index names and column names) for selection.

**Syntax**: **df.loc[row\_label\_selection, column\_label\_selection]**

In [9]:
# Example 1: Selecting a Single Row and Single Column by Label
# Access the 'Stock' value for 'Keyboard'
keyboard_stock = df_sales.loc['Keyboard', 'Stock']
print(f"1. Keyboard Stock (loc): {keyboard_stock}")

# Example 2: Selecting Multiple Rows and Multiple Columns by Label
# Access 'Stock' and 'Price' for 'Laptop' and 'Speaker'
laptops_and_Speaker_data = df_sales.loc[['Laptop', 'Speaker'], ['Stock', 'Price']]
print(f"\n2. Laptop & Monitor Data (loc):\n{laptops_and_Speaker_data}")

# Example 3: Slicing by Label (inclusive!)
# Get all data from 'Speaker' through 'Mouse' (inclusive for .loc)
Speaker_to_mouse = df_sales.loc['Mouse':'Speaker', :] 
print(f"\n3. Data from Speaker through Mouse (loc slice):\n{Speaker_to_mouse}")

1. Keyboard Stock (loc): 120

2. Laptop & Monitor Data (loc):
         Stock  Price
Laptop     110  65000
Speaker    125   5000

3. Data from Speaker through Mouse (loc slice):
          Stock  Price    Sun    Mon     Tue    Wed     Thu    Fri     Sat  \
Mouse       115   1500  78000  68000   85000  48000   65000  88000   95000   
Keyboard    120   2000  95000  75000  100000  35000  100000  85000  100000   
Speaker     125   5000  88000  98000   92000  58000   72000  58000   92000   

          Total_value  
Mouse          172500  
Keyboard       240000  
Speaker        625000  


### 4.2 Accessor .iloc (Integer-based Indexing)
The **.iloc** accessor uses integer positions (starting from 0) for selection, just like NumPy array indexing. This is useful when you don't care about the labels, only the position.

Syntax: **df.iloc[row\_position\_selection, column\_position\_selection]**

In [10]:
# Example 4: Selecting a Single Cell by Position
# Access the 'Price' (column index 3) for the 3rd item ('Keyboard' - row index 3)
keyboard_price_iloc = df_sales.iloc[2, 3] 
print(f"\n4. Keyboard Total Value (iloc): {keyboard_price_iloc}")

# Example 5: Selecting a Subset using Slicing (exclusive!)
# Get the first two rows (0 and 1) and the first two data columns (1 and 2: Units and Price)
subset_iloc = df_sales.iloc[0:2, 1:3]
print(f"\n5. First two items, Units and Price (iloc slice):\n{subset_iloc}")


4. Keyboard Total Value (iloc): 68000

5. First two items, Units and Price (iloc slice):
        Price    Sun
Phone   21000  45000
Laptop  65000  32000


### 4.3 Conditional Filtering (Boolean Indexing in Pandas)
This is the most common use case and directly leverages your knowledge of NumPy's Boolean Indexing!

Concept: You create a Boolean Series (a column of **True/False** values) by applying a condition to a column. You then pass that Boolean Series to **.loc** to filter the rows.

Syntax: **df.loc[df['Column'] operator value]**

In [11]:
# Find all items where the Price is greater than $10000
# Step 1: Create the Boolean Series (the mask)
price_filter_mask = df_sales['Price'] > 10000
print(f"\nPrice Filter Mask (Boolean Series):\n{price_filter_mask}")

# Step 2: Use .loc to apply the mask to the rows
expensive_items = df_sales.loc[price_filter_mask]

# Or in one line:
expensive_items = df_sales.loc[df_sales['Price'] > 10000]

print(f"\n6. Items with Price > $10000 (Filtered):\n{expensive_items}")


Price Filter Mask (Boolean Series):
Phone        True
Laptop       True
Mouse       False
Keyboard    False
Speaker     False
Name: Price, dtype: bool

6. Items with Price > $10000 (Filtered):
        Stock  Price    Sun    Mon    Tue    Wed    Thu    Fri    Sat  \
Phone     105  21000  45000  85000  55000  65000  65000  35000  95000   
Laptop    110  65000  32000  92000  98000  62000  68000  82000  68000   

        Total_value  
Phone       2205000  
Laptop      7150000  


## 5. Grouping and Summarizing Data (Groupby)

The next major topic is the groupby() method, which is the most essential tool for exploratory data analysis (EDA). It allows you to split your data into groups based on some criterion, apply a function (like **mean**, **sum**, **count**) to each group, and then combine the results.

This is often referred to as the Split-Apply-Combine strategy.

Concept: Split-Apply-Combine

**Split**: The DataFrame is split into groups based on the unique values in a specified column (e.g., grouping sales data by 'Region').

**Apply**: An aggregation function (like **.sum()** or **.mean()**) is applied to the columns of each individual group.

**Combine**: The results are combined into a new DataFrame.

Code Example: Grouping Data
Let's use a new, slightly larger DataFrame representing sales data across regions.

In [12]:
sales_data = {
    'Region': ['North', 'South', 'North', 'West', 'South', 'West'],
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'David'],
    'Revenue': [1500, 2200, 1800, 3000, 1900, 2500],
    'Items_Sold': [10, 15, 12, 20, 14, 18]
}
df_sales = pd.DataFrame(sales_data)

print("Original Sales Data:")
print(df_sales)

# 1. Group the data by the 'Region' column
by_region = df_sales.groupby('Region')

# 2. Apply an aggregation function (e.g., calculate the SUM of Revenue and Items_Sold for each region)
# Only numerical columns are summed.
regional_totals = by_region.sum() 

print(f"\nRegional Sales Totals (Sum):\n{regional_totals}")

Original Sales Data:
  Region Salesperson  Revenue  Items_Sold
0  North       Alice     1500          10
1  South         Bob     2200          15
2  North       Alice     1800          12
3   West     Charlie     3000          20
4  South         Bob     1900          14
5   West       David     2500          18

Regional Sales Totals (Sum):
         Salesperson  Revenue  Items_Sold
Region                                   
North     AliceAlice     3300          22
South         BobBob     4100          29
West    CharlieDavid     5500          38


#### Common Groupby Operations

Aggregation Function | Description

**.sum()** | Total of all values in the group.

**.mean()** | Average of all values in the group.

**.count()** | Number of non-NaN values in the group.

**.max()/.min()** | Maximum/Minimum value in the group.

**.describe()** | "Returns multiple summary statistics (count, mean, std, min, max, quartiles)."

In [13]:
data = {
    'Region': ['East', 'East', 'West', 'East', 'West', 'West', 'East', 'West'],
    'Month': ['Jan', 'Feb', 'Jan', 'Mar', 'Feb', 'Mar', 'Apr', 'Apr'],
    'Sales': [100, 150, 200, 120, 180, 250, 90, 220],
    'Returns': [5, 10, np.nan, 8, 12, 15, np.nan, 18] # Note the NaN (missing) values
}

df = pd.DataFrame(data)
print("--- Original DataFrame ---")
print(df)

--- Original DataFrame ---
  Region Month  Sales  Returns
0   East   Jan    100      5.0
1   East   Feb    150     10.0
2   West   Jan    200      NaN
3   East   Mar    120      8.0
4   West   Feb    180     12.0
5   West   Mar    250     15.0
6   East   Apr     90      NaN
7   West   Apr    220     18.0


#### 1. .sum() Total of all values in the group

In [14]:
# Calculate the total Sales and total Returns for each Region
result_sum = df.groupby('Region').sum()
print('.sum()')
print(result_sum)

.sum()
               Month  Sales  Returns
Region                              
East    JanFebMarApr    460     23.0
West    JanFebMarApr    850     45.0


#### 2. .mean(): Average of 'Sales' and 'Returns' columns in the group

In [15]:
# Select ONLY the 'Sales' and 'Returns' columns for the mean calculation
result_mean = df.groupby('Region')[['Sales', 'Returns']].mean()
print('.mean()')
print(result_mean)

.mean()
        Sales    Returns
Region                  
East    115.0   7.666667
West    212.5  15.000000


#### 3. .count(): Number of non-NaN values in the group

In [16]:
# Count the number of non-NaN entries (transactions) for each column
result_count = df.groupby('Region').count()
print('.count()')
print(result_count)

.count()
        Month  Sales  Returns
Region                       
East        4      4        3
West        4      4        3


#### 4. .max() / .min(): Maximum/Minimum value in the group

In [17]:
# Find the maximum value in each group
result_max = df.groupby('Region').max()
print(".max()")
print(result_max)

# Find the minimum value in each group
result_min = df.groupby('Region').min()
print(".min()")
print(result_min)

.max()
       Month  Sales  Returns
Region                      
East     Mar    150     10.0
West     Mar    250     18.0
.min()
       Month  Sales  Returns
Region                      
East     Apr     90      5.0
West     Apr    180     12.0


#### 5. .describe(): Multiple Summary Statistics

In [18]:
# Returns a summary table of statistics for each numerical column
result_describe = df.groupby('Region').describe()
print(".describe()")
# Only printing the 'Sales' part for brevity, as the output is wide
print(result_describe['Sales'])

.describe()
        count   mean        std    min    25%    50%    75%    max
Region                                                            
East      4.0  115.0  26.457513   90.0   97.5  110.0  127.5  150.0
West      4.0  212.5  29.860788  180.0  195.0  210.0  227.5  250.0


## 6. Handling Missing Data (NaN)

The next major topic is dealing with missing values, often represented as **NaN** (Not a Number) in Pandas (which is a float value managed by NumPy). Real-world data is almost always messy, so this is critical.

### 6.1 Identifing missing data
The methods **.isnull()** (or **.isna()**) and **.notnull()** (or **.notna()**) return Boolean DataFrames/Series indicating where data is missing (**True**) or present (**False**)**

In [19]:
# Create a DataFrame with missing data
data_missing = {
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, 7, 8],
    'C': [9, 10, 11, 12]
}
df_nan = pd.DataFrame(data_missing)

# Check where the values are missing
print("Is Null Check:\n", df_nan.isnull())

# Get a count of missing values per column (chaining methods)
print("\nTotal Missing Values per Column:")
print(df_nan.isnull().sum())

Is Null Check:
        A      B      C
0  False  False  False
1  False   True  False
2   True  False  False
3  False  False  False

Total Missing Values per Column:
A    1
B    1
C    0
dtype: int64


### 6.2 Dropping Missing Data (.dropna())
The simplest way to handle missing data is to remove the rows or columns that contain it.

**Argument** | Description

**axis=0** | Drop rows (default).

**axis=1** | Drop columns.

**how='any'** | Drop the row/column if any value is NaN (default).

**how='all'** | Drop the row/column only if all values are NaN 

In [20]:
# Drop any row that contains at least one NaN value
df_dropped_rows = df_nan.dropna(axis=0, how='any')
print("\nDataFrame after dropping rows with ANY NaN:\n", df_dropped_rows)

# Drop columns where ALL values are NaN (not applicable here, but common)
# df_dropped_cols = df_nan.dropna(axis=1, how='all')


DataFrame after dropping rows with ANY NaN:
      A    B   C
0  1.0  5.0   9
3  4.0  8.0  12


### 6.3 Filling Missing Data (.fillna())
Instead of dropping data, you can fill the missing values with a replacement value (imputation)

In [22]:
# Fill all NaN values with a single fixed value (e.g., 0)
df_filled_zero = df_nan.fillna(0)
print("\nDataFrame after filling NaN with 0:\n", df_filled_zero)

# Fill NaN with the MEAN of that column (common imputation method)
mean_a = df_nan['A'].mean()
df_filled_mean = df_nan.fillna({'A': mean_a}) # Fill only column 'A' with its mean
print(f"\nMean of A ({mean_a:.2f}) used for imputation:\n", df_filled_mean)


DataFrame after filling NaN with 0:
      A    B   C
0  1.0  5.0   9
1  2.0  0.0  10
2  0.0  7.0  11
3  4.0  8.0  12

Mean of A (2.33) used for imputation:
           A    B   C
0  1.000000  5.0   9
1  2.000000  NaN  10
2  2.333333  7.0  11
3  4.000000  8.0  12


## 7. Merging and Joining DataFrames (The "SQL JOIN" of Pandas)
   In real-world analysis, data often lives in multiple tables. Merging is how you combine DataFrames based on a shared column (a key), much like a JOIN operation in SQL.The primary function is **pd.merge()**.

#### Key Arguments of pd.merge()
**left** / **right**: The two DataFrames to merge.

**on**: The column name(s) to join on (the common key).

**how**: Specifies the type of join:
    
**'inner'** (Default): Returns only rows that have matching keys in both DataFrames.
    
**'left'**: Returns all rows from the left DataFrame, and the matching rows from the right. **NaN** is filled where there is no match.
    
**'right'**: Returns all rows from the right DataFrame, and the matching rows from the left.
    
**'outer'**: Returns all rows when there is a match in either the left or the right DataFrame (full union).

In [24]:
# DataFrame 1: Employee Names and IDs
df_employees = pd.DataFrame({'Emp_ID': [1, 2, 3, 4], 
                             'Name': ['Alice', 'Bob', 'Charlie', 'David']})

# DataFrame 2: Project Assignments and IDs (Missing Emp_ID 4)
df_projects = pd.DataFrame({'Emp_ID': [1, 2, 3, 5], 
                            'Project': ['Alpha', 'Beta', 'Gamma', 'Zeta']})

# Inner Merge: Only includes employees found in BOTH tables (1, 2, 3)
merged_inner = pd.merge(df_employees, df_projects, on='Emp_ID', how='inner')

print("Inner Merge Result:")
print(merged_inner)

Inner Merge Result:
   Emp_ID     Name Project
0       1    Alice   Alpha
1       2      Bob    Beta
2       3  Charlie   Gamma


## 8. Applying Custom Functions ({.apply() and .map())
   While vectorized operations (like **df['A'] + df['B']**) are fastest, sometimes you need to run complex, custom logic that Python functions are better suited for.
   
   

#### **.apply()**
   
   The **.apply()** method is used to apply a function along an axis of a DataFrame (row or column) or to a specific **Series**. It's great for complex column transformations.

In [27]:
# Create a simple DataFrame
df_scores = pd.DataFrame({'Score': [85, 92, 78, 95, 60]})

# Function to assign a letter grade
def get_grade(score):
    if score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    elif score >= 70:
        return 'C'
    else:
        return 'F'

# Apply the function to the 'Score' column (which is a Series)
df_scores['Grade'] = df_scores['Score'].apply(get_grade)

print("\nApplying Custom Function (.apply()):")
print(df_scores)


Applying Custom Function (.apply()):
   Score Grade
0     85     B
1     92     A
2     78     C
3     95     A
4     60     F


#### **.map()** (For Series only)

The **.map()** method is specialized for a Series and is ideal for substituting values based on a dictionary or another Series.

In [30]:
df_scores['Pass_Fail'] = df_scores['Grade'].map({'A': 'Pass', 'B': 'Pass', 'C': 'Pass', 'F': 'Fail'})
print("\nApplying Dictionary Mapping (.map()):")
print(df_scores)


Applying Dictionary Mapping (.map()):
   Score Grade Pass_Fail
0     85     B      Pass
1     92     A      Pass
2     78     C      Pass
3     95     A      Pass
4     60     F      Fail


## 9. Pivot Tables and .crosstab() (Reshaping for Summaries)
  These tools are essential for restructuring and summarizing data, often to make it resemble a traditional financial or statistical report.

#### .pivot_table()

This is used to create a spreadsheet-style pivot table as a DataFrame. It's similar to **.groupby()**, but allows you to place one categorical variable on the rows and another on the columns.

**index**: The column(s) to become the row index.

**columns**: The column(s) to become the new column headers.

**values**: The column(s) to aggregate.

**aggfunc**: The aggregation function (sum, mean, count, etc.).

In [32]:
# Sample data with Region, Product, and Sales
df_data = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'East'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Sales': [100, 150, 120, 200, 130]
})

# Pivot: Summarise total Sales by Region AND Product
sales_pivot = df_data.pivot_table(
    index='Region', 
    columns='Product', 
    values='Sales', 
    aggfunc='sum'
)

print("\nPivot Table (Region vs. Product Sales):")
print(sales_pivot)


Pivot Table (Region vs. Product Sales):
Product    A    B
Region           
East     230  150
West     120  200


#### pd.crosstab() (Frequency Tables)
A special case of a pivot table used to compute a simple frequency table of two or more categorical factors.

In [34]:
# Count how many times each Product appears in each Region
frequency_table = pd.crosstab(df_data['Region'], df_data['Product'])
print("\nCrosstab (Frequency Table):")
print(frequency_table)


Crosstab (Frequency Table):
Product  A  B
Region       
East     2  1
West     1  1


## 10.MultiIndex (Hierarchical Indexing) 
A MultiIndex allows a Pandas object to have multiple levels of indexing on an axis (rows or columns), which is often the result of using 
**.groupby()** on multiple keys or using **.pivot\_table()**.

Example: Multi-Level Grouping

When you group by two columns, the result naturally creates a MultiIndex:Python

In [35]:
# Group by both Region AND Product, and calculate the mean Sales
multi_group = df_data.groupby(['Region', 'Product'])['Sales'].mean()

print("\nSeries with MultiIndex (Region, Product):")
print(multi_group)
print(f"Index Type: {multi_group.index}")


Series with MultiIndex (Region, Product):
Region  Product
East    A          115.0
        B          150.0
West    A          120.0
        B          200.0
Name: Sales, dtype: float64
Index Type: MultiIndex([('East', 'A'),
            ('East', 'B'),
            ('West', 'A'),
            ('West', 'B')],
           names=['Region', 'Product'])


#### Key Point: 
You access data in a MultiIndex using tuples for indexing (e.g., **multi _group.loc[('East', 'A')]**  will give you the sales for Product A in the East region).

In [36]:
multi_group.loc[('East','A')] # using .loc() you can get values by index names

np.float64(115.0)