
# 🐾 Multi-Level Indexing (a.k.a. Hierarchical Indexing) & Index Concepts in Pandas

---

#### 📌 Index Fundamentals Recap

- A **DataFrame** has three key components:
  - A NumPy array of data.
  - A **row index**.
  - A **column index**.
- `df.columns` → shows column names (an Index object).
- `df.index` → shows row index values (also an Index object).

---

#### 🔁 Setting & Resetting Index

```python
# Setting a column as index
dogs_ind = dogs.set_index("name")
print(dogs_ind)
````

> Setting an index moves a column from the body to the index (left-aligned). Useful for subsetting.

```python
# Resetting the index
dogs_reset = dogs_ind.reset_index()
```

```python
# Reset and drop the index (fully remove it)
dogs_reset = dogs_ind.reset_index(drop=True)
```

---

#### 🔍 Why Use Indexes?

* Subsetting becomes **cleaner and easier**:

```python
# Subsetting using a regular column
dogs[dogs["name"].isin(["Bella", "Stella"])]
```

vs

```python
# After setting name as index
dogs_ind.loc[["Bella", "Stella"]]
```

> ✔ Index-based subsetting is simpler and more readable.

---

#### 🚨 Index Notes

* **Index values don’t need to be unique**:

  * If "Labrador" appears twice, `loc["Labrador"]` will return **both rows**.

---

### 🧱 Multi-Level Indexing (Hierarchical Indexing)

#### 🏗️ Creating a Multi-Level Index

```python
# Setting both 'breed' and 'color' as multi-level index
dogs_ind3 = dogs.set_index(["breed", "color"])
print(dogs_ind3)
```

```
                        name  height_cm  weight_kg
breed       color                                
Labrador    Brown     Bella         56         25
Poodle      Black   Charlie         43         23
Chow Chow   Brown      Lucy         46         22
Schnauzer   Grey     Cooper         49         17
Labrador    Black       Max         59         29
Chihuahua   Tan      Stella         18          2
St. Bernard White    Bernie         77         74
```

> 🧠 `breed` is outer level, `color` is inner level. This nesting is useful for grouped operations.

---

#### 🔍 Subsetting by Index

```python
# Subset by outer level (breed)
dogs_ind3.loc[["Labrador", "Chihuahua"]]
```

```
                    name  height_cm  weight_kg
breed     color                             
Labrador  Brown   Bella         56         25
          Black     Max         59         29
Chihuahua Tan    Stella         18          2
```

```python
# Subset by all levels using tuples
dogs_ind3.loc[[("Labrador", "Brown"), ("Chihuahua", "Tan")]]
```

```
                    name  height_cm  weight_kg
breed     color                             
Labrador  Brown   Bella         56         25
Chihuahua Tan    Stella         18          2
```

---

#### 🔃 Sorting by Index

```python
# Sort by full index (default: outer to inner, ascending)
dogs_ind3.sort_index()
```

```
                        name  height_cm  weight_kg
breed       color                                
Chihuahua   Tan      Stella         18          2
Chow Chow   Brown      Lucy         46         22
Labrador    Black       Max         59         29
            Brown     Bella         56         25
Poodle      Black   Charlie         43         23
Schnauzer   Grey     Cooper         49         17
St. Bernard White    Bernie         77         74
```

```python
# Sort by custom levels & order
dogs_ind3.sort_index(level=["color", "breed"], ascending=[True, False])
```

```
                        name  height_cm  weight_kg
breed       color                                
Poodle      Black   Charlie         43         23
Labrador    Black       Max         59         29
Labrador    Brown     Bella         56         25
Chow Chow   Brown      Lucy         46         22
Schnauzer   Grey     Cooper         49         17
Chihuahua   Tan      Stella         18          2
St. Bernard White    Bernie         77         74
```

> 🎯 You can sort by specific levels in any order using `sort_index()` with `level=` and `ascending=`.

---

### ⚠️ The Downside of Indexes

* Indexes violate **"tidy data"**:

  * Tidy data: Each column is a variable, each row an observation.
  * Index values aren't part of the columns → ❌ not tidy.
* You must learn **two syntaxes** (index-based vs column-based), making code **harder to reason about** and easier to break.

> ✅ It's OK to avoid using indexes, but **understanding** them is necessary for reading others' code.

---

### 🌡️ Bonus: Temperature Dataset

```python
# A temperature dataset with multiple columns
temperature = pd.DataFrame({
    "date": ["2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01", "2000-05-01"],
    "city": ["Abidjan"] * 5,
    "country": ["Côte D'Ivoire"] * 5,
    "avg_temp_c": [27.2931, 27.6852, 29.0613, 28.1624, 27.547]
})
print(temperature)
```

```
         date     city       country  avg_temp_c
0  2000-01-01  Abidjan  Côte D'Ivoire     27.2931
1  2000-02-01  Abidjan  Côte D'Ivoire     27.6852
2  2000-03-01  Abidjan  Côte D'Ivoire     29.0613
3  2000-04-01  Abidjan  Côte D'Ivoire     28.1624
4  2000-05-01  Abidjan  Côte D'Ivoire     27.5470
```

> 🌍 This dataset will be used to practice time-series and indexed data in later sections.

---

### ✅ Summary

* `set_index(["col1", "col2"])` → multi-level index
* Use `.loc[]` for subsetting by index
* Indexes simplify subsetting but make data less tidy
* Index-based operations can replace column-based logic
* Understand both to navigate real-world pandas workflows

```


In [None]:
# Exercise
# Setting and removing indexes
# pandas allows you to designate columns as an index. This enables cleaner code when taking subsets (as well as providing more efficient lookup under some circumstances).

# In this chapter, you'll be exploring temperatures, a DataFrame of average temperatures in cities around the world. pandas is loaded as pd.

# Instructions
# 100 XP
# Look at temperatures.
# Set the index of temperatures to "city", assigning to temperatures_ind.
# Look at temperatures_ind. How is it different from temperatures?
# Reset the index of temperatures_ind, keeping its contents.
# Reset the index of temperatures_ind, dropping its contents.

# Look at temperatures
print(temperatures)

# Set the index of temperatures to city
temperatures_ind = temperatures.set_index("city")

# Look at temperatures_ind
print(temperatures_ind)

# Reset the temperatures_ind index, keeping its contents
print(temperatures_ind.reset_index())

# Reset the temperatures_ind index, dropping its contents
print(temperatures_ind.reset_index(drop=True))

In [None]:
# Exercise
# Subsetting with .loc[]
# The killer feature for indexes is .loc[]: a subsetting method that accepts index values. When you pass it a single argument, it will take a subset of rows.

# The code for subsetting using .loc[] can be easier to read than standard square bracket subsetting, which can make your code less burdensome to maintain.

# pandas is loaded as pd. temperatures and temperatures_ind are available; the latter is indexed by city.

# Instructions
# 100 XP
# Create a list called cities that contains "London" and "Paris".
# Use [] subsetting to filter temperatures for rows where the city column takes a value in the cities list.
# Use .loc[] subsetting to filter temperatures_ind for rows where the city is in the cities list.
# Make a list of cities to subset on
cities = ["London", "Paris"]

# Subset temperatures using square brackets
print(temperatures[temperatures['city'].isin(cities)])

# Subset temperatures_ind using .loc[]
print(temperatures_ind.loc[cities])




In [None]:
# Exercise
# Setting multi-level indexes
# Indexes can also be made out of multiple columns, forming a multi-level index (sometimes called a hierarchical index). There is a trade-off to using these.

# The benefit is that multi-level indexes make it more natural to reason about nested categorical variables. For example, in a clinical trial, you might have control and treatment groups. Then each test subject belongs to one or another group, and we can say that a test subject is nested inside the treatment group. Similarly, in the temperature dataset, the city is located in the country, so we can say a city is nested inside the country.

# The main downside is that the code for manipulating indexes is different from the code for manipulating columns, so you have to learn two syntaxes and keep track of how your data is represented.

# pandas is loaded as pd. temperatures is available.

# Instructions
# 100 XP
# Set the index of temperatures to the "country" and "city" columns, and assign this to temperatures_ind.
# Specify two country/city pairs to keep: "Brazil"/"Rio De Janeiro" and "Pakistan"/"Lahore", assigning to rows_to_keep.
# Print and subset temperatures_ind for rows_to_keep using .loc[].

# Index temperatures by country & city
temperatures_ind = temperatures.set_index(['country','city'])

# List of tuples: Brazil, Rio De Janeiro & Pakistan, Lahore
rows_to_keep = [("Brazil", "Rio De Janeiro"), ("Pakistan", "Lahore")]

# Subset for rows to keep
print(temperatures_ind.loc[rows_to_keep])

In [None]:
# Exercise
# Sorting by index values
# Previously, you changed the order of the rows in a DataFrame by calling .sort_values(). It's also useful to be able to sort by elements in the index. For this, you need to use .sort_index().

# pandas is loaded as pd. temperatures_ind has a multi-level index of country and city, and is available.

# Instructions
# 100 XP
# Sort temperatures_ind by the index values.
# Sort temperatures_ind by the index values at the "city" level.
# Sort temperatures_ind by ascending country then descending city.

# Sort temperatures_ind by index values
print(temperatures_ind.sort_index())

# Sort temperatures_ind by index values at the city level
print(temperatures_ind.sort_index(level=['city']))

# Sort temperatures_ind by country then descending city
print(temperatures_ind.sort_index(level=['country', 'city' ], ascending=[True, False]))


# 🐼 Data Manipulation with Pandas: Slicing & Subsetting Cheatsheet

## 🔹 1. Slicing Lists
Python lists can be sliced using `list[start:stop]`.

```python
breeds = ["Chihuahua", "Chow Chow", "Labrador", "Poodle", "Schnauzer"]
breeds[1:4]  # ['Chow Chow', 'Labrador', 'Poodle']
breeds[:3]   # From start to 3rd index (excluded)
breeds[:]    # Whole list
````

---

## 🔹 2. Slicing with `.loc[]` (label-based)

You must sort the index before slicing. Works with both **row** and **column names**.

```python
dogs_srt.loc[:, "name":"height_cm"]
```

✅ **Final row/column is included**

---

## 🔹 3. MultiIndex Slicing

You must pass tuples for multi-level index slicing.

```python
dogs_srt.loc[("Labrador", "Brown"):("Schnauzer", "Grey"), "name":"height_cm"]
```

⛔ Slicing inner index badly (like `"Tan":"Grey"`) won't work
✅ Instead, always slice with **tuples**: `("Breed", "Color")`

---

## 🔹 4. Slicing by Dates

Set the date as index first:

```python
dogs = dogs.set_index("date_of_birth").sort_index()
```

Then slice with:

```python
dogs.loc["2014-08-25":"2016-09-16"]
```

✅ You can also slice with **partial dates**:

```python
dogs.loc["2014":"2016"]  # Includes full 2014–2016 range
```

---

## 🔹 5. Slicing with `.iloc[]` (position-based)

Uses index numbers of rows and columns.

```python
dogs.iloc[2:5, 1:4]
```

⛔ Final index is NOT included.
Only gets row 2, 3, 4 and column 1, 2, 3.

---

## 🔹 6. Slice Rows AND Columns Together

```python
dogs_srt.loc[("Labrador", "Brown"):("Schnauzer", "Grey"), "name":"height_cm"]
```

This slices both rows (by index values) and columns (by names) in a single call.

---

## 🧠 Quick Recap Table

| What You Want To Do        | Syntax                          | Final Included? | Uses             |
| -------------------------- | ------------------------------- | --------------- | ---------------- |
| Slice by labels (names)    | `df.loc["a":"d"]`               | ✅ Yes           | Row/column names |
| Slice by position          | `df.iloc[1:4]`                  | ❌ No            | Row/col index    |
| Slice columns only         | `df.loc[:, "col1":"col3"]`      | ✅ Yes           | Column names     |
| Slice by full/partial date | `df.loc["2020":"2021"]`         | ✅ Yes           | DatetimeIndex    |
| MultiIndex slice           | `df.loc[("A", "x"):("B", "y")]` | ✅ Yes           | Tuple values     |

---

## ✅ Summary Tips

* `.loc[]` → uses names, includes final
* `.iloc[]` → uses numbers, excludes final
* Always sort index before slicing
* For MultiIndex: slice using tuples!
* Dates can be partially specified like `"2020"` or `"2020-05"`

---



In [None]:
# Exercise
# Slicing index values
# Slicing lets you select consecutive elements of an object using first:last syntax. DataFrames can be sliced by index values or by row/column number; we'll start with the first case. This involves slicing inside the .loc[] method.

# Compared to slicing lists, there are a few things to remember.

# You can only slice an index if the index is sorted (using .sort_index()).
# To slice at the outer level, first and last can be strings.
# To slice at inner levels, first and last should be tuples.
# If you pass a single slice to .loc[], it will slice the rows.
# pandas is loaded as pd. temperatures_ind has country and city in the index, and is available.

# Instructions
# 100 XP
# Sort the index of temperatures_ind.
# Use slicing with .loc[] to get these subsets:
# from Pakistan to Philippines.
# from Lahore to Manila. (This will return nonsense.)
# from Pakistan, Lahore to Philippines, Manila.


# Sort the index of temperatures_ind
temperatures_srt = temperatures_ind.sort_index()

# Subset rows from Pakistan to Philippines
print(temperatures_srt.loc['Pakistan':'Philippines'])

# Try to subset rows from Lahore to Manila
print(temperatures_srt.loc['Lahore':'Manila'])

# Subset rows from Pakistan, Lahore to Philippines, Manila
print(temperatures_srt.loc[('Pakistan','Lahore'): ('Philippines','Manila')])

In [None]:
# Exercise
# Slicing in both directions
# You've seen slicing DataFrames by rows and by columns, but since DataFrames are two-dimensional objects, it is often natural to slice both dimensions at once. That is, by passing two arguments to .loc[], you can subset by rows and columns in one go.

# pandas is loaded as pd. temperatures_srt is indexed by country and city, has a sorted index, and is available.

# Instructions
# 100 XP
# Use .loc[] slicing to subset rows from India, Hyderabad to Iraq, Baghdad.
# Use .loc[] slicing to subset columns from date to avg_temp_c.
# Slice in both directions at once from Hyderabad to Baghdad, and date to avg_temp_c.

# Subset rows from India, Hyderabad to Iraq, Baghdad
print(temperatures_srt.loc[('India','Hyderabad'):('Iraq','Baghdad')])

# Subset columns from date to avg_temp_c
print(temperatures_srt.loc[:, 'date':'avg_temp_c'])

# Subset in both directions at once
print(temperatures_srt.loc[('India', 'Hyderabad'):('Iraq', 'Baghdad'), 'date':'avg_temp_c'])


In [None]:
# Exercise
# Slicing time series
# Slicing is particularly useful for time series since it's a common thing to want to filter for data within a date range. Add the date column to the index, then use .loc[] to perform the subsetting. The important thing to remember is to keep your dates in ISO 8601 format, that is, "yyyy-mm-dd" for year-month-day, "yyyy-mm" for year-month, and "yyyy" for year.

# Recall from Chapter 1 that you can combine multiple Boolean conditions using logical operators, such as &. To do so in one line of code, you'll need to add parentheses () around each condition.

# pandas is loaded as pd and temperatures, with no index, is available.

# Instructions
# 100 XP
# Use Boolean conditions, not .isin() or .loc[], and the full date "yyyy-mm-dd", to subset temperatures for rows where the date column is in 2010 and 2011 and print the results.
# Set the index of temperatures to the date column and sort it.
# Use .loc[] to subset temperatures_ind for rows in 2010 and 2011.
# Use .loc[] to subset temperatures_ind for rows from August 2010 to February 2011.

# Use Boolean conditions to subset temperatures for rows in 2010 and 2011
temperatures_bool = temperatures[(temperatures["date"] >= "2010-01-01") & (temperatures["date"] <= "2011-12-31")]
print(temperatures_bool)

# Set date as the index and sort the index
temperatures_ind = temperatures.set_index("date").sort_index()

# Use .loc[] to subset temperatures_ind for rows in 2010 and 2011
print(temperatures_ind.loc["2010":"2011"])

# Use .loc[] to subset temperatures_ind for rows from Aug 2010 to Feb 2011
print(temperatures_ind.loc["2010-08":"2011-02"])


In [2]:
# Subsetting by row/column number
# The most common ways to subset rows are the ways we've previously discussed: using a Boolean condition or by index labels. However, it is also occasionally useful to pass row numbers.

# This is done using .iloc[], and like .loc[], it can take two arguments to let you subset by rows and columns.

# pandas is loaded as pd. temperatures (without an index) is available.

# Instructions
# 100 XP
# Use .iloc[] on temperatures to take subsets.

# Get the 23rd row, 2nd column (index positions 22 and 1).
# Get the first 5 rows (index positions 0 to 5).
# Get all rows, columns 3 and 4 (index positions 2 to 4).
# Get the first 5 rows, columns 3 and 4.

# Get 23rd row, 2nd column (index 22, 1)
print(temperatures.iloc[22:1])

# Use slicing to get the first 5 rows
print(temperatures.iloc[:5])

# Use slicing to get columns 3 to 4
print(temperatures.iloc[:, 2:4])

# Use slicing in both directions at once
print(temperatures.iloc[0:5, 2:4])



# 🐶 Working with Pivot Tables in Pandas

In this section, we explore how to work with **pivot tables** using pandas. Pivot tables help you reorganize and summarize your data — they’re like Excel’s pivot tables, but with code! You'll learn to create pivot tables, subset them, and calculate statistics across rows and columns.

---

## 📊 A Bigger Dog Dataset

```python
print(dog_pack)
````

The `dog_pack` DataFrame contains many dog records, each with:

* `breed` (e.g., Boxer, Beagle)
* `color` (e.g., Black, White)
* `height_cm` (numeric)
* `weight_kg` (numeric)

This rich dataset is used to demonstrate pivoting and statistical summarization.

---

## 🔄 Creating a Pivot Table

To analyze average height of dogs grouped by breed and color:

```python
dogs_height_by_breed_vs_color = dog_pack.pivot_table(
    "height_cm", index="breed", columns="color"
)
print(dogs_height_by_breed_vs_color)
```

This creates a table where:

* Rows are dog **breeds**
* Columns are dog **colors**
* Cell values are the **average height** of dogs in that group

### 📌 Example Output (abridged)

| breed     | Black | Brown | Gray  | Tan   | White |
| --------- | ----- | ----- | ----- | ----- | ----- |
| Beagle    | 34.50 | 36.45 | 36.31 | 35.74 | 38.81 |
| Boxer     | 57.20 | 62.64 | 58.28 | 62.31 | 56.36 |
| Chihuahua | 18.55 | NaN   | 21.66 | 20.10 | 17.93 |
| Poodle    | 48.04 | 57.13 | 56.65 | NaN   | 44.74 |

🧠 **Takeaway**: `pivot_table()` is ideal for comparing stats like mean height across combinations of categories.

---

## 🔍 Subsetting with `.loc[]` and Slicing

You can use `.loc[]` to select specific rows from a pivot table — especially useful when your index is sorted.

```python
dogs_height_by_breed_vs_color.loc["Chow Chow":"Poodle"]
```

This selects a range of breeds alphabetically from **Chow Chow to Poodle**.

### 📌 Example Output

| breed     | Black | Brown | Gray  | Tan   | White |
| --------- | ----- | ----- | ----- | ----- | ----- |
| Chow Chow | 51.26 | 50.48 | NaN   | 53.50 | 54.41 |
| Dachshund | 21.19 | 19.72 | NaN   | 19.38 | 20.66 |
| Labrador  | 57.13 | NaN   | NaN   | 55.19 | 55.31 |
| Poodle    | 48.04 | 57.13 | 56.65 | NaN   | 44.74 |

---

## 🔁 Using the `axis` Argument for Calculations

You can use `.mean()` or similar methods to compute summary statistics **across rows or columns** using the `axis` argument.

### ➕ Average Height per Color (Across Breeds)

```python
dogs_height_by_breed_vs_color.mean(axis="index")
```

Equivalent to: *"average for each color column"*

| Color | Mean Height (cm) |
| ----- | ---------------- |
| Black | 43.97            |
| Brown | 48.72            |
| Gray  | 48.11            |
| Tan   | 44.93            |
| White | 44.47            |

🧠 **Tip**: `axis="index"` (or `axis=0`) = column-wise operation = average across rows (i.e., across breeds)

---

### ➕ Average Height per Breed (Across Colors)

```python
dogs_height_by_breed_vs_color.mean(axis="columns")
```

Equivalent to: *"average for each breed row"*

| Breed       | Mean Height (cm) |
| ----------- | ---------------- |
| Beagle      | 36.36            |
| Boxer       | 59.36            |
| Chihuahua   | 19.56            |
| Chow Chow   | 52.41            |
| Dachshund   | 20.24            |
| Labrador    | 55.88            |
| Poodle      | 51.64            |
| St. Bernard | 66.65            |

🧠 **Tip**: `axis="columns"` (or `axis=1`) = row-wise operation = average across columns (i.e., across colors)

---

## ✅ Summary and Key Takeaways

* **Pivot tables** allow quick summaries of grouped data in tabular form.
* Use `.pivot_table(value, index, columns)` to create one.
* Use `.loc[]` and slicing for subsetting rows in pivot tables.
* Use `.mean(axis=...)`, `.sum(axis=...)`, etc. to compute summaries.

  * `axis="index"` (or `0`) → operate across rows (per column)
  * `axis="columns"` (or `1`) → operate across columns (per row)
* Pivot tables are perfect when your data has multiple categories (like breed and color), and you want to compute stats like mean height or weight.

---

💡 **Why this matters**: Pivot tables help simplify complex group-wise calculations, making your data analysis faster, cleaner, and more readable — especially in large datasets!

---

```
```


In [None]:
# Exercise
# Pivot temperature by city and year
# It's interesting to see how temperatures for each city change over time—looking at every month results in a big table, which can be tricky to reason about. Instead, let's look at how temperatures change by year.

# You can access the components of a date (year, month and day) using code of the form dataframe["column"].dt.component. For example, the month component is dataframe["column"].dt.month, and the year component is dataframe["column"].dt.year.

# Once you have the year column, you can create a pivot table with the data aggregated by city and year, which you'll explore in the coming exercises.

# pandas is loaded as pd. temperatures is available.

# Instructions
# 100 XP
# Add a year column to temperatures, from the year component of the date column.
# Make a pivot table of the avg_temp_c column, with country and city as rows, and year as columns. Assign to temp_by_country_city_vs_year, and look at the result.



# Add a year column to temperatures
temperatures['year'] = temperatures['date'].dt.year

# Pivot avg_temp_c by country and city vs year
temp_by_country_city_vs_year = temperatures.pivot_table(index=['country','city'], columns = 'year', values= 'avg_temp_c')

# See the result
print(temp_by_country_city_vs_year)

In [None]:
# Exercise
# Subsetting pivot tables
# A pivot table is just a DataFrame with sorted indexes, so the techniques you have learned already can be used to subset them. In particular, the .loc[] + slicing combination is often helpful.

# pandas is loaded as pd. temp_by_country_city_vs_year is available.

# Instructions
# 100 XP
# Use .loc[] on temp_by_country_city_vs_year to take subsets.

# From Egypt to India.
# From Egypt, Cairo to India, Delhi.
# From Egypt, Cairo to India, Delhi, and 2005 to 2010.


# Subset for Egypt to India
temp_by_country_city_vs_year.loc['Egypt':'India']

# Subset for Egypt, Cairo to India, Delhi
temp_by_country_city_vs_year.loc[('Egypt','Cairo'):('India','Delhi')]


# Subset for Egypt, Cairo to India, Delhi, and 2005 to 2010
temp_by_country_city_vs_year.loc[('Egypt','Cairo'):('India','Delhi'), 2005:2010]



# year                       2005       2006       2007       2008       2009       2010
# country  city                                                                         
# Egypt    Cairo        22.006500  22.050000  22.361000  22.644500  22.625000  23.718250
#          Gizeh        22.006500  22.050000  22.361000  22.644500  22.625000  23.718250
# Ethiopia Addis Abeba  18.312833  18.427083  18.142583  18.165000  18.765333  18.298250
# France   Paris        11.552917  11.788500  11.750833  11.278250  11.464083  10.409833
# Germany  Berlin        9.919083  10.545333  10.883167  10.657750  10.062500   8.606833
# India    Ahmadabad    26.828083  27.282833  27.511167  27.048500  28.095833  28.017833
#          Bangalore    25.476500  25.418250  25.464333  25.352583  25.725750  25.705250
#          Bombay       27.035750  27.381500  27.634667  27.177750  27.844500  27.765417
#          Calcutta     26.729167  26.986250  26.584583  26.522333  27.153250  27.288833
#          Delhi        25.716083  26.365917  26.145667  25.675000  26.554250  26.520250
# In [1]:


In [None]:
# Exercise
# Calculating on a pivot table
# Pivot tables are filled with summary statistics, but they are only a first step to finding something insightful. Often you'll need to perform further calculations on them. A common thing to do is to find the rows or columns where the highest or lowest value occurs.

# Recall from Chapter 1 that you can easily subset a Series or DataFrame to find rows of interest using a logical condition inside of square brackets. For example: series[series > value].

# pandas is loaded as pd and the DataFrame temp_by_country_city_vs_year is available. The .head() for this DataFrame is shown below, with only a few of the year columns displayed:

# country       city        2000     2001     2002    ...     2013
# ---------------------------------------------------------------
# Afghanistan   Kabul       15.823   15.848   15.715  ...     16.206
# Angola        Luanda      24.410   24.427   24.791  ...     24.554
# Australia     Melbourne   14.320   14.180   14.076  ...     14.742
# Australia     Sydney      17.567   17.854   17.734  ...     18.090
# Bangladesh    Dhaka       25.905   25.931   26.095  ...     26.587

# Instructions
# 100 XP
# Calculate the mean temperature for each year, assigning to mean_temp_by_year.
# Filter mean_temp_by_year for the year that had the highest mean temperature.
# Calculate the mean temperature for each city (across columns), assigning to mean_temp_by_city.
# Filter mean_temp_by_city for the city that had the lowest mean temperature.


# Get the worldwide mean temp by year
mean_temp_by_year = temp_by_country_city_vs_year.mean()

# Filter for the year that had the highest mean temp
print(mean_temp_by_year[mean_temp_by_year == mean_temp_by_year.max()])

# Get the mean temp by city
mean_temp_by_city = temp_by_country_city_vs_year.mean(axis=1)

# Filter for the city that had the lowest mean temp
print(mean_temp_by_city[mean_temp_by_city == mean_temp_by_city.min()])


# <script.py> output:
#     year
#     2013    20.312285
#     dtype: float64
#     country  city  
#     China    Harbin    4.876551
#     dtype: float64
