<a href="https://colab.research.google.com/github/ddaeducation/DataAnalyst/blob/main/_Indexes_in_Pandas___Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Examples
### Objectives
1. Understand the concept of indexes in Pandas and their importance in data manipulation.
2. Learn how to create and modify indexes in a Pandas DataFrame.
3. Explore the different types of indexes available in Pandas, including default, custom, and multi-level indexes.
4. Analyze a real dataset using indexes to enhance data retrieval and analysis.
5. Develop skills to perform operations such as filtering, sorting, and grouping using indexes.

### Introduction
Pandas is a powerful data manipulation library in Python that provides data structures like Series and DataFrames to handle structured data efficiently. One of the key features of Pandas is its indexing capabilities, which allow for fast data retrieval and manipulation. An index in Pandas serves as a reference point for accessing data, similar to an index in a book. It can be thought of as a label for rows in a DataFrame, enabling users to perform operations like selection, filtering, and aggregation more intuitively.

In this notebook, we will explore the concept of indexes in Pandas using a real dataset. We will utilize the "Iris" dataset, which contains measurements of different species of iris flowers. This dataset is widely used for demonstrating data analysis techniques and is readily available in the Seaborn library.

### Dataset
The Iris dataset consists of 150 samples of iris flowers, with four features: sepal length, sepal width, petal length, and petal width, along with the species of the iris flower.

### Questions
Now, let's create ten questions related to the Iris dataset, each in its own cell.


In [None]:
# Question 1: What are the dimensions of the Iris dataset?
import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target_names[iris.target]

# Display the dimensions
iris_df.shape

(150, 5)

In [None]:
# Question 2: What are the first five rows of the Iris dataset?
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
# Question 3: How many unique species are present in the Iris dataset?
iris_df['species'].nunique()

3

In [None]:
# Question 4: What is the average sepal length for each species?
iris_df.groupby('species')['sepal length (cm)'].mean()

species
setosa        5.006
versicolor    5.936
virginica     6.588
Name: sepal length (cm), dtype: float64

In [None]:
# Question 5: What is the maximum petal width in the dataset?
iris_df['petal width (cm)'].max()

2.5

In [None]:
# Question 6: How many samples are there for each species?
iris_df['species'].value_counts()

species
setosa        50
versicolor    50
virginica     50
Name: count, dtype: int64

In [None]:
# Question 7: What is the standard deviation of the sepal width for the entire dataset?
iris_df['sepal width (cm)'].std()

0.435866284936698

In [None]:
# Question 8: Create a custom index using the sepal length and sepal width.
iris_df.set_index(['sepal length (cm)', 'sepal width (cm)'], inplace=True)
iris_df.index

MultiIndex([(5.1, 3.5),
            (4.9, 3.0),
            (4.7, 3.2),
            (4.6, 3.1),
            (5.0, 3.6),
            (5.4, 3.9),
            (4.6, 3.4),
            (5.0, 3.4),
            (4.4, 2.9),
            (4.9, 3.1),
            ...
            (6.7, 3.1),
            (6.9, 3.1),
            (5.8, 2.7),
            (6.8, 3.2),
            (6.7, 3.3),
            (6.7, 3.0),
            (6.3, 2.5),
            (6.5, 3.0),
            (6.2, 3.4),
            (5.9, 3.0)],
           names=['sepal length (cm)', 'sepal width (cm)'], length=150)

In [None]:
# Question 9: How can we reset the index back to the default integer index?
iris_df.reset_index(inplace=True)
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
# Question 10: What is the correlation between petal length and petal width?
iris_df[['petal length (cm)', 'petal width (cm)']].corr()

Unnamed: 0,petal length (cm),petal width (cm)
petal length (cm),1.0,0.962865
petal width (cm),0.962865,1.0



This structure provides a comprehensive exploration of the Iris dataset while focusing on the use of indexes in Pandas. Each question is designed to enhance understanding of data manipulation techniques using indexes.

### Exercises
To generate a DataFrame from the provided URL and display the first 10 rows, you will need to use the `pandas` library in Python. Here’s how you can do it:

1. Import the `pandas` library.
2. Use the `read_csv` function to read the CSV file from the URL.
3. Use the `head` method to display the first 10 rows of the DataFrame.

Here’s the code to accomplish this:


In [None]:
import pandas as pd
# Load the data from the URL
url = 'https://raw.githubusercontent.com/bbwieland/ncaa-projections/main/data/TeamRatings.csv'
df = pd.read_csv(url)
# Display the first 10 rows
df.head()

Unnamed: 0,team,season_ortg,season_drtg,season_nrtg,avg_poss,off_rk,def_rk,net_rk,tempo_rk
0,Houston,11.336137,17.116614,28.452751,64.214595,12,3,1,333
1,UConn,15.391627,12.184026,27.575654,67.672105,1,15,2,188
2,Alabama,9.73644,16.823355,26.559795,73.336216,22,4,3,9
3,Tennessee,5.257544,20.826315,26.083858,65.479444,58,1,4,299
4,UCLA,8.589127,16.337035,24.926162,66.617838,25,5,5,244



1. **What are the top 5 teams based on their ratings?**
   - This question can help identify the highest-rated teams in the dataset.

2. **How many teams are included in the dataset?**
   - This question will provide insight into the size of the dataset.

3. **What is the average rating of all teams?**
   - This question can help understand the overall performance level of the teams.

4. **Which team has the lowest rating, and what is that rating?**
   - This question can identify the team that is currently rated the lowest.

5. **How do the ratings vary by conference?**
   - This question can explore the distribution of ratings across different conferences.

6. **What is the distribution of team ratings (e.g., mean, median, standard deviation)?**
   - This question can provide statistical insights into the ratings.

7. **Are there any teams with ratings significantly higher or lower than the average?**
   - This question can help identify outliers in the dataset.

8. **How many teams have a rating above a certain threshold (e.g., 80)?**
   - This question can help assess the number of high-performing teams.

9. **What is the correlation between team ratings and other numerical features in the dataset?**
   - This question can explore relationships between ratings and other metrics.

10. **How have team ratings changed over time (if the dataset includes a time component)?**
    - This question can analyze trends in team performance over different seasons or years.

These questions can guide further analysis and exploration of the dataset, providing insights into team performance and comparisons.