<a href="https://colab.research.google.com/github/JulTob/Python/blob/master/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐼 Welcome to the Grand Library of DataFrames!

Step into a world where data transforms into organized knowledge. This is the Grand Library of DataFrames, a place where we'll learn to manage, explore, and understand our data using the powerful magic of Pandas in Python.

Think of a DataFrame as a grand catalog within this library, filled with structured information. Let's begin our magical journey by opening the gates to this library.

### Creating Your First Catalog (DataFrame)

Every library needs a catalog to keep track of its treasures. In our Grand Library, we create catalogs (DataFrames) from various sources, like enchanted scrolls (dictionaries), lists of artifacts, or even ancient texts (CSV files).

Let's create a simple catalog of magical artifacts:

### Exploring Your Catalog

Now that we have our magical catalog, let's learn how to browse through its entries.

**Peeking at the first or last entries:**

You can quickly peek at the first few artifacts using the `.head()` spell or the last few with the `.tail()` spell.

**Understanding Your Catalog's Secrets:**

To truly understand the nature of your catalog, you can use special incantations.

The `.info()` spell reveals a concise summary of the catalog, including the type of magic (data type) in each column and how many entries are not empty.
The `.describe()` spell conjures up descriptive statistics of the numerical aspects of your artifacts, like the average power level.

**Focusing on Specific Columns:**

Sometimes you only need to focus on specific types of information in your catalog, like just the 'Artifact Name' or 'Power Level'. You can select a single column by calling its name in square brackets `[]` or multiple columns by listing their names.

## Introduction to Pandas DataFrames

Pandas is a powerful open-source library for data analysis and manipulation in Python. Its core data structure is the DataFrame, which is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table.

Let's start by importing the pandas library.

In [1]:
import pandas as pd

print("The gates to the Grand Library of DataFrames are now open!")

The gates to the Grand Library of DataFrames are now open!


### Creating a DataFrame

You can create a DataFrame from various data sources, such as dictionaries, lists, or CSV files. Here's an example of creating a DataFrame from a dictionary:

In [2]:
data = {'Artifact Name': ['Phoenix Feather', 'Dragon Scale', 'Unicorn Horn', 'Griffin Claw'],
        'Power Level': [10, 9, 8, 7],
        'Location': ['Forbidden Forest', 'Dragon Mountains', 'Mystical Meadow', 'Sky Peaks']}
magic_artifacts_df = pd.DataFrame(data)
print("Behold! Your first magical catalog (DataFrame):")
display(magic_artifacts_df)

Behold! Your first magical catalog (DataFrame):


Unnamed: 0,Artifact Name,Power Level,Location
0,Phoenix Feather,10,Forbidden Forest
1,Dragon Scale,9,Dragon Mountains
2,Unicorn Horn,8,Mystical Meadow
3,Griffin Claw,7,Sky Peaks


### Basic DataFrame Operations

Once you have a DataFrame, you can perform various operations on it.

**Viewing data:**

You can view the first few rows using `.head()` and the last few rows using `.tail()`.

In [3]:
print("Peeking at the first 2 artifacts:")
display(magic_artifacts_df.head(2))

print("\nLooking at the last artifact:")
display(magic_artifacts_df.tail(1))

Peeking at the first 2 artifacts:


Unnamed: 0,Artifact Name,Power Level,Location
0,Phoenix Feather,10,Forbidden Forest
1,Dragon Scale,9,Dragon Mountains



Looking at the last artifact:


Unnamed: 0,Artifact Name,Power Level,Location
3,Griffin Claw,7,Sky Peaks


**Getting information about the DataFrame:**

`.info()` provides a concise summary of the DataFrame, including the data types of each column and the number of non-null values.
`.describe()` generates descriptive statistics of the numerical columns.

In [4]:
print("Unveiling the catalog's information:")
display(magic_artifacts_df.info())

print("\nDescribing the magical properties (numerical columns):")
display(magic_artifacts_df.describe())

Unveiling the catalog's information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Artifact Name  4 non-null      object
 1   Power Level    4 non-null      int64 
 2   Location       4 non-null      object
dtypes: int64(1), object(2)
memory usage: 228.0+ bytes


None


Describing the magical properties (numerical columns):


Unnamed: 0,Power Level
count,4.0
mean,8.5
std,1.290994
min,7.0
25%,7.75
50%,8.5
75%,9.25
max,10.0



**Selecting Rows:**

Sometimes you need to retrieve specific artifacts from your catalog. You can select rows by their position in the catalog using `.iloc[]` (integer-based) or by their magical label (index) using `.loc[]` (label-based).

**Selecting columns:**

You can select a single column using square brackets `[]` or multiple columns using a list of column names.

In [5]:
print("\nFocusing on just the names of the artifacts:")
display(magic_artifacts_df['Artifact Name'])

print("\nExamining the power levels and locations:")
display(magic_artifacts_df[['Power Level', 'Location']])


Focusing on just the names of the artifacts:


Unnamed: 0,Artifact Name
0,Phoenix Feather
1,Dragon Scale
2,Unicorn Horn
3,Griffin Claw



Examining the power levels and locations:


Unnamed: 0,Power Level,Location
0,10,Forbidden Forest
1,9,Dragon Mountains
2,8,Mystical Meadow
3,7,Sky Peaks


**Selecting rows:**

You can select rows by their index using `.loc[]` (label-based) or `.iloc[]` (integer-based).

In [25]:
print("\nRetrieving the first artifact in the catalog using iloc (by position):")
display(magic_artifacts_df.iloc[0])


Retrieving the first artifact in the catalog using iloc (by position):


Unnamed: 0,0
Artifact Name,Phoenix Feather
Power Level,10
Location,Forbidden Forest


In [26]:
print("\nGetting the second and third artifacts by their position using iloc:")
display(magic_artifacts_df.iloc[1:3])


Getting the second and third artifacts by their position using iloc:


Unnamed: 0,Artifact Name,Power Level,Location
1,Dragon Scale,9,Dragon Mountains
2,Unicorn Horn,8,Mystical Meadow


In [24]:
# Example using .loc with the indexed DataFrame
print("\nRetrieving the 'Dragon Scale' artifact using loc (by magical name):")
display(magic_artifacts_indexed_df.loc['Dragon Scale'])

print("\nRetrieving multiple artifacts using loc:")
display(magic_artifacts_indexed_df.loc[['Phoenix Feather', 'Unicorn Horn']])

print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' using loc:")
display(magic_artifacts_indexed_df.loc['Dragon Scale', ['Power Level', 'Location']])

print("\nRetrieving 'Power Level' for multiple artifacts using loc:")
display(magic_artifacts_indexed_df.loc[['Phoenix Feather', 'Unicorn Horn'], 'Power Level'])

print("\nRetrieving all columns for artifacts from 'Dragon Scale' to 'Griffin Claw' using loc:")
display(magic_artifacts_indexed_df.loc['Dragon Scale':'Griffin Claw', :])

# Adding examples from user's notes for selecting rows and columns:
print("\nRetrieving 'Power Level' for 'Dragon Scale' using loc (row and column label):")
display(magic_artifacts_indexed_df.loc['Dragon Scale', 'Power Level'])

print("\nRetrieving 'Power Level' for 'Dragon Scale' using iloc (row and column index):")
display(magic_artifacts_indexed_df.iloc[1, 0]) # Dragon Scale is at index 1, Power Level is at index 0

print("\nRetrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column label):")
display(magic_artifacts_indexed_df.loc[['Dragon Scale', 'Unicorn Horn'], 'Power Level'])

print("\nRetrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column index):")
display(magic_artifacts_indexed_df.iloc[[1, 2], 0]) # Dragon Scale at 1, Unicorn Horn at 2, Power Level at 0

print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column labels):")
display(magic_artifacts_indexed_df.loc[['Dragon Scale', 'Unicorn Horn'], ['Power Level', 'Location']])

print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column indices):")
display(magic_artifacts_indexed_df.iloc[[1, 2], [0, 1]]) # Dragon Scale at 1, Unicorn Horn at 2, Power Level at 0, Location at 1


Retrieving the first artifact in the catalog using iloc (by position):


Unnamed: 0,0
Artifact Name,Phoenix Feather
Power Level,10
Location,Forbidden Forest



Getting the second and third artifacts by their position using iloc:


Unnamed: 0,Artifact Name,Power Level,Location
1,Dragon Scale,9,Dragon Mountains
2,Unicorn Horn,8,Mystical Meadow



Retrieving the 'Dragon Scale' artifact using loc (by magical name):


Unnamed: 0,Dragon Scale
Power Level,9
Location,Dragon Mountains



Retrieving multiple artifacts using loc:


Unnamed: 0_level_0,Power Level,Location
Artifact Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Phoenix Feather,10,Forbidden Forest
Unicorn Horn,8,Mystical Meadow



Retrieving 'Power Level' and 'Location' for 'Dragon Scale' using loc:


Unnamed: 0,Dragon Scale
Power Level,9
Location,Dragon Mountains



Retrieving 'Power Level' for multiple artifacts using loc:


Unnamed: 0_level_0,Power Level
Artifact Name,Unnamed: 1_level_1
Phoenix Feather,10
Unicorn Horn,8



Retrieving all columns for artifacts from 'Dragon Scale' to 'Griffin Claw' using loc:


Unnamed: 0_level_0,Power Level,Location
Artifact Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dragon Scale,9,Dragon Mountains
Unicorn Horn,8,Mystical Meadow
Griffin Claw,7,Sky Peaks



Retrieving 'Power Level' for 'Dragon Scale' using loc (row and column label):


np.int64(9)


Retrieving 'Power Level' for 'Dragon Scale' using iloc (row and column index):


np.int64(9)


Retrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column label):


Unnamed: 0_level_0,Power Level
Artifact Name,Unnamed: 1_level_1
Dragon Scale,9
Unicorn Horn,8



Retrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column index):


Unnamed: 0_level_0,Power Level
Artifact Name,Unnamed: 1_level_1
Dragon Scale,9
Unicorn Horn,8



Retrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column labels):


Unnamed: 0_level_0,Power Level,Location
Artifact Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dragon Scale,9,Dragon Mountains
Unicorn Horn,8,Mystical Meadow



Retrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column indices):


Unnamed: 0_level_0,Power Level,Location
Artifact Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dragon Scale,9,Dragon Mountains
Unicorn Horn,8,Mystical Meadow


### Setting a Magical Identifier (Index)

By default, your catalog uses a simple numerical order as its identifier. However, you can set one of the columns as a unique magical identifier (index) for easier retrieval of artifacts. We can use the `set_index()` spell for this.

In [7]:
print("\nSetting 'Artifact Name' as the magical identifier:")
magic_artifacts_indexed_df = magic_artifacts_df.set_index('Artifact Name')
display(magic_artifacts_indexed_df)


Setting 'Artifact Name' as the magical identifier:


Unnamed: 0_level_0,Power Level,Location
Artifact Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Phoenix Feather,10,Forbidden Forest
Dragon Scale,9,Dragon Mountains
Unicorn Horn,8,Mystical Meadow
Griffin Claw,7,Sky Peaks


Now you can retrieve artifacts directly using their magical name:

In [12]:
print("\nRetrieving the 'Dragon Scale' artifact using its magical name:")
display(magic_artifacts_indexed_df.loc['Dragon Scale'])


Retrieving the 'Dragon Scale' artifact using its magical name:


Unnamed: 0,Dragon Scale
Power Level,9
Location,Dragon Mountains


### Assigning New Magical Identifiers (Index)

Instead of using an existing column, you can also assign a new list of magical identifiers to your catalog.

In [13]:
magic_artifacts_df_new_index = magic_artifacts_df.copy() # Create a copy to keep the original DataFrame

new_magical_ids = ['Artifact_1', 'Artifact_2', 'Artifact_3', 'Artifact_4']
magic_artifacts_df_new_index.index = new_magical_ids

print("\nCatalog with new magical identifiers:")
display(magic_artifacts_df_new_index)


Catalog with new magical identifiers:


Unnamed: 0,Artifact Name,Power Level,Location
Artifact_1,Phoenix Feather,10,Forbidden Forest
Artifact_2,Dragon Scale,9,Dragon Mountains
Artifact_3,Unicorn Horn,8,Mystical Meadow
Artifact_4,Griffin Claw,7,Sky Peaks


### Adding a New Artifact to Your Catalog

To add a new artifact to your catalog, you can create a new DataFrame for the artifact and then use the `pd.concat()` spell to merge it with your existing `magic_artifacts_df`.

Now you can use these new magical identifiers to retrieve artifacts:

In [14]:
print("\nRetrieving 'Artifact_3' using its new magical identifier:")
display(magic_artifacts_df_new_index.loc['Artifact_3'])


Retrieving 'Artifact_3' using its new magical identifier:


Unnamed: 0,Artifact_3
Artifact Name,Unicorn Horn
Power Level,8
Location,Mystical Meadow


In [17]:
new_artifact_data = {'Artifact Name': ["Goblin's Gold Coin"],
                     'Power Level': [6],
                     'Location': ["Goblin's Lair"]}
new_artifact_df = pd.DataFrame(new_artifact_data)

print("\nOur new artifact:")
display(new_artifact_df)

# Concatenate the new artifact to the existing DataFrame
magic_artifacts_df = pd.concat([magic_artifacts_df, new_artifact_df], ignore_index=True)

print("\nCatalog with the new artifact added:")
display(magic_artifacts_df)


Our new artifact:


Unnamed: 0,Artifact Name,Power Level,Location
0,Goblin's Gold Coin,6,Goblin's Lair



Catalog with the new artifact added:


Unnamed: 0,Artifact Name,Power Level,Location
0,Phoenix Feather,10,Forbidden Forest
1,Dragon Scale,9,Dragon Mountains
2,Unicorn Horn,8,Mystical Meadow
3,Griffin Claw,7,Sky Peaks
4,Goblin's Gold Coin,6,Goblin's Lair
5,Goblin's Gold Coin,6,Goblin's Lair


### Sorting Your Magical Artifacts

To bring order to your magical catalog, you can sort the artifacts based on the values in one or more columns. The `.sort_values()` spell allows you to arrange your artifacts. Let's sort them by 'Power Level' to see which are the most powerful!

In [18]:
print("\nSorting artifacts by Power Level (ascending):")
sorted_artifacts_ascending = magic_artifacts_df.sort_values(by='Power Level')
display(sorted_artifacts_ascending)

print("\nSorting artifacts by Power Level (descending):")
sorted_artifacts_descending = magic_artifacts_df.sort_values(by='Power Level', ascending=False)
display(sorted_artifacts_descending)


Sorting artifacts by Power Level (ascending):


Unnamed: 0,Artifact Name,Power Level,Location
5,Goblin's Gold Coin,6,Goblin's Lair
4,Goblin's Gold Coin,6,Goblin's Lair
3,Griffin Claw,7,Sky Peaks
2,Unicorn Horn,8,Mystical Meadow
1,Dragon Scale,9,Dragon Mountains
0,Phoenix Feather,10,Forbidden Forest



Sorting artifacts by Power Level (descending):


Unnamed: 0,Artifact Name,Power Level,Location
0,Phoenix Feather,10,Forbidden Forest
1,Dragon Scale,9,Dragon Mountains
2,Unicorn Horn,8,Mystical Meadow
3,Griffin Claw,7,Sky Peaks
4,Goblin's Gold Coin,6,Goblin's Lair
5,Goblin's Gold Coin,6,Goblin's Lair


### Importing Artifacts from Ancient Scrolls (CSV Files)

Sometimes your magical artifacts are stored in ancient scrolls (CSV files). You can import this data directly into your catalog using the `pd.read_csv()` spell. You can even specify a column to be the magical identifier (index) when you import it using the `index_col` parameter.

In [19]:
# Let's use a sample CSV file available in this environment
csv_file_path = '/content/sample_data/california_housing_train.csv'

print(f"\nImporting artifacts from the ancient scroll: {csv_file_path}")
housing_df = pd.read_csv(csv_file_path, index_col='longitude')

print("\nBehold! Your new catalog conjured from the ancient scroll:")
display(housing_df.head())


Importing artifacts from the ancient scroll: /content/sample_data/california_housing_train.csv

Behold! Your new catalog conjured from the ancient scroll:


Unnamed: 0_level_0,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
longitude,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


### Selecting Artifacts by Numerical Position (Slicing)

Just like selecting items from a list, you can select a range of artifacts from your catalog using numerical intervals within square brackets `[]`. This is often called "slicing". Remember that the end of the interval is *exclusive*, meaning the artifact at the end index is not included.

In [22]:
print("\nRetrieving the first two artifacts using slicing:")
display(magic_artifacts_df[0:2])

print("\nRetrieving artifacts from the third to the fifth (index 2 to 4) using slicing:")
display(magic_artifacts_df[2:5])


Retrieving the first two artifacts using slicing:


Unnamed: 0,Artifact Name,Power Level,Location
0,Phoenix Feather,10,Forbidden Forest
1,Dragon Scale,9,Dragon Mountains



Retrieving artifacts from the third to the fifth (index 2 to 4) using slicing:


Unnamed: 0,Artifact Name,Power Level,Location
2,Unicorn Horn,8,Mystical Meadow
3,Griffin Claw,7,Sky Peaks
4,Goblin's Gold Coin,6,Goblin's Lair


This is just a basic introduction. Pandas DataFrames offer many more functionalities for data manipulation, cleaning, and analysis. Feel free to ask if you have any specific questions or want to explore more advanced topics!

This is just the beginning of our adventure in the Grand Library of DataFrames! There are many more spells (operations) to learn for manipulating, cleaning, and analyzing your data.

Would you like to learn how to:

*   **Filter** artifacts based on their properties (e.g., find all artifacts with Power Level greater than 8)?
*   **Add** new artifacts to your catalog?
*   **Sort** the artifacts by Power Level?
*   Perform magical **calculations** on the Power Levels?

Let me know what magical data skill you'd like to unlock next!