# DataFrame and Series Indices

## DataFrame Indices

In Pandas, a DataFrame is a two-dimensional labeled data structure with columns that can be of different data types. Each column in a DataFrame is a Pandas Series, and the entire DataFrame has both row and column indices [Pandas Developers, 2023].

Row indices can be customized or left as default. The default row index is a sequence of integers starting from 0. However, you can set a specific column to be the index or assign custom index labels to rows.

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd
import numpy as np
import string

# Define the number of rows
n = 10

# Generate random data with n rows and 2 columns
data = np.random.randint(0, 100, size=(n, 2))

# Generate letters from A to Z for index labels
index_labels = list(string.ascii_uppercase)[:n]

# Create a DataFrame with random data, named columns, and custom index
df = pd.DataFrame(data = data,
                  columns=['Col 1', 'Col 2'],
                  index=index_labels)

# Print the DataFrame
print("Generated DataFrame:")
display(df)

## Series Indices

A Series is a one-dimensional labeled array in Pandas. Like DataFrames, Series also have indices, which provide labels for each element in the Series. The default index for a Series is similar to the row index in a DataFrame (a sequence of integers starting from 0). However, you can customize the index with labels [Pandas Developers, 2023].

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd
import numpy as np
import string

# Define the number of rows
n = 10

# Generate random data with n rows
data = np.random.randint(0, 100, size=n)

# Generate letters from A to Z for index labels
index_labels = list(string.ascii_uppercase)[:n]

# Create a Pandas Series with random data and custom index
series = pd.Series(data, index=index_labels)

# Print the Series
display(series)

## Index Alignment in Pandas

This alignment is crucial for accurately combining, comparing, and performing arithmetic operations on data with different structures but related indices.

<font color='Blue'><b>Example - Series Alignment:</b></font>

In [None]:
import pandas as pd

# Create two Pandas Series
data1 = pd.Series([10, 20, 30], index=['A', 'B', 'C'])
data2 = pd.Series([5, 15, 25], index=['B', 'C', 'D'])

# Perform element-wise addition on the Series
result = data1 + data2

# Display the result using the appropriate function for a Series
print(result)

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/Index_Alignment_Fig1.png" alt="picture" width="700">
</center>

In this example, the Series `data1` and `data2` have different indices. However, when the addition operation is performed, Pandas aligns the data based on their indices. As a result, calculations are only performed where indices match, and NaN (Not a Number) values are introduced for indices that do not match.

In data analysis, "NaN" stands for "Not a Number." It is a special value used to represent missing or undefined data in numerical or floating-point data types.

## Create new columns

You can assign new values to rows and columns even if they don't exist in the DataFrame.

```python
# Setting new values, creating rows if needed
df.loc[new_row_label] = new_data
df['new_column'] = new_data
```

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# This example is from
# https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html
# The Air Quality NO2 dataset if from
# http://dhhagan.github.io/py-openaq/index.html
df = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2.csv')
print(r'Air Quality NO2:')
df.head(8)

The goal is to express the $NO_2$ concentration at the London station in milligrams per cubic meter (mg/m³). This conversion is achieved under the conditions of 25 degrees Celsius and 1013 hPa pressure, using the specific conversion factor of 1.882 (Further information can be found [here](https://www.ncbi.nlm.nih.gov/books/NBK138707/)).

In [None]:
df["london_mg_per_cubic"] = df["station_london"] * 1.882
df.head(8)