# 5. Pandas
Pandas is a powerful open-source data manipulation and analysis library for Python. It is built on top of the NumPy library and provides data structures for efficient manipulation and analysis of structured data. Pandas is widely used in data science, machine learning, and data analysis tasks due to its flexibility and ease of use. Pandas is better suited for tasks involving structured and labeled data, especially when dealing with heterogeneous data types and complex operations on tabular data (think of a spreadsheet for example). This is very useful for large labeled datasets often used in Artificial Intelligence.

## 5.1 Series & Dataframes
Pandas introduces two key data structures in Pandas are the **Series** and **DataFrame**. We will discuss how they work and why they are useful.

Similarly to NumPy and Matplotlib, the convention to import Pandas is as follows:

In [None]:
import pandas as pd

### 5.1.1 Series
`Series` is a fundamental data structure in Pandas, representing a one-dimensional labeled array capable of holding any data type. It's akin to a column in a spreadsheet or a single column in a database table. Each element in a Series has a label called an index. In the following example you can see why it differs from a NumPy `Array`.

In [2]:
import pandas as pd
import numpy as np

# Creating a Series from a list
array = np.array([1, 3, 5, np.nan, 6, 8])
series = pd.Series([1, 3, 5, np.nan, 6, 8])

# Displaying the Series
print(array)
print(series)

[ 1.  3.  5. nan  6.  8.]
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64


As you can see, all values are labeled with an index. Accessing points in a series can be done using:

In [None]:
# Accessing elements by index
print(series[2])  # Output: 5.0

# Slicing the Series
print(series[1:4])

`Series` supports both integer-based and label-based indexing. You can access elements by their index or use slicing to select a range of elements. A series is also very similar to a standard Python Dictionary. The following code shows you how to convert between them:

In [None]:
import time
# Creating a dictionary
data_dict = {'airplanes': 10, 'rockets': 20, 'satellites': 30}

# Creating a Series from the dictionary
s = pd.Series(data_dict)

# Displaying the Series
print(s)

### 5.1.2 DataFrames
DataFrame is a two-dimensional labeled data structure with columns that can be of different data types. It resembles a spreadsheet or SQL table, consisting of rows and columns. A Pandas DataFrame can be thought of as a dictionary of Series. Each column in a DataFrame is essentially a Series, and the keys of the dictionary become the column names. The following example shows how a DateFrame is created from a NumPy matrix.

In [None]:
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])

# Displaying the DataFrame
print(df)

Indexing and Selecting can be performed from a DataFrame as follows:

In [None]:
# Accessing a specific column can be done by passing the column name as a string in [].
print(df['Column2'])

# Accessing a specific element can be done by using the df.at method.
# This example takes the index 1 (so 2nd element) of the specified column.
print(df.at[1, 'Column3'])

A new column can be quite easily added by the following code:

In [None]:
# Adding a new column
df['Column4'] = pd.Series([4, 7, 10])

# Displaying the updated DataFrame
print(df)

## 5.2 File Handling with Pandas
Pandas provides versatile tools for reading data from various file formats. This capability is crucial for importing external data into your Python environment.

### 5.2.1 Reading Data

Reading from a CSV is really easy in Pandas as the DataFrame is very similary structured. Pandas is also very easy to use, like in this example where an airfoil is plotted.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Reading a CSV file into a DataFrame
df_csv = pd.read_csv('assets/eppler-airfoil.csv', skiprows=2) # skiprows=2 skips the first two rows

# Displaying the DataFrame
print(df_csv)

# Plotting the airfoil
plt, ax = plt.subplots(1)
ax.plot(df_csv['x'][:31], df_csv['y'][:31], label='Top Surface')
ax.plot(-df_csv['x'][32:], -df_csv['y'][32:], label='Bottom Surface')
ax.set_aspect('equal', adjustable='box')
ax.set_title('EPPLER 67 AIRFOIL')
plt.show()

Note how pandas automatically assigned x and y to be the column names after reading the CSV. Pandas can read an Excel file in a very similar way using the `pd.read_excel()` function.
### 5.2.2 Writing Data
Pandas also allows you to export data from DataFrames to various file formats. The interface is really simple to use: 

In [None]:
# Writing a DataFrame to a CSV file
df.to_csv('assets/output.csv', index=False)

# Writing a DataFrame to an Excel file
df.to_excel('assets/output.xlsx', sheet_name='Sheet1', index=False)

## 5.3 Data Manipulation
Pandas provides powerful mechanisms for manipulating and preparing data in DataFrames.

For example, one can extract columns from a DataFrame in the following way:

In [None]:
# Selecting a single column
single_column = df['Column1']

# Selecting multiple columns
multiple_columns = df[['Column1', 'Column2']]

multiple_columns

Selecting individual rows can be done using the `loc()` (by index) and `iloc()` (by position) functions:

In [None]:
# Selecting rows by index
row_by_index = df.loc[1]

# Selecting rows by position
row_by_position = df.iloc[1]

row_by_index

It is also possible to do conditional selection, similar to how it is handled in NumPy.

In [None]:
# Selecting rows based on a condition
condition_selection = df[df['Column1'] > 4]

condition_selection