# Practical Session 7 Bonus: `pandas`

### What is pandas?

- `pandas` is a Python library for working with structured/tabular data - — like Excel sheets, CSVs, or SQL tables.

- It is built on top of `numpy`, and integrates well with other Python tools.

- The core data structures in pandas are:

    - Series – a 1D labeled array

    - DataFrame – a 2D table with labeled rows and columns

- It’s ideal for loading, cleaning, analyzing, filtering, and aggregating data.

In [None]:
import pandas as pd

### `pandas` tutorial

Tables can be loaded from files such as 
```python
df = pd.read_csv("data.csv")
```

Alternatively dataframes can be created for scratch e.g. from a dictionary

In [None]:
data = {'Time': [0, 1, 2], 'Position': [0.0, 4.9, 9.8]}

df = pd.DataFrame(data)

Or `numpy` arrays

In [None]:
import numpy as np

arr = np.array([[0, 0.0],
                [1, 4.9],
                [2, 9.8]])

df = pd.DataFrame(arr, columns=['Time', 'Position'])

In [None]:
df # Display the DataFrame

In a `pandas` DataFrame, the index is the label for each row. By default, pandas assigns a simple numerical index starting from 0, shown on the left side of the table. This is similar to row numbers in a spreadsheet.

You can:

- Leave the default index (0, 1, 2, …)

- Set a custom index (e.g. time, IDs, dates)

- Use the index to select, filter, or merge rows

You can access or assign new indexes using
```python
df.index
```
for example for looping over them

In [None]:
for i in df.index:
    print(i)

`pandas` has various functions for exploring the data
```python
df.head()        # First 5 rows
df.tail(3)       # Last 3 rows
df.shape         # (rows, columns)
df.columns       # Column names
df.dtypes        # Data types
df.info()        # Summary of structure
df.describe()    # Summary statistics
```

Columns can be accessed through

In [None]:
df['Time']          # Single column (returns Series)

Rows can be accessed through `iloc` (if you have custom indexes you can instead use `loc`)

In [None]:
df.iloc[0]

Filtering can be done with conditions as so

In [None]:
df[df['Position'] > 5]              # Rows where position > 10

In [None]:
df[(df['Position'] > 5) & (df['Time'] < 2)]  # Multiple conditions

Note that these create new dataframes and don't alter `df` itself. To modify `df` directly it needs to be done through assignment

In [None]:
df['Position'] = df['Position'] * 100
df

Dataframes can be saved with
```python
df.to_csv('new_data.csv', index=False)
```
The `index=False` argument means the indexes aren't saved in the file.

### Task: Projectile Motion with `pandas`

We are going to work with the `projectile_analysis.csv` which we created in the practical session 7 worksheet.

Firstly load the data into a `pandas` dataframe, giving appropriate column names.

Use `.info()` and `.describe()` to understand the structure and summary of the dataset.

Calculate and print the average acceleration during freefall (i.e., when the velocity is negative and position is above 0.5 m).

Add a new column to the DataFrame with the ball’s kinetic energy:
$$
KE = \frac{1}{2} m v^2
$$
Assume the mass is 0.2kg

Save this new dataframe as `projection_final.csv`