# Practical Session 7 Bonus: `pandas`

### What is pandas?

- `pandas` is a Python library for working with structured/tabular data - — like Excel sheets, CSVs, or SQL tables.

- It is built on top of `numpy`, and integrates well with other Python tools.

- The core data structures in pandas are:

    - Series – a 1D labeled array

    - DataFrame – a 2D table with labeled rows and columns

- It’s ideal for loading, cleaning, analyzing, filtering, and aggregating data.

In [2]:
import pandas as pd

### `pandas` tutorial

Tables can be loaded from files such as 
```python
df = pd.read_csv("data.csv")
```

Alternatively dataframes can be created for scratch e.g. from a dictionary

In [3]:
data = {'Time': [0, 1, 2], 'Position': [0.0, 4.9, 9.8]}

df = pd.DataFrame(data)

Or `numpy` arrays

In [4]:
import numpy as np

arr = np.array([[0, 0.0],
                [1, 4.9],
                [2, 9.8]])

df = pd.DataFrame(arr, columns=['Time', 'Position'])

In [5]:
df # Display the DataFrame

Unnamed: 0,Time,Position
0,0.0,0.0
1,1.0,4.9
2,2.0,9.8


In a `pandas` DataFrame, the index is the label for each row. By default, pandas assigns a simple numerical index starting from 0, shown on the left side of the table. This is similar to row numbers in a spreadsheet.

You can:

- Leave the default index (0, 1, 2, …)

- Set a custom index (e.g. time, IDs, dates)

- Use the index to select, filter, or merge rows

You can access or assign new indexes using
```python
df.index
```
for example for looping over them

In [6]:
for i in df.index:
    print(i)

0
1
2


`pandas` has various functions for exploring the data
```python
df.head()        # First 5 rows
df.tail(3)       # Last 3 rows
df.shape         # (rows, columns)
df.columns       # Column names
df.dtypes        # Data types
df.info()        # Summary of structure
df.describe()    # Summary statistics
```

Columns can be accessed through

In [7]:
df['Time']          # Single column (returns Series)

0    0.0
1    1.0
2    2.0
Name: Time, dtype: float64

Rows can be accessed through `iloc` (if you have custom indexes you can instead use `loc`)

In [8]:
df.iloc[0]

Time        0.0
Position    0.0
Name: 0, dtype: float64

Filtering can be done with conditions as so

In [9]:
df[df['Position'] > 5]              # Rows where position > 10

Unnamed: 0,Time,Position
2,2.0,9.8


In [10]:
df[(df['Position'] > 5) & (df['Time'] < 2)]  # Multiple conditions

Unnamed: 0,Time,Position


Note that these create new dataframes and don't alter `df` itself. To modify `df` directly it needs to be done through assignment

In [11]:
df['Position'] = df['Position'] * 100
df

Unnamed: 0,Time,Position
0,0.0,0.0
1,1.0,490.0
2,2.0,980.0


Dataframes can be saved with
```python
df.to_csv('new_data.csv', index=False)
```
The `index=False` argument means the indexes aren't saved in the file.

### Task: Projectile Motion with `pandas`

We are going to work with the `projectile_analysis.csv` whcih we created in the practical session 7 worksheet.

Firstly load the data into a `pandas` dataframe, giving appropriate column names.

In [12]:
# Load the CSV file with headers
df = pd.read_csv('projectile_analysis.csv')

# Check the first few rows
print(df.head())

   time  position  velocity  acceleration
0   0.0  0.000000   0.99019       -0.0981
1   0.1  0.099019   0.98038       -0.0981
2   0.2  0.197057   0.97057       -0.0981
3   0.3  0.294114   0.96076       -0.0981
4   0.4  0.390190   0.95095       -0.0981


Use `.info()` and `.describe()` to understand the structure and summary of the dataset.

In [13]:
# Display concise summary of the DataFrame
print(df.info())

# Display descriptive statistics of the DataFrame
print(df.describe())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 998 entries, 0 to 997
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   time          998 non-null    float64
 1   position      998 non-null    float64
 2   velocity      998 non-null    float64
 3   acceleration  998 non-null    float64
dtypes: float64(4)
memory usage: 31.3 KB
None
             time    position      velocity  acceleration
count  998.000000  998.000000  9.980000e+02    998.000000
mean    49.850000    1.359681  2.669875e-18     -0.009922
std     28.824209    1.472504  3.648042e-01      0.775774
min      0.000000    0.000000 -9.816200e-01     -0.098100
25%     24.925000    0.154425 -1.986523e-01     -0.098100
50%     49.850000    0.777263  0.000000e+00     -0.098100
75%     74.775000    2.088826  1.998358e-01     -0.098100
max     99.700000    5.046869  9.901900e-01     16.489040


Calculate and print the average acceleration during freefall (i.e., when the velocity is negative and position is above 0.5 m).

In [14]:
# Filter rows where velocity is negative and position is above 0.5
freefall = df[(df['velocity'] < 0) & (df['position'] > 0.5)]

# Calculate the average acceleration during freefall
avg_acceleration = freefall['acceleration'].mean()

print(f"Average acceleration during freefall: {avg_acceleration:.2f} m/s²")


Average acceleration during freefall: -0.10 m/s²


Add a new column to the DataFrame with the ball’s kinetic energy:
$$
KE = \frac{1}{2} m v^2
$$
Assume the mass is 0.2kg

In [17]:
m = 0.2
df['kinetic_energy'] = 0.5 * m * df['velocity']**2
df

Unnamed: 0,time,position,velocity,acceleration,kinetic_energy
0,0.0,0.000000,0.99019,-0.0981,0.098048
1,0.1,0.099019,0.98038,-0.0981,0.096114
2,0.2,0.197057,0.97057,-0.0981,0.094201
3,0.3,0.294114,0.96076,-0.0981,0.092306
4,0.4,0.390190,0.95095,-0.0981,0.090431
...,...,...,...,...,...
993,99.3,0.000000,0.00000,0.0000,0.000000
994,99.4,0.000000,0.00000,0.0000,0.000000
995,99.5,0.000000,0.00000,0.0000,0.000000
996,99.6,0.000000,0.00000,0.0000,0.000000


Save this new dataframe as `projection_final.csv`

In [18]:
df.to_csv('projection_final.csv', index=False)