# DataFrame Operations with pandas

## Pandas: A Brief Overview

[Pandas](https://pandas.pydata.org/) is a powerful and widely used Python library for data manipulation and analysis. It provides easy-to-use data structures, such as DataFrame, designed for working with structured data seamlessly. Pandas is particularly useful for tasks like cleaning, exploring, and analyzing datasets.

## Relevance in Astrophysics

In physics, where data comes in various formats and structures, Pandas proves to be an invaluable tool. Its DataFrame allows astronomers and researchers to organize, filter, and analyze data efficiently. Whether dealing with celestial body properties, observational data, or simulations, Pandas simplifies the process of handling astrophysical data.

## Exploring DataFrame Operations

Here we'll explore basic DataFrame operations using Pandas with simple dataset. The dataset contains information about celestial bodies, and we'll perform operations like sorting by distance, filtering bodies with known temperatures, calculating statistics, and adding a new column based on size categories.


#### Importing Pandas

In [1]:
import pandas as pd


#### Creating the DataFrame

In [2]:
# Creating a DataFrame with data
data = {
    'Celestial Body': ['Sun', 'Proxima Centauri', 'Andromeda Galaxy', 'Sirius', 'Betelgeuse'],
    'Distance (light-years)': [0, 4.24, 2.537e6, 8.6, 642.5],
    'Average Temperature (K)': [5778, 3040, None, 9940, 3600],
    'Diameter (solar radii)': [1, 0.141, None, 1.711, 887]
}

celestial_bodies_df = pd.DataFrame(data)


#### # Viewing the DataFrame


In [3]:
# Viewing the DataFrame
celestial_bodies_df


Unnamed: 0,Celestial Body,Distance (light-years),Average Temperature (K),Diameter (solar radii)
0,Sun,0.0,5778.0,1.0
1,Proxima Centauri,4.24,3040.0,0.141
2,Andromeda Galaxy,2537000.0,,
3,Sirius,8.6,9940.0,1.711
4,Betelgeuse,642.5,3600.0,887.0


#### Sorting by Distance

In [4]:
# Sorting by distance
sorted_df = celestial_bodies_df.sort_values(by='Distance (light-years)')
sorted_df


Unnamed: 0,Celestial Body,Distance (light-years),Average Temperature (K),Diameter (solar radii)
0,Sun,0.0,5778.0,1.0
1,Proxima Centauri,4.24,3040.0,0.141
3,Sirius,8.6,9940.0,1.711
4,Betelgeuse,642.5,3600.0,887.0
2,Andromeda Galaxy,2537000.0,,


#### Data Filtering with pandas

In [5]:
# Filtering bodies with known temperatures
known_temperatures_df = celestial_bodies_df.dropna(subset=['Average Temperature (K)'])
known_temperatures_df


Unnamed: 0,Celestial Body,Distance (light-years),Average Temperature (K),Diameter (solar radii)
0,Sun,0.0,5778.0,1.0
1,Proxima Centauri,4.24,3040.0,0.141
3,Sirius,8.6,9940.0,1.711
4,Betelgeuse,642.5,3600.0,887.0


#### data statistics with pandas

In [6]:
# Calculating statistics
statistics = celestial_bodies_df.describe()
statistics


Unnamed: 0,Distance (light-years),Average Temperature (K),Diameter (solar radii)
count,5.0,4.0,4.0
mean,507531.1,5589.5,222.463
std,1134508.0,3131.583359,443.025132
min,0.0,3040.0,0.141
25%,4.24,3460.0,0.78525
50%,8.6,4689.0,1.3555
75%,642.5,6818.5,223.03325
max,2537000.0,9940.0,887.0


#### Adding a New Column 'Size Category'

In [7]:
# Adding a new column 'Size Category'
size_threshold = 1.5
celestial_bodies_df['Size Category'] = ['Small' if diameter and diameter < size_threshold else 'Large' for diameter in celestial_bodies_df['Diameter (solar radii)']]
celestial_bodies_df


Unnamed: 0,Celestial Body,Distance (light-years),Average Temperature (K),Diameter (solar radii),Size Category
0,Sun,0.0,5778.0,1.0,Small
1,Proxima Centauri,4.24,3040.0,0.141,Small
2,Andromeda Galaxy,2537000.0,,,Large
3,Sirius,8.6,9940.0,1.711,Large
4,Betelgeuse,642.5,3600.0,887.0,Large
