# More on Pandas
Pandas contains multitudes of functions and methods that make data wrangling much easier. Lets import the example files from `Pandas.ipynb`

In [None]:
import numpy as np
import pandas as pd
#Create dataframes
dfcarbon = pd.read_excel('data/GlobalCarbonBudget2022.xlsx','Global Carbon Budget', header=0, skiprows=20)
dfcarbon

Let's construct the `dfcarbon` dataframe as we did the other day. Below is the code we used to rename the columns using shorter names:

In [None]:
# Rename columns
name_dict = {}
for name in dfcarbon.columns:
    namelist = name.split(' ')
    name_dict[name] = ' '.join(namelist[0:2])
newdfcarbon = dfcarbon.rename(columns = name_dict)
# Create new column
newdfcarbon['total carbon'] = newdfcarbon['fossil emissions'] + newdfcarbon['land-use change'] 
newdfcarbon = newdfcarbon.iloc[:,[0,1,2,8,3,4,5,6,7]]

In [None]:
newdfcarbon

## Filtering
Dataframes can be sliced (indexed) using logical statements. This can be useful when filtering data based on the values within each record. You can select rows from any dataframe by using a boolean statement as an index. 

In [None]:
# boolean indexing
newdfcarbon[(newdfcarbon['land-use change']<1.32) & (newdfcarbon['total carbon']<6.1)]

In [None]:
newdfcarbon['total carbon'].tolist()

In [None]:
newdfcarbon['total carbon'][newdfcarbon.Year>2010]

In [None]:
maxcarbon = max(newdfcarbon['total carbon'])
newdfcarbon[newdfcarbon['total carbon']==maxcarbon]

You can also use the `query` method to filter data based on values. However, formatting queries takes a little more thinking. Please refer to [Dataframe query](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html) for formatting information

In [None]:
newdf = newdfcarbon.query('Year>2000')
newdf1 = newdfcarbon.query('`total carbon`>10 and `land-use change`>1.3')

In [None]:
newdf1

## Filter Method
`Pandas` also contains an explicit `filter` method that allows you to pick out rows or columns of particular interest.

In [None]:
newdfcarbon

In [None]:
newdfcarbon.filter(items = ["Year","total carbon","fossil emissions", "land-use change"], inplace=True)

# Exploration
Let's try to use some of the `pandas` functionality to explore the exoplanet data in our file `Exoplanet_Archive10.2025.csv`

In [None]:
#create exoplanet dataframe
dfplanets = pd.read_csv('data/Exoplanet_Archive10.2025.csv', header=0,  skiprows=95)
dfplanets

In [None]:
dfplanets.columns.to_list()

In [None]:
dfplanets.filter(items = ["pl_name","default_flag"], axis=1)

In [None]:
dfplanets = dfplanets[dfplanets.default_flag==1]
#setting index - not strictly needed
dfplanets.set_index("pl_name")

In [None]:
sum(dfplanets['sy_snum'][dfplanets.sy_snum>1])

In [None]:
import statistics as st
Kratio = 7.496*10**(-6)
ratio = dfplanets.pl_orbsmax**3/dfplanets.pl_orbper**2

In [None]:
len(dfplanets.st_mass)

In [None]:
len(dfplanets.st_mass[dfplanets.sy_snum==1])

In [None]:
sum(dfplanets.pl_orbsmax<0.5)

In [None]:
dfplanets.pl_rade.hist(bins=20)