# Updating data frames
Lecture Notes
10-16-23
https://carmengg.github.io/eds-220-book/lectures/lesson-5-updating-dataframes.html

We will use the Palmer penguins dataset (Horst et al., 2020). This time we will import it via the seaborn package since it is included as one of seaborn’s example datasets.

In [11]:
# standard libraries
import pandas as pd
import numpy as np

# import seaborn with its standard abbreviation
import seaborn as sns

# will use the random library to create some random numbers
import random

penguins = sns.load_dataset("penguins")

# look at dataframe's head
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


## Add a column
```
df['new_col_name'] = new_column_values
```

`new_column_values` could be:
- a pd.Series or a numpy array of the same length as the data
- a single scalar (single number, single string)

Ex: create a new column where the body mass is in kilograms instead of grams, so we need to divide the body_mass_g by 1000

In [12]:
# add a new column body_mass_kg 
# sane syntax as adding a new key to a dictionary
penguins['body_mass_kg'] = penguins.body_mass_g/1000

# confirm the new column is in the data frame
print('body_mass_kg' in penguins.columns)

# take a look at the new column
penguins.head()

True


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,body_mass_kg
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male,3.75
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female,3.8
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female,3.25
3,Adelie,Torgersen,,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female,3.45


If we want to create a new column and insert it at a particular position we use the data frame method insert():

```
df.insert(loc = integer_index,  # location of new column
          column = 'new_col_name', 
          value = new_col_values)
```

In [13]:
# create random 3-digit codes
# random.sample used for random sampling wo replacement
codes = random.sample(range(100,1000), len(penguins))

# insert codes at the front of data frame = index 0
penguins.insert(loc=0, 
                column = 'code',
                value = codes)

In [14]:
penguins.head()

Unnamed: 0,code,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,body_mass_kg
0,388,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male,3.75
1,979,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female,3.8
2,167,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female,3.25
3,957,Adelie,Torgersen,,,,,,
4,823,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female,3.45


In [None]:
git sta