In [None]:
import pandas as pd
import numpy as np

We'd be remiss if we didn't cover _what dataframes are_ and  how to _create_ datafames with Pandas, so here's a quick primer.


A Pandas DataFrame is a way to store data in Python that's similar to an Excel spreadsheet or a table in a relational database. Like an Excel spreadsheet, it organizes data into rows and columns, which makes it easy to see and work with. Each column in a DataFrame holds data of the same type, for example, all numbers or all text, just like columns in Excel. Each row in a DataFrame is like a record in a database table, containing different types of data across its columns.

If you're familiar with how databases work, you know that you can filter data, join tables, and aggregate data in a database. DataFrames allow you to do similar things in Python. You can filter rows, join DataFrames together like you would join tables in a database, and summarize data. This makes DataFrames a powerful tool for data analysis, giving you a familiar way to handle data in Python if you're already used to working with Excel spreadsheets or database tables.

Dataframes can be constructed like any other object— you just need an `index`, `columns`, & `data`!

In [None]:
new_df = pd.DataFrame(
    columns=["A", "B", "C", "D", "E"],
    index=["V", "W", "X", "Y", "Z"],
    # Generate a 2D Vandermonte matrix (https://en.wikipedia.org/wiki/Vandermonde_matrix)
    data=np.vander((5, 4, 3, 2, 1), 5),
)
display(new_df)

Columns can be added through assignment. Columnar operations are supported for like-kind columns.

In [None]:
new_df["F"] = new_df["A"] + new_df["B"] + new_df["C"] + new_df["D"] + new_df["E"]
display(new_df)

Columns can also be dropped or renamed, but it's usually best to create a new dataframe:

In [None]:
# Hopefully, you're a bit more creative than us :)

# Drop and assign to a new dataframe
new_new_df = new_df.drop("F", axis=1)

# Drop the data in place
new_df.drop("F", axis=1, inplace=True)

We can also manipulate entire dataframes

In [None]:
double_new_df = new_df * 2
display(double_new_df)