# Data Frames

A DataFrame is Jupyter's notion of a table.  Data frames are made of "Series" with each series representing one column of the table.

# Series


In [1]:
import pandas as pd

my_series = pd.Series([5,6,7])
print(my_series)

0    5
1    6
2    7
dtype: int64



Once you have done that, you have made it to the very simplest of Excel tables :)  It seemed like a lot of work to get here but we now have tools that are much more powerful than Excel can give us and those tools work in ways that reduce the likely-hood of errors.

Note that each value we put in got a "row number" just like we might expect in Excel, except they start at 0.  The "row number" is actually referred to as the _index_ in pandas and we can control it when we create our series

In [5]:
adjusted_series = pd.Series({1:5,2:6,3:7})
print(adjusted_series)

1    5
2    6
3    7
dtype: int64


Series _don't have colum names_.  If you are familiar with other programming languages they are like arrays, associative arrays, or dictionaries.

# Arithmetic on Series.

The `numpy` library will let us do arithmetic on whole series as if they were single values.  When you add, multiply, subtract, or divide a series, you get another series back with all the values adjusted.

In [32]:
import numpy as np

print("-- adding one --")
print(adjusted_series + 1)

print("-- multiply by two --")
print(adjusted_series * 2)

-- adding one --
1    6
2    7
3    8
dtype: int64
-- multiply by two --
1    10
2    12
3    14
dtype: int64




# Data Frame

We want titles on our columns though, and we want multiple columns.  Data frames give us that.  We can promote a Series to data frame

In [11]:
first_frame = adjusted_series.to_frame()
first_frame

Unnamed: 0,0
1,5
2,6
3,7


It has given our one column a name.  I don't love that name, I would prefer "A"

In [12]:
second_frame = adjusted_series.to_frame("A")
second_frame

Unnamed: 0,A
1,5
2,6
3,7


Lets now add a second column to this data frame.  When doing so, I need to say what "column slot" to use.  These are also labelled from 0, so the second one is slot 1 :/  Notice I can name that column when I insert it.

In [15]:
second_frame.insert(1,"B", [50,50,70])
second_frame

Unnamed: 0,A,B
1,5,50
2,6,50
3,7,70


You can get a series back from a dataframe using the _square bracket_ notation

In [16]:
second_frame["A"]

1    5
2    6
3    7
Name: A, dtype: int64

Add you can add a column using the same notation (this is the same thing as `insert` but it is a nicer form)

In [20]:
second_frame["C"] = [500,600,700]
second_frame

Unnamed: 0,A,B,C
1,5,50,500
2,6,50,600
3,7,70,700


# Exercise

Adjust the following code block so that it adds a new column (called "B") to the one-column data frame `table` Column "B" should have values one more (`+1`) than the corresponding existing value. I.e. replicate our table from the transition notebook

![simple table](small_table.png)

Everything you need was covered in this notebook, but you might have to get creative in how you combine the ideas.


In [27]:
table = pd.Series({1:5,2:6,3:7}).to_frame("A")
print(table)

# put your code here such that the next print of table has all the new values in column "B"

print(table)

   A
1  5
2  6
3  7
   A
1  5
2  6
3  7


# Conclusion

Once you have done that, you have made it to the very simplest of Excel tables :)  It seemed like a lot of work to get here but we now have tools that are much more powerful than Excel can give us and those tools work in ways that reduce the likely-hood of errors.  In [the next notebook](less_errors.ipynb) we will explore exactly how.