# Constructing DataFrames from Series

This lesson introduced method to construct a DataFrame from multiple
Series.

This first block loads the variables created in an earlier lesson.  A
later lesson will cover loading and saving data.

In [1]:
# Setup: Load data created in an earlier lesson

import pandas as pd

hdf_file = "data/dataframes.h5"

sep_04 = pd.read_hdf(hdf_file, "sep_04")
sep_05 = pd.read_hdf(hdf_file, "sep_05")
sep_06 = pd.read_hdf(hdf_file, "sep_06")
sep_07 = pd.read_hdf(hdf_file, "sep_07")
sep_10 = pd.read_hdf(hdf_file, "sep_10")
sep_11 = pd.read_hdf(hdf_file, "sep_11")
sep_12 = pd.read_hdf(hdf_file, "sep_12")
sep_13 = pd.read_hdf(hdf_file, "sep_13")
sep_14 = pd.read_hdf(hdf_file, "sep_14")
sep_17 = pd.read_hdf(hdf_file, "sep_17")
sep_18 = pd.read_hdf(hdf_file, "sep_18")
sep_19 = pd.read_hdf(hdf_file, "sep_19")

spy = pd.read_hdf(hdf_file, "spy")
aapl = pd.read_hdf(hdf_file, "aapl")
goog = pd.read_hdf(hdf_file, "goog")

dates = pd.to_datetime(pd.read_hdf(hdf_file, "dates"))

prices = pd.read_hdf(hdf_file, "prices")

## Problem: Construct a DataFrame from rows

Create a DataFrame named `prices_row` from the row vectors previously
entered such that the results are identical to prices. For example, the first
two days worth of data are:

```python
dates_2 = pd.to_datetime(["1998-09-04", "1998-09-05"])
prices_row = pd.DataFrame([sep_04, sep_05])
# Set the index after using concat to join
prices_row.index = dates_2
```

Verify that the DataFrame identical by printing the difference with
`prices` 

```python
print(prices_row - prices)
```

In [2]:
prices_row = pd.DataFrame(
    [
        sep_04,
        sep_05,
        sep_06,
        sep_07,
        sep_10,
        sep_11,
        sep_12,
        sep_13,
        sep_14,
        sep_17,
        sep_18,
        sep_19,
    ]
)
prices_row.index = dates
print(prices_row)
print(prices - prices_row)

               SPY    AAPL     GOOG
2018-09-04  289.81  228.36  1197.00
2018-09-05  289.03  226.87  1186.48
2018-09-06  288.16  223.10  1171.44
2018-09-07  287.60  221.30  1164.83
2018-09-10  288.10  218.33  1164.64
2018-09-11  289.05  223.85  1177.36
2018-09-12  289.12  221.07  1162.82
2018-09-13  290.83  226.41  1175.33
2018-09-14  290.88  223.84  1172.53
2018-09-17  289.34  217.88  1156.05
2018-09-18  290.91  218.24  1161.22
2018-09-19  291.44  216.64  1158.78
            SPY  AAPL  GOOG
2018-09-04  0.0   0.0   0.0
2018-09-05  0.0   0.0   0.0
2018-09-06  0.0   0.0   0.0
2018-09-07  0.0   0.0   0.0
2018-09-10  0.0   0.0   0.0
2018-09-11  0.0   0.0   0.0
2018-09-12  0.0   0.0   0.0
2018-09-13  0.0   0.0   0.0
2018-09-14  0.0   0.0   0.0
2018-09-17  0.0   0.0   0.0
2018-09-18  0.0   0.0   0.0
2018-09-19  0.0   0.0   0.0


## Problem: Construct a DataFrame from columns

Create a DataFrame named `prices_col` from the 3 column vectors entered
such that the results are identical to prices.

*Note*: `.T` transposes a 2-d array since `DataFrame` builds the
array by rows.

Verify that the DataFrame identical by printing the difference with
`prices` 

In [3]:
# No need to set the index or the column names since index
# and name set in Series
prices_col = pd.DataFrame([spy, aapl, goog]).T
print(prices_col)
print(prices - prices_col)

               SPY    AAPL     GOOG
2018-09-04  289.81  228.36  1197.00
2018-09-05  289.03  226.87  1186.48
2018-09-06  288.16  223.10  1171.44
2018-09-07  287.60  221.30  1164.83
2018-09-10  288.10  218.33  1164.64
2018-09-11  289.05  223.85  1177.36
2018-09-12  289.12  221.07  1162.82
2018-09-13  290.83  226.41  1175.33
2018-09-14  290.88  223.84  1172.53
2018-09-17  289.34  217.88  1156.05
2018-09-18  290.91  218.24  1161.22
2018-09-19  291.44  216.64  1158.78
            SPY  AAPL  GOOG
2018-09-04  0.0   0.0   0.0
2018-09-05  0.0   0.0   0.0
2018-09-06  0.0   0.0   0.0
2018-09-07  0.0   0.0   0.0
2018-09-10  0.0   0.0   0.0
2018-09-11  0.0   0.0   0.0
2018-09-12  0.0   0.0   0.0
2018-09-13  0.0   0.0   0.0
2018-09-14  0.0   0.0   0.0
2018-09-17  0.0   0.0   0.0
2018-09-18  0.0   0.0   0.0
2018-09-19  0.0   0.0   0.0


## Problem: Construct a DataFrame from a dictionary

Create a DataFrame named `prices_dict` from the 3 column vectors entered
such that the results are identical to prices

Verify that the DataFrame identical by printing the difference with
`prices` 

In [4]:
prices_dict = pd.DataFrame({"SPY": spy, "AAPL": aapl, "GOOG": goog}, index=dates)
print(prices_dict)
print(prices - prices_dict)

               SPY    AAPL     GOOG
2018-09-04  289.81  228.36  1197.00
2018-09-05  289.03  226.87  1186.48
2018-09-06  288.16  223.10  1171.44
2018-09-07  287.60  221.30  1164.83
2018-09-10  288.10  218.33  1164.64
2018-09-11  289.05  223.85  1177.36
2018-09-12  289.12  221.07  1162.82
2018-09-13  290.83  226.41  1175.33
2018-09-14  290.88  223.84  1172.53
2018-09-17  289.34  217.88  1156.05
2018-09-18  290.91  218.24  1161.22
2018-09-19  291.44  216.64  1158.78
            SPY  AAPL  GOOG
2018-09-04  0.0   0.0   0.0
2018-09-05  0.0   0.0   0.0
2018-09-06  0.0   0.0   0.0
2018-09-07  0.0   0.0   0.0
2018-09-10  0.0   0.0   0.0
2018-09-11  0.0   0.0   0.0
2018-09-12  0.0   0.0   0.0
2018-09-13  0.0   0.0   0.0
2018-09-14  0.0   0.0   0.0
2018-09-17  0.0   0.0   0.0
2018-09-18  0.0   0.0   0.0
2018-09-19  0.0   0.0   0.0


## Exercises

### Exercise: Create a DataFrame from rows

Use the three series populated below to create a DataFrame using each
as a row.

**Note**: Notice what happens in the resulting `DataFrame` since one of the
`Series` has 4 elements while the others have 3.

In [5]:
# Setup: Data for the Exercises
import pandas as pd

index = ["Num", "Let", "Date"]
a = pd.Series([1, "A", pd.Timestamp(2018, 12, 31)], name="a", index=index)
b = pd.Series([2, "B", pd.Timestamp(2018, 12, 31)], name="b", index=index)
index = ["Num", "Let", "Date", "Float"]
c = pd.Series([3, "C", pd.Timestamp(2018, 12, 31), 3.0], name="c", index=index)

In [6]:
df = pd.DataFrame([a, b, c])
df

Unnamed: 0,Num,Let,Date,Float
a,1,A,2018-12-31,
b,2,B,2018-12-31,
c,3,C,2018-12-31,3.0


### Exercise: Build a DataFrame from Columns

Build a `DataFrame` from the three series where each is used as a column.


In [7]:
df = pd.DataFrame([a, b, c]).T
df

Unnamed: 0,a,b,c
Num,1,2,3
Let,A,B,C
Date,2018-12-31 00:00:00,2018-12-31 00:00:00,2018-12-31 00:00:00
Float,,,3.0


In [8]:
# Note: a.name is "A"
d = {a.name: a, b.name: b, c.name: c}
df = pd.DataFrame(d)
df

Unnamed: 0,a,b,c
Date,2018-12-31 00:00:00,2018-12-31 00:00:00,2018-12-31 00:00:00
Float,,,3.0
Let,A,B,C
Num,1,2,3
