## Data structures

### Basic data structures

There are two types of data structure in pandas:

**Series**: a 1-d labelled array holding data of any type

**DataFrame**: a 2-d data structure holding a 2-d array or a table with rows and columns

### Creating data structures
You can create a Series by passing a list of values, letting pandas set up a default index:

In [None]:
import pandas as pd
import numpy as np

s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

You can create a DataFrame by passing a NumPy array, letting pandas set up a default index and column names:

In [None]:
df = pd.DataFrame(np.random.randn(6, 4))
print(df)

You can create a DataFrame by passing a NumPy array, defining the index as an array and letting pandas set up default column names:

In [None]:
index_array = np.array(['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6'])
df = pd.DataFrame(np.random.randn(6, 4), index=index_array)
print(df)

# Note that the length of the index_array list must be equal to the number of rows in the dataframe

You can create a DataFrame by passing a NumPy array, defining the index as an array and defining column names as a list:

In [None]:
column_list = ('A', 'B', 'C', 'D')
df = pd.DataFrame(np.random.randn(6, 4), index=index_array, columns=column_list)
print(df)

# Same as above, the column_list length must be equal to the number of columns

You can create a DataFrame by passing a Python dictionary where the keys become the column names, and the values contain the data:

In [None]:
from datetime import time as dtt

df2 = pd.DataFrame(
    {"A": 1.0,
     "B": pd.Timestamp("20130102"),
     "C": pd.Series(1, index=list(range(4)), dtype="float32"),
     "D": np.array([3] * 4, dtype="int32"),
     "E": pd.Categorical(["test", "train", "test", "train"]),
     "F": "foo",
     "Time": [dtt(3), dtt(6), dtt(9), dtt(12)]
    }
)

# Note that any iterable object within the dictionary must all have the same length
# As you can see with column A and F, they are not iterables and therefore are repeated
# across all the rows

# Sets a column to be the row index
df2.set_index("Time")

### Exploring a DataFrame

You can view the index and column names of the DataFrame ``df``.

In [None]:
df.index

In [None]:
df.columns

You can view the top and bottom rows of the DataFrame ``df``.

In [None]:
df.head(2) # View the top 2 rows

In [None]:
df.tail(3) # View the bottom 3 rows

In [None]:
df.head() # Without providing a number, it will return the top 5 rows

### Exercise
Print out the index and column names of `df2`.

In [None]:
# space for completing exercise

### Exercise
Print out the bottom 2 rows of `df2`.

In [None]:
# space for completing exercise

### Transposing a DataFrame

You can transpose a DataFrame (swap rows and columns):

In [None]:
df.T

### Reshaping a DataFrame
A DataFrame with a ``MultiIndex`` can be set up.

In [None]:
index_1 = ["1993", "1993", "1993", "1994", "1994", "1994", "1995", "1995", "1995"]
index_2 = ["Mar", "Apr", "May","Mar", "Apr", "May", "Mar", "Apr", "May"]

# Note that both arrays must have the same length

indexes = [index_1, index_2]

index = pd.MultiIndex.from_arrays(indexes, names=["Year", "Month"])

df = pd.DataFrame(np.random.randn(9, 2), index=index, columns=["T", "P"])

df

You can reshape this ``DataFrame`` to make the columns into a level in a MultiIndex.

In [None]:
stacked = df.stack()
stacked

You can also perform the reverse using ``unstack``.

In [None]:
stacked.unstack()

It will use the final index to become the columns unless specified otherwise.

In [None]:
stacked.unstack(1)

### Exercise
Reshape `df` to use a new row for each value.

In [None]:
# space for completing exercise