# Demonstration of Long vs. Wide Format

First we read in a table in the wide format.  The table becomes a `pandas` dataframe.

In [1]:
import pandas as pd
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 
                            'acceleration', 'model_year', 'origin', 'car_name']
auto = pd.read_csv('../data/auto-mpg.csv', sep = '\s+', header = None, names = column_names)
auto

  auto = pd.read_csv('../data/auto-mpg.csv', sep = '\s+', header = None, names = column_names)
  auto = pd.read_csv('../data/auto-mpg.csv', sep = '\s+', header = None, names = column_names)


FileNotFoundError: [Errno 2] No such file or directory: '../data/auto-mpg.csv'

## Approach 1
Melt the table without considering a unique row id

In [None]:
LongFormWithoutEntities = pd.melt(auto, value_vars=column_names, var_name='Attribute', value_name='Value')
LongFormWithoutEntities

Every value in the original table has its own row in this table.  (398 * 9 = 3582)

The above table does not have enough information to reconstitute a wide table.  We do not have a way to find out which group of attribute values belong together (except if we consider row order).   

## Approach 2
Melt the table using car_name as our id

In [None]:
LongFormByCarName = pd.melt(auto, id_vars=['car_name'], var_name='Attribute', value_name='Value')
LongFormByCarName

Note that we have fewer rows in the long format.

In [None]:
WideFormUniqueCarName = pd.pivot_table(LongFormByCarName, index="car_name", columns="Attribute", values="Value", aggfunc='max')
WideFormUniqueCarName[column_names[:-1]] # order the columns the way they were in the orinal table

In the last example, we lost some rows.  Presumably, 93 rows were for car names that are already listed among the 305 car names above.  These rows represent the same model but a different model year.  The values in these replicated rows get combined or aggregated into single values.  Our aggregation function was `max` which might not be appropriate.   Also the car names are now the row indices, which may or may not be good.

## Approach 3

In our next approach, we create a column that is unique for each table row.  The row indices are unique.  We can assign the values in the row index to this new column.   

In [None]:
auto["index"] = auto.index
auto

In [None]:
LongFormWithUniqueID = pd.melt(auto, id_vars="index", var_name='Attribute', value_name='Value')
LongFormWithUniqueID

Every value in the original table has its own row in this long format table.  

In [None]:
WideAgain = pd.pivot_table(LongFormWithUniqueID, index="index", columns="Attribute", values="Value", aggfunc='max')
WideAgain[column_names] # order the columns the way they were in the orinal table

The last table is (almost) exactly like the original table.